Data Science in Real-life.. a Closer look | Part 1
- May 9, 2022
- Category: Data Science
Data science has revolutionized every aspect of our lives. From predicting diseases to finding the best route to your work, the enormous amounts of data generated every day make it almost impossible to just look the other way and not allocate all resources to get the most of it. In 2021, a staggering 79 Zettabytes’ worth of data was generated worldwide. To get a sense of how big this number is, imagine that if each Gigabyte is a Zettabyte is a brick, we would’ve built 258 great walls in China which is equivalent to 3,873,000,000 bricks given that there are a trillion Gigabytes in one Zettabyte.
In this article, we’ll take a closer look at how different industries use Data Science to leverage their potential, solve problems, and improve results. First, we’ll understand how Uber, one of the most successful companies of all time, uses Machine Learning to predict the density of rides and estimate fares. Then, we’ll move on to the healthcare sector and discover how companies like Health Catalyst and Ancora Medical provide better healthcare services to thousands of users. Next, we’ll dig into Airbnb’s strategy to understand its customer’s needs and finally see how valuable it’s for banks to hire data scientists.
Table of contents
- 1- Data Science in transportation (Uber)
- 2- Data Science in healthcare (Oncora Medical)
- 3- Data Science in business (Airbnb)
- 4- Data Science in finance (Paypal)
- 5- Predicting strokes with machine learning (with code)
1- Data Science in transportation (Uber)
Transportation is one of the many fields that benefit from Data Science, specifically big data. Implementing the right analytical techniques in the transportation industry helps decision-makers with:
- Estimating fares
- Predicting traffic jams and riders’ behaviour
- Suggesting faster routes… Etc
Uber is a living proof of how data can be used to completely transform business. Uber relies on big data which is collected in a data lake so that data scientists can analyze it using Spark and Hadoop. So how does data science work at Uber?
Let’s consider that you’re the user and you want to book a ride, firstly you enter the destination (input) then you specify your current location (input). That’s when you’ll see the estimated fare for the ride. Next, you confirm the ride and in less than a minute, a nearby driver accepts your request, and an estimated time of arrival is displayed on the screen.
From a data science point of view, it goes like this:
Uber has a massive database of all drivers, when you choose your current location and destination this serves as inputs along with other factors such as the number of cars available at that time and traffic that is used by prediction algorithms which, based on the previous inputs, gives you the estimated fare and time of arrival as well as the nearest driver’s information as outputs. And just like any Machine Learning model, the more data you have and the more you train it, the more accurate its performance will be. That’s what makes Uber unique. It basically depends on its rich databases to provide better services.
Data Science at Uber in a glimpse:
- Demand prediction
- Cars positioning
- Fares estimation
- Resources allocation
- Business decision
2- Data Science in healthcare (Oncora Medical)
If we were to record the human body activities in a given day, we would end up with nearly 2 Terabytes’ worth of data, let alone any medical procedure or examination along with all its generated data just for this one individual. Companies like IBM and MedAware are making the best use of this big data by implementing different analytical processes to drive the best results for patients everywhere. Applications of Data Science in the healthcare sector include but are not limited to:
- Enhancing the accuracy of diagnosing illnesses
- Medical imaging
- Developing more effective drugs
- Monitoring and preventing health problems
- Resources allocation in hospitals
- Virtual medical support
“Fighting cancer with data”. This is the first thing you see when you open the Oncora Medical website. Scientists at Oncora believe in the power of Machine learning and its ability to bring real value and ease unnecessary pain. The primary goal is to improve the quality of care and treatment for cancer patients and they achieve this with 3 products:
– Oncora patient care: A workflow designed to help the physician with the treatment journey of every patient.
– Oncora Analytics: A visualization software that links clinical data from all sources, helping researchers gain insights and improve the design of clinical trials.
– Oncora quality: it focuses on streamlining operations to improve the quality of care as it supports different clinical quality measures.
The main backbone of Oncora’s research is the Machine learning pipeline, which is dedicated to predicting certain clinical outcomes such as survival after radiotherapy. It goes through 5 phases to reach the desired outcome:
1- Pre-clinical development
At this phase, researchers start with the crucial question in any data science process: what is the outcome that can be predicted? Knowing the answer is rather important for the whole medical team, especially the physician. The way data is collected and processed is initially determined by knowing when exactly do the medical team needs to see this prediction in the workflow.
2- Retrospective validation
Knowing the desired outcome, timing, and the proper clinical intervention, the team can start training and testing the model.
3- Prospective validation
Now that the model is trained, it goes through evaluation just like any machine learning model. The team collects new patient data and evaluates the model’s performance with those new patients.
4- Clinical use evaluation
That’s when the model comes to life and is applied in a real-world scenario of a clinical setting by physicians who use the predictive algorithm to reduce negative outcomes in treatment.
5- Continuous learning
The last phase is an ongoing phase of testing the model’s accuracy, recall, and precision to improve its performance using new incoming data.
3- Data Science in business
We can clearly see the best use of data science in business. There’s an entire specialization called Business Analysis, which focuses on identifying business needs and problems, then proposing suitable solutions. And since businesses involve people and corporates, which also include people, then you have data everywhere and with the right analytical skillset, succeeding isn’t that far. There’re quite many uses of data science in business, including:
- Predicting trends
- Forecasting future sales
- Improving services and products
- Reducing risks
- Optimize marketing strategies
Applications are actually endless, but let’s look at how data science helped Airbnb become a unicorn company with a total valuation of nearly $31 billion dollars.
Although the business aim is pretty straightforward and simple (matching travellers who want a place to stay with residents willing to rent their place) Airbnb is implementing the whole complex data science package in each and every one of its processes. Similar to Uber, Airbnb uses Hadoop to manage the enormous amounts of data they have, which is used in the data science workflow. The culture inside the company is healthy and diverse, earning them the first position in the list of “Best Places to Work for in 2016” according to Glassdoor.
How does Data Science work at Airbnb?
Since it’s a recommendation system, predictive modelling is at the top of their work. The process is simple yet complex at their end. Just like when a user books a ride with Uber, the user input the desired destination and, using the vast database they have, the system recommends a place considering factors like price and amenities; the input is entered by letting the user select from a list of preferences.
They also use A/B testing, which is trying and testing different designs and versions of the same system and seeing which one the customers respond best to. To test the effectiveness of their recommendation system, they try different ranking algorithms and see how the user behaves, next they record the user’s rating and see if the rating matches the behaviour, they’re doing a great job.
4- Data Science in finance (Paypal)
Like in business, big data and data science are used in finance to solve problems and improve outcomes. Nowadays, banks are inclined towards hiring data scientists more than ever. They’re fully aware of how powerful data and machine learning are. To become a data scientist in the financial sector, one must have some domain expertise that includes the basics of economy, financial markets, and risk analytics. To further understand the role of data science in financial processes, let’s look at one example of a well-known unicorn company that depends on big data to drive success.
Paypal was founded in 1998. It has almost half a billion users and total revenue of 25 billion USD. the data team in Paypal depends on big data analytics to improve services for their users and since the company has users literally everywhere in the world; the data generated is so big and of different formats, the first option when dealing with such data is Hadoop where they can carry out sentiment analysis to optimize the recommendation engine.
How does data science work at Paypal?
Marketing is all about understanding your customers’ psychology and interpreting how and why they behave the way they do. And since we nobody longer believes in the “one size fits all” principle, data experts make use of the millions of transactions happening every day to design the best marketing approaches that suit each customer segment. The following are some ways to explain how this is implemented:
- Scientists collect data from transactions and carry out sentiment analysis to improve the process of customer segmentation.
- They use deep learning and linear regression algorithm to predict fraudulent transactions where the thorough analysis of users’ behaviour helps the predictive model detect if there’s a possibility of fraud.
- Since the transactions are analyzed constantly, they use predictive modelling to send personalized offers and discounts. This is made possible by knowing several data points, such as the purchase history, preferences, and spending behaviour.
In the next part of this article we will show one example of data science project end to end starting from assessing to model evaluation.