In this second part of the data science roadmap, we’ll go over a structured learning path for data analysis and machine learning in Python.
Introduction Everyone have a different view to the data, you can extract insights from the data and another one extract different insights from the same data.Also the different audiences have different informational needs, so when you’re building your dashboard ask the decision makers: “What are we trying to extract and know from this analyze to […]
Neither Titanic dataset nor sklearn a new thing for any data scientist but there are some important features in scikit-learn that will make any model pre-processing and tuning easier, to be specific this notebook will cover the following concepts ColumnTransformer Pipeline SimpleImputer StandardScalar OneHotEncoder OrdinalEncoder GridSearch The dataset used in this article can be found […]
In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this […]
Extracting information from string columns is almost a repetitive necessity in Data Engineers, Data Scientists, and Business Analysts day to day tasks, and this task can be done using a programming language such as Python, or by SQL depends on your application and on the task required. In this tutorial, we will discover together how […]
Date and Time are part of almost any dataset data scientist, data engineer, or data analyst will work on, so to know how to handle this kind of data is a crucial skill which will save you a lot of time and effort. In this tutorial, we will discuss various methods in handling dates and […]
Data preparation and data discovery consume a great amount of time in any data science or data analytics job, one of the solutions is to write a template script that you can use in this phase of your job, but what about adding interactive controls and dynamic controls into your scripting wouldn’t that be great?, […]
Apache Spark is a powerful processing platform for big data applications that supports different big data processing types. In this article we will discover together how Apache Spark application can be executed in multiple modes, depending on the environment architecture and on the application requirements. Before going into details, if you would like to setup […]