Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform

In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this […]

Data Engineering Detailed Roadmap | Data Engineering

Data Engineering become a critical part in the past few years in almost any organization that use data heavily in their system, and I am sure you heard a lot about the comparison between data engineers and data scientist and which is better but actually, there is no role is better than another role, each […]

Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis

Extracting information from string columns is almost a repetitive necessity in Data Engineers, Data Scientists, and Business Analysts day to day tasks, and this task can be done using a programming language such as Python, or by SQL depends on your application and on the task required. In this tutorial, we will discover together how […]

Handling Dates and Time in Pandas

Date and Time are part of almost any dataset data scientist, data engineer, or data analyst will work on, so knowing how to handle this kind of data is a crucial skill that will save you a lot of time and effort. In this tutorial, we will discuss various methods of handling dates and times […]

So Which Machine Learning Algorithm to use?

A lot of data science practitioners found the process of selecting a machine learning algorithm overwhelming and confusing. That’s because there are a bunch of algorithms that can do the same task. For example, classification can be done using a Decision Tree, SVM, Logistic Regression, Naive Bayes, KNN, and Neural Network.  Now, which one should […]

Apache Hive Table Types | Apache Hive

Apache Hive is designed to give data engineers and data scientists a SQL like access to the big data available in the Hadoop cluster, so we can think of it as a normal RDBMS, in normal RDBMS we have a database, and tables, in Hive we have the same except in Hive we have two […]

Data Science Roadmap .. Concepts, Tools, and Technologies

In this article, we will depict some skills and concepts that must be learned in the journey of becoming a data scientist but first, what is data science?  Data Science is the art of uncovering the insights and trends in data. It has been around since ancient times. The ancient Egyptians used census data to […]

Introduction to Hive | Apache Hive

Hive was initially developed by Facebook in 2007 to help the company handle massive amounts of new data. At the time Hive was created, Facebook had a 15TB dataset they needed to work with. A few short years later, that data had grown to 700TB. Their RDBMS data warehouse was taking too long to process […]

Implement SCD Types 2 on Talend Open Studio

Introduction In this article, we will explore together how to use Talend data integration capabilities to implement one of the most important use cases in Data Warehouse implementation which is Slowly Changing Dimensions (SCD) tables. Before moving on and following the next steps, make sure you have and running Talend solution, you can check our […]

Setup Talend Open Studio on Linux

Introduction Talend is an open-source data integration platform. It provides different solutions and services for data integration, data quality, cloud storage, and Big Data. According to the latest Gartner report, Talend is named in the leader’s quadrant among other data integration solutions. In this article, we will show you step-by-step how to install and configure Talend […]