Dimensional Modeling Dimensional modeling is one of the data modeling techniques used for designing the data warehouses, It also considered a suitable technique for representing analytic data, because it understandably delivers data for users and is optimized for query performance which increases the data retrieval speed. Normalized databases are very useful in transactional processing because […]
ETL vs ELT | Differences and Use Cases
1. What is ETL? ETL stands for Extract, Transform, and Load. ETL process starts by extract data from one or multiple sources, then, Transform this data to match the data warehouse schema, and finally load the transformed data to the data warehouse. ETL system should enforce data quality, consistency standards, and ensure that separated data […]
DNA Sequencing with Machine Learning
Introduction What if, a small sample of each baby’s saliva was sent out to a lab, where—for just a few dollars—the baby’s DNA was analyzed and a multitude of “risk scores” returned? These would not be diagnoses but instead, prognostication: This baby is at elevated risk for developing heart disease in 40 years, Is more […]
Denormalization when, why, and how !?
What is de-normalization? De-normalization is an optimization technique to make our database respond faster to queries by reducing the number of joins needed to satisfy user needs. In de-normalization, we mainly aim to reduce the number of tables that are needed by re-joining these tables together and add redundant data. De-normalization is commonly used with […]
Introduction to Apache Airflow – Powerful and Dynamic Orchestrator
What is Apache Airflow? Apache Airflow is a platform that will help you programmatically to design, schedule, and monitor big data pipelines, with a rich number of tasks you can execute and link together you can almost design any pipeline you have no matter how it is complicated In this article, we discover what are […]
Normalization in Depth
Designing and understanding a data model is all about understanding the concepts and the options you have in your use case and what is the best use case for each design option you have, in this article we will go through the normalization types and understand how to implement each option and pros and cons […]
COVID-19 Data Analysis with Python
Our Use Case and Objective In this article, we demonstrate the data discovery process on a COVID-19 dataset, data discovery process is a necessary milestone in any data science project. We will cover the following topics What is Data Science In A Big Data World? Why Become Data Scientist? What are the most Frequently mentioned […]
Ensemble Learning | Machine Learning | Data Science
English Tutorial Arabic Tutorial Transscript In some complex problems in data science, we find that the performance of our machine learning algorithm is very poor even after spending some time understanding the problem and performing some feature engineering. At that point, we realize that combining several machine learning algorithms may come to the rescue! Ensemble […]
Data Lake Concept and Solutions on GCP using Cloud Storage | GCP Cloud Storage.
Introduction to Data Lakes Let’s start with a discussion about what data lakes are, and then where they fit in as a critical component to your overall data engineering ecosystem. So what is a data lake? Well, it’s a fairly broad term, but it generally describes a place where you can securely store various types […]
NoSQL Database Services | Cloud Datastore, Cloud Firestore, and Cloud Bigtable
Introduction The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this “one size fits all” stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the “Big Data” and “NoSQL” revolutions, as well as forcing fundamental changes in databases across the board. […]