What is Apache Airflow? Apache Airflow is a platform that will help you programmatically to design, schedule, and monitor big data pipelines, with a rich number of tasks you can execute and link together you can almost design any pipeline you have no matter how it is complicated In this article, we discover what are […]
Ahmed Ibrahem
Your Guide to NoSQL Databases | Data Engineering
One of the major reasons that the era of big data started was the increase in the number of data source and variety of data types that each organization has nowadays, almost any organization has different types of data not only structured data but also it can have unstructured or semi-structured data, and each type […]
Quick Reference to six D’s of the data field
For any professional or beginner in the data field, regardless of your specialty or technology you will work on, you will hear about one or more of the following concepts, and we can say it is absolutely important for any data professional to know at least the general concept of any of the following concepts. […]
Azure Data Factory – Modern ETL On Cloud – Data Migration Use Case | Azure Data Factory
Introduction ETL is one of the major tasks for any data engineer, and we have many solutions either on-premise or cloud solutions available in the market to implement this concept, in Microsoft Azure, Azure Data Factory is the ETL solution to implement data pipelines using data from the cloud source or data from on-premise sources, […]
Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform
In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this […]
Data Engineering Detailed Roadmap | Data Engineering
Data Engineering become a critical part in the past few years in almost any organization that use data heavily in their system, and I am sure you heard a lot about the comparison between data engineers and data scientist and which is better but actually, there is no role is better than another role, each […]
Build Data Analysis and Data Discovery Web Application for Data Science projects in few minutes | Data Science | Data Analytics
Data preparation and data discovery consume a great amount of time in any data science or data analytics job, one of the solutions is to write a template script that you can use in this phase of your job, but what about adding interactive controls and dynamic controls into your scripting wouldn’t that be great?, […]
How to choose your ETL solution | Data Integration
ETL stands for Extraction Transform Load is a common concept in data engineering, and as we can imply from the name of the concept that this concept has three types of operations, Extract which indicate the process of extracting data from the source system of information, Transform to represent the process of manipulating the data […]
Azure Storage Account | Microsoft Azure
Storage Account A storage account is a container that groups a set of Azure Storage services together. Only data services from Azure Storage can be included in a storage account (Azure Blobs, Azure Files, Azure Queues, and Azure Tables) Storage Account is an Azure resource, so it can be grouped under a Resource Group. Under […]
ER vs Dimensional Modeling simplified under 10 Minutes
In this video we will go through the main differences between ER modeling and Dimensional modeling by using simple and straight forward examples, and we will understand the importance of dimensional modeling in Data Warehouse design
Dimension Keys – Part 1 – Natural Keys | Data Warehouse
Dimensions tables are core part of any Data Warehouse modeling. In general dimension tables store details side of any event or business process, for example, for a purchase operation from a retail store we will have dimension tables to store customer information, product information, store information, and so on, on the other hand, Fact tables […]
Functions in Scala – Part 1 | Scala
Scala is a a multi-paradigm language that supports both functional and object-oriented programming with a growing community and many useful features Scala worth learning, and it has been adopted by big enterprises such as Linkedin , Twitter, and many others. As Functional programming is one of the main strength points in Scala and and understanding […]
Migrate Files from local files system to Amazon S3 with Python Application | AWS S3 | Python
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. S3 storage well fit in different use cases, such as websites, mobile applications, backup and restore, archiv2e, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize […]
Create Scala Project on Intellij with Scala Worksheets | Scala
Scala is a a multi-paradigm language that supports both functional and object-oriented programming with a growing community and many useful features Scala worth learning, and it has been adopted by big enterprises such as Linkedin , Twitter, and many others. When you start experimenting Scala you can use Scala interactive REPL (Read Evaluate Print Loop) […]
Setup Apache Spark environment on Windows | Apache Spark
Apache Spark is easy to use, unified platform for all purposes of big data processing, and equipped with rich set of APIs for different application needs as Spark DataFrame and Spark SQL for structured data processing, Spark Streaming and Structured Streaming for streaming applications, Spark MLib for machine learning applications, Spark Graphx for Graph analytics […]