• Courses
  • Knowledge Hub
  • Cheat Sheets
  • Market Place
  • Plans and Pricing
  • Contact Us
  • Become an Instructor
DataValley
Category
Cloud Computing
Data Engineering
Data Modeling
ETL (Data Integration)
Data Science
Python
Webinars & Events
{{ search }}
Log in Sign Up

Login/Sign Up

Courses Favorites 0

Search

Category
Cloud Computing
Data Engineering
Data Modeling
ETL (Data Integration)
Data Science
Python
Webinars & Events
{{ search }}

Menu

  • Courses
  • Knowledge Hub
  • Cheat Sheets
  • Market Place
  • Plans and Pricing
  • Contact Us
  • Become an Instructor

Dimensional Modeling |Part 1: Introduction and Fact Types

April 12, 2021Seifalden Hany Data Engineering, Data Modeling

Dimensional Modeling Dimensional modeling is one of the data modeling techniques used for designing the data warehouses, It also considered a suitable technique for representing analytic data, because it understandably delivers data for users and is optimized for query performance which increases the data retrieval speed. Normalized databases are very useful in transactional processing because […]

ETL vs ELT | Differences and Use Cases

April 5, 2021Seifalden Hany Data Engineering, Data Integration

1. What is ETL? ETL stands for Extract, Transform, and Load. ETL process starts by extract data from one or multiple sources, then, Transform this data to match the data warehouse schema, and finally load the transformed data to the data warehouse. ETL system should enforce data quality, consistency standards, and ensure that separated data […]

Denormalization when, why, and how !?

March 25, 2021Seifalden Hany Data Engineering, Data Modeling

What is de-normalization? De-normalization is an optimization technique to make our database respond faster to queries by reducing the number of joins needed to satisfy user needs. In de-normalization, we mainly aim to reduce the number of tables that are needed by re-joining these tables together and add redundant data. De-normalization is commonly used with […]

Normalization in Depth

March 17, 2021Seifalden Hany Data Engineering, Data Modeling

Designing and understanding a data model is all about understanding the concepts and the options you have in your use case and what is the best use case for each design option you have, in this article we will go through the normalization types and understand how to implement each option and pros and cons […]

Building a data pipeline using Dataflow | GCP Dataflow

September 11, 2020aliaa.amr Cloud Computing, Data pipeline, Dataflow, ETL, Google Cloud Platform, Streaming

Data uncover deep insights, support informed decisions, and enhances efficient processes. But when data coming from various sources, in varying formats, and stored across different infrastructures, so here are data pipelines are coming as the first step to centralizing data for reliable business intelligence, operational insights, and analytics. By contrast, the data pipeline is a […]

Introduction to Impala .. Architecture and Components | Impala

September 10, 2020mtarek Big Data, Data Engineering, Databases

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar […]

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse

August 30, 2020radwa.ali Data Engineering, Data Modeling, Data Warehouse

Data warehouses has been around since the 80s. Throughout these years, it has proven its capabilities to support decision making and business analysis. Data warehouses allow Integrating many source systems such as databases, spreadsheets, and flat files. Cleansing and Transformation can be applied to these data after integration then organizes it in a way that […]

Getting Started with Containers & Dockers | Dockers

August 17, 2020mahmoud.feteha Containerization, Data Engineering, Docker

Introduction Containerization revolutionized the software development and it becomes a common building block in today’s architecture, applications, big data environments, and data engineering applications can be deployed and developed inside containers In this article, we will know more containers and its advantage, and we will discuss Dockers which is a container image that packages all […]

Aggregation Queries in Apache Hive | Apache Hive

August 13, 2020mtarek Apache Hive, Data Engineering

Introduction Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HiveQL offers several built-in aggregate functions, such as max, min, avg,..etc. It also supports advanced aggregation using keywords such as Variance and Standard Deviation and different types of window functions. […]

Quick Reference to six D’s of the data field

August 10, 2020Ahmed Ibrahem Concepts and Technologies

For any professional or beginner in the data field, regardless of your specialty or technology you will work on, you will hear about one or more of the following concepts, and we can say it is absolutely important for any data professional to know at least the general concept of any of the following concepts. […]

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform

July 25, 2020Ahmed Ibrahem Apache Spark, Big Data, Data Analytics, Data Engineering

In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this […]

Data Engineering Detailed Roadmap | Data Engineering

July 23, 2020Ahmed Ibrahem Data Engineering

Data Engineering become a critical part in the past few years in almost any organization that use data heavily in their system, and I am sure you heard a lot about the comparison between data engineers and data scientist and which is better but actually, there is no role is better than another role, each […]

Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis

July 22, 2020Omar Mohamed Data Analytics, Data Engineering, Data Science, SQL

Extracting information from string columns is almost a repetitive necessity in Data Engineers, Data Scientists, and Business Analysts day to day tasks, and this task can be done using a programming language such as Python, or by SQL depends on your application and on the task required. In this tutorial, we will discover together how […]

Apache Hive Table Types | Apache Hive

July 10, 2020mtarek Apache Hive, Big Data, Data Engineering

Apache Hive is designed to give data engineers and data scientists a SQL like access to the big data available in the Hadoop cluster, so we can think of it as a normal RDBMS, in normal RDBMS we have a database, and tables, in Hive we have the same except in Hive we have two […]

Introduction to Hive | Apache Hive

July 5, 2020mtarek Apache Hive, Big Data

Hive was initially developed by Facebook in 2007 to help the company handle massive amounts of new data. At the time Hive was created, Facebook had a 15TB dataset they needed to work with. A few short years later, that data had grown to 700TB. Their RDBMS data warehouse was taking too long to process […]

  • 1
  • 2
  • Next

Learn

Courses
Cheat Sheets
Market Place
Plans and Pricing

About

DataValley is the e-learning platform for everything data science. From beginners to gurus, data geeks of all levels can find something at DataValley to help them enhance their skills.

Contact

DataValley Technologies.

[email protected]

Copyright © 2021 DataValley Technologies.
Search