• Courses
  • Knowledge Hub
  • Cheat Sheets
  • Market Place
  • Plans and Pricing
  • Contact Us
  • Become an Instructor
DataValley
Category
Data Engineering
Data Modeling
Data Science
Machine Learning
Data Visualization
{{ search }}
Log in Sign Up

Login/Sign Up

Courses Favorites 0

Search

Category
Data Engineering
Data Modeling
Data Science
Machine Learning
Data Visualization
{{ search }}

Menu

  • Courses
  • Knowledge Hub
  • Cheat Sheets
  • Market Place
  • Plans and Pricing
  • Contact Us
  • Become an Instructor

Bigquery

Building a data warehouse solution using BigQuery | GCP BigQuery

September 20, 2020aliaa.amr Big Query, Data Engineering, Data Warehouse, Google Cloud Platform

An enterprise data warehouse brings the data together and makes it available for querying and data processing, it should consolidate data from many sources. All data in a data warehouse should be available for querying and it’s important to ensure that those queries are quick. Another reason to consolidate all of your data besides standardizing […]

Introduction to Impala .. Architecture and Components | Impala

September 10, 2020mtarek Big Data, Data Engineering, Databases

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar […]

Aggregation Queries in Apache Hive | Apache Hive

August 13, 2020mtarek Apache Hive, Data Engineering

Introduction Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HiveQL offers several built-in aggregate functions, such as max, min, avg,..etc. It also supports advanced aggregation using keywords such as Variance and Standard Deviation and different types of window functions. […]

Introduction to Hive | Apache Hive

July 5, 2020mtarek Apache Hive, Big Data

Hive was initially developed by Facebook in 2007 to help the company handle massive amounts of new data. At the time Hive was created, Facebook had a 15TB dataset they needed to work with. A few short years later, that data had grown to 700TB. Their RDBMS data warehouse was taking too long to process […]

Create a Kafka Pipeline using Java Application | Apache Kafka

June 6, 2020mtarek Apache Kafka, Big Data, Data Engineering, Streaming

Introduction This Article is about Programming Apache Kafka producer and consumer using Java language, as we’ll see, using Java we’ll be able to reproduce what the CLI does and even more. Prerequisites Kafka Installation and configuration article ( To setup cluster will be used in this article) Any java programming editor Ex. (Netbeans – IntelliJ […]

Setup Apache Kafka Environment | Apache Kafka

May 3, 2020mtarek Apache Kafka, Big Data, Data Engineering, Streaming

Introduction This article is about configuring and starting an Apache Kafka server on a Windows OS and Linux. This guide will also provide instructions to set up Java and Apache Zookeeper, and after the setup we will create a simple pipeline to test our installation. Kafka on windows Make sure you have the following prerequisites […]

Setup Apache Spark environment on Windows | Apache Spark

April 6, 2020Ahmed Ibrahem Apache Spark, Big Data, Data Engineering

Apache Spark is easy to use, unified platform for all purposes of big data processing, and equipped with rich set of APIs for different application needs as Spark DataFrame and Spark SQL for structured data processing, Spark Streaming and Structured Streaming for streaming applications, Spark MLib for machine learning applications, Spark Graphx for Graph analytics […]

Apache Spark Application Execution Mode | Apache Spark

April 5, 2020Ahmed Ibrahem Apache Spark, Big Data, Data Analytics, Data Engineering

Apache Spark is a powerful processing platform for big data applications that supports different big data processing types. In this article we will discover together how Apache Spark application can be executed in multiple modes, depends on the environment architecture and on the application requirements. Before going into details, if you would like to setup […]

Learn

Courses
Cheat Sheets
Market Place
Plans and Pricing

About

DataValley is the e-learning platform for everything data science. From beginners to gurus, data geeks of all levels can find something at DataValley to help them enhance their skills.

Contact

DataValley Technologies.

wecare@datavalley.technology

Copyright © 2021 DataValley Technologies.
Search