Building a data pipeline using Dataflow | GCP Dataflow

Data uncover deep insights, support informed decisions, and enhances efficient processes. But when data coming from various sources, in varying formats, and stored across different infrastructures, so here are data pipelines are coming as the first step to centralizing data for reliable business intelligence, operational insights, and analytics. By contrast, the data pipeline is a […]

Apache Kafka and Apache Spark Integration | Apache Kafka | Apache Spark

Introduction Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data like a messaging system. We can start writing Kafka applications using Java fairly easily, check our previous article on how to design a Kafka pipeline in Java. If you research the variety of real-world use-cases for Kafka, you will very […]

Create a Kafka Pipeline using Java Application | Apache Kafka

Introduction This Article is about Programming Apache Kafka producer and consumer using Java language, as we’ll see, using Java we’ll be able to reproduce what the CLI does and even more. Prerequisites Kafka Installation and configuration article ( To setup cluster will be used in this article) Any java programming editor Ex. (Netbeans – IntelliJ […]

Setup Apache Flink Environment Standalone on Windows | Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams, for introduction about Apache Flink components please check our previous article In this article we will learn together how to setup and run Apache Flink in Standalone mode. Run Apache Flink Standalone Flink has been designed to […]

Introduction to Apache Flink | Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Apache Flink is powerful open source engine which provides: Batch Processing Interactive Processing Real-time (Streaming) Processing Graph […]

Setup Apache Kafka Environment | Apache Kafka

Introduction This article is about configuring and starting an Apache Kafka server on a Windows OS and Linux. This guide will also provide instructions to set up Java and Apache Zookeeper, and after the setup we will create a simple pipeline to test our installation. Kafka on windows Make sure you have the following prerequisites […]

Apache Kafka Components

What Is Apache Kafka? Apache Kafka is an open source project, initially created by LinkedIn, that is designed to be a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design, which we will investigate in more detail in this Article. Kafka was designed with a […]