Introduction to Apache Flink | Apache Flink

DataValley Team
May 10, 2020
9:49 pm
No Comments

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

Apache Flink is powerful open source engine which provides:

Batch Processing
Interactive Processing
Real-time (Streaming) Processing
Graph Processing
Iterative Processing
In-memory Processing

It can handle all these types of requirements So; I have collected what all are the different requirements in the industry and flink actually can address all of them earlier. We need use multiple frameworks like for badge use MapReduce and for streaming use storm but that was very complex here in the flink just single unified platform to address all the types of requirements that to with lightning, fast, speed easy of used sophisticated analytics.

Advantage of Flink:

Flink is true streaming engine. It doesn’t actually cut the stream into micro batches like a spark, it processes the data as soon as it arrives
Flink’s core is a streaming data flow engine that provides distribution, communication and fault tolerance for distributed computations
Flink is General Purpose Framework which targets to unify different data loads. not need of different specialized engine, use a single unified platform called Apache Flink for all your requirements.
process events at a consistently high rate with latency as low as milliseconds
Flink is an open Source Platform for distributed stream and batch processing
Large scale data processing

Ecosystem:

From the following graph we can see all layers of Flink core layers and other available layers on top of Flink core layer

Run time (Kernel):

At the core at the heart we have run time. Run time is the core of Apache Flink. It’s also known as kernel of flink, it’s distributed streaming data flow. there is at the core, it is a steaming data engine.

APIs and Libraries:

On the top of the Run time layer, we have several APIs and Libraries available, so in the broad category we have like:

Dataset APIs that is for batch processing
- ML for machine learning
- Gelly for graph processin
- Table for SQL processing.
Data streaming API for stream processing
- Table

Now if you observe this particular ecosystem this is just a processing engine there is no storage layer, so flink is dependent on third party storage system

Deploy:

Flink can be deploy one of this mode.

local mode is used for development and testing purpose, deploy on
- Local machine (single JVM)
cluster Mode, deploy on
- Standalone
- Yarn (usually used)
- Mesos
- Tez
Cloud Mod, deploy on
- Google Compute Engine (GCE)
- Amazon EC2

Storage:

Flink can read data from various storage system like.

Local file system
- Local file system
- HDFS
- S3
Database
- MongoDB
- HBase
- Even from relational database
Streams
- RabbitMQ
- Kafka
- Flume

Introduction to Apache Flink | Apache Flink

Advantage of Flink:

Ecosystem:

Run time (Kernel):

APIs and Libraries:

Deploy:

Storage:

Leave a Reply Cancel reply

Unlimited access to educational materials for subscribers

Resources

Information

Social Media

We Accept