Apache Spark Application Execution Mode | Apache Spark

DataValley Team
April 5, 2020
8:38 pm

Apache Spark is a powerful processing platform for big data applications that supports different big data processing types. In this article we will discover together how Apache Spark application can be executed in multiple modes, depending on the environment architecture and on the application requirements.

Before going into details, if you would like to setup Apache Spark environment on your Windows machine, check our previous article

Setup Apache Spark environment on Windows | Apache Spark

Cluster Mode

In Cluster mode user submit packaged application file to the cluster, then the cluster manager launches the driver process then launch executor processes on cluster worker nodes, cluster manager maintains the application from start to finish, then when the application is completed successfully or with some failures cluster manager store application status, and we can check our application status from cluster monitoring interface, following is a diagram to describe steps and communications between different parties in this mode.

Application execution will go as follows:

1- User packages spark application and submit it to cluster manager using spark-submit

2- Cluster manager launches driver process on one of the worker nodes

3- Driver process starts executing application code, as we know SparkSession is the entry point of spark application to the cluster, the driver will start SparkSession

4- SparkSession communicates with the cluster manager to allocate required resources and launch the number of executors as requested by user submission

5- Cluster manager launches executor processes as requested and send the location of all executors (worker nodes connection details) to the driver process

6- Driver process starts communicating with executor’s processes and application start moving data around and physically execute our submitted application

Note: In this mode, client status will not affect running applications as it is fully managed by the cluster manager and all processes are running on the cluster.

Client Mode

In client mode, the user submits packaged application file, driver process started locally on the machine from which the application is submitted, driver process starts with initiating SparkSession which communicates with the cluster manager to allocate required resources, following is a diagram to describe steps and communications between different parties in this mode:

1- User packages spark application and submit it in client mode using spark-submit

2- Driver process starts on the local machine from which the application submitted

3- Driver process starts executing application code, SparkSession is the entry point of spark application to the cluster, the driver will start SparkSession

4- SparkSession communicates with the cluster manager to allocate required resources and launch the number of executors as requested by user submission

5- Cluster manager launches executor processes as requested and send the location of all executors (Worker nodes connection details) to the driver process

6- Driver process starts communicating with executor’s processes on worker nodes and application start moving data around and physically executing our submitted application

Note: In this mode, the client will be fully responsible for managing applications from start to finish, one of the important performance factors for this mode is the network connectivity stability and speed between client mode and cluster nodes.

Local Mode

This mode has the same general behavior as the previous modes except that all processes are running as local threads on the local machine, and spark clusters will be entirely co-locating in the same machine.

Submit Command

The following is the command syntax to submit your spark application in different modes

./bin/spark-submit --class <main-class> --master <master-url> --deploy-mode <deploy-mode> --conf <key>=<value> <application-jar> [application-arguments]

Property	Description
main-class	(Required). Name of the main class in your application
master-url	(Required). URL of the cluster on which your application will execute on, this cluster could be Yarn, Mesos, Kubernetes, Spark Standalone, or local spark://Host:Port mesos://Host:Port
deploy-mode	(Required). Cluster, Client, Local
conf <key>=<value>	(Required). extra configurations such as a number of executors, executors memory,…etc.
application-jar	(Required).Your application JAR file
application-arguments	(Optional). Any arguments needed as input to your application

For a full description of command attributes and other extra options, please check Apache Spark documentation

http://spark.apache.org/docs/latest/submitting-applications.html

Apache Spark Application Execution Mode | Apache Spark

Cluster Mode

Client Mode

Local Mode

Submit Command

Unlimited access to educational materials for subscribers

Resources

Information

Social Media

We Accept