Apache Spark Application Execution Mode | Apache Spark

Apache Spark is a powerful processing platform for big data applications that supports different big data processing types. In this article we will discover together how Apache Spark application can be executed in multiple modes, depending on the environment architecture and on the application requirements.

Before going into details, if you would like to setup Apache Spark environment on your Windows machine, check our previous article

Cluster Mode

In Cluster mode user submit packaged application file to the cluster, then the cluster manager launches the driver process then launch executor processes on cluster worker nodes, cluster manager maintains the application from start to finish, then when the application is completed successfully or with some failures cluster manager store application status, and we can check our application status from cluster monitoring interface, following is a diagram to describe steps and communications between different parties in this mode.

Application execution will go as follows:

1- User packages spark application and submit it to cluster manager using spark-submit

2- Cluster manager launches driver process on one of the worker nodes

3- Driver process starts executing application code, as we know SparkSession is the entry point of spark application to the cluster, the driver will start SparkSession

4- SparkSession communicates with the cluster manager to allocate required resources and launch the number of executors as requested by user submission

5- Cluster manager launches executor processes as requested and send the location of all executors (worker nodes connection details) to the driver process

6- Driver process starts communicating with executor’s processes and application start moving data around and physically execute our submitted application

Note: In this mode, client status will not affect running applications as it is fully managed by the cluster manager and all processes are running on the cluster.

Client Mode

In client mode, the user submits packaged application file, driver process started locally on the machine from which the application is submitted, driver process starts with initiating SparkSession which communicates with the cluster manager to allocate required resources, following is a diagram to describe steps and communications between different parties in this mode:

1- User packages spark application and submit it in client mode using spark-submit

2- Driver process starts on the local machine from which the application submitted

3- Driver process starts executing application code, SparkSession is the entry point of spark application to the cluster, the driver will start SparkSession

4- SparkSession communicates with the cluster manager to allocate required resources and launch the number of executors as requested by user submission

5- Cluster manager launches executor processes as requested and send the location of all executors (Worker nodes connection details) to the driver process

6- Driver process starts communicating with executor’s processes on worker nodes and application start moving data around and physically executing our submitted application

Note: In this mode, the client will be fully responsible for managing applications from start to finish, one of the important performance factors for this mode is the network connectivity stability and speed between client mode and cluster nodes.

Local Mode

This mode has the same general behavior as the previous modes except that all processes are running as local threads on the local machine, and spark clusters will be entirely co-locating in the same machine.

Submit Command

The following is the command syntax to submit your spark application in different modes

./bin/spark-submit --class <main-class> --master <master-url> --deploy-mode <deploy-mode> --conf <key>=<value> <application-jar> [application-arguments]
PropertyDescription
main-class(Required). Name of the main class in your application
master-url(Required). URL of the cluster on which your application will execute on, this cluster could be Yarn, Mesos, Kubernetes, Spark Standalone, or local
spark://Host:Port
mesos://Host:Port
deploy-mode(Required). Cluster, Client, Local
conf <key>=<value>(Required). extra configurations such as a number of executors, executors memory,…etc.
application-jar(Required).Your application JAR file
application-arguments (Optional). Any arguments needed as input to your application

For a full description of command attributes and other extra options, please check Apache Spark documentation

http://spark.apache.org/docs/latest/submitting-applications.html

Facebook
Twitter

Unlimited access to educational materials for subscribers

Ask ChatGPT
Set ChatGPT API key
Find your Secret API key in your ChatGPT User settings and paste it here to connect ChatGPT with your Tutor LMS website.
Hi, Welcome back!
Forgot?
Don't have an account?  Register Now