This tutorial covers deploying and managing Apache Spark applications in production environments.
Simplest deployment mode.
Start the master:
./sbin/start-master.sh
Start the workers:
./sbin/start-worker.sh spark://<master-url>
Runs on Hadoop YARN.
Configure yarn-site.xml
.
Set HADOOP_CONF_DIR
.
Submit the application:
spark-submit --class <main-class> \
--master yarn
--deploy-mode cluster
Runs on Kubernetes.
Create a Kubernetes cluster.
Configure Spark to use Kubernetes.
Submit the application:
spark-submit --class <main-class> \
--master k8s://
--deploy-mode cluster
Runs on Apache Mesos.
Configure Mesos.
Submit the application:
spark-submit --class <main-class> \
--master mesos://
--deploy-mode cluster
Enabled by default.
spark.executor.cores
spark.executor.memory
Allocates resources fairly across applications.
Dynamically adjusts the number of executors based on workload.
Access at http://<driver-node>:4040
.
Configured in spark-defaults.conf
.
Enable authentication to secure the cluster.
Control access to resources based on user roles.
Configured to handle failover.