I have made a Spark Standalone Cluster with two virtual machines.
In the 1st VM (8 cores, 64 GB Memory), I started the master manually using the command
In the 2nd VM (8 cores, 64 GB Memory), I started the slave manually using
bin/spark-class org.apache.spark.deploy.worker.Worker spark://<hostname of master>:7077.
Then in the 1st VM, I also started the slave using the above slave command. It can be seen in the below pic that both the workers & master are started & ALIVE.
But when I run my Spark applications only the worker in 2nd VM is run (
worker-20160613102937-10.0.37.150-47668 ). The worker of 1st VM (
worker-20160613103042-10.0.37.142-52601 ) doesn't run. See the below pic
I want both the workers should be used in my Spark applications. How can this be done?
EDIT : See this pic of Executor summary where the Executors corresponding to worker in VM 1st are failed.
When I click on any
stderr, it shows the error of
invalid log directory. See the below pic
The error is resolved. Spark was not able to create the log directory on the 1st VM. The user from which I was submitting the Spark job didn't have the permission to create a file on the path
/usr/local/spark. Just changing the read/write permissions of the directory (
chmod -R 777 /usr/local/spark) did the trick.