问题描述:

I'm migrating from Spark 1.6 to 2.1.0 and I've run into a problem migrating my PySpark application.

I'm dynamically setting up my SparkConf object based on configurations in a file and when I was on Spark 1.6, the app would run with the correct configs. But now, when I open the Spark UI, I can see that NONE of those configs are loaded into the SparkContext.

Here's my code:

spark_conf = SparkConf().setAll(

filter(lambda x: x[0].startswith('spark.'), conf_dict.items())

)

sc = SparkContext(conf=spark_conf)

I've also added a print before initializing the SparkContext to make sure the SparkConf has all the relevant configs:

[print("{0}: {1}".format(key, value)) for (key, value) in spark_conf.getAll()]

And this outputs all the configs I need:

  • spark.app.name: MyApp
  • spark.akka.threads: 4
  • spark.driver.memory: 2G
  • spark.streaming.receiver.maxRate: 25
  • spark.streaming.backpressure.enabled: true
  • spark.executor.logs.rolling.maxRetainedFiles: 7
  • spark.executor.memory: 3G
  • spark.cores.max: 24
  • spark.executor.cores: 4
  • spark.streaming.blockInterval: 350ms
  • spark.memory.storageFraction: 0.2
  • spark.memory.useLegacyMode: false
  • spark.memory.fraction: 0.8
  • spark.executor.logs.rolling.time.interval: daily

I submit my job with the following:

/usr/local/spark/bin/spark-submit --conf spark.driver.host=i-${HOSTNAME} --master spark://i-${HOSTNAME}:7077 /path/to/main/file.py /path/to/config/file

Does anybody know why my SparkContext doesn't get initialized with my SparkConf?

Thanks :)

相关阅读:
Top