问题描述:

I need to use SparkContext instead of JavaSparkContext for the accumulableCollection (if you don't agree check out the linked question and answer it please!)

Clarified Question: SparkContext is available in Java but wants a Scala sequence. How do I make it happy -- in Java?

I have this code to do a simple jsc.parallelize I was using with JavaSparkContext, but SparkContext wants a Scala collection. I thought here I was building a Scala Range and converting it to a Java list, not sure how to get that core Range to be a Scala Seq, which is what the parallelize from SparkContext is asking for.

 // The JavaSparkContext way, was trying to get around MAXINT limit, not the issue here

// setup bogus Lists of size M and N for parallelize

//List<Integer> rangeM = rangeClosed(startM, endM).boxed().collect(Collectors.toList());

//List<Integer> rangeN = rangeClosed(startN, endN).boxed().collect(Collectors.toList());

The money line is next, how can I create a Scala Seq in Java to give to parallelize?

 // these lists above need to be scala objects now that we switched to SparkContext

scala.collection.Seq<Integer> rangeMscala = scala.collection.immutable.List(startM to endM);

// setup sparkConf and create SparkContext

... SparkConf setup

SparkContext jsc = new SparkContext(sparkConf);

RDD<Integer> dataSetMscala = jsc.parallelize(rangeMscala);

网友答案:

You should use it this way:

scala.collection.immutable.Range rangeMscala = 
  scala.collection.immutable.Range$.MODULE$.apply(1, 10);

SparkContext sc = new SparkContext();

RDD dataSetMscala = 
  sc.parallelize(rangeMscala, 3, scala.reflect.ClassTag$.MODULE$.Object());

Hope it helps! Regards

相关阅读:
Top