问题描述:

This question already has an answer here:

  • Converting CSV to ORC with Spark

    1 answer

网友答案:

Persisting ORC formats in persistent storage area (like HDFS) is only available with the HiveContext.

As an alternate (workaround) you can register it as temporary table. Something like this: -

DataFrame.write.mode("overwrite").orc("myDF.orc")
val orcDF = sqlCtx.read.orc("myDF.orc")
orcDF.registerTempTable("<Table Name>")
网友答案:

As for now, saving as orc can only be done with HiveContext.

so the approach will be like this :

import sqlContext.implicits._ 
val data: RDD[MyObject] = createMyData()
val sqlContext = new New Org.Apache.Spark.Sql.Hive.HiveContext(Sc)   
data.toDF.write.format("orc").save(outputPath)
相关阅读:
Top