问题描述:

I'm working with 2 rather large rdds: a list of latitudes lat and a list of longitudes long. I'm trying to use latlong = lat.zip(long) to mash them together, but when I try to look at it, I get the ValueError:

 File "<stdin>", line 1, in <lambda>

ValueError: could not convert string to float:

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)

at org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:129)

at org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:125)

at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)

at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)

at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)

at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)

at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)

In the end I tried other simple commands too, like lat.count() returns the same error.

What does this mean? I'm not converting any strings into floats?

EDIT: Getting Lat Long rdds:

I have a taxi dataset with pickup latitude and longitude, and a ZipCode dataset. I have an equation that I'm trying to use to find the nearest zip code.

pickuplat = taxirdd.map(lambda p: float(p[2])) #taxi pickup lat

zclat = zc.map(lambda p: (p[3]) ) #zip code center lat

alllat = pickuplat.cartesian(zclat) #compare taxi_lat with each zip_lat

lat = alllat.map(lambda x: x[0], x[1], x[0] - x[1]) #lat is actually the DIFFERENCE between 2 latitudes, my bad!!

pickuplong = taxirdd.map(lambda p: float(p[3]) ) #same with long

zclong = zc.map(lambda p: (p[4]) )

alllong = pickuplong.cartesian(zclong)

long = alllong.map(lambda x: x[0], x[1], x[0] - x[1]) #get pickuplong, zclong, and the diff.

Now I want to zip lat and long together because I need to use both of them together in my next equation, and I can't use 2 RDDs in a function.

*After some more testing, I am able to use all these functions on the taxirdd before I convert the values to floats, but then trying pickuplat.count() yields that the ValueError. I tried converting the results back into unicode like it was in the original taxirdd and tried again, but I still get the same error.

相关阅读:
Top