问题描述:

I want to fill nan values in spark conditionally (to make sure I considered each corner case of my data and not by simply filling anything with a replacement value).

A sample could look like

case class FooBar(foo:String, bar:String)

val myDf = Seq(("a","first"),("b","second"),("c",null), ("third","fooBar"), ("someMore","null"))

.toDF("foo","bar")

.as[FooBar]

+--------+------+

| foo| bar|

+--------+------+

| a| first|

| b|second|

| c| null|

| third|fooBar|

|someMore| null|

+--------+------+

Unfortunately

 myDf

.withColumn(

"bar",

when(

(($"foo" === "c") and ($"bar" isNull)) , "someReplacement"

)

).show

resets all the regular other values in the column

+--------+---------------+

| foo| bar|

+--------+---------------+

| a| null|

| b| null|

| c|someReplacement|

| third| null|

|someMore| null|

+--------+---------------+

and

myDf

.withColumn(

"bar",

when(

(($"foo" === "c") and ($"bar" isNull)) or

(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"

)

).show

Which I really would want to use to fill in the values for different classes / categories of foo. does not work as well.

I am curious how to fix this.

网友答案:

Use otherwise:

when(
  (($"foo" === "c") and ($"bar" isNull)) or
  (($"foo" === "someMore") and ($"bar" isNull)), "someReplacement" 
).otherwise($"bar")

or coalesce:

coalesce(
  $"bar",  
  when(($"foo" === "c") or ($"foo" === "someMore"), "someReplacement")
)

The reason for coalesce is...less typing (so you don't repeat $"bar" isNull).

相关阅读:
Top