问题描述:

I have the following actual data:

actual.code actual.distance

a 1

c 6

e 10

And also the following interpolated data for all observations:

interpolated.code interpolated.distance

a 2

b 4

c 7

d 4

e 9

f 1

And I want the following output:

 output.code output.distance

a 1

b 4

c 6

d 4

e 10

f 1

So for any rows where I don't have the actual data, I want to use the interpolated data. I tried an inner join but that didn't seem to work, as I want to keep all observations from the first dataframe, and only use data from the second if necessary.

网友答案:

A long version is to use merge while keeping the indicator in the result, and then then replace the interpolated.distance with actual.distance based on the indicator:

df_merge = df1.merge(df2, left_on="actual.code", right_on="interpolated.code", how="right", indicator=True)
df_merge["interpolated.distance"] = df_merge["interpolated.distance"].where(df_merge["_merge"] != "both", df_merge["actual.distance"]) 
df_merge.drop(["actual.code", "actual.distance", "_merge"], axis=1).sort_values("interpolated.code")

# interpolated.code interpolated.distance
#0          a                  1.0
#3          b                  4.0
#1          c                  6.0
#4          d                  4.0
#2          e                 10.0
#5          f                  1.0

Or another option, you can use combine_first() with df1 as the calling object which will be prioritized when the values are picked up, Basically set the codes from both data frames as index, and combine the distance with actual.distance as the calling object, this will prioritize the values from actual.distance if the codes match:

(df1.set_index("actual.code")["actual.distance"]
 .combine_first(df2.set_index("interpolated.code")["interpolated.distance"])
 .rename_axis("output.code").rename("output.distance").reset_index())

#output.code    output.distance
#0     a              1.0
#1     b              4.0
#2     c              6.0
#3     d              4.0
#4     e             10.0
#5     f              1.0
相关阅读:
Top