I am completely new with R, and I tried googling a representative solution for my problem for some time, but haven't found an adequate answer so far, so I hope that asking for help might solve this one here.

I should merge two different size data sets (other includes annual data: df_f, and other monthly data: df_m). I should merge the smaller df_f to the larger df_m in a way that rows of df_f are merged conditionally with df_m.

Here is a descriptive example of my problem (with some very basic reproducible numbers):

# first dataset

``a <- c(1990)b <- c(1980:1981)c <- c(1994:1995)aa <- rep("A", 1)bb <- rep("B", 2)cc <- rep("C", 2)df1 <- data.frame(comp=factor(c(aa, bb, cc)))df2 <- data.frame(year=factor(c(a, b, c)))other.columns <- rep("other_columns", length(df1))df_f <- cbind(df1, df2, other.columns ) # first dataset``

# second dataset

``z <- c(10:12)x <- c(7:12)xx <- c(1:9)v <- c(2:9)w <- rep(1990, length(z))e <- rep(1980, length(x))ee <- rep (1981, length(xx))r <- rep(1995, length(v))t <- rep("A", length(z))y <- rep("B", length(x) + length(xx))u <- rep("C", length(v))df3 <- data.frame(month=factor(c(z, x, xx, v)))df4 <- data.frame(year=factor(c(w, e, ee, r)))df5 <- data.frame(comp=factor(c(t, y, u)))df_m <- cbind(df5, df4, df3) # second dataset``

# Output:

``> df_mcomp year month1 A 1990 102 A 1990 113 A 1990 124 B 1980 75 B 1980 86 B 1980 97 B 1980 108 B 1980 119 B 1980 1210 B 1981 111 B 1981 212 B 1981 313 B 1981 414 B 1981 515 B 1981 616 B 1981 717 B 1981 818 B 1981 919 C 1995 220 C 1995 321 C 1995 422 C 1995 523 C 1995 624 C 1995 725 C 1995 826 C 1995 9> df_fcomp year other.columns1 A 1990 other_columns2 B 1980 other_columns3 B 1981 other_columns4 C 1994 other_columns5 C 1995 other_columns``

I want to have the rows from df_f placed to df_m (store the data from df_f to new columns in df_m) according to the conditions comp, year, and month. Comp (company) needs to match always, but matching the year is conditional to month: if month is >6 then year is matched between datasets, if month is <7 then year + 1 (in df_m) is matched with year (in df_f). Note that a certain row in df_f should be placed into several rows in df_m according to the conditions.

The wanted output clarifies the problem and the goal:

# Wanted output:

`` comp year month comp year other.columns1 A 1990 10 A 1990 other_columns2 A 1990 11 A 1990 other_columns3 A 1990 12 A 1990 other_columns4 B 1980 7 B 1980 other_columns5 B 1980 8 B 1980 other_columns6 B 1980 9 B 1980 other_columns7 B 1980 10 B 1980 other_columns8 B 1980 11 B 1980 other_columns9 B 1980 12 B 1980 other_columns10 B 1981 1 B 1980 other_columns11 B 1981 2 B 1980 other_columns12 B 1981 3 B 1980 other_columns13 B 1981 4 B 1980 other_columns14 B 1981 5 B 1980 other_columns15 B 1981 6 B 1980 other_columns16 B 1981 7 B 1981 other_columns17 B 1981 8 B 1981 other_columns18 B 1981 9 B 1981 other_columns19 C 1995 2 C 1994 other_columns20 C 1995 3 C 1994 other_columns21 C 1995 4 C 1994 other_columns22 C 1995 5 C 1994 other_columns23 C 1995 6 C 1994 other_columns24 C 1995 7 C 1995 other_columns25 C 1995 8 C 1995 other_columns26 C 1995 9 C 1995 other_columns``

Thank you very much in advance! I hope the question is clear enough, it was somewhat difficult to explain it at least.

The basic idea to solve your problem is to add an extra column with the year that should be used for matching. I will use the package `dpylr` for this and other manipulation steps.

Before the tables can be combined, the numeric columns must be converted to be numeric:

``````library(dplyr)
df_m <- mutate(df_m, year = as.numeric(as.character(year)),
month = as.numeric(as.character(month)))
df_f <- mutate(df_f, year = as.numeric(as.character(year)))
``````

The reason is that you want to be able to do numerical comparison with the month (`month > 6`) and subtract one from the year. You cannot do this with a factor.

Then I add the column to be used for matching:

``````df_m <- mutate(df_m, match_year = ifelse(month >= 7, year, year - 1))
``````

And in the last step, I join the two tables:

``````df_new <- left_join(df_m, df_f, by = c("comp", "match_year" = "year"))
``````

The argument `by` determines which columns of the two data frames should be matched. The output agrees with your result:

``````##    comp year month match_year other.columns
## 1     A 1990    10       1990 other_columns
## 2     A 1990    11       1990 other_columns
## 3     A 1990    12       1990 other_columns
## 4     B 1980     7       1980 other_columns
## 5     B 1980     8       1980 other_columns
## 6     B 1980     9       1980 other_columns
## 7     B 1980    10       1980 other_columns
## 8     B 1980    11       1980 other_columns
## 9     B 1980    12       1980 other_columns
## 10    B 1981     1       1980 other_columns
## 11    B 1981     2       1980 other_columns
## 12    B 1981     3       1980 other_columns
## 13    B 1981     4       1980 other_columns
## 14    B 1981     5       1980 other_columns
## 15    B 1981     6       1980 other_columns
## 16    B 1981     7       1981 other_columns
## 17    B 1981     8       1981 other_columns
## 18    B 1981     9       1981 other_columns
## 19    C 1995     2       1994 other_columns
## 20    C 1995     3       1994 other_columns
## 21    C 1995     4       1994 other_columns
## 22    C 1995     5       1994 other_columns
## 23    C 1995     6       1994 other_columns
## 24    C 1995     7       1995 other_columns
## 25    C 1995     8       1995 other_columns
## 26    C 1995     9       1995 other_columns
``````

Top