问题描述:

I'm a little confused by what the following in= step does.

Here is the code:

data data1;

merge data2 data3 (in=inb);

by ID;

if inb;

run;

I would really appreciate if someone can tell me what the in=inb here does.

网友答案:
DS_A         DS_B

ID VAR1      ID VAR2
A   X        A   X
B   X        B   X
C   X        D   X

data want;
merge ds_a ds_b;
by id;
run;

will produce this

WANT:

ID VAR1 VAR2
A   X    X
B   X    X
C   X  
D        X

If you add the IN= option you add a temporary and hidden variable that is 1 when the observation is present in that dataset, 0 otherwise, like this:

DS_A         DS_B

ID VAR1      ID VAR2
A   X        A   X
B   X        B   X
C   X        D   X

data want;
merge ds_a (in=frs) ds_b (in=scn);
by id;
run;

WANT:
ID VAR1 VAR2  FRS  SCN
A   X    X     1    1
B   X    X     1    1
C   X          1    0
D        X     0    1

So you can play with this hidden variable to keep observations from one dataset or from both or from only one etc...

if frs; ---> keep ID=A B C
if scn; ---> keep ID=A B D
if frs and scn ---> keep ID=A B
if frs and not scn --> keep ID=C
etc..
网友答案:

One other aspect of the behaviour of the in= option that I don't think anyone else has mentioned - if you merge two different datasets using the same in= variable for both, and a row is in one but not the other, a value of 1 takes precedence over a value of 0. E.g.

data test;
    merge sashelp.class(where = (sex = 'F') in = a) 
          sashelp.class(where = (sex = 'M') in = a);
    by name;          
    put _all_;
run;    

In this case, a = 1 for every row, even though each row is only present in one of the input datasets.

网友答案:

According to SYNTAX section of the merge Statement documentation, the data sets you are merging can have options. In this case you are using IN= Data Set Option. Below is the explanation of this option:

Creates a Boolean variable that indicates whether the data set contributed data to the current observation.

So in this case, you are naming this boolean variable inb.

网友答案:

Thanks all for the help. But what does that do in my particular case?

let's assume the syntax is now something like the following:

data data1;
merge data2 data3 (in=inb);
by ID;
if inb;
drop Var1 Var2;
run; 
网友答案:

because the option (in=inb) is after data3, it is referred to this dataset. hence, you will have a boolean variable that will be 1 in data1 (final dataset) if that observation was present in data3, 0 otherwise.

Data2 Data3
ID       ID
A        A
B        B
C        D

You will have

Data3
ID INB
A  1
B  1
C  0
D  1

Adding the statement if INB; you will keep only observations with INB=1 (observations coming from data3)

Data3
ID
A
B
D
网友答案:

Functionally,

 merge data2 data3 (in=inb);
 by ID;
    if inb;

is the same as a right join in SQL.

Technically, "inb" is a 0/1 flag set to "1" for each record found in data3. "if inb" is shorthand for "if inb is true [then keep the record]", and for numeric fields "true" means greater than zero.

相关阅读:
Top