问题描述:

I have two tab delimited csv files (with headers) that I need to merge in python.

Also, in the merged file I want to add a column in the end to identify the files because though they have same format, they have different data that I need to separate later on.

So, I want to add a column called 'source' on each line of output which is 0 for file1 and 1 for file2.

I have gone far as using the csv module but the writerow adds an additioal newline character between each line it writes and this code doesn't write anything from file2. What am I doing wrong here? Also, how do I add the extra column 'source' in the line object?

import os, csv

path1 = os.path.abspath("../data/file1.txt")

path2 = os.path.abspath("../data/file2.txt")

merged_path = os.path.abspath('../data/output.txt')

# merge the two files for further processing

merged_file = csv.writer(open(merged_path, 'a'), delimiter = '\t')

#file1

fg = csv.reader(open(path1, 'r'), delimiter = '\t')

for line in fg:

if line[7] != '\N':

merged_file.writerow(line)

#file2

bg = csv.reader(open(path2, 'r'), delimiter = '\t')

for line in bg:

if line[16] != '\N':

merged_file.writerow(line)

网友答案:

I prefer to use the dictWriter for this. Also, your code doesn't work because the csv library requires opening files in binary mode.

import os, csv

path1 = os.path.abspath("../data/file1.txt")
path2 = os.path.abspath("../data/file2.txt")
merged_path = os.path.abspath('../data/output.txt')

#file1
fg = csv.DictReader(open(path1, 'rb'), delimiter = '\t')

fieldnames = fg.fieldnames
fieldnames.append('source')
# merge the two files for further processing
merged_file = csv.DictWriter(open(merged_path, 'ab'), delimiter = '\t', fieldnames=fieldnames)
merged_file.writeheader()

for row in fg:
    row['source'] = os.path.basename(path1)
    merged_file.writerow(row)

#file2
bg = csv.DictReader(open(path2, 'rb'), delimiter = '\t')

for row in bg:
    row['source'] = os.path.basename(path1)
    merged_file.writerow(row)
相关阅读:
Top