问题描述:

I'm using Mac 10.9.5, bash shell, and perl 5, version 16, subversion 3 (v5.16.3). I have the following script ...

#!/bin/bash

perl -pi -e "s/([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?)/REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '\$24', '\$26', '\$2', '\$27');/g" $1

However, when I run the script against a file ...

 sh myscript.sh ~/Downloads/myfile.csv

The above only gets run against the first line of the file and not against every line in the file, despite the fact the file has thousands of lines ...

davea$ wc -l ~/Downloads/myfile.csv

91552 /Users/davea/Downloads/myfile.csv

How do I adjust the above so that it applies the search and replace to every line of the file?

Edit: here is an example of a file that I'm passing in as input

 app.app.first_name,app.app.id,app.app.last_name,app.app.max_time,app.app.url,app.app.user_name,thirdparty.created,thirdparty.district,thirdparty.dob,thirdparty.ell_status,thirdparty.email,thirdparty.frl_status,thirdparty.gender,thirdparty.grade,thirdparty.hispanic_ethnicity,thirdparty.iep_status,thirdparty.last_modified,thirdparty.location.zip,thirdparty.name.first,thirdparty.name.last,thirdparty.name.middle,thirdparty.race,thirdparty.school,thirdparty.sis_id,thirdparty.state_id,thirdparty.student_number,thirdparty.id,matchmaker_result

FirstName,0040FBA053464647BD51141EECF4437F,LastName,2014-09-15 20:46:11,cityunifiedca.springboardonline.org,mlastname,2014-04-04T23:03:29.916Z,51e76ab1d93412f47b000c32,6/12/2000,,,Paid,F,10,Y,Y,2015-08-19T21:33:13.989Z,90033-1803,FIRSTNAME,LASTNAME,A,Caucasian,51f811478a86244d2900033f,061200F010,6124939964,061200F010,533f3a412a1f1fea24c8e164,match

and here is the output of running the above

 REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), 'thirdparty.sis_id', 'thirdparty.student_number', 'app.app.id', 'thirdparty.id');atchmaker_result

FirstName,0040FBA053464647BD51141EECF4437F,LastName,2014-09-15 20:46:11,cityunifiedca.springboardonline.org,mlastname,2014-04-04T23:03:29.916Z,51e76ab1d93412f47b000c32,6/12/2000,,,Paid,F,10,Y,Y,2015-08-19T21:33:13.989Z,90033-1803,FIRSTNAME,LASTNAME,A,Caucasian,51f811478a86244d2900033f,061200F010,6124939964,061200F010,533f3a412a1f1fea24c8e164,match

网友答案:

Provide the path to your input file as the first command line argument.

Note: array indices might be off because I simply took your regex match variables and shifted them down by one (i.e., I didn't test this code).

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag;
open(my $fh, '<', $ARGV[0]) or die $!;

while (my $row = $csv->getline($fh)) {
    print "REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '$row->[23]', '$row->[25]', '$row->[1]', '$row->[26]');\n";
}

$csv->eof or $csv->error_diag;
close($fh);
网友答案:

Your s/// seems to be only matching the first line. Don't know why. However, that's a ridiculous regular expression. You want to split on commas into a list

perl -F, -lane '
    BEGIN { $t="REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), \047-\047, \047\047), \047%s\047, \047%s\047, \047%s\047, \047%s\047);\n"; }
    printf $t, $F[23], $F[25], $F[1], $F[26];
' file
REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), 'thirdparty.sis_id', 'thirdparty.student_number', 'app.app.id', 'thirdparty.id');
REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '061200F010', '061200F010', '0040FBA053464647BD51141EECF4437F', '533f3a412a1f1fea24c8e164');
网友答案:

Lets start by fixing your script to be a Perl script, one-liners are for the command line.

#!/usr/bin/perl
# example code from `man perlrun`

use warnings;
use strict;
my $extension = '.orig';
my $oldargv;
my $backup;
my $subre = "([^,]+?)";
my $bigre = "$subre," x 27 . $subre;
my $presub = "REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', '')";
LINE: while (<>) {
if ($ARGV ne $oldargv) {
    if ($extension !~ /\*/) {
      $backup = $ARGV . $extension;
    } else {
      ($backup = $extension) =~ s/\*/$ARGV/g;
    }
    rename($ARGV, $backup);
    open(ARGVOUT, ">$ARGV");
    select(ARGVOUT);
    $oldargv = $ARGV;
}
  s/$bigre/$presub, '\$24', '\$26', '\$2', '\$27');/g;
} continue {
  print;    # this prints to original filename
}
select(STDOUT);

Then, looking at that regex, there's probably lines that contain ,, for an empty field, so... you could fix the regex, but using one is kind of wrong here. Lets just change that one line from the above to this:

  my @f = split /,/;
  $_ = $presub . ", '${f[23]}', '${f[25]}', '${f[1]}', '${f[26]}');"

This assumes that there are no fields which, containing ,, become quoted or escaped fields. For all that you'd use Text::CSV as Matt Jacob showed. I have similar caveats.


Alternatively you can stick with the regex if you must, but remove the g modifier, anchor the lines, and allow for empty captured groups.

s/^([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?)$/REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '\$24', '\$26', '\$2', '\$27');/;

This doesn't timeout in regex101.com and works when given the flags mg for sample input, if you remove the $ from the substitution's references to the captured fields.

Or modify the first script above changing these lines:

my $subre = "([^,]*?)";
my $bigre = '^' . "$subre," x 27 . $subre . '$';
...
s/$bigre/$presub, '\$24', '\$26', '\$2', '\$27');/;
相关阅读:
Top