问题描述:

This is a followup to my first question "Porting “SQL” export to T-SQL".

I am working with a 3rd party program that I have no control over and I can not change. This program will export it's internal database in to a set of .sql each one with a format of:

INSERT INTO [ExampleDB] ( [IntField] , [VarcharField], [BinaryField])

VALUES

(1 , 'Some Text' , 0x123456),

(2 , 'B' , NULL),

--(SNIP, it does this for 1000 records)

(999, 'E' , null);

(1000 , 'F' , null);

INSERT INTO [ExampleDB] ( [IntField] , [VarcharField] , BinaryField)

VALUES

(1001 , 'asdg', null),

(1002 , 'asdf' , 0xdeadbeef),

(1003 , 'dfghdfhg' , null),

(1004 , 'sfdhsdhdshd' , null),

--(SNIP 1000 more lines)

This pattern continues till the .sql file has reached a file size set during the export, the export files are grouped by EXPORT_PATH\%Table_Name%\Export#.sql Where the # is a counter starting at 1.

Currently I have about 1.3GB data and I have it exporting in 1MB chunks (1407 files across 26 tables, All but 5 tables only have one file, the largest table has 207 files).

Right now I just have a simple C# program that reads each file in to ram then calls ExecuteNonQuery. The issue is I am averaging 60 sec/file which means it will take about 23 hrs for it to do the entire export.

I assume if I some how could format the files to be loaded with a BULK INSERT instead of a INSERT INTO it could go much faster. Is there any easy way to do this or do I have to write some kind of Find & Replace and keep my fingers crossed that it does not fail on some corner case and blow up my data.

Any other suggestions on how to speed up the insert into would also be appreciated.


UPDATE:

I ended up going with the parse and do a SqlBulkCopy method. It went from 1 file/min. to 1 file/sec.

网友答案:

Well, here is my "solution" for helping convert the data into a DataTable or otherwise (run it in LINQPad):

var i = "(null, 1 , 'Some''\n Text' , 0x123.456)";
var pat = @",?\s*(?:(?<n>null)|(?<w>[\w.]+)|'(?<s>.*)'(?!'))";
Regex.Matches(i, pat,
      RegexOptions.IgnoreCase | RegexOptions.Singleline).Dump();

The match should be run once per value group (e.g. (a,b,etc)). Parsing of the results (e.g. conversion) is left to the caller and I have not tested it [much]. I would recommend creating the correctly-typed DataTable first -- although it may be possible to pass everything "as a string" to the database? -- and then use the information in the columns to help with the extraction process (possibly using type converters). For the captures: n is null, w is word (e.g. number), s is string.

Happy coding.

网友答案:

Apparently your data is always wrapped in parentheses and starts with a left parenthesis. You might want to use this rule to split(RemoveEmptyEntries) each of those lines and load it into a DataTable. Then you can use SqlBulkCopy to copy all at once into the database.

This approach would not necessarily be fail-safe, but it would be certainly faster.

Edit: Here's the way how you could get the schema for every table:

private static DataTable extractSchemaTable(IEnumerable<String> lines)
{
    DataTable schema = null;
    var insertLine = lines.SkipWhile(l => !l.StartsWith("INSERT INTO [")).Take(1).First();
    var startIndex = insertLine.IndexOf("INSERT INTO [") + "INSERT INTO [".Length;
    var endIndex = insertLine.IndexOf("]", startIndex);
    var tableName = insertLine.Substring(startIndex, endIndex - startIndex);
    using (var con = new SqlConnection("CONNECTION"))
    {
        using (var schemaCommand = new SqlCommand("SELECT * FROM " tableName, con))
        {
            con.Open();
            using (var reader = schemaCommand.ExecuteReader(CommandBehavior.SchemaOnly))
            {
                schema = reader.GetSchemaTable();
            }
        }
    }
    return schema;
}

Then you simply need to iterate each line in the file, check if it starts with ( and split that line by Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries). Then you could add the resulting array into the created schema-table.

Something like this:

var allLines = System.IO.File.ReadAllLines(path);
DataTable result = extractSchemaTable(allLines);
for (int i = 0; i < allLines.Length; i++)
{
    String line = allLines[i];
    if (line.StartsWith("("))
    {
        String data = line.Substring(1, line.Length - (line.Length - line.LastIndexOf(")")) - 1);
        var fields = data.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
        // you might need to parse it to correct DataColumn.DataType
        result.Rows.Add(fields);
    }
}
相关阅读:
Top