问题描述:

I have two externally hosted third-party .txt files that are updated on an irregular basis by someone other than myself. I have written a script that pulls this information in, manipulates it, and creates a merged array of data suitable for use in a database. I'm not looking for exact code but rather a description of a good process that will work efficiently in inserting a new row from this array if it doesn't already exist, updating a row in the table if any values have changed, or deleting a row in the table if it no longer exists in the array of data.

The data is rather simple and has the following structure:

map (string) | route (string) | time (decimal) | player (string) | country (string)

where a map and route combination must be unique.

Is there any way to do all needed actions without having to loop through all of the external data and all of the data from the table in my database? If not, what would be the most efficient method?

Below is what I have written. It takes care of all but the delete part:

require_once('includes/db.php');

require_once('includes/helpers.php');

$data = array_merge(

custom_parse_func('http://example1.com/ex.txt'),

custom_parse_func('http://example2.com/ex.txt')

);

try {

$dsn = "mysql:host=$dbhost;dbname=mydb";

$dbh = new PDO($dsn, $dbuser, $dbpass);

$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);

$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

foreach ($data as $value) {

$s = $dbh->prepare('INSERT INTO table SET map=:map, route=:route, time=:time, player=:player, country=:country ON DUPLICATE KEY UPDATE map=:map2, route=:route2, time=:time2, player=:player2, country=:country2');

$s->execute(array(

':map' => $value['map'],

':route' => $value['route'],

':time' => $value['time'],

':player' => $value['player'],

':country' => $value['country'],

':map2' => $value['map'],

':route2' => $value['route'],

':time2' => $value['time'],

':player2' => $value['player'],

':country2' => $value['country']

));

}

} catch(PDOException $e) {

echo $e;

}

网友答案:

You mention that you're using MySQL, which has a handy INSERT ... ON DUPLICATE KEY UPDATE ... statement (documentation here). You will have to iterate over your collection of data (but not the existing table). I would handle it a little differently than @Tim B does...

  1. create a temporary table to hold the new data.

  2. loop through your new data and insert it into the new table

  3. run an INSERT ... ON DUPLICATE KEY UPDATE ... statement inserting from the temporary table into the existing table - that takes care of both inserting new records and updated changed records.

  4. run a DELETE FROM [existing table] t1 LEFT JOIN [temporary table] t2 ON [whatever key(s) you have] WHERE t2.id IS NULL - this will delete everything from the existing table that does not appear in the temporary table.

The nice thing about temporary tables is that they are automatically dropped when the connection closes (as well has having some other nice features like being invisible to other connections).

The other nice thing about this method is that you can do some (or all) of your data manipulation in the database after you insert it into a table in step 1. It is often faster and simpler to do this kind of thing through SQL instead of looping through and changing values in your array.

网友答案:

The simplest way would be to truncate the table and then insert all the values. This will handle all of your requirements.

Assuming that is not viable though then you need to remember which rows have been modified, that can be done using a flag, a version number, or a timestamp. For example:

  • Update the table, set the "updated" flag to 0 on every row

  • Loop through doing an upsert for every item (http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html). Set the flag to 1 in each upsert.

  • Delete every entry from the database with the flag set to 0.

相关阅读:
Top