I have a table with around 1million rows.
In the fileID column there has been some accidental duplication.
This column can have 7 or 8 digit numbers stored as a string.
I need to search for all 7-digit values that have a corresponding entry that is the same seven digits with a '0' (zero) prefixing it.
So, while there should only be a row with value '1234567' in fileID, there is now a new row with '01234567' that has been entered in error. I believe around 10,000 rows have been affected, and the incorrect entries need to be removed.
How would I write a SQL query to select only rows that have a corresponding 0-prefixed row, and return both rows in the results?
Some valid data has already been entered into the records with incorrect fileID, so the output I need would ideally have a row with a 7-digit fileID followed by any corresponding rows that have an eight digit fileID with leading '0'.
fileID other1 other2
1234567 xxxxxx xxxxxxx
01234567 xxxxxx xxxxxxx
1234566 xxxxxx xxxxxxx
01234566 xxxxxx xxxxxxx
You could group outputs by the last 7 characters like so:
SELECT RIGHT(fileID ,7) AS FileId, count(*) RecordCount FROM [YOUR_TABLE] GROUP BY RIGHT(fileID ,7) HAVING COUNT(*) > 1
This will highlight duplicate records based on the last 7 characters. You could use this result in a temp table and then filter that to highlight the rows that start with
So first insert in to a temp table:
SELECT RIGHT(fileID ,7) AS FileId, count(*) RecordCount INTO #temp FROM [YOUR_TABLE] GROUP BY RIGHT(fileID ,7) HAVING COUNT(*) > 1
Then output those starting with
0 with 8 characters:
SELECT * FROM #temp WHERE LEFT (FileId,1) = '0' and LEN(FileId) = 8
Based on your description in your question, I think the
like '0%' statement will just do fine. However, like you said, you may delete way too many records, so my other suggestion is to check for the length.
Since you mentioned the fileID should be only 7 digit, so you can check for condition if the length is more than 7 digit:
Select fileID from SomeTable where fileID like '0%' and len(fileID) > 7
This should produce the required output:
SELECT * FROM table WHERE CAST(fileID AS NUMERIC(18,0)) IN( SELECT cast(fileID AS NUMERIC(18,0)) FROM table group BY CAST(fileID AS NUMERIC(18,0)) HAVING COUNT(*)>1)
SELECT distinct T1.* FROM TABLE AS T1 RIGHT JOIN TABLE T2 ON T1.fileID like '%0'+T2.fileID