Quantcast
Viewing all articles
Browse latest Browse all 4871

MS SQL Deduplication

I have a table that has a couple hundred thousand records provided by the state nursing registrar for potential nurse hires. Some way within the database there are 24,000 records that have some how managed to get duplicated.

I copied the table, and created a new column named duplicated, Using a query that utilized COUNT(*) > 1, I had all the rows that were duplicates flagged to True. Once they were flagged to true, I pulled all addresses that were flagged as duplicate out of the database table to a new table.

I now have a tabled with all the duplicates. I have been attempting to figure out how to only select one of the duplicated records. I have tried several variations of

SELECT DISTINCT *

SELECT DISTINCT [col1], [col2], [col3]

etc. What i am running into is that when I enter all the columns after the DISTINCT argument it is not filtering out the records

What I would like to do is be able to go through and flag one of each that matches [first], [last], [street] to a 0 so that I can pull out the addresses to INSERT back into the database.

This becomes more complicated is because at some addresses I will have more than one person. Example I could have Tom Smith and Jane Smith both at the same address and I need to keep the records for both.

Any suggestions?

Thanks,


Viewing all articles
Browse latest Browse all 4871

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>