Delete duplicate records from a table

If the use enters duplicate records in a table, a delete statement with a where clause will delete all the records that satisfy the where condition specified in the delete statement.

In that case delete statement is executed as:

DELETE FROM tableName WHERE columnName IN (value1,value2,value3)

Here all the records with value1, value2, value3 will be deleted.

What is required is that we have to retain one record and delete all other duplicate records. To retain one records, the where clause must be defined on the column that uniquely identified the record.

Lets do it  by example.

Suppose we have a table called tblDocs with 3 columns


CREATE TABLE tblDocs(
 id INT IDENTITY(1, 1)
,docId      INT
,orgId       INT
)

Here we insert some values in tblDocs


INSERT INTO tblDocs VALUES (1, 1)
INSERT INTO tblDocs VALUES (1, 2)
INSERT INTO tblDocs VALUES (1, 3)
INSERT INTO tblDocs VALUES (2, 1)
INSERT INTO tblDocs VALUES (3, 2)
INSERT INTO tblDocs VALUES (2, 2)
INSERT INTO tblDocs VALUES (1, 1)    --Duplicate record
INSERT INTO tblDocs VALUES (1, 3)    --Duplicate record
INSERT INTO tblDocs VALUES (4, 3)

Now we have result area of tblDocs as below












Now here if we see then record with id 7 & 8 are duplicate with id 1 & 3. Now suppose we have a requirement where we need to put the unique index then we need to delete duplicate record. Here we can either keep id 1 & 3 or id 7 & 8, depends upon MIN or MAX in the below query. We can achieve the requirement with the help of below query:

DELETE FROM tblDocs
WHERE id NOT IN(
SELECT MIN(id) FROM tblDocs GROUP BY docId, orgId
)

There might be other way to fulfill this requirement. This one I used during my project. Comments and other solutions are always welcome.


No comments:

Post a Comment