Find Duplicates From a Table in SQL Server

Summary: in this tutorial, you will learn how to use the GROUP BY clause or ROW_NUMBER() function to find duplicate values in a table.

Technically, you use the UNIQUE constraints to enforce the uniqueness of rows in one or more columns of a table. However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or uncleaned data from external sources. Your job is to identify these duplicate values in effective ways.

To find the duplicate values in a table, you follow these steps:

First, define criteria for duplicates: values in a single column or multiple columns.
Second, write a query to search for duplicates.

Let’s set up a sample table for the demonstration.

Setting up a sample table

First, create a new table named t1 that contains three columns id, a, and b.

1

2

3

4

5

6

7

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (

    id INT IDENTITY(1, 1), 

    a  INT, 

    b  INT, 

    PRIMARY KEY(id)

);

Then, insert some rows into the t1 table:

1

2

3

4

5

6

7

8

9

10

11

INSERT INTO

    t1(a,b)

VALUES

    (1,1),

    (1,2),

    (1,3),

    (2,1),

    (1,2),

    (1,3),

    (2,1),

    (2,2);

The t1 table contains the following duplicate rows:

1

2

3

(1,2)

(2,1)

(1,3)

Your goal is to write a query to find the above duplicate rows.

Using `GROUP BY` clause to find duplicates in a table

This statement uses the GROUP BY clause to find the duplicate rows in both a and b columns of the t1 table:

1

2

3

4

5

6

7

8

9

10

SELECT 

    a, 

    b, 

    COUNT(*) occurrences

FROM t1

GROUP BY

    a, 

    b

HAVING 

    COUNT(*) > 1;

Here is the result:

SQL Server Find Duplicates using GROUP BY clause

How it works:

First, the GROUP BY clause groups the rows into groups by values in both a and b columns.
Second, the COUNT() function returns the number of occurrences of each group (a,b).
Third, the HAVING clause keeps only duplicate groups, which are groups that have more than one occurrence.

To return the entire row for each duplicate row, you join the result of the above query with the t1 table using a common table expression (CTE):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

WITH cte AS (

    SELECT

        a, 

        b, 

        COUNT(*) occurrences

    FROM t1

    GROUP BY 

        a, 

        b

    HAVING 

        COUNT(*) > 1

)

SELECT 

    t1.id, 

    t1.a, 

    t1.b

FROM t1

    INNER JOIN cte ON 

        cte.a = t1.a AND 

        cte.b = t1.b

ORDER BY 

    t1.a, 

    t1.b;

Here is the output:

Generally, the query for finding the duplicate values in one column using the GROUP BY clause is as follows:

1

2

3

4

5

6

7

8

9

SELECT 

    col, 

    COUNT(col)

FROM 

    table_name

GROUP BY 

    col

HAVING 

    COUNT(col) > 1;

The query for finding the duplicate values in multiple columns using the GROUP BY clause :

1

2

3

4

5

6

7

8

9

SELECT 

    col1,col2,...

    COUNT(*)

FROM 

    table_name

GROUP BY 

    col1,col2,...

HAVING 

    COUNT(*) > 1;

Using `ROW_NUMBER()` function to find duplicates in a table

The following statement uses the ROW_NUMBER() function to find duplicate rows based on both a and b columns:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

WITH cte AS (

    SELECT 

        a, 

        b, 

        ROW_NUMBER() OVER (

            PARTITION BY a,b

            ORDER BY a,b) rownum

    FROM 

        t1

) 

SELECT 

  * 

FROM 

    cte 

WHERE 

    rownum > 1;

Here is the result:

SQL Server Find Duplicates using ROW_NUMBER

How it works:

First, the ROW_NUMBER() distributes rows of the t1 table into partitions by values in the a and b columns. The duplicate rows will have repeated values in the a and b columns, but different row numbers as shown in the following picture:

Second, the outer query removes the first row in each group.

Generally, This statement uses the ROW_NUMBER() function to find the duplicate values in one column of a table:

1

2

3

4

5

6

7

8

9

10

11

WITH cte AS (

    SELECT 

        col,

        ROW_NUMBER() OVER (

            PARTITION BY col

            ORDER BY col) row_num

    FROM 

        t1

) 

SELECT * FROM cte 

WHERE row_num > 1;

In this tutorial, you have learned how to use the GROUP BY clause or ROW_NUMBER() function to find duplicate values in SQL Server.

KANSIRIS

Find Duplicates From a Table in SQL Server

Setting up a sample table

Using `GROUP BY` clause to find duplicates in a table

Using `ROW_NUMBER()` function to find duplicates in a table

0 comments:

Post a Comment

Complete Tutorials

Popular Posts

Labels

KANSIRIS

Find Duplicates From a Table in SQL Server

Setting up a sample table

Using GROUP BY clause to find duplicates in a table

Using ROW_NUMBER() function to find duplicates in a table

0 comments:

Post a Comment

Complete Tutorials

Popular Posts

Labels

Using `GROUP BY` clause to find duplicates in a table

Using `ROW_NUMBER()` function to find duplicates in a table