How To Delete Duplicate Rows in PostgreSQL
Summary: in this tutorial, you will learn how to use various techniques to delete duplicate rows in PostgreSQL.
Preparing sample data
First, create a new table named basket
that stores fruits:
Second, insert some fruits into the basket
table.
Third, query data from the basket
table:
Output:
The output indicates some duplicate rows such as 2 apples and 3 oranges in the basket
table.
Finding duplicate rows
If the table has few rows, you can easily see which ones are duplicates immediately. However, this is not the case with a table that has lots of rows.
The find the duplicate rows, you use the following statement:
Output:
Deleting duplicate rows using DELETE USING statement
The following statement uses the DELETE USING
statement to remove duplicate rows:
In this example, we joined the basket
table to itself and checked if two different rows (a.id < b.id) have the same value in the fruit
column.
The following query retrieves data from the basket
table to verify the duplication removal:
Output:
The output indicates that the statement removes the duplicate rows with the lowest IDs and keeps the one with the highest id.
If you want to keep the duplicate rows with the lowest id, you need to flip the operator in the WHERE
clause:
To check whether the statement works correctly, let’s verify the data in the basket
table:
Output:
The output indicates that duplicate rows with the lowest ids are retained.
Deleting duplicate rows using subquery
The following statement uses a subquery to delete duplicate rows and keep the row with the lowest id.
In this example, the subquery returned the duplicate rows except for the first row in the duplicate group. The outer DELETE
statement deleted the duplicate rows returned by the subquery.
If you want to keep the duplicate row with the highest ID, just change the order in the subquery:
In case you want to delete duplicates based on values of multiple columns, here is the query template:
In this case, the statement will delete all rows with duplicate values in the column_1
and column_2
columns.
Deleting duplicate rows using an immediate table
To delete rows using an immediate table, you use the following steps:
- Create a new table with the same structure as the one whose duplicate rows should be removed.
- Insert distinct rows from the source table to the immediate table.
- Drop the source table.
- Rename the immediate table to the name of the source table.
The following illustrates the steps for removing duplicate rows from the basket
table:
In this tutorial, you have learned how to delete duplicate rows in PostgreSQL using the DELETE USING
statement, subquery, and the immediate table techniques.