Latest web development tutorials

MySQL handles duplicate data

There may be some MySQL data table duplicate records, in some cases we allow duplicate data, but sometimes we need to delete the duplicate data.

This chapter we will introduce how to prevent data duplication of data tables and how to delete duplicate data in the data table.


Prevent duplicate data in the table appears

You can set the specified field in MySQL data tables for the PRIMARY KEY (primary key) or UNIQUE (unique) index to ensure the uniqueness of the data.

Let's try an example: under the table without indexes and primary keys, so that the table allows multiple duplicate records appear.

CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);

If you want to set the table fields first_name, last_name data can not be repeated, you can set up a dual primary key mode to set the unique nature of the data, if you set up a dual primary key, then the default value of that key can not be NULL, can be set to NOT NULL. As follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

If we set a unique index, then insert duplicate data, SQL statement will not be executed successfully, and throw wrong.

INSERT IGNORE INTO difference is that with the INSERT INTO INSERT IGNORE ignore data that already exists in the database, if the database does not have data, insert new data if there is data, then skip this data. So that you can retain the data already exists in the database, to achieve the purpose of data inserted in the gap.

The following example uses INSERT IGNORE INTO, can not go wrong after the implementation, it will not insert duplicate to the data table:

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

INSERT IGNORE INTO When inserting data, after setting the uniqueness of the record, if you insert duplicate data, will not return an error, it returns only as a warning. The REPLACE INTO into the same record if the primary or unique presence, the first deleted. Then insert a new record.

Another set of data is the only way is to add a UNIQUE index, as follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);

Statistics duplicate data

Below we will repeat the record number of tables in the first_name and last_name:

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;

The above query will return person_tbl duplicate the number of records in the table. Under normal circumstances, the value of repeat queries, please do the following:

  • Determine which column contains the value may be repeated.
  • In the column selection list use COUNT (*) those listed in the column.
  • Columns listed in the GROUP BY clause.
  • HAVING clause sets the number of repetitions is greater than 1.

Filter duplicate data

If you need to read the unique data can use the DISTINCT keyword in the SELECT statement to filter duplicate data.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM person_tbl
    -> ORDER BY last_name;

You can also use GROUP BY to read the data in the table does not duplicate data:

mysql> SELECT last_name, first_name
    -> FROM person_tbl
    -> GROUP BY (last_name, first_name);

Deduplication

If you want to delete duplicate data in the data table, you can use the following SQL statement:

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
    ->                  FROM person_tbl;
    ->                  GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

Of course, you can also add INDEX (index) and PRIMAY KEY (primary key) This simple method in a data table to delete duplicate records in the table. Here's how:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);