Can a database table have multiple rows with a NULL value in a column if there is a unique index on it? That seems like a question that we should have known the answer to from a basic RDBMS course from our studies. But for some reasons, some developers will say NO and some will strongly defend the YES option. Which group are you in?
If you thought yes, multiple NULLs are fine, you won. However, if you voted for no way!, you won too. Do you see what I am doing? It simply depends on a database engine. MySQL allows multiple NULL values in a column with a unique index on. But do not think that is true for all database engines. For example, SQL Server will strongly protest if you try to insert the second NULL.
What the standard says
SQL has its own standard. It is published by the ISO committee. The latest version is SQL:2016. It defines a NULL value as a special value that is used to indicate the absence of any data value.
- 3.1.1.13 null value
- special value that is used to indicate the absence of any data value
The problems emerge when we dive deep into the details. The standard does not define whether multiple NULL values are taken into account for a unique constraint or not. Honestly, even if it clearly indicated the behavior, it would not automatically make all the various database engines to follow the principle. There are various levels of ISO SQL standard adoption between the major RDBMS vendors.
If we take a look at the MySQL compatibility to the ISO standard, we will notice that they do not clearly specify to which version MySQL is compatible to. They are more like inspired by the standard than devoting themselves to be fully compliant. And that is not an exception. It is a similar case for most major database engines.
To be precise, the SQL standard does not define what should happen when inserting multiple NULL values into a column with a unique index.
What is an index
All major database engines use unique indexes to enforce values uniqueness. That makes absolute sense. Imagine a table like this:
ID | NAME | YEAR_OF_BIRTH |
---|---|---|
1 | Alice | 1980 |
2 | Greg | null |
3 | Natalie | 1990 |
Assume that there is a unique constraint on the YEAR_OF_BIRTH column.
It has three rows at the moment, but the following insert statement is executed.
insert into EMPLOYEES (NAME, YEAR_OF_BIRTH)
values ('Patrick', 1985);
How the database engine may verify the uniqueness constraint? Of course, it can go over each row and check each value from the YEAR_OF_BIRTH column. Compare it to the value from the insert statement (1985) and it will know. Obviously, performance would be a major problem in such a scenario if the table had many rows. Because of that fact, most database engines enforce uniqueness by a unique index.
If there is an index on YEAR_OF_BIRTH, finding the desired value (1985 in this case) is fast. Computational complexity is O(log n) vs O(n). The database engine may go even further and not stop on creating an index on YEAR_OF_BIRTH, but create a unique index instead. It will enforce uniqueness by definition as it does not allow leaves with the same values in the index B-tree.
To summarize, when you create a unique constraint on a column in MySQL, a unique index is automatically created on that column in the background. It is crucial for further uniqueness enforcement.
Multiple NULL values in a unique index
You may wonder why I tell you all these details about unique indexes. There is a reason for this. I hope you already see it.
A unique constraint is enforced by a unique index, so the main question is reduced to this - can a unique index have multiple NULL values? We already know that it does not allow two the same regular values like 1985 and 1985. But how about NULLs? To answer that, let's look again at the NULL definition - it is a value that indicates the absence of any data value. Looking at our EMPLOYEES table, Greg was not born in NULL year, but we simply do not know when Greg was born. We may try to insert Mark data into the table with an unknown year of birth (NULL):
insert into EMPLOYEES (NAME, YEAR_OF_BIRTH)
values ('Mark', NULL);
Is it a collision with Greg's birthdate? No, maybe? Exactly, we do not know, because Greg's and Mark's birthdates are unknown to us. Because of that explanation, I strongly think that it should not be reported as a uniqueness violation as it is unknown if the birthdates are equal. But who cares about my opinion, right? Fortunately, the MySQL engine works exactly like that - it allows inserting another NULL year of birth. So the following table is fully valid in MySQL.
ID | NAME | YEAR_OF_BIRTH |
---|---|---|
1 | Alice | 1980 |
2 | Greg | null |
3 | Natalie | 1990 |
4 | Mark | null |
It is correct even though there is a unique constraint (index) on the YEAR_OF_BIRTH column.
The database world would be so beautiful if that was enough on that topic. Oh no, no, there are two more things I have to mention. The first one is - do not extrapolate MySQL's behavior to all database engines. In turn, SQL Server works differently. Two NULL values violate the uniqueness constraint. Be sure to check the database engine you use, and how it behaves there. By the way, Oracle is similar to MySQL in that aspect - multiple NULL values can coexist in a unique column.
The second one is an important technical consequence of ignoring NULL values in unique constraints by MySQL and Oracle. As you already know, MySQL's unique constraint allows multiple NULL values. It is possible, because MySQL does not index NULLs. To be clear - NULL values never get to index leaves in MySQL. This is why, MySQL does not complain about multiple NULLs in a unique index - there are simply no NULLs in it. Why do I even dive into such a technical aspect? Here it comes. As NULL values are not indexed, they cannot be effectively searched by MySQL. For example, consider the following query:
select *
from EMPLOYESS
where YEAR_OF_BIRTH is null
The database engine cannot use the index to find matching rows, because the index does not contain NULLs. It will go with a full table scan to execute that query. The same thing happens in Oracle, but not in SQL Server. But that is a story for another time.