Today I learned you can't join tables on NULLs because SQL Server deems a NULL to be an ambiguous quantity so two NULLs are not the same NULL. You can force SQL Server to do the join by doing something like this
... colors c JOIN points p ON (c.color = p.color OR (c.color IS NULL AND p.color IS NULL))
This example assumes only one team will have a NULL color.
CREATE TABLE colors (
team varchar(16) NOT NULL,
color varchar(32) NULL,
)
CREATE TABLE points (
color varchar(32) NULL,
points int
)
INSERT INTO colors VALUES ('lobsters','red')
INSERT INTO colors VALUES ('swans','white')
INSERT INTO colors VALUES ('jellyfish',NULL)
INSERT INTO points VALUES ('red',100)
INSERT INTO points VALUES ('white',90)
INSERT INTO points VALUES (NULL,80)
SELECT * FROM colors c JOIN points p ON c.color = p.color
-- returns:
--lobsters red red 100
--swans white white 90
SELECT * FROM colors c JOIN points p ON (c.color = p.color OR (c.color IS NULL AND p.color IS NULL))
-- returns:
--lobsters red red 100
--swans white white 90
--jellyfish NULL NULL 80
1 comment:
I also assumed NULL joins would work, expecting the only issue to be runaway record duplication in the case of NULL-to-NULL joins. As you've discovered, SQL won't join on NULL, or any ambiguous values.
You suggested:
... colors c JOIN points p ON (c.color = p.color OR (c.color IS NULL AND p.color IS NULL))
This will work, but the ORing of two ON conditions, and the explicit ANDing of IS NULL checks may put a significant load on server resources, especially when joining large tables. I would suggest using the IsNull function to alias NULLs to a reserved literal, such as the pipe character:
... colors c JOIN points p ON (IsNull (c.color, '|') = IsNull (p.color, '|'))
This will allow the server to optimize the NULL conversion with the native logic of the IsNull function. You should probably explicitly trap the NULL-to-NULL case as well, to prevent wasting resources generating massive unusable datasets.
(apologies for the cross-post in LinkedIn comments)
Post a Comment