注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

银河里的星星

落在人间

 
 
 

日志

 
 

Information about Null (SQL)  

2013-04-20 13:07:22|  分类: 技术专题 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Null is a special marker used to indicate that a data value is unknown in the Structured Query Language (SQL). Introduced by the creator of the relationaldatabase model, Dr. E.F. Codd,SQL Null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Dr. Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent Null indatabase theoryNULL is also an SQL reserved keyword used to identify the Null special marker. 

Null has been the focus of controversy and a source of debate because of its associated Three-Valued Logic (3VL), special requirements for its use in SQL joins, and the special handling required by aggregate functions and SQL grouping operators. Though special functions and predicates are provided to properly handle Nulls, opponents feel that resolving these issues introduces unnecessary complexity and inconsistency into the relational model of databases.

History

Null was introduced by Dr. E.F. Codd as a method of representing missing data in the relational model. Dr. Codd later reinforced his requirement that all RDBMS' support Null to indicate missing data in a two-part series published inComputerWorld magazine.[1][2] Dr. Codd also introduced a ternary (three-valued)logic, consisting of the truth values True, False, and Unknown, which is closely tied to the concept of Null. The Unknown truth value is generated whenever Null is compared with any data value, or with another Null. 

Dr. Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively.[3] Dr. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Null-type values has not gained widespread acceptance.

Three-valued logic (3VL)

Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.[3] The logical result of the expression below, which compares the value 10 to Null, is Unknown: 

10 = NULL -- Results in Unknown 

However, certain operations on Null can return values if the value of Null is not relevant to the outcome of the operation. For example: 

TRUE OR NULL -- Results in True 

In this case, the fact that the value on the right of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the right. 

SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL). The rules governing SQL three-valued logic are shown in the tables below (p and q represent logical states)"[4] 

p AND qp
TrueFalseUnknown
qTrueTrueFalseUnknown
FalseFalseFalseFalse
UnknownUnknownFalseUnknown
p OR qp
TrueFalseUnknown
qTrueTrueTrueTrue
FalseTrueFalseUnknown
UnknownTrueUnknownUnknown
pNOT p
TrueFalse
FalseTrue
UnknownUnknown
Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates test whether data is, or is not, Null.[5]


Data typing

Null is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific data type.[3] Because of this, it is sometimes mandatory (or desirable) to explicitly convert Nulls to a specific data type. For example, if overloaded functions are supported by the RDBMS, SQL might not be able to automatically resolve to the correct function without knowing the data types of all parameters, including those for which Null is passed.

Data Manipulation Language

SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERTUPDATE, or DELETE DML statements, and are discarded by SELECT queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls.[4] The following simple example demonstrates this fallacy: 

SELECT * FROM t WHERE i = NULL; 

The example query above always returns zero rows because the comparison of the i column with Null always returns Unknown, even for those rows where i is Null. The Unknown result causes the SELECT statement to summarily discard each and every row.


CASE expressions

SQL CASE expressions operate under the same rules as the DML WHERE clause rules for Null. Because it can be evaluated as a series of equality comparison conditions, a simple CASE expression cannot check for the existence of Null directly. A check for Null in a simple CASE expression always results in Unknown, as in the following: 

SELECT CASE i WHEN NULL THEN 'Is Null' -- This will never be returned WHEN 0 THEN 'Is Zero' -- This will be returned when i = 0 WHEN 1 THEN 'Is One' -- This will be returned when i = 1 END FROM t; 

Because the expression i = NULL evaluates to Unknown no matter what value column i contains (even if it contains Null), the string 'Is Null' will never be returned. 

searched CASE expression also returns the first value for which the result of the comparison predicate evaluates to True, including comparisons using the IS NULL and IS NOT NULL comparison predicates. The following example shows how to use a searched CASE expression to properly check for Null: 

SELECT CASE WHEN i IS NULL THEN 'Null Result' -- This will be returned when i is NULL WHEN i = 0 THEN 'Zero' -- This will be returned when i = 0 WHEN i = 1 THEN 'One' -- This will be returned when i = 1 END FROM t; 

In the searched CASE expression, the string 'Null Result' is returned for all rows in which i is Null.

Check constraints

The primary place in which SQL three-valued logic intersects with SQL Data Definition Language (DDL) is in the form of check constraints. A check constraint placed on a column operates under a slightly different set of rules than those for the DML WHERE clause. While a DML WHERE clause must evaluate to True for a row, a check constraint must not evaluate to False. This means that a check constraint will succeed if the result of the check is either True or Unknown. The following example table with a check constraint will prohibit any integer values from being inserted into column i, but will allow Null to be inserted since the result of the check will always evaluate to Unknown for Nulls.[7] 

CREATE TABLE t ( i INTEGER, CONSTRAINT ck_i CHECK ( i < 0 AND i = 0 AND i > 0 ) ); 

In order to constrain a column to reject Nulls, the NOT NULL constraint can be applied, as shown in the example below. The NOT NULL constraint is semantically equivalent to a check constraint with an IS NOT NULL predicate. 

CREATE TABLE t ( i INTEGER NOT NULL );

Procedural extensions

SQL/PSM (SQL Persistent Stored Modules) defines procedural extensions for SQL, such as the IF statement. However, the major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for looping and comparisons operate under Null comparison rules similar to those for DML statements and queries. The following code fragment, in ISO SQL standard format, demonstrates the use of Null 3VL in an IFstatement. 

IF i = NULL THEN SELECT 'Result is True' ELSEIF NOT(i = NULL) THEN SELECT 'Result is False' ELSE SELECT 'Result is Unknown'; 

The IF statement performs actions only for those comparisons that evaluate to True. For statements that evaluate to False or Unknown, the IF statement passes control to the ELSEIF clause, and finally to the ELSE clause. The result of the code above will always be the message 'Result is Unknown' since the comparisons with Null always evaluate to Unknown.

Joins

SQL outer joins, including left outer joins, right outer joins, and full outer joins, automatically produce Nulls as placeholders for missing values in related tables. For left outer joins, for instance, Nulls are produced in place of rows missing from the table appearing on the right-hand side of theLEFT OUTER JOIN operator. The following simple example uses two tables to demonstrate Null placeholder production in a left outer join. 

The first table (Employee) contains employee ID numbers and names, while the second table (PhoneNumber) contains related employee ID numbers and phone numbers, as shown below. 

Employee
IDLastNameFirstName
1JohnsonJoe
2LewisLarry
3ThompsonThomas
4PattersonPatricia
PhoneNumber
IDNumber
1555-2323
3555-9876
The following sample SQL query performs a left outer join on these two tables. 

SELECT e.ID, e.LastName, e.FirstName, pn.Number FROM Employee e LEFT OUTER JOIN PhoneNumber pn ON e.ID = pn.ID; 

The result set generated by this query demonstrates how SQL uses Null as a placeholder for values missing from the right-hand (PhoneNumber) table, as shown below. 

Query Result
IDLastNameFirstNameNumber
1JohnsonJoe555-2323
2LewisLarryNULL
3ThompsonThomas555-9876
4PattersonPatriciaNULL
Inner joins and cross joins, also available in standard SQL, do not generate Null placeholders for missing values in related tables. 

Care must be taken when using nullable columns in SQL join criteria. Because a Null is not equal to any other Null, Nulls in a column of one table will not join to Nulls in the related column of another table using the standard equality comparison operators. The SQL COALESCE function or CASE expressions can be used to "simulate" Null equality in join criteria, and the IS NULL and IS NOT NULL predicates can be used in the join criteria as well. The following predicate tests for equality of the values A and B and treats Nulls as being equal. 

( A = B ) OR ( A IS NULL AND B IS NULL )

Mathematical and string concatenation

Because Null is not a data value, but a marker for an unknown value, using mathematical operators on Null results in an unknown value, which is represented by Null.[8] In the following example multiplying 10 by Null results in Null:

10 * NULL -- Result is NULL 

This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception - division by zero".[8] In the example below, the Oracle and Microsoft SQL Server platforms both return a Null result. 

NULL / 0 

String concatenation operations, which are common in SQL, also result in Null when one of the operands is Null.[9]The following example demonstrates the Null result returned by using Null with the SQL || string concatenation operator. 

'Fish ' || NULL || 'Chips' -- Result is NULL

Aggregate functions

SQL defines aggregate functions to simplify server-side aggregate calculations on data. Almost all aggregate functions perform a Null-elimination step, so that Null values are not included in the final result of the calculation.[10] This implicit Null elimination, however, can have an impact on aggregate function results. 

The following example table results in different results being returned for each column when the SQL AVG (average) aggregate function is applied. 

Table
ij
150150
200200
350350
NULL0
The SQL AVG aggregate function returns 233 when applied to column i, but returns 175 when applied to column j. The aggregate function's Null-elimination step accounts for the difference in these results. The only aggregate function that does not implicitly eliminate Null is the COUNT(*) function.

Grouping and sorting

Because defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct".[11] This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used. 

Other SQL operations, clauses, and keywords use "not distinct" in their treatment of Nulls. These include the following:
  • PARTITION BY clause of ranking and windowing functions like ROW_NUMBER
  • UNIONINTERSECT, and EXCEPT operator which treat NULLs as the same for row comparison/elimination purposes
  • DISTINCT keyword used in SELECT queries
  • Unique constraints on nullable columns, which allow only a single Null value to be stored in the column
The SQL standard does not explicitly define a default sort order for Nulls. Instead, on conforming systems, Nulls can be sorted before or after all data values by using the NULLS FIRST or NULLS LAST clauses of the ORDER BYlist, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for Null sorting in the DBMS.[7]


Null-handling functions

SQL defines two functions to explicitly handle Nulls: COALESCE and NULLIF. Both functions are abbreviations for searched CASE expressions.[12]


COALESCE

The COALESCE function accepts a list of parameters, returning the first non-Null value from the list: 

COALESCE(value1value2value3...

COALESCE is defined as shorthand for the following SQL CASE expression: 

CASE WHEN value1 IS NOT NULL THEN value1 WHEN value2 IS NOT NULL THEN value2 WHEN value3 IS NOT NULL THEN value3 ... END 

Some SQL DBMS's implement vendor-specific functions similar to COALESCE. Some systems implement anISNULL function, or other similar functions which are functionally similar to COALESCE.


NULLIF

The NULLIF function accepts two parameters. If the first parameter is equal to the second parameter, NULLIFreturns Null. Otherwise, the value of the first parameter is returned. 

NULLIF(value1value2

Thus, NULLIF is an abbrevation for the following CASE expression: 

CASE WHEN value1 = value2 THEN NULL ELSE value1 END


Controversy

Common mistakes

Misunderstanding of how Null works is the cause of a great number of errors in SQL code, both in ISO standard SQL statements and in the specific SQL dialects supported by real-world database management systems. These mistakes are usually the result of confusion between Null and either 0 (zero) or an empty string (a string value with a length of zero, represented in SQL as ''). Null is different from an empty string and the numerical value 0, however. While Null indicates the absence of any value, the empty string and numerical zero both represent actual values. 

For example, a WHERE clause or conditional statement might compare a column's value with a constant. It is often incorrectly assumed that a missing value would be "less than" or "not equal to" a constant if that field contains Null, but in fact, such expressions return Unknown. An example is below: 

SELECT * FROM sometable WHERE num <> 1; -- Rows where num is NULL will not be returned, -- contrary to many users' expectations. 

Similarly, Null values are often confused with empty strings. Consider the LENGTH function, which returns the number of characters in a string. When a Null is passed into this function, the function returns Null. This can lead to unexpected results, if users are not well versed in 3-value logic. An example is below: 

SELECT * FROM sometable WHERE LENGTH(string) < 20; -- Rows where string is NULL will not be returned.


Criticisms

The ISO SQL implementation of Null is the subject of criticism, debate and calls for change. In The Relational Model for Database Management: Version 2, Dr. E.F. Codd suggested that the SQL implementation of Null was flawed and should be replaced by two distinct Null-type markers. The markers he proposed were to stand for"Missing but Applicable" and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL.[3] Others have suggested adding additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be "Missing", increasing the complexity of SQL's logic system. At various times, proposals have also been put forth to implement multiple user-defined Null markers in SQL. Because of the complexity of the Null-handling and logic systems required to support multiple Null markers, none of these proposals have gained widespread acceptance. 

Still other Relational Management (RM) experts, like the authors of The Third Manifesto, Chris Date and Hugh Darwen, have suggested that the SQL Null implementation is inherently flawed and should be eliminated altogether.[14] These experts often point to inconsistencies and flaws in the implementation of SQL Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be removed from the Relational Model.[15] Others, like author Fabian Pascal, have stated a belief that "how the function calculation should treat missing values is not governed by the relational model."


Closed world assumption

Another point of conflict concerning Nulls is that they violate the closed world assumption model of relational databases by introducing an open world assumption into it.[16] The Closed World Assumption, as it pertains to databases, states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false."[17] This view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under the Open World Assumption, in which some items stored in the database are considered unknown, making the databases's stored knowledge of the world incomplete. Relational Management experts see this as an inconsistency within SQL.


Boolean simplification

A WHERE expression such as the following, might at first glance seem to be subject to simplification usingboolean algebra (presuming that x is of boolean datatype): 

SELECT * FROM stuff WHERE x OR NOT x; 

The incorrect presumption is that it would simplify to: 

SELECT * FROM stuff; 

Instead, the correct equivalent expression is: 

SELECT * FROM stuff WHERE x IS NOT NULL;


See also

References

1. ^ Codd, E.F. (14 October 1985). "Is Your Database Really Relational?".ComputerWorld.ComputerWorld&rft.date=14%20October%201985&rft.aulast=Codd&rft.aufirst=E.F."> 
2. ^ Codd, E.F. (21 October 1985). "Does Your DBMS Run By The Rules?".ComputerWorld.ComputerWorld&rft.date=21%20October%201985&rft.aulast=Codd&rft.aufirst=E.F."> 
3. ^ 1990">Codd, E.F. (1990). The Relational Model for Database Management, Version 2, Addison Wesley Publishing Company. ISBN 0-201-14192-2.1990&rft.edition=Version%202&rft.pub=Pearson%20PLC%7CAddison%20Wesley%20Publishing%20Company"> 
4. ^ Coles, Michael (June 272005). "Four Rules for Nulls". SQL Server Central.2005&rft.aulast=Coles,%20Michael&rft_id=http%3A%2F%2Fwww.sqlservercentral.com%2Fcolumnists%2Fmcoles%2Ffourrulesfornulls.asp"> 
5. ^ ISO/IEC (2003). ISO/IEC 9075-2:2003, "SQL/Foundation". ISO/IEC, Section 8.7: null predicate. 
6. ^ ISO/IEC (2003). ISO/IEC 9075-1:2003, "SQL/Framework". ISO/IEC, Section 4.4.2: The null value. 
7. ^ Coles, Michael (February 262007). "Null Versus Null?". SQL Server Central.2007&rft.aulast=Coles,%20Michael&rft_id=http%3A%2F%2Fwww.sqlservercentral.com%2Fcolumnists%2Fmcoles%2F2829.asp"> 
8. ^ ISO/IEC (2003). ISO/IEC 9075-2:2003, "SQL/Foundation". ISO/IEC, Section 6.2.6: numeric value expressions. .
9. ^ ISO/IEC (2003). ISO/IEC 9075-2:2003, "SQL/Foundation". ISO/IEC, Section 6.2.8: string value expression. .
10. ^ ISO/IEC (2003). ISO/IEC 9075-2:2003, "SQL/Foundation". ISO/IEC, Section 4.15.4: Aggregate functions. .
11. ^ ISO/IEC (2003). ISO/IEC 9075-2:2003, "SQL/Foundation". ISO/IEC, Section 3.1.6.8: Definitions: distinct. 
12. ^ ISO/IEC (2003). ISO/IEC 9075-2:2003, "SQL/Foundation". ISO/IEC, Section 6.11: case expression. .
13. ^ 1990">Codd, E.F. (1990). The Relational Model for Database Management, Version 2, Addison Wesley Publishing Company. ISBN 0-201-14192-2.1990&rft.edition=Version%202&rft.pub=Pearson%20PLC%7CAddison%20Wesley%20Publishing%20Company"> 
14. ^ Darwen, Hugh; Chris Date. The Third Manifesto. Retrieved on May 29, 2007.
15. ^ Darwen, Hugh. The Askew Wall (PDF). Retrieved on May 29, 2007.
16. ^ Date, Chris (May 2005). Database in Depth: Relational Theory for Practicioners. O'Reilly Media, Inc., 73. ISBN 0-5961-0012-4. 
17. ^ Date, Chris. Abstract: The Closed World Assumption. Data Management Association, San Francisco Bay Area Chapter. Retrieved on May 29, 2007.

Further reading

External links

  评论这张
 
阅读(1795)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017