len vs datalength - measuring data size

lenVsDatalengthData stored in a database consumes disk space. Big amount of data usually means a decrease in performance and an increase in the storage cost. A database designer have some influence on it, for example by choosing the most suitable datatype for each case. It is not an easy task and a big book could be written on this subject, but there are two important SQL Server functions that make this task a little bit easier. Those functions are: len() and datalength().

In some way, both functions meaure a size of data, but they do it in a different way. This article concentrates on those differences.

 

Syntax

Syntax of both functions are as follows:

len( <string expression> )
datalength( <expression> )

The difference is small, but it exists. Len gets a string expression. Of course due to an implicit conversion, you can pass a numeric value to it, but it will still be analyzed by len as a string expression.

Datalength, in turn, can operate on any expression, even if the expression result is a number or a date.

 

Result

Len function returns a number of characters of the string expression passed as the function parameter. What might be important - trailing spaces are not counted.

Datalength function returns a number of bytes required to store the expression. The number might depend on the database settings like collation or compatibility level.

While len strictly operates on the logical level, datalength is strongly related to the implementation of the particular data type.

 

Examples

Numbers

For testing purposes, the following table was created and two rows were inserted.

CREATE TABLE testNumbers (
	tinyint1 tinyint,
	smallint1 smallint,
	int1 int,
	decimal1 decimal(10,4),
	real1 real
);
GO

INSERT INTO testNumbers
VALUES (0, 0, 0, 0, 0), (34, 34, 34, 34, 34);

The following query returns len and datalength for each value for each data type.

SELECT int1, len(tinyint1), len(smallint1), len(int1), len(decimal1), len(real1)
FROM testNumbers;
SELECT int1, datalength(tinyint1), datalength(smallint1), datalength(int1), datalength(decimal1), datalength(real1)
FROM testNumbers;
value len(tinyint1) len(smallint1) len(int1) len(decimal1) len(real1)
0 1 1 1 6 1
34 2 2 2 7 2
value datalength(tinyint1)  datalength(smallint1) datalength(int1) datalength(decimal1) datalength(real1)
0 1 2 4 5 4
34 1 2 4 5 4

Values returned by len are the same for almost all data types - one for 0 and two for 34. An exception is the decimal column but it can be easily explained - decimal data type has a fixed decimal places so 0 is 0.0000 not 0 and 34 is 34.0000 not 34 when stored as decimal. It is worth to notice that in spite of the fact that len requires a string expression it also works for numbers, but it converts them to a string and calculates its length.

Datalength, in turn, returned a variety of numbers. Tinyint always consumes 1 byte, no matter how big the number is, smallint consumes two bytes, int - four bytes etc.

 

Texts

A new table - testTexts is used to show differences in calculations for string expressions: null, empty string and 'abc'.

CREATE TABLE testTexts (
	char1 char(10),
	varchar1 varchar(10),
	nvarchar1 nvarchar(10),
	text1 text,
	image1 image
);
GO

INSERT INTO testTextsVALUES (null, null, null, null, null), ('', '', '', '', ''), ('abc', 'abc', 'abc', 'abc', 'abc');

Analogically to the example with numbers, the following query show results of len and datalength for string expressions. As len does not work for text and image data types, they are skipped in the query.

SELECT varchar1, len(char1), len(varchar1), len(nvarchar1)
FROM testTexts
SELECT varchar1, datalength(char1), datalength(varchar1), datalength(nvarchar1), datalength(text1), datalength(image1)
FROM testTexts
value  len(char1) len(varchar1) len(nvarchar1) len(text1)
NULL NULL NULL NULL NULL
  0 0 0 0
abc 3 3 3 3
value  datalength(char1) datalength(varchar1) datalength(nvarchar1) datalength(text1)
NULL NULL NULL NULL NULL
  10 0 0 0
abc 10 3 6 3

Len consistently counts number of characters disregard of the data type. As physical implementation is important for datalength, a number of consumed bytes differs between columns. For example, char(10) column takes ten characters no matter how long data is. In this example both empty string and 'abc' require 10 bytes of disk space.

Another interesting observation is made on nvarchar column. It allows national characters and it is implemented in a way that two bytes are used to store a single character. It is clearly visible for 'abc' value which consumes 6 bytes on that data type.

 

Summary

Both functions are very useful. I have an impression that len is more widely known and used by database developers because calculating a string length is frequently needed for string manipulation. Datalength is not so popular and generally more used by database designers than developers. Nevertheless, both apparently do similar job, but they differ significantly.