YEAR(2) Challenge

You might remember this little thing called the Y2K bug.  That’s twelve years ago, but MySQL still supports a YEAR(2) data type.  A recent internal discussion made me realize there were aspects of YEAR(2) that I didn’t fully understand, and prompts me to challenge readers to come up with a truly viable use case for YEAR(2) data types.

A primary reason for storing only the last two digits of years is to save on storage space, so perhaps YEAR(2) makes sense for big data or in situations where storage is constrained, right?  Well, no – the number of bytes used to store YEAR(2) data is the same as what is required when YEAR(4) is used:

mysql> SHOW CREATE TABLE y2\G
*************************** 1. row ***************************
Table: y2
Create Table: CREATE TABLE `y2` (
`y` year(2) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE y4\G
*************************** 1. row ***************************
Table: y4
Create Table: CREATE TABLE `y4` (
`y` year(4) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM y2;
+----------+
| COUNT(*) |
+----------+
| 33554432 |
+----------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM y4;
+----------+
| COUNT(*) |
+----------+
| 33554432 |
+----------+
1 row in set (0.00 sec)

D:\mysql-5.5.20-win32\data\test>dir *.MYD
Volume in drive D is Data
Volume Serial Number is 4015-B2FF

Directory of D:\mysql-5.5.20-win32\data\test

04/25/2012  10:43 AM       234,881,024 y2.MYD
04/25/2012  10:45 AM       234,881,024 y4.MYD

Huh.

Well, that doesn’t make much sense, since YEAR(2) only stores the last two digits, while YEAR(4) stores all four – right?  Nope, YEAR(2) stores the full four digits, and has the same range of values as YEAR(4):

mysql> TRUNCATE y2;
Query OK, 0 rows affected (0.09 sec)

mysql> TRUNCATE y4;
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO y4 VALUES (1901), (2000), (2155);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y4 ORDER BY y;
+------+
| y    |
+------+
| 1901 |
| 2000 |
| 2155 |
+------+
3 rows in set (0.00 sec)

mysql> INSERT INTO y2 SELECT * FROM y4;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y2 ORDER BY y;
+------+
| y    |
+------+
|   01 |
|   00 |
|   55 |
+------+
3 rows in set (0.00 sec)

mysql> ALTER TABLE y2 MODIFY COLUMN y YEAR(4);
Query OK, 0 rows affected (0.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y2 ORDER BY y;
+------+
| y    |
+------+
| 1901 |
| 2000 |
| 2155 |
+------+
3 rows in set (0.00 sec)

OK, so maybe one would use YEAR(2) because they really, really like the fun behavior that assumes that values 70-99 are 1970-1999, and values 01-69 are years 2001-2069:

mysql> TRUNCATE TABLE y2;
Query OK, 0 rows affected (0.02 sec)

mysql> ALTER TABLE y2 MODIFY COLUMN y YEAR(2);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> INSERT INTO y2 VALUES (69), (70);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y2 ORDER BY y;
+------+
| y    |
+------+
|   70 |
|   69 |
+------+
2 rows in set (0.00 sec)

mysql> ALTER TABLE y2 MODIFY COLUMN y YEAR(4);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y2 ORDER BY y;
+------+
| y    |
+------+
| 1970 |
| 2069 |
+------+
2 rows in set (0.00 sec)

Except that behavior is identical to YEAR(4):

mysql> TRUNCATE TABLE y4;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO y4 VALUES (69), (70);
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y4 ORDER BY y;
+------+
| y    |
+------+
| 1970 |
| 2069 |
+------+
2 rows in set (0.00 sec)

There’s one area where behavior is different, and that’s in handling of the numeric value zero:

 

mysql> TRUNCATE TABLE y4;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO y4 VALUES (0);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM y4;
+------+
| y    |
+------+
| 0000 |
+------+
1 row in set (0.00 sec)

mysql> TRUNCATE TABLE y2;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO y2 VALUES (0);
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM y2;
+------+
| y    |
+------+
|   00 |
+------+
1 row in set (0.00 sec)

Looks the same so far, right?  Under the hood, though, it’s not:

mysql> ALTER TABLE y2 MODIFY COLUMN y YEAR(4);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y2;
+------+
| y    |
+------+
| 2000 |
+------+
1 row in set (0.00 sec)

So a numeric zero inserted into a YEAR(2) column will be effectively stored as 2000, while the same value inserted into a YEAR(4) will be stored as “0000” (invalid).  So, there’s one use case: When your application has to insert numeric zero values and have them stored as the year 2000.  Probably not terribly compelling.

Here’s one more not-so-compelling use case:  You really, really care about the number of bytes sent over the network, and only need the last two digits of a year.  As can be seen here, the number of bytes sent is lower for YEAR(2) data than YEAR(4) data – two bytes per row:

 

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from y2 limit 1;
+------+
| y    |
+------+
|   12 |
+------+
1 row in set (0.00 sec)

mysql> show session status like 'bytes_sent';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Bytes_sent    | 77    |
+---------------+-------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from y4 limit 1;
+------+
| y    |
+------+
| 2012 |
+------+
1 row in set (0.00 sec)

mysql> show session status like 'bytes_sent';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Bytes_sent    | 79    |
+---------------+-------+
1 row in set (0.00 sec)

But wait, if you can save bytes there, can you save bytes in other locations, like temporary tables created for sorting?  Nope – the full data is represented in the sort, otherwise this wouldn’t sort correctly:

mysql> INSERT INTO y2 VALUES (1969), (1970), (2000), (2071);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM y2 ORDER BY y;
+------+
| y    |
+------+
|   69 |
|   70 |
|   00 |
|   71 |
+------+
4 rows in set (0.00 sec)

There is some funky behavior related to equality matches, though:

mysql> SELECT y2.y y2, y4.y y4 FROM y2 LEFT JOIN y4 ON (y2.y = y4.y);
+------+------+
| y2   | y4   |
+------+------+
|   69 | NULL |
|   70 | 1970 |
|   00 | 2000 |
|   71 | NULL |
+------+------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM y2 WHERE y = 1969 OR y = 2071;
+------+
| y    |
+------+
|   69 |
|   71 |
+------+
2 rows in set (0.00 sec)

mysql> ALTER TABLE y2 MODIFY COLUMN y YEAR(4);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT y2.y y2, y4.y y4 FROM y2 LEFT JOIN y4 ON (y2.y = y4.y);
+------+------+
| y2   | y4   |
+------+------+
| 1969 | 1969 |
| 1970 | 1970 |
| 2000 | 2000 |
| 2071 | 2071 |
+------+------+
4 rows in set (0.00 sec)

Pretty strange, no?  I actually suspect this may be a bug and will report it as such.

So YEAR(2) really does have the same underlying data storage requirements as YEAR(4), and there’s no internal optimization for sorting or anything like that – in fact, the only areas where there might be optimization seem to produce bad results, as shown in the last example.  There’s extremely limited savings in terms of network overhead, and special handling of numeric zeros.  Since the MySQL server sends the results over the wire with just the last two years, that would be moderately helpful if that’s how your application needs to consume the data – but it also causes the following behavior, which is an absolute deal-breaker for me.  First, here’s the original table creation and data:

mysql> CREATE TABLE yd (y2 YEAR(2), y4 YEAR(4));
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO yd (y2) VALUES (1969), (1970), (2000), (2071);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> UPDATE yd SET y4 = y2;
Query OK, 4 rows affected (0.01 sec)
Rows matched: 4  Changed: 4  Warnings: 0

mysql> SELECT * FROM yd ORDER BY y2;
+------+------+
| y2   | y4   |
+------+------+
|   69 | 1969 |
|   70 | 1970 |
|   00 | 2000 |
|   71 | 2071 |
+------+------+
4 rows in set (0.00 sec)

Guess what happens when you recreate this from mysqldump?

D:\>mysql-5.5.20-win32\bin\mysqldump -uroot -P3307 --add-drop-table test yd | mysql-5.1.38-win32\bin\mysql -uroot -P3307 test

D:\>mysql-5.5.20-win32\bin\mysql -uroot -P3307 test -e"SELECT * FROM yd ORDER BY y2;"
+------+------+
| y2   | y4   |
+------+------+
|   70 | 1970 |
|   71 | 2071 |
|   00 | 2000 |
|   69 | 1969 |
+------+------+

Ouch.  That’s right, mysqldump produces 2-digit output, which – when replayed as an INSERT – prefixes the year with the centuries as described in the manual here.  The following demonstrates what’s really stored in the YEAR(2) columns after restoring using mysqldump:

mysql> UPDATE yd SET y4 = y2;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 4  Changed: 2  Warnings: 0

mysql> SELECT * FROM yd ORDER BY y2;
+------+------+
| y2   | y4   |
+------+------+
|   70 | 1970 |
|   71 | 1971 |
|   00 | 2000 |
|   69 | 2069 |
+------+------+
4 rows in set (0.00 sec)

Nasty stuff, and reason enough for me to never touch YEAR(2).

Knowing the above, my challenge for you as a reader is to identify any compelling use case for YEAR(2).  Please feel free to offer suggestions as comments.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.