Fresh dogfood: Migrating to InnoDB fulltext search on bugs.mysql.com
Even frequent visitors to bugs.mysql.com can sometimes miss the little note in the bottom right corner of each page:
Page generated in 0.017 sec. using MySQL 5.6.11-enterprise-commercial-advanced-log
That text changed this past weekend, going from MySQL Enterprise 5.6.10 to 5.6.11. But more importantly, the collection of MyISAM tables which support the bugs system were also converted to InnoDB. There’s a little story to tell here about eating this particular helping of dogfood which also amplifies changelog comments, so here it is:
We like to keep bugs.mysql.com on a current release of MySQL, and we started looking to upgrade from 5.5.27 shortly after GA. In doing so, we also noticed that the majority of tables were still using MyISAM. Because bugs.mysql.com leverage fulltext search, we’ve been stuck on MyISAM for ages – the upgrade to 5.6 presented the opportunity to convert our tables to InnoDB. Naturally, we wanted to test and verify that the functionality of the website wouldn’t change dramatically due to the conversion, so we upgraded a slave and started some basic testing.
Our initial testing found that some searches of the staging system returned no results, while the same search in production returned many. In particular, searches for command-line options such as –skip-networking consistently returned no results in InnoDB. We filed internal bug 16419661, and the root cause was quickly determined. With 5.6.10, InnoDB used two different parsing algorithms when indexing values for a fulltext index and when processing a search string. Like MyISAM, InnoDB considers hyphens a word delimiter, so “–skip-networking” was indexed as two words (“skip” and “networking”) – easily confirmed with the new INFORMATION_SCHEMA tables supporting InnoDB fulltext:
mysql> SET GLOBAL innodb_ft_aux_table='test/ft'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT word FROM information_schema.innodb_ft_index_cache; +------------+ | word | +------------+ | networking | | skip | +------------+ 2 rows in set (0.08 sec)mysql> SELECT * FROM
It’s a bit more work to get information on the indexed contents from MyISAM, but it can be done with the myisam_ftdump utility:
mysql> ALTER TABLE test.ft ENGINE = MyISAM; Query OK, 1 row affected (0.36 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> exit Bye D:\mysql-advanced-5.6.11-win32>bin\myisam_ftdump.exe data\test\ft 0 -d 0 0.9775171 networking 0 0.9775171 skip D:\mysql-advanced-5.6.11-win32>
So, if InnoDB and MyISAM both parse “–skip-networking” similarly when indexing values, why did MyISAM return results while InnoDB returned nothing? Because InnoDB parsed search terms differently, and would effectively look in the fulltext index for the full “–skip-networking” word – which it could never find, due to how values are parsed as they are indexed. This is fixed in 5.6.11, and the changelog has the following note:
InnoDB: When parsing a delimited search string such as “abc-def” in a full-text search,
InnoDBnow uses the same word delimiters as
MyISAM. (Bug #16419661)
Vlad extracted the unique search terms from the past 300+ days from the bugs.mysql.com logs, and scripted execution of the same against both the current production and staged servers, comparing the results on each of them. There were some with minor differences in terms of weighting, but the only other behavior which surfaced was related to how InnoDB handles BOOLEAN MODE modifiers:
MyISAM requires the modifiers to not have any whitespace separation, while InnoDB does allow it. For example:
mysql> INSERT INTO ft VALUES ('arugula basalt chameleon'); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula -basalt' IN BOOLEAN MODE); Empty set (0.00 sec) mysql> ALTER TABLE ft ENGINE = InnoDB; Query OK, 3 rows affected (1.55 sec)Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula -basalt' IN BOOLEAN MODE); Empty set (0.00 sec) mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula - basalt' IN BOOLEAN MODE); Empty set (0.02 sec)
If you try to confuse matters by using multiple boolean search modifiers, InnoDB will look at the first modifier, while MyISAM appears to look at the one immediately preceding the word:
mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula -+base chameleon' IN BOOLEAN MODE); Empty set (0.00 sec) mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula -+basalt chameleon' IN BOOLEAN MODE); +--------------------------+ | a | +--------------------------+ | arugula basalt chameleon | +--------------------------+ 1 row in set (0.00 sec) mysql> ALTER TABLE ft ENGINE = InnoDB; Query OK, 3 rows affected (1.55 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula -+base chameleon' IN BOOLEAN MODE); Empty set (0.03 sec) mysql> SELECT * FROM ft -> WHERE MATCH(a) AGAINST ('arugula -+basalt chameleon' IN BOOLEAN MODE) Empty set (0.00 sec)
Quite frankly, I’m not sure whether either is “correct” in any formal sense. I filed bug 69032 if you have opinions on the matter.
While we updated bugs.mysql.com to 5.6 in early March, we weren’t able to convert the tables which supported fulltext searches to InnoDB until this weekend’s upgrade to 5.6.11. After testing thousands of search criteria used over the past year and comparing the results side-by-side, we felt confident converting our remaining MyISAM tables to InnoDB, and expect the results for such searches today to be largely identical to what they would be using MyISAM. If you have been keeping MyISAM tables around due to dependence on fulltext search capabilities, now is the time to start looking at InnoDB’s new capabilities in 5.6.