It’s very handy to have FT search out of the box, but there are several drawbacks attached. Problem is that MyISAM Full-text search is not designed to handle big amounts of text data. If you plan to index more than 1M documents you will probably need to take a look on the external search system like Lucene or Sphinx. For the usual LAMP-based service I personally would prefer to use Sphinx as it provides simple transition from MySQL FT and easy to integrate into any application (Sphinx could be queried via native APIs or via MySQL protocol).
Say we have table called <my_table> with `title` and `content` text fields. In MySQL you have to fire query like this:
SELECT * FROM <my_table> WHERE MATCH(`title`,`content`) AGAINST ('I love Sphinx');
Let’s see how could we do the same query with Sphinx.
There are two steps to run the Sphinx as a MySQL FT replacement. First you will need to pull all needed text data from MySQL to Sphinx. For that you have to configuresource and index definition in Sphinx search config. Second step is to run indexer program which will connect to MySQL and fetch all the desired data. Then fire up the search daemon which will serve queries. Simplified Sphinx configuration example is below:
You need to let Sphinx know where to look for the data (source configuration):
type = mysql
sql_host = localhost
sql_user = sphinx
sql_pass = ********
sql_db = <my_database_name>
sql_port = 3306
sql_query = SELECT id, title, description FROM <my_table>
Please note that id field in sql_query. This field MUST be positive integer and have to be unique across all the documents in collection. Auto_incremented integer primary key from MySQL table with work in this case like a charm.
Now we need to tell Sphinx where to store all that data and configure index:
source = my_source
path = <path_to_sphinx_home>/var/index1
docinfo = extern
charset_type = utf-8
That’s it. Let’s add few more required sections to complete configuration:
mem_limit = 256M
write_buffer = 8M
and daemon configuration:
listen = 9312
listen = 9306:mysql41
pid_file = <sphinx_path>/var/searchd.pid
max_matches = 1000
Put blocks above to <sphinx_path>/etc/sphinx.conf file which will be your main Sphinx configuration file.
Please also make sure that <sphinx_path>/var directory is writable for user you planning to run sphinx daemon.
Now we have to perform initial indexing by running indexer binary
<sphinx_path>/bin/indexer my_first_sphinx_index -c <sphinx_path>/etc/sphinx.conf
indexer my_first_sphinx_index tells indexer to create index called my_first_sphinx_index described in sphinx config. To create all the indexes at once (if you have two or more) run indexer –all -c <sphinx_path>/etc/sphinx.conf
Now you have to run the search daemon apparently called searchd
<sphinx_path>/bin/searchd -c <sphinx_path>/etc/sphinx.conf
Now Sphinx should be able to answer queries.
Fire up mysql client and connect to brand new Sphinx installation:
$ mysql -h0 -P 9306
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.0.3-id64-dev (rel20-r3043)
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
Please note server version. It is Sphinx! Now you can hire all power of Sphinx full-text query language!
mysql> SELECT * FROM my_first_sphinx_index WHERE
MATCH('I love Sphinx') LIMIT 0,5; SHOW META;
| id | weight |
| 7637682 | 2652 |
| 6598265 | 2612 |
| 6941386 | 2612 |
| 6913297 | 2584 |
| 7139957 | 1667 |
5 rows in set (0.01 sec)
| Variable_name | Value |
| total | 51 |
| total_found | 51 |
| time | 0.013 |
| keyword | love |
| docs | 227990 |
| hits | 472541 |
| keyword | sphinx |
| docs | 114 |
| hits | 178 |
9 rows in set (0.00 sec)
Please note – Sphinx returns document IDs, not a document content, so you need to query MySQL to fetch additional fields: SELECT * FROM <my_table> WHERE id IN (7637682, 6598265, …, 7139957)
Config above is fully working but very simple and provided as an example. You can download it from this website directly or using wget:
Another way to create initial Sphinx configuration is to adopt Sphinx configuration sample called sphinx-min.conf.dist bundled to the Sphinx RPM and Deb packets.
You could also learn more about Sphinx tips and tricks from my talks on various conferences and meetups, read blog posts about Sphinx and follow me on twitter.
If you are looking for help with Sphinx installation and integration, troubleshooting and fine tuning please contact me for a quote with your problem description.
P.S. If you found this article useful please share it!