Replacing MySQL Full-text search with Sphinx

It’s very handy to have FT search out of the box, but there are several drawbacks attached. Problem is that MyISAM Full-text search is not designed to handle big amounts of text data. If you plan to index more than 1M documents you will probably need to take a look on the external search system like Lucene or Sphinx. For the usual LAMP-based service I personally would prefer to use Sphinx as it provides simple transition from MySQL FT and easy to integrate into any application (Sphinx could be queried via native APIs or via MySQL protocol).

Say we have table called <my_table> with `title` and `content` text fields. In MySQL you have to fire query like this:

SELECT * FROM <my_table> WHERE MATCH(`title`,`content`) AGAINST ('I love Sphinx');

Let’s see how could we do the same query with Sphinx.

There are two steps to run the Sphinx as a MySQL FT replacement. First you will need to pull all needed text data from MySQL to Sphinx. For that you have to configuresource and index definition in Sphinx search config. Second step is to run indexer program which will connect to MySQL and fetch all the desired data. Then fire up the search daemon which will serve queries. Simplified Sphinx configuration example is below:

You need to let Sphinx know where to look for the data (source configuration):

source my_source
{
    type      = mysql
    sql_host  = localhost
    sql_user  = sphinx
    sql_pass  = ********
    sql_db    = <my_database_name>
    sql_port  = 3306
    sql_query = SELECT id, title, description FROM <my_table>
}

Please note that id field in sql_query. This field MUST be positive integer and have to be unique across all the documents in collection. Auto_incremented integer primary key from MySQL table with work in this case like a charm.

Now we need to tell Sphinx where to store all that data and configure index:

index my_first_sphinx_index
{
    source        = my_source
    path          = <path_to_sphinx_home>/var/index1
    docinfo       = extern
    charset_type  = utf-8
}

That’s it. Let’s add few more required sections to complete configuration:

Indexer settings:

indexer
{
    mem_limit    = 256M
    write_buffer = 8M
}

and daemon configuration:

searchd
{
    listen                  = 9312
    listen                  = 9306:mysql41
    pid_file                = <sphinx_path>/var/searchd.pid
    max_matches             = 1000
}

Put blocks above to <sphinx_path>/etc/sphinx.conf file which will be your main Sphinx configuration file.
Please also make sure that <sphinx_path>/var directory is writable for user you planning to run sphinx daemon.

Now we have to perform initial indexing by running indexer binary

<sphinx_path>/bin/indexer my_first_sphinx_index -c <sphinx_path>/etc/sphinx.conf

indexer my_first_sphinx_index tells indexer to create index called my_first_sphinx_index described in sphinx config. To create all the indexes at once (if you have two or more) run indexer –all -c <sphinx_path>/etc/sphinx.conf

Now you have to run the search daemon apparently called searchd

<sphinx_path>/bin/searchd -c <sphinx_path>/etc/sphinx.conf

Now Sphinx should be able to answer queries.

Fire up mysql client and connect to brand new Sphinx installation:

$ mysql -h0 -P 9306
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.0.3-id64-dev (rel20-r3043)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Please note server version. It is Sphinx! Now you can hire all power of Sphinx full-text query language!

mysql> SELECT * FROM my_first_sphinx_index WHERE
MATCH('I love Sphinx') LIMIT 0,5; SHOW META;
+---------+--------+
| id      | weight |
+---------+--------+
| 7637682 |   2652 |
| 6598265 |   2612 |
| 6941386 |   2612 |
| 6913297 |   2584 |
| 7139957 |   1667 |
+---------+--------+
5 rows in set (0.01 sec)

+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| total         | 51     |
| total_found   | 51     |
| time          | 0.013  |
| keyword[0]    | love   |
| docs[0]       | 227990 |
| hits[0]       | 472541 |
| keyword[1]    | sphinx |
| docs[1]       | 114    |
| hits[1]       | 178    |
+---------------+--------+
9 rows in set (0.00 sec)
mysql>

Please note – Sphinx returns document IDs, not a document content, so you need to query MySQL to fetch additional fields: SELECT * FROM <my_table> WHERE id IN (7637682, 6598265, …, 7139957)

Config above is fully working but very simple and provided as an example. You can download it from this website directly or using wget:

wget http://astellar.com/downloads/sphinx.conf

Another way to create initial Sphinx configuration is to adopt Sphinx configuration sample called sphinx-min.conf.dist bundled to the Sphinx RPM and Deb packets.

You could also learn more about Sphinx tips and tricks from my talks on various conferences and meetups, read blog posts about Sphinx and follow me on twitter.

If you are looking for help with Sphinx installation and integration, troubleshooting and fine tuning please contact me for a quote with your problem description.

Enjoy!

P.S. If you found this article useful please share it!

Anuncios
Publicado en Uncategorized

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

A %d blogueros les gusta esto: