How to make MySQL treat underscore as a word separator for fulltext search?

Asked
Active3 hr before
Viewed126 times

6 Answers

underscoretreatmysql
90%

However, I have a rather wild suggestion if you can afford the diskspace.,I have never heard of such a plugin. Sure, you could make your own plugin.,If not, does such a plugin already exist or do I have to learn to make my own?, Podcast 394: what if you could invest in your favorite developer?

Let's take an example table:

CREATE TABLE mydb.mytable(
   id int not null auto_increment,
   txt text,
   primary key(id),
   fulltext txt(txt)
) ENGINE = MyISAM;
load more v
88%

For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index., Although the use of multiple character sets within a single table is supported, all columns in a FULLTEXT index must use the same character set and collation. , Note that in some contexts, if you cast an indexed column to BINARY, MySQL is not able to use the index efficiently. , As of MySQL 5.5.6, the stopword file is loaded and searched using latin1 if character_set_server is ucs2, utf16, or utf32. If any table was created with FULLTEXT indexes while the server character set was ucs2, utf16, or utf32, it should be repaired using this statement:

By default or with the IN NATURAL LANGUAGE MODE modifier, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.

mysql > CREATE TABLE articles(- > id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, - > title VARCHAR(200), - > body TEXT, - > FULLTEXT(title, body) - > ) ENGINE = MyISAM;
Query OK, 0 rows affected(0.00 sec) mysql > INSERT INTO articles(title, body) VALUES - > ('MySQL Tutorial', 'DBMS stands for DataBase ...'), - > ('How To Use MySQL Well', 'After you went through a ...'), - > ('Optimizing MySQL', 'In this tutorial we will show ...'), - > ('1001 MySQL Tricks', '1. Never run mysqld as root. 2. ...'), - > ('MySQL vs. YourSQL', 'In the following database comparison ...'), - > ('MySQL Security', 'When configured properly, MySQL ...');
Query OK, 6 rows affected(0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql > SELECT * FROM articles - > WHERE MATCH(title, body) - > AGAINST('database'
   IN NATURAL LANGUAGE MODE); + -- -- + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | id | title | body | + -- -- + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 5 | MySQL vs.YourSQL | In the following database comparison... || 1 | MySQL Tutorial | DBMS stands
for DataBase... | + -- -- + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 2 rows in set(0.00 sec)

To simply count matches, you could use a query like this:

mysql > SELECT COUNT( * ) FROM articles - > WHERE MATCH(title, body) - > AGAINST('database'
   IN NATURAL LANGUAGE MODE); + -- -- -- -- -- + | COUNT( * ) | + -- -- -- -- -- + | 2 | + -- -- -- -- -- + 1 row in set(0.00 sec)

However, you might find it quicker to rewrite the query as follows:

mysql > SELECT - > COUNT(IF(MATCH(title, body) AGAINST('database'
   IN NATURAL LANGUAGE MODE), 1, NULL)) - > AS count - > FROM articles; + -- -- -- - + | count | + -- -- -- - + | 2 | + -- -- -- - +1 row in set(0.03 sec)

The preceding example is a basic illustration that shows how to use the MATCH() function where rows are returned in order of decreasing relevance. The next example shows how to retrieve the relevance values explicitly. Returned rows are not ordered because the SELECT statement includes neither WHERE nor ORDER BY clauses:

mysql > SELECT id, MATCH(title, body) - > AGAINST('Tutorial'
   IN NATURAL LANGUAGE MODE) AS score - > FROM articles; + -- -- + -- -- -- -- -- -- -- -- -- + | id | score | + -- -- + -- -- -- -- -- -- -- -- -- + | 1 | 0.65545833110809 || 2 | 0 || 3 | 0.66266459226608 || 4 | 0 || 5 | 0 || 6 | 0 | + -- -- + -- -- -- -- -- -- -- -- -- + 6 rows in set(0.00 sec)

The following example is more complex. The query returns the relevance values and it also sorts the rows in order of decreasing relevance. To achieve this result, you should specify MATCH() twice: once in the SELECT list and once in the WHERE clause. This causes no additional overhead, because the MySQL optimizer notices that the two MATCH() calls are identical and invokes the full-text search code only once.

mysql > SELECT id, body, MATCH(title, body) AGAINST - > ('Security implications of running MySQL as root' - > IN NATURAL LANGUAGE MODE) AS score - > FROM articles WHERE MATCH(title, body) AGAINST - > ('Security implications of running MySQL as root' - > IN NATURAL LANGUAGE MODE); + -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - + | id | body | score | + -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - + | 4 | 1. Never run mysqld as root.2.... | 1.5219271183014 || 6 | When configured properly, MySQL... | 1.3114095926285 | + -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - +2 rows in set(0.00 sec)

Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word "MySQL" is present in every row of the articles table shown earlier, a search for the word produces no results:

mysql > SELECT * FROM articles - > WHERE MATCH(title, body) - > AGAINST('MySQL'
   IN NATURAL LANGUAGE MODE);
Empty set(0.00 sec)
load more v
72%

CONCAT_WS(separator,str1,str2,...) , The inverse of this function (when called with a single argument) is the EXP() function. , These functions are not implemented in MySQL. , This section discusses XML and related functionality in MySQL.

When an operator is used with operands of different types, type conversion occurs to make the operands compatible. Some conversions occur implicitly. For example, MySQL automatically converts numbers to strings as necessary, and vice versa.

mysql > SELECT 1 + '1'; -
> 2
mysql > SELECT CONCAT(2, ' test'); -
> '2 test'

It is also possible to convert a number to a string explicitly using the CAST() function. Conversion occurs implicitly with the CONCAT() function because it expects string arguments.

mysql > SELECT 38.8, CAST(38.8 AS CHAR); -
> 38.8, '38.8'
mysql > SELECT 38.8, CONCAT(38.8); -
> 38.8, '38.8'

The following examples illustrate conversion of strings to numbers for comparison operations:

mysql > SELECT 1 > '6x'; -
> 0
mysql > SELECT 7 > '6x'; -
> 1
mysql > SELECT 0 > 'x6'; -
> 0
mysql > SELECT 0 = 'x6'; -
> 1

For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:

SELECT * FROM tbl_name WHERE str_col = 1;

Comparisons that use floating-point numbers (or values that are converted to floating-point numbers) are approximate because such numbers are inexact. This might lead to results that appear inconsistent:

mysql > SELECT '18015376320243458' = 18015376320243458; -
> 1
mysql > SELECT '18015376320243459' = 18015376320243459; -
> 0

Such results can occur because the values are converted to floating-point numbers, which have only 53 bits of precision and are subject to rounding:

mysql > SELECT '18015376320243459' + 0.0; -
> 1.8015376320243e+16

The results shown will vary on different systems, and can be affected by factors such as computer architecture or the compiler version or optimization level. One way to avoid such problems is to use CAST() so that a value will not be converted implicitly to a float-point number:

mysql > SELECT CAST('18015376320243459'
   AS UNSIGNED) = 18015376320243459; -
> 1
load more v
65%

Full-text searches have three modes: the natural language mode, the boolean mode, and the query expansion mode.,When running full-text searches in MySQL, keep in mind that there are three search types to choose from:,A natural language search mode, as noted above, is enabled by default or when the IN NATURAL LANGUAGE MODE modifier is specified. This mode performs a natural language search against a given text collection (one or more columns). The basic query format of full-text searches in MySQL should be similar to the following:,These operators allow you to expand the functionality of the search: for example, if you would want to retrieve all rows that contain the word “Demo”, but not “Demo2”, you could use a query like so:

A natural language search mode, as noted above, is enabled by default or when the IN NATURAL LANGUAGE MODE modifier is specified. This mode performs a natural language search against a given text collection (one or more columns). The basic query format of full-text searches in MySQL should be similar to the following:

SELECT * FROM table WHERE MATCH(column) AGAINST(“string” IN NATURAL LANGUAGE MODE);
load more v
75%

CONCAT_WS(separator,str1,str2,...) , MySQL has support for full-text indexing and searching: , In all other cases, the arguments are compared as case-insensitive strings. , In a comparison, BINARY affects the entire operation; it can be given before either operand with the same result.

Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence.

INTERVAL
BINARY, COLLATE
   !
   -(unary minus), ~(unary bit inversion) ^
   *
   , /, DIV, %, MOD -
   , +
   <<
   , >>
   &
   |
   = , <= > , >= , > , <= , < , < > , != , IS, LIKE, REGEXP, IN
BETWEEN, CASE, WHEN, THEN, ELSE
NOT
   &&
   , AND
XOR
   ||
   , OR: =

The precedence of operators determines the order of evaluation of terms in an expression. To override this order and group terms explicitly, use parentheses. For example:

mysql > SELECT 1 + 2 * 3; -
> 7
mysql > SELECT(1 + 2) * 3; -
> 9
load more v
40%

The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.,A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.,The substring function with three parameters provides extraction of a substring that matches an SQL regular expression pattern. The function can be written according to standard SQL syntax:,In addition to these facilities borrowed from LIKE, SIMILAR TO supports these pattern-matching metacharacters borrowed from POSIX regular expressions:

string LIKE pattern[ESCAPE escape - character]
string NOT LIKE pattern[ESCAPE escape - character]

Some examples:

'abc'
LIKE 'abc'
true
   'abc'
LIKE 'a%'
true
   'abc'
LIKE '_b_'
true
   'abc'
LIKE 'c'
false
load more v

Other "underscore-treat" queries related to "How to make MySQL treat underscore as a word separator for fulltext search?"