Thursday, March 27, 2008

Text Mining with SQL 2008 - C is just noise

I see some interesting possibilities, however the limitation is the fact that FTS doesn't show you the positions of words within a document, making ranking a bit difficult.

Here's another function to show how Full Text determines which words work in your search and which ones are just "noise".

SELECT *
FROM sys.dm_fts_parser ('C or c or C++ or c++ or C# or c#', 2057, 0, 0)

Returns the following, which shows what you need to put in to get an exact search on c++, or c#. Capitalise the C. What’s also interesting is that C, C++ both relate to C as well but C# doesn’t, which means it C is removed from the noise word then C++ would return any document containing the word C.

SQL Server 2008 – iFTS Transparency – dm_fts_parser - SimonS Blog on SQL Server Stuff

No comments: