Thursday, March 27, 2008

Text Mining in SQL 2008

One way to build a tag cloud based on your full-text document repository - Data Management Views (DMVs) in SQL 2008.

If we had access to the words that had been indexed, it would have been clear what had happened. Well in SQL Server 2008 there are Two dynamic management views that return the keywords of the index, and the keywords of each document in the index. These dmvs are, sys.dm_fts_index_keywords and sys.dm_fts_index_keywords_by_document respectively.

select *
from sys.dm_fts_index_keywords (db_id(),object_id('Table_1'))

select *
from sys.dm_fts_index_keywords_By_Document (db_id(),object_id('Table_1'))

Note: The second dmv whilst is by document does not take a document id as a parameter. This means that this returns all the keywords for all the documents in an index. That will be a lot of rows. As an example if you store the definition from all_sql_modules in master in a table you will have ~ 1780 rows. When index using the default stop lists that will result in ~153000 rows being returned from sys.dm_fts_index_keywords_by_document.

SimonS Blog on SQL Server Stuff : Tips and Tricks

No comments: