Posts Tagged ‘MOSS’

Search server 2008 – recap

Wednesday, December 2nd, 2009

A note to the reader : this post is a technical recap of a project i was involved in, if you are reading it out of interest in the author, you can stop now, Thanks mom ;) . However if you are involved in an implementation of a MOSS 2007 search project, then by all means, carry on.

I recently implemented a fairly large search system as a part of a larger application in my company, while the deployment of the Search server express 2008 as a stand alone server is fairly straight forward, along the development process you may encounter some of the following issues:

pdf indexing – this can be a pain, for a good tutorial on how to install and configure all the bits required see here and for 64bit here

dictionary, noise words and thesaurus files – pay attention to the location of the files, this is a tricky one.

Note : A thesaurus file cannot contain duplicate records and will crash when encountering one leaving a message in the event log

This is a fairly good tutorial on total hits and paging search results.

did you mean” issue – this is actually a feature request waiting to materialize, our customer wanted to be able to control the terms in the “did you mean” feature, this seems reasonable enough, however, its not even manageable, this functionality relies on a file, the nlgindexlexicon.lex, that is generated through the indexing process and does not have any interface or API, its default location is <drive>:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\<guid>\Projects\Portal_Content\Indexer\CiFiles, if you do require to change it, MS recommends to listen on the file events and change it after a change event occurred as in the following code:

private void SetNlgLexiconFileWatcher()
{
    Logger.LogEntry("start SetNlgLexiconFileWatcher");
    fsw = new System.IO.FileSystemWatcher();
    fsw.NotifyFilter = NotifyFilters.LastWrite;
    fsw.Path = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];
    fsw.Filter = "nlgindexlexicon.lex";
    fsw.EnableRaisingEvents = true;
    fsw.Changed += new System.IO.FileSystemEventHandler(nlgIndexLexicon_Changed);
    nlgLexiconWaitTimer = new System.Timers.Timer();
    nlgLexiconWaitTimer.Interval = 1000*60*5;//5 minuts
    nlgLexiconWaitTimer.Elapsed += new System.Timers.ElapsedEventHandler(nlgLexiconWaitTimer_Elapsed);
    nlgLexiconWaitTimer.Enabled = false;
}

void nlgLexiconWaitTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
    Logger.LogEntry("in nlgLexiconWaitTimer_Elapsed");
    MergeNlgIndexLexicon();
    Logger.LogEntry("end nlgLexiconWaitTimer_Elapsed");
}

public void MergeNlgIndexLexicon()
{
    Logger.LogEntry("run MergeNlgIndexLexicon");

    try
    {
        fsw.EnableRaisingEvents = false;
        string nlgLexPath = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];
        nlgLexPath = Path.Combine(nlgLexPath, "nlgindexlexicon.lex");
        string[] words = File.ReadAllLines(nlgLexPath);
        HashSet<string> hashSet = new HashSet<string>(words);

        string customDictionaryPath = ConfigurationManager.AppSettings["CustomDictionaryPath"];
        string[] customWords = File.ReadAllLines(customDictionaryPath);
        HashSet<string> customHashSet = new HashSet<string>(customWords);

        hashSet.UnionWith(customHashSet);
        File.WriteAllLines(nlgLexPath, hashSet.ToArray(), Encoding.Unicode);
        nlgLexiconWaitTimer.Stop();//suceeded so stop
        Logger.LogEntry("MergeNlgIndexLexicon finished");
    }
    catch (Exception ex)
    {
        Logger.LogEntry("error in MergeNlgIndexLexicon : " + ex.ToString(), EventLogEntryType.Error);
        try
        {
            nlgLexiconWaitTimer.Start();//failed so start retrying                
        }
        catch (Exception ex1)
        {
            Logger.LogEntry("error in MergeNlgIndexLexicon, could not start nlgLexiconWaitTimer : " + ex1.ToString(), EventLogEntryType.Error);
        }
    }
    finally
    {
        fsw.EnableRaisingEvents = true;
    }
}

void nlgIndexLexicon_Changed(object sender, System.IO.FileSystemEventArgs e)
{
    Logger.LogEntry(string.Format("nlgIndexLexicon_Changed fired for change Type : {0}",e.ChangeType));
    MergeNlgIndexLexicon();
}

You see, when sharepoint access the file, it locks it for several minutes at the end of the indexing process so you have to wait until the locks are gone, hence the timer. 

LIKE statement bug – in this lovely undocumented feature, you will discover that using the LIKE statement will not yield the expected results on long strings, that is because in some table in the search DB, the property value field is limited to 64 characters. i saw it with my own eyes, read my previous blog post here

If you plan to use the QueryEx method of the Search.asmx than stop! and read carefully:

There are at least 2 show stopper issues that will prevent you from using the search.asmx as is.

1. A query for the HitHighlightedSummary or HitHighlightedProperties (used for search term highlighting) may result in the following error :

----------------------------------------

There was an error generating the XML document. ---> The surrogate pair (0xD86E, 0x79) is invalid. A high surrogate character (0xD800 - 0xDBFF) must always be paired with a low surrogate character (0xDC00 - 0xDFFF).

----------------------------------------

This is because the search service does not clear illegal characters before it sends the dataset back at you.

2. The query timeout is hardcoded in the search service code to 10 seconds (!), i know 10 seconds is a long time, yet a complicated search may take longer than that and the user, well, she just wants to get some results..

It was observed that queries which involves dates or custom sorting often yielded a timeout due to this limitation.

To overcome these issues MS recommends to write your own service, to which i can only say, gotcha! but fear not, below you will find all kind of goodies that will make this task easier.

First you may register your webservice inside sharepoint or you may run it on a separate  website along your SharePoint installation, i run it on the side and it works.

One thing, when you access the search from a different website is the shared scope, its easy but not well documented. you have to create the scopes, then make them shared, and the place to do it is in this url :  http://<serveradmin>/ssp/admin/_layouts/viewscopes.aspx?mode=ssp

After copy as shared remember to change the name of the scope.

Paste this code into your web service and implement a simple ValueHelper<T> class (cant give you that)

[WebService()]
public class MySearch
{
    [WebMethod]
    public DataSet QueryEx(string xmlPacket)
    {
        XDocument xdoc = XDocument.Parse(xmlPacket);
        var q = from x in xdoc.Descendants()
                select new SearchQuery()
                {
                    QueryText = x.Descendants("QueryText").First().Value,
                    Language = x.Descendants("QueryText").First().Attribute("language").Value,
                    EnableStemming = ValueHelper.GetValue<bool>(x.Descendants("EnableStemming").First().Value, false),
                    HighlightedSentenceCount = x.Descendants("HighlightedSentenceCount").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("HighlightedSentenceCount").First(), 3) : 3,
                    IgnoreAllNoiseQuery = ValueHelper.GetValue<bool>(x.Descendants("IgnoreAllNoiseQuery").First().Value, true),
                    RowLimit = ValueHelper.GetValue<int>(x.Descendants("Count").First().Value, 10),
                    StartRow = ValueHelper.GetValue<int>(x.Descendants("StartAt").First().Value, 1),
                    Timeout = x.Descendants("Timeout").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("Timeout").First().Value, 10000) : 10000,
                    TrimDuplicates = ValueHelper.GetValue<bool>(x.Descendants("TrimDuplicates").First().Value, true),
                    KeywordInclusion = ValueHelper.GetValue<bool>(x.Descendants("ImplicitAndBehavior").First().Value, false) ? KeywordInclusion.AllKeywords : KeywordInclusion.AnyKeyword

                };
        SearchQuery searchQuery = q.First();
        return QueryEx1(searchQuery);
    }
    [WebMethod]
    public DataSet QueryEx1(SearchQuery searchQuery)
    {
        string spSiteUrl = ConfigurationManager.AppSettings["SPSiteUrl"];
        using (SPSite site = new SPSite(spSiteUrl))
        {

            ServerContext context = ServerContext.GetContext(site);
            using (FullTextSqlQuery query = new FullTextSqlQuery(context))
            {
                query.QueryText = searchQuery.QueryText;
                query.HighlightedSentenceCount = searchQuery.HighlightedSentenceCount;
                query.EnableStemming = searchQuery.EnableStemming;
                query.IgnoreAllNoiseQuery = searchQuery.IgnoreAllNoiseQuery;
                query.RowLimit = searchQuery.RowLimit;
                query.StartRow = searchQuery.StartRow - 1;
                query.Timeout = searchQuery.Timeout;
                query.TrimDuplicates = searchQuery.TrimDuplicates;
                query.ResultTypes = ResultType.RelevantResults;
                query.KeywordInclusion = searchQuery.KeywordInclusion;

                query.TotalRowsExactMinimum = searchQuery.StartRow + (searchQuery.RowLimit * 2);
                query.SiteContext = new Uri(site.Url);
                CultureInfo info = (searchQuery.Language == null) ? CultureInfo.CurrentCulture : new CultureInfo(searchQuery.Language);

                DateTime start = DateTime.Now;
                DateTime end = DateTime.Now;
                try
                {

                    ResultTableCollection resultsCollection = query.Execute();

                    end = DateTime.Now;
                    ResultTable relevantResults = resultsCollection[ResultType.RelevantResults];

                    DataTable results = new DataTable();
                    results.Load(relevantResults, LoadOption.OverwriteChanges);
                    results.ExtendedProperties.Add("TotalRows", relevantResults.TotalRows);
                    results.ExtendedProperties.Add("IsTotalRowsExact", relevantResults.IsTotalRowsExact);

                    foreach (DataRow dr in results.Rows)
                    {
                        for (int i = 0; i < dr.ItemArray.Length; i++)
                        {
                            if (dr[i] is string)
                            {
                                dr[i] = stripNonValidXMLCharacters(dr[i].ToString());
                            }
                        }
                    }
                    results.AcceptChanges();

                    DataSet ds = new DataSet();
                    ds.Tables.Add(results);
                    ds.ExtendedProperties.Add("SpellingSuggestion", resultsCollection.SpellingSuggestion);
                    ds.ExtendedProperties.Add("QueryTerms", resultsCollection.QueryTerms);
                    ds.ExtendedProperties.Add("IgnoredNoiseWords", resultsCollection.IgnoredNoiseWords);

                    return ds;
                }
                catch (Exception ex)
                {
                    throw ex;
                }
            }
        }
        throw new Exception();
    }

    public static string stripNonValidXMLCharacters(string s)
    {
        if (string.IsNullOrEmpty(s)) return string.Empty;

        string xml = Regex.Replace(s, "[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]", "", RegexOptions.Compiled);

        return xml;
    }
}

public class SearchQuery
{
    public string Language { get; set; }
    public bool EnableStemming { get; set; }
    public int HighlightedSentenceCount { get; set; }

    public bool IgnoreAllNoiseQuery { get; set; }
    public KeywordInclusion KeywordInclusion { get; set; }

    public string QueryText { get; set; }

    public int RowLimit { get; set; }
    public int StartRow { get; set; }
    public int Timeout { get; set; }

    public bool TrimDuplicates { get; set; }

}

 

i know it’s not beautiful but it solved my problems and the search seems to be working really well for my customers. so I’m happy.

HTH, Roi