Archive for the ‘Work’ Category

Search server 2008 – recap

Wednesday, December 2nd, 2009

A note to the reader : this post is a technical recap of a project i was involved in, if you are reading it out of interest in the author, you can stop now, Thanks mom ;) . However if you are involved in an implementation of a MOSS 2007 search project, then by all means, carry on.

I recently implemented a fairly large search system as a part of a larger application in my company, while the deployment of the Search server express 2008 as a stand alone server is fairly straight forward, along the development process you may encounter some of the following issues:

pdf indexing – this can be a pain, for a good tutorial on how to install and configure all the bits required see here and for 64bit here

dictionary, noise words and thesaurus files – pay attention to the location of the files, this is a tricky one.

Note : A thesaurus file cannot contain duplicate records and will crash when encountering one leaving a message in the event log

This is a fairly good tutorial on total hits and paging search results.

did you mean” issue – this is actually a feature request waiting to materialize, our customer wanted to be able to control the terms in the “did you mean” feature, this seems reasonable enough, however, its not even manageable, this functionality relies on a file, the nlgindexlexicon.lex, that is generated through the indexing process and does not have any interface or API, its default location is <drive>:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\<guid>\Projects\Portal_Content\Indexer\CiFiles, if you do require to change it, MS recommends to listen on the file events and change it after a change event occurred as in the following code:

private void SetNlgLexiconFileWatcher()
{
    Logger.LogEntry("start SetNlgLexiconFileWatcher");
    fsw = new System.IO.FileSystemWatcher();
    fsw.NotifyFilter = NotifyFilters.LastWrite;
    fsw.Path = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];
    fsw.Filter = "nlgindexlexicon.lex";
    fsw.EnableRaisingEvents = true;
    fsw.Changed += new System.IO.FileSystemEventHandler(nlgIndexLexicon_Changed);
    nlgLexiconWaitTimer = new System.Timers.Timer();
    nlgLexiconWaitTimer.Interval = 1000*60*5;//5 minuts
    nlgLexiconWaitTimer.Elapsed += new System.Timers.ElapsedEventHandler(nlgLexiconWaitTimer_Elapsed);
    nlgLexiconWaitTimer.Enabled = false;
}

void nlgLexiconWaitTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
    Logger.LogEntry("in nlgLexiconWaitTimer_Elapsed");
    MergeNlgIndexLexicon();
    Logger.LogEntry("end nlgLexiconWaitTimer_Elapsed");
}

public void MergeNlgIndexLexicon()
{
    Logger.LogEntry("run MergeNlgIndexLexicon");

    try
    {
        fsw.EnableRaisingEvents = false;
        string nlgLexPath = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];
        nlgLexPath = Path.Combine(nlgLexPath, "nlgindexlexicon.lex");
        string[] words = File.ReadAllLines(nlgLexPath);
        HashSet<string> hashSet = new HashSet<string>(words);

        string customDictionaryPath = ConfigurationManager.AppSettings["CustomDictionaryPath"];
        string[] customWords = File.ReadAllLines(customDictionaryPath);
        HashSet<string> customHashSet = new HashSet<string>(customWords);

        hashSet.UnionWith(customHashSet);
        File.WriteAllLines(nlgLexPath, hashSet.ToArray(), Encoding.Unicode);
        nlgLexiconWaitTimer.Stop();//suceeded so stop
        Logger.LogEntry("MergeNlgIndexLexicon finished");
    }
    catch (Exception ex)
    {
        Logger.LogEntry("error in MergeNlgIndexLexicon : " + ex.ToString(), EventLogEntryType.Error);
        try
        {
            nlgLexiconWaitTimer.Start();//failed so start retrying                
        }
        catch (Exception ex1)
        {
            Logger.LogEntry("error in MergeNlgIndexLexicon, could not start nlgLexiconWaitTimer : " + ex1.ToString(), EventLogEntryType.Error);
        }
    }
    finally
    {
        fsw.EnableRaisingEvents = true;
    }
}

void nlgIndexLexicon_Changed(object sender, System.IO.FileSystemEventArgs e)
{
    Logger.LogEntry(string.Format("nlgIndexLexicon_Changed fired for change Type : {0}",e.ChangeType));
    MergeNlgIndexLexicon();
}

You see, when sharepoint access the file, it locks it for several minutes at the end of the indexing process so you have to wait until the locks are gone, hence the timer. 

LIKE statement bug – in this lovely undocumented feature, you will discover that using the LIKE statement will not yield the expected results on long strings, that is because in some table in the search DB, the property value field is limited to 64 characters. i saw it with my own eyes, read my previous blog post here

If you plan to use the QueryEx method of the Search.asmx than stop! and read carefully:

There are at least 2 show stopper issues that will prevent you from using the search.asmx as is.

1. A query for the HitHighlightedSummary or HitHighlightedProperties (used for search term highlighting) may result in the following error :

----------------------------------------

There was an error generating the XML document. ---> The surrogate pair (0xD86E, 0x79) is invalid. A high surrogate character (0xD800 - 0xDBFF) must always be paired with a low surrogate character (0xDC00 - 0xDFFF).

----------------------------------------

This is because the search service does not clear illegal characters before it sends the dataset back at you.

2. The query timeout is hardcoded in the search service code to 10 seconds (!), i know 10 seconds is a long time, yet a complicated search may take longer than that and the user, well, she just wants to get some results..

It was observed that queries which involves dates or custom sorting often yielded a timeout due to this limitation.

To overcome these issues MS recommends to write your own service, to which i can only say, gotcha! but fear not, below you will find all kind of goodies that will make this task easier.

First you may register your webservice inside sharepoint or you may run it on a separate  website along your SharePoint installation, i run it on the side and it works.

One thing, when you access the search from a different website is the shared scope, its easy but not well documented. you have to create the scopes, then make them shared, and the place to do it is in this url :  http://<serveradmin>/ssp/admin/_layouts/viewscopes.aspx?mode=ssp

After copy as shared remember to change the name of the scope.

Paste this code into your web service and implement a simple ValueHelper<T> class (cant give you that)

[WebService()]
public class MySearch
{
    [WebMethod]
    public DataSet QueryEx(string xmlPacket)
    {
        XDocument xdoc = XDocument.Parse(xmlPacket);
        var q = from x in xdoc.Descendants()
                select new SearchQuery()
                {
                    QueryText = x.Descendants("QueryText").First().Value,
                    Language = x.Descendants("QueryText").First().Attribute("language").Value,
                    EnableStemming = ValueHelper.GetValue<bool>(x.Descendants("EnableStemming").First().Value, false),
                    HighlightedSentenceCount = x.Descendants("HighlightedSentenceCount").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("HighlightedSentenceCount").First(), 3) : 3,
                    IgnoreAllNoiseQuery = ValueHelper.GetValue<bool>(x.Descendants("IgnoreAllNoiseQuery").First().Value, true),
                    RowLimit = ValueHelper.GetValue<int>(x.Descendants("Count").First().Value, 10),
                    StartRow = ValueHelper.GetValue<int>(x.Descendants("StartAt").First().Value, 1),
                    Timeout = x.Descendants("Timeout").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("Timeout").First().Value, 10000) : 10000,
                    TrimDuplicates = ValueHelper.GetValue<bool>(x.Descendants("TrimDuplicates").First().Value, true),
                    KeywordInclusion = ValueHelper.GetValue<bool>(x.Descendants("ImplicitAndBehavior").First().Value, false) ? KeywordInclusion.AllKeywords : KeywordInclusion.AnyKeyword

                };
        SearchQuery searchQuery = q.First();
        return QueryEx1(searchQuery);
    }
    [WebMethod]
    public DataSet QueryEx1(SearchQuery searchQuery)
    {
        string spSiteUrl = ConfigurationManager.AppSettings["SPSiteUrl"];
        using (SPSite site = new SPSite(spSiteUrl))
        {

            ServerContext context = ServerContext.GetContext(site);
            using (FullTextSqlQuery query = new FullTextSqlQuery(context))
            {
                query.QueryText = searchQuery.QueryText;
                query.HighlightedSentenceCount = searchQuery.HighlightedSentenceCount;
                query.EnableStemming = searchQuery.EnableStemming;
                query.IgnoreAllNoiseQuery = searchQuery.IgnoreAllNoiseQuery;
                query.RowLimit = searchQuery.RowLimit;
                query.StartRow = searchQuery.StartRow - 1;
                query.Timeout = searchQuery.Timeout;
                query.TrimDuplicates = searchQuery.TrimDuplicates;
                query.ResultTypes = ResultType.RelevantResults;
                query.KeywordInclusion = searchQuery.KeywordInclusion;

                query.TotalRowsExactMinimum = searchQuery.StartRow + (searchQuery.RowLimit * 2);
                query.SiteContext = new Uri(site.Url);
                CultureInfo info = (searchQuery.Language == null) ? CultureInfo.CurrentCulture : new CultureInfo(searchQuery.Language);

                DateTime start = DateTime.Now;
                DateTime end = DateTime.Now;
                try
                {

                    ResultTableCollection resultsCollection = query.Execute();

                    end = DateTime.Now;
                    ResultTable relevantResults = resultsCollection[ResultType.RelevantResults];

                    DataTable results = new DataTable();
                    results.Load(relevantResults, LoadOption.OverwriteChanges);
                    results.ExtendedProperties.Add("TotalRows", relevantResults.TotalRows);
                    results.ExtendedProperties.Add("IsTotalRowsExact", relevantResults.IsTotalRowsExact);

                    foreach (DataRow dr in results.Rows)
                    {
                        for (int i = 0; i < dr.ItemArray.Length; i++)
                        {
                            if (dr[i] is string)
                            {
                                dr[i] = stripNonValidXMLCharacters(dr[i].ToString());
                            }
                        }
                    }
                    results.AcceptChanges();

                    DataSet ds = new DataSet();
                    ds.Tables.Add(results);
                    ds.ExtendedProperties.Add("SpellingSuggestion", resultsCollection.SpellingSuggestion);
                    ds.ExtendedProperties.Add("QueryTerms", resultsCollection.QueryTerms);
                    ds.ExtendedProperties.Add("IgnoredNoiseWords", resultsCollection.IgnoredNoiseWords);

                    return ds;
                }
                catch (Exception ex)
                {
                    throw ex;
                }
            }
        }
        throw new Exception();
    }

    public static string stripNonValidXMLCharacters(string s)
    {
        if (string.IsNullOrEmpty(s)) return string.Empty;

        string xml = Regex.Replace(s, "[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]", "", RegexOptions.Compiled);

        return xml;
    }
}

public class SearchQuery
{
    public string Language { get; set; }
    public bool EnableStemming { get; set; }
    public int HighlightedSentenceCount { get; set; }

    public bool IgnoreAllNoiseQuery { get; set; }
    public KeywordInclusion KeywordInclusion { get; set; }

    public string QueryText { get; set; }

    public int RowLimit { get; set; }
    public int StartRow { get; set; }
    public int Timeout { get; set; }

    public bool TrimDuplicates { get; set; }

}

 

i know it’s not beautiful but it solved my problems and the search seems to be working really well for my customers. so I’m happy.

HTH, Roi

Microsoft Search LIKE statement oddity

Monday, August 31st, 2009

I was working on a project involving Search server and building a search query according to user input, I had stumbled upon a weird bug in the LIKE statement, which caused the query to miss out on relevant data. simply put, the like statement will fail on long strings, I was trying to compare some long pipe concatenated metadata as follows “|item a|item b|…” the LIKE statement would work for item A but for items further along the line it failed silently, not returning results. i counted to the threshold, it was 64 characters. that was odd. searching this issue returned nothing (like is a difficult search term) so I solved the problem with plan B and carried on.

Today i came across a post stating that this in fact is a known limitation, known to who you might ask?

Well, to Steve Curran MVP which cleverly disclose : 

"Yes this is a known limitation. You should avoid using the LIKE predicate in FullTextSQL and use the CONTAINS predicate. It works very well with the Path managed property. In you case just do CONTAINS(Path,’http://servername/sitename/listname/folder’)."

Thank you Steve.

Indeed using a CONTAINS statement will solve the problem and will not (in my specific case) impact rank.

If you read this, HTH.

Searching documents in MOSS 2007

Sunday, August 9th, 2009

I’m doing some work on SharePoint search recently. while the tool is fairly impressive, it comes with its own pitfalls, one of them I will discuss now: from a certain content scope, you want to return only documents, the scope contains content crawled from a web site file share, external to SharePoint. this is somewhat less common and most examples do not refer to it which may be the cause to this peculiarity. now searching the web on how to go along with this requirement will result with several suggestions :

1. add a scope property rule : IsDocument = 1 – not working, no results are returned from any query (not to try on production)

2. add your query with different variations of : (“IsDocument” = 1) – gives an array of weird results and COM exceptions

3. add your query with : freetext(DEFAULTPROPERTIES,’IsDocument:1’) – this is actually documented to be unsupported  and while it did gave promising results, further tests proved it to be inaccurate and insufficient.

so after much fiddling with scope rules I came up with the following solution, since folders do not have contenttype I created a scope rule like so: [contenttype = ] , yes blank. it works for me, hope it will work for you too.

contentTypeRule

AtBroker.exe remote desktop error

Tuesday, June 23rd, 2009

this had been a long time issue for me.

When you are trying connect to a remote desktop which happens to be a windows server 2008 – but can be a vista edition as well, you sometimes get this despicable error message :

AtBroker.exe

The application failed to initialized properly…

And than the black screen of death. (yes black)

What i used to do was restart the machine from another machine using command line, or just go to sleep. (embarrassing i know)

Strangely enough, I was not able to find a solution to this problem.

Today, finally, after receiving this error once again, the Google karma was in my favor and the solution popped up immediately at Spat’s blog

CTRL+ALT+END will get you to the logoff screen, I just choose to lock the machine than entered the password again.

And I was in.

Ahh.. the pains one will go through to stay late at night and work…

IIS7 “cannot open configuration file” issue

Tuesday, May 26th, 2009

 

so I’m working in the IT industry.

I encountered a weird bug the other day

IIS7 could not save simple settings such as mime types and default documents on the account of “cannot open configuration file”

so i looked around a bit and found this hotfix from Microsoft

which you can order and get by mail with password protection.

for you corporate IT girl, you may want to make sure Windows Update service is activated before running the hotfix

a restart is required post-update

and it works!

one thing though : it appears that all sites are stopped after the restart so you may need to restart them, weird  but this what i have experienced.