Archive for December, 2009

last minute sell-off

Thursday, December 31st, 2009

as we moved into the last minutes of the year, a sell off developed in the market, indexes fell by 1.17% in just a few minutes

will this be the face of the coming year or is it just what it is? last minute profit realization.

How fast is Google realtime anyway

Wednesday, December 9th, 2009

The much talked Google realtime search release arrived here today, so i was apt to test it.

i created a tweet “How fast is Google real time anyway? http://bit.ly/66Z5Cj” and voila! not 2 minutes later and i was at the head of the Google search results, this is awesome!

image

so i was wondering, is it possible to takeover the fire hose?

in theory, you can create a tweet-storm, by connecting a few twitter and facebook accounts with retweets and status updates, the realtime fire hose will deliver this directly to Google allowing for an instant astro-turfing of topics.

i created a new account and posted a tweet,this time on the Copenhagen climate convention, and waited, nothing happened,

i then retweeted it from my real twitter account and got the immediate Google take

image

i played with it just a little bit, but it seems possible to create a trend or take over a topic using such  a combination.

leaving the conspiring theme aside, the interface is really cool and useful, only thing is that in a normal search, the “latest results” appears “somewhere in the middle” in a messy kind of way:

image

all and all, this is a really nice feature.

AdSense this

Thursday, December 3rd, 2009

So I was test driving the latest chrome version today (I know I’m late, I penalized chrome a few months ago after it caused some slowness and crashed on my sluggish laptop) then I saw this:

chrome-adsense

And i thought to my self, adsense.. what adsense?

BTW: don’t you just die for this theme? you can download it here.

Search server 2008 – recap

Wednesday, December 2nd, 2009

A note to the reader : this post is a technical recap of a project i was involved in, if you are reading it out of interest in the author, you can stop now, Thanks mom ;) . However if you are involved in an implementation of a MOSS 2007 search project, then by all means, carry on.

I recently implemented a fairly large search system as a part of a larger application in my company, while the deployment of the Search server express 2008 as a stand alone server is fairly straight forward, along the development process you may encounter some of the following issues:

pdf indexing – this can be a pain, for a good tutorial on how to install and configure all the bits required see here and for 64bit here

dictionary, noise words and thesaurus files – pay attention to the location of the files, this is a tricky one.

Note : A thesaurus file cannot contain duplicate records and will crash when encountering one leaving a message in the event log

This is a fairly good tutorial on total hits and paging search results.

did you mean” issue – this is actually a feature request waiting to materialize, our customer wanted to be able to control the terms in the “did you mean” feature, this seems reasonable enough, however, its not even manageable, this functionality relies on a file, the nlgindexlexicon.lex, that is generated through the indexing process and does not have any interface or API, its default location is <drive>:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\<guid>\Projects\Portal_Content\Indexer\CiFiles, if you do require to change it, MS recommends to listen on the file events and change it after a change event occurred as in the following code:

private void SetNlgLexiconFileWatcher()
{
    Logger.LogEntry("start SetNlgLexiconFileWatcher");
    fsw = new System.IO.FileSystemWatcher();
    fsw.NotifyFilter = NotifyFilters.LastWrite;
    fsw.Path = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];
    fsw.Filter = "nlgindexlexicon.lex";
    fsw.EnableRaisingEvents = true;
    fsw.Changed += new System.IO.FileSystemEventHandler(nlgIndexLexicon_Changed);
    nlgLexiconWaitTimer = new System.Timers.Timer();
    nlgLexiconWaitTimer.Interval = 1000*60*5;//5 minuts
    nlgLexiconWaitTimer.Elapsed += new System.Timers.ElapsedEventHandler(nlgLexiconWaitTimer_Elapsed);
    nlgLexiconWaitTimer.Enabled = false;
}

void nlgLexiconWaitTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
    Logger.LogEntry("in nlgLexiconWaitTimer_Elapsed");
    MergeNlgIndexLexicon();
    Logger.LogEntry("end nlgLexiconWaitTimer_Elapsed");
}

public void MergeNlgIndexLexicon()
{
    Logger.LogEntry("run MergeNlgIndexLexicon");

    try
    {
        fsw.EnableRaisingEvents = false;
        string nlgLexPath = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];
        nlgLexPath = Path.Combine(nlgLexPath, "nlgindexlexicon.lex");
        string[] words = File.ReadAllLines(nlgLexPath);
        HashSet<string> hashSet = new HashSet<string>(words);

        string customDictionaryPath = ConfigurationManager.AppSettings["CustomDictionaryPath"];
        string[] customWords = File.ReadAllLines(customDictionaryPath);
        HashSet<string> customHashSet = new HashSet<string>(customWords);

        hashSet.UnionWith(customHashSet);
        File.WriteAllLines(nlgLexPath, hashSet.ToArray(), Encoding.Unicode);
        nlgLexiconWaitTimer.Stop();//suceeded so stop
        Logger.LogEntry("MergeNlgIndexLexicon finished");
    }
    catch (Exception ex)
    {
        Logger.LogEntry("error in MergeNlgIndexLexicon : " + ex.ToString(), EventLogEntryType.Error);
        try
        {
            nlgLexiconWaitTimer.Start();//failed so start retrying                
        }
        catch (Exception ex1)
        {
            Logger.LogEntry("error in MergeNlgIndexLexicon, could not start nlgLexiconWaitTimer : " + ex1.ToString(), EventLogEntryType.Error);
        }
    }
    finally
    {
        fsw.EnableRaisingEvents = true;
    }
}

void nlgIndexLexicon_Changed(object sender, System.IO.FileSystemEventArgs e)
{
    Logger.LogEntry(string.Format("nlgIndexLexicon_Changed fired for change Type : {0}",e.ChangeType));
    MergeNlgIndexLexicon();
}

You see, when sharepoint access the file, it locks it for several minutes at the end of the indexing process so you have to wait until the locks are gone, hence the timer. 

LIKE statement bug – in this lovely undocumented feature, you will discover that using the LIKE statement will not yield the expected results on long strings, that is because in some table in the search DB, the property value field is limited to 64 characters. i saw it with my own eyes, read my previous blog post here

If you plan to use the QueryEx method of the Search.asmx than stop! and read carefully:

There are at least 2 show stopper issues that will prevent you from using the search.asmx as is.

1. A query for the HitHighlightedSummary or HitHighlightedProperties (used for search term highlighting) may result in the following error :

----------------------------------------

There was an error generating the XML document. ---> The surrogate pair (0xD86E, 0x79) is invalid. A high surrogate character (0xD800 - 0xDBFF) must always be paired with a low surrogate character (0xDC00 - 0xDFFF).

----------------------------------------

This is because the search service does not clear illegal characters before it sends the dataset back at you.

2. The query timeout is hardcoded in the search service code to 10 seconds (!), i know 10 seconds is a long time, yet a complicated search may take longer than that and the user, well, she just wants to get some results..

It was observed that queries which involves dates or custom sorting often yielded a timeout due to this limitation.

To overcome these issues MS recommends to write your own service, to which i can only say, gotcha! but fear not, below you will find all kind of goodies that will make this task easier.

First you may register your webservice inside sharepoint or you may run it on a separate  website along your SharePoint installation, i run it on the side and it works.

One thing, when you access the search from a different website is the shared scope, its easy but not well documented. you have to create the scopes, then make them shared, and the place to do it is in this url :  http://<serveradmin>/ssp/admin/_layouts/viewscopes.aspx?mode=ssp

After copy as shared remember to change the name of the scope.

Paste this code into your web service and implement a simple ValueHelper<T> class (cant give you that)

[WebService()]
public class MySearch
{
    [WebMethod]
    public DataSet QueryEx(string xmlPacket)
    {
        XDocument xdoc = XDocument.Parse(xmlPacket);
        var q = from x in xdoc.Descendants()
                select new SearchQuery()
                {
                    QueryText = x.Descendants("QueryText").First().Value,
                    Language = x.Descendants("QueryText").First().Attribute("language").Value,
                    EnableStemming = ValueHelper.GetValue<bool>(x.Descendants("EnableStemming").First().Value, false),
                    HighlightedSentenceCount = x.Descendants("HighlightedSentenceCount").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("HighlightedSentenceCount").First(), 3) : 3,
                    IgnoreAllNoiseQuery = ValueHelper.GetValue<bool>(x.Descendants("IgnoreAllNoiseQuery").First().Value, true),
                    RowLimit = ValueHelper.GetValue<int>(x.Descendants("Count").First().Value, 10),
                    StartRow = ValueHelper.GetValue<int>(x.Descendants("StartAt").First().Value, 1),
                    Timeout = x.Descendants("Timeout").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("Timeout").First().Value, 10000) : 10000,
                    TrimDuplicates = ValueHelper.GetValue<bool>(x.Descendants("TrimDuplicates").First().Value, true),
                    KeywordInclusion = ValueHelper.GetValue<bool>(x.Descendants("ImplicitAndBehavior").First().Value, false) ? KeywordInclusion.AllKeywords : KeywordInclusion.AnyKeyword

                };
        SearchQuery searchQuery = q.First();
        return QueryEx1(searchQuery);
    }
    [WebMethod]
    public DataSet QueryEx1(SearchQuery searchQuery)
    {
        string spSiteUrl = ConfigurationManager.AppSettings["SPSiteUrl"];
        using (SPSite site = new SPSite(spSiteUrl))
        {

            ServerContext context = ServerContext.GetContext(site);
            using (FullTextSqlQuery query = new FullTextSqlQuery(context))
            {
                query.QueryText = searchQuery.QueryText;
                query.HighlightedSentenceCount = searchQuery.HighlightedSentenceCount;
                query.EnableStemming = searchQuery.EnableStemming;
                query.IgnoreAllNoiseQuery = searchQuery.IgnoreAllNoiseQuery;
                query.RowLimit = searchQuery.RowLimit;
                query.StartRow = searchQuery.StartRow - 1;
                query.Timeout = searchQuery.Timeout;
                query.TrimDuplicates = searchQuery.TrimDuplicates;
                query.ResultTypes = ResultType.RelevantResults;
                query.KeywordInclusion = searchQuery.KeywordInclusion;

                query.TotalRowsExactMinimum = searchQuery.StartRow + (searchQuery.RowLimit * 2);
                query.SiteContext = new Uri(site.Url);
                CultureInfo info = (searchQuery.Language == null) ? CultureInfo.CurrentCulture : new CultureInfo(searchQuery.Language);

                DateTime start = DateTime.Now;
                DateTime end = DateTime.Now;
                try
                {

                    ResultTableCollection resultsCollection = query.Execute();

                    end = DateTime.Now;
                    ResultTable relevantResults = resultsCollection[ResultType.RelevantResults];

                    DataTable results = new DataTable();
                    results.Load(relevantResults, LoadOption.OverwriteChanges);
                    results.ExtendedProperties.Add("TotalRows", relevantResults.TotalRows);
                    results.ExtendedProperties.Add("IsTotalRowsExact", relevantResults.IsTotalRowsExact);

                    foreach (DataRow dr in results.Rows)
                    {
                        for (int i = 0; i < dr.ItemArray.Length; i++)
                        {
                            if (dr[i] is string)
                            {
                                dr[i] = stripNonValidXMLCharacters(dr[i].ToString());
                            }
                        }
                    }
                    results.AcceptChanges();

                    DataSet ds = new DataSet();
                    ds.Tables.Add(results);
                    ds.ExtendedProperties.Add("SpellingSuggestion", resultsCollection.SpellingSuggestion);
                    ds.ExtendedProperties.Add("QueryTerms", resultsCollection.QueryTerms);
                    ds.ExtendedProperties.Add("IgnoredNoiseWords", resultsCollection.IgnoredNoiseWords);

                    return ds;
                }
                catch (Exception ex)
                {
                    throw ex;
                }
            }
        }
        throw new Exception();
    }

    public static string stripNonValidXMLCharacters(string s)
    {
        if (string.IsNullOrEmpty(s)) return string.Empty;

        string xml = Regex.Replace(s, "[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]", "", RegexOptions.Compiled);

        return xml;
    }
}

public class SearchQuery
{
    public string Language { get; set; }
    public bool EnableStemming { get; set; }
    public int HighlightedSentenceCount { get; set; }

    public bool IgnoreAllNoiseQuery { get; set; }
    public KeywordInclusion KeywordInclusion { get; set; }

    public string QueryText { get; set; }

    public int RowLimit { get; set; }
    public int StartRow { get; set; }
    public int Timeout { get; set; }

    public bool TrimDuplicates { get; set; }

}

 

i know it’s not beautiful but it solved my problems and the search seems to be working really well for my customers. so I’m happy.

HTH, Roi