as we moved into the last minutes of the year, a sell off developed in the market, indexes fell by 1.17% in just a few minutes
will this be the face of the coming year or is it just what it is? last minute profit realization.
as we moved into the last minutes of the year, a sell off developed in the market, indexes fell by 1.17% in just a few minutes
will this be the face of the coming year or is it just what it is? last minute profit realization.
The much talked Google realtime search release arrived here today, so i was apt to test it.
i created a tweet “How fast is Google real time anyway? http://bit.ly/66Z5Cj” and voila! not 2 minutes later and i was at the head of the Google search results, this is awesome!
so i was wondering, is it possible to takeover the fire hose?
in theory, you can create a tweet-storm, by connecting a few twitter and facebook accounts with retweets and status updates, the realtime fire hose will deliver this directly to Google allowing for an instant astro-turfing of topics.
i created a new account and posted a tweet,this time on the Copenhagen climate convention, and waited, nothing happened,
i then retweeted it from my real twitter account and got the immediate Google take
i played with it just a little bit, but it seems possible to create a trend or take over a topic using such a combination.
leaving the conspiring theme aside, the interface is really cool and useful, only thing is that in a normal search, the “latest results” appears “somewhere in the middle” in a messy kind of way:
all and all, this is a really nice feature.
So I was test driving the latest chrome version today (I know I’m late, I penalized chrome a few months ago after it caused some slowness and crashed on my sluggish laptop) then I saw this:
And i thought to my self, adsense.. what adsense?
BTW: don’t you just die for this theme? you can download it here.
A note to the reader : this post is a technical recap of a project i was involved in, if you are reading it out of interest in the author, you can stop now, Thanks mom
. However if you are involved in an implementation of a MOSS 2007 search project, then by all means, carry on.
I recently implemented a fairly large search system as a part of a larger application in my company, while the deployment of the Search server express 2008 as a stand alone server is fairly straight forward, along the development process you may encounter some of the following issues:
pdf indexing – this can be a pain, for a good tutorial on how to install and configure all the bits required see here and for 64bit here
dictionary, noise words and thesaurus files – pay attention to the location of the files, this is a tricky one.
Note : A thesaurus file cannot contain duplicate records and will crash when encountering one leaving a message in the event log
This is a fairly good tutorial on total hits and paging search results.
“did you mean” issue – this is actually a feature request waiting to materialize, our customer wanted to be able to control the terms in the “did you mean” feature, this seems reasonable enough, however, its not even manageable, this functionality relies on a file, the nlgindexlexicon.lex, that is generated through the indexing process and does not have any interface or API, its default location is <drive>:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\<guid>\Projects\Portal_Content\Indexer\CiFiles, if you do require to change it, MS recommends to listen on the file events and change it after a change event occurred as in the following code:
private void SetNlgLexiconFileWatcher(){Logger.LogEntry("start SetNlgLexiconFileWatcher");fsw = new System.IO.FileSystemWatcher();fsw.NotifyFilter = NotifyFilters.LastWrite;fsw.Path = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];fsw.Filter = "nlgindexlexicon.lex";fsw.EnableRaisingEvents = true;fsw.Changed += new System.IO.FileSystemEventHandler(nlgIndexLexicon_Changed);nlgLexiconWaitTimer = new System.Timers.Timer();nlgLexiconWaitTimer.Interval = 1000*60*5;//5 minutsnlgLexiconWaitTimer.Elapsed += new System.Timers.ElapsedEventHandler(nlgLexiconWaitTimer_Elapsed);nlgLexiconWaitTimer.Enabled = false;}void nlgLexiconWaitTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e){Logger.LogEntry("in nlgLexiconWaitTimer_Elapsed");MergeNlgIndexLexicon();Logger.LogEntry("end nlgLexiconWaitTimer_Elapsed");}public void MergeNlgIndexLexicon(){Logger.LogEntry("run MergeNlgIndexLexicon");try{fsw.EnableRaisingEvents = false;string nlgLexPath = ConfigurationManager.AppSettings["nlgIndexLexiconFolder"];nlgLexPath = Path.Combine(nlgLexPath, "nlgindexlexicon.lex");string[] words = File.ReadAllLines(nlgLexPath);HashSet<string> hashSet = new HashSet<string>(words);string customDictionaryPath = ConfigurationManager.AppSettings["CustomDictionaryPath"];string[] customWords = File.ReadAllLines(customDictionaryPath);HashSet<string> customHashSet = new HashSet<string>(customWords);hashSet.UnionWith(customHashSet);File.WriteAllLines(nlgLexPath, hashSet.ToArray(), Encoding.Unicode);nlgLexiconWaitTimer.Stop();//suceeded so stopLogger.LogEntry("MergeNlgIndexLexicon finished");}catch (Exception ex){Logger.LogEntry("error in MergeNlgIndexLexicon : " + ex.ToString(), EventLogEntryType.Error);try{nlgLexiconWaitTimer.Start();//failed so start retrying}catch (Exception ex1){Logger.LogEntry("error in MergeNlgIndexLexicon, could not start nlgLexiconWaitTimer : " + ex1.ToString(), EventLogEntryType.Error);}}finally{fsw.EnableRaisingEvents = true;}}void nlgIndexLexicon_Changed(object sender, System.IO.FileSystemEventArgs e){Logger.LogEntry(string.Format("nlgIndexLexicon_Changed fired for change Type : {0}",e.ChangeType));MergeNlgIndexLexicon();}
You see, when sharepoint access the file, it locks it for several minutes at the end of the indexing process so you have to wait until the locks are gone, hence the timer.
LIKE statement bug – in this lovely undocumented feature, you will discover that using the LIKE statement will not yield the expected results on long strings, that is because in some table in the search DB, the property value field is limited to 64 characters. i saw it with my own eyes, read my previous blog post here
If you plan to use the QueryEx method of the Search.asmx than stop! and read carefully:
There are at least 2 show stopper issues that will prevent you from using the search.asmx as is.
1. A query for the HitHighlightedSummary or HitHighlightedProperties (used for search term highlighting) may result in the following error :
----------------------------------------There was an error generating the XML document. ---> The surrogate pair (0xD86E, 0x79) is invalid. A high surrogate character (0xD800 - 0xDBFF) must always be paired with a low surrogate character (0xDC00 - 0xDFFF).----------------------------------------
This is because the search service does not clear illegal characters before it sends the dataset back at you.
2. The query timeout is hardcoded in the search service code to 10 seconds (!), i know 10 seconds is a long time, yet a complicated search may take longer than that and the user, well, she just wants to get some results..
It was observed that queries which involves dates or custom sorting often yielded a timeout due to this limitation.
To overcome these issues MS recommends to write your own service, to which i can only say, gotcha! but fear not, below you will find all kind of goodies that will make this task easier.
First you may register your webservice inside sharepoint or you may run it on a separate website along your SharePoint installation, i run it on the side and it works.
One thing, when you access the search from a different website is the shared scope, its easy but not well documented. you have to create the scopes, then make them shared, and the place to do it is in this url : http://<serveradmin>/ssp/admin/_layouts/viewscopes.aspx?mode=ssp
After copy as shared remember to change the name of the scope.
Paste this code into your web service and implement a simple ValueHelper<T> class (cant give you that)
[WebService()]public class MySearch{[WebMethod]public DataSet QueryEx(string xmlPacket){XDocument xdoc = XDocument.Parse(xmlPacket);var q = from x in xdoc.Descendants()select new SearchQuery(){QueryText = x.Descendants("QueryText").First().Value,Language = x.Descendants("QueryText").First().Attribute("language").Value,EnableStemming = ValueHelper.GetValue<bool>(x.Descendants("EnableStemming").First().Value, false),HighlightedSentenceCount = x.Descendants("HighlightedSentenceCount").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("HighlightedSentenceCount").First(), 3) : 3,IgnoreAllNoiseQuery = ValueHelper.GetValue<bool>(x.Descendants("IgnoreAllNoiseQuery").First().Value, true),RowLimit = ValueHelper.GetValue<int>(x.Descendants("Count").First().Value, 10),StartRow = ValueHelper.GetValue<int>(x.Descendants("StartAt").First().Value, 1),Timeout = x.Descendants("Timeout").Count() > 0 ? ValueHelper.GetValue<int>(x.Descendants("Timeout").First().Value, 10000) : 10000,TrimDuplicates = ValueHelper.GetValue<bool>(x.Descendants("TrimDuplicates").First().Value, true),KeywordInclusion = ValueHelper.GetValue<bool>(x.Descendants("ImplicitAndBehavior").First().Value, false) ? KeywordInclusion.AllKeywords : KeywordInclusion.AnyKeyword};SearchQuery searchQuery = q.First();return QueryEx1(searchQuery);}[WebMethod]public DataSet QueryEx1(SearchQuery searchQuery){string spSiteUrl = ConfigurationManager.AppSettings["SPSiteUrl"];using (SPSite site = new SPSite(spSiteUrl)){ServerContext context = ServerContext.GetContext(site);using (FullTextSqlQuery query = new FullTextSqlQuery(context)){query.QueryText = searchQuery.QueryText;query.HighlightedSentenceCount = searchQuery.HighlightedSentenceCount;query.EnableStemming = searchQuery.EnableStemming;query.IgnoreAllNoiseQuery = searchQuery.IgnoreAllNoiseQuery;query.RowLimit = searchQuery.RowLimit;query.StartRow = searchQuery.StartRow - 1;query.Timeout = searchQuery.Timeout;query.TrimDuplicates = searchQuery.TrimDuplicates;query.ResultTypes = ResultType.RelevantResults;query.KeywordInclusion = searchQuery.KeywordInclusion;query.TotalRowsExactMinimum = searchQuery.StartRow + (searchQuery.RowLimit * 2);query.SiteContext = new Uri(site.Url);CultureInfo info = (searchQuery.Language == null) ? CultureInfo.CurrentCulture : new CultureInfo(searchQuery.Language);DateTime start = DateTime.Now;DateTime end = DateTime.Now;try{ResultTableCollection resultsCollection = query.Execute();end = DateTime.Now;ResultTable relevantResults = resultsCollection[ResultType.RelevantResults];DataTable results = new DataTable();results.Load(relevantResults, LoadOption.OverwriteChanges);results.ExtendedProperties.Add("TotalRows", relevantResults.TotalRows);results.ExtendedProperties.Add("IsTotalRowsExact", relevantResults.IsTotalRowsExact);foreach (DataRow dr in results.Rows){for (int i = 0; i < dr.ItemArray.Length; i++){if (dr[i] is string){dr[i] = stripNonValidXMLCharacters(dr[i].ToString());}}}results.AcceptChanges();DataSet ds = new DataSet();ds.Tables.Add(results);ds.ExtendedProperties.Add("SpellingSuggestion", resultsCollection.SpellingSuggestion);ds.ExtendedProperties.Add("QueryTerms", resultsCollection.QueryTerms);ds.ExtendedProperties.Add("IgnoredNoiseWords", resultsCollection.IgnoredNoiseWords);return ds;}catch (Exception ex){throw ex;}}}throw new Exception();}public static string stripNonValidXMLCharacters(string s){if (string.IsNullOrEmpty(s)) return string.Empty;string xml = Regex.Replace(s, "[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]", "", RegexOptions.Compiled);return xml;}}public class SearchQuery{public string Language { get; set; }public bool EnableStemming { get; set; }public int HighlightedSentenceCount { get; set; }public bool IgnoreAllNoiseQuery { get; set; }public KeywordInclusion KeywordInclusion { get; set; }public string QueryText { get; set; }public int RowLimit { get; set; }public int StartRow { get; set; }public int Timeout { get; set; }public bool TrimDuplicates { get; set; }}
i know it’s not beautiful but it solved my problems and the search seems to be working really well for my customers. so I’m happy.
HTH, Roi
I’m writing this from my iPod, after downloading my first kindle version book to the kindle for ipod app, i bought the “wealth of nations” for less than a dollar, Adam smith would have been thrilled. while I’m knee deep in a Microsoft environment and google is my home on the web, it’s Amazon (big book worm) and Apple (I know I’m a fashion victim) that really gets me for the cashbox.it’s the micropayments, and the digital delivery that makes the kindle model so appealing, the kindle for iPod is the perfect solution for non coverage areas.
check out my friend’s recipes blog at http://Barons.co.il/matkonim
good luck Erez!
this is becoming a tradition, a server error in the afternoon implies a new version in the evening
Google has released a new version of Google finance, with a new feature, Google Domestic trends.
i will update this post later this evening after some explorations.
I was working on a project involving Search server and building a search query according to user input, I had stumbled upon a weird bug in the LIKE statement, which caused the query to miss out on relevant data. simply put, the like statement will fail on long strings, I was trying to compare some long pipe concatenated metadata as follows “|item a|item b|…” the LIKE statement would work for item A but for items further along the line it failed silently, not returning results. i counted to the threshold, it was 64 characters. that was odd. searching this issue returned nothing (like is a difficult search term) so I solved the problem with plan B and carried on.
Today i came across a post stating that this in fact is a known limitation, known to who you might ask?
Well, to Steve Curran MVP which cleverly disclose :
"Yes this is a known limitation. You should avoid using the LIKE predicate in FullTextSQL and use the CONTAINS predicate. It works very well with the Path managed property. In you case just do CONTAINS(Path,’http://servername/sitename/listname/folder’)."
Thank you Steve.
Indeed using a CONTAINS statement will solve the problem and will not (in my specific case) impact rank.
If you read this, HTH.
The Jewish Buddha says:
If there is no self, whose arthritis is this?
Be here now. Be someplace else later. Is that so complicated?
Drink tea and nourish life; with the first sip, joy; with the second sip, satisfaction; with the third sip, peace; with the fourth, a Danish.
Wherever you go, there you are. Your luggage is another story.
Accept misfortune as a blessing. Do not wish for perfect health, or a life without problems. What would you talk about?
The journey of a thousand miles begins with a single Oy.
There is no escaping karma. In a previous life, you never called, you never wrote, you never visited. And whose fault was that?
Zen is not easy. It takes effort to attain nothingness. And then what do you have? Bupkis.
The Tao does not speak. The Tao does not blame. The Tao does not take sides. The Tao has no expectations. The Tao demands nothing of others. The Tao is not Jewish.
Breathe in. Breathe out. Breathe in. Breathe out. Forget this and attaining Enlightenment will be the least of your problems.
Let your mind be as a floating cloud. Let your stillness be as a wooded glen. And sit up straight. You’ll never meet the Buddha with such rounded shoulders.
Deep inside you are ten thousand flowers. Each flower blossoms ten thousand times. Each blossom has ten thousand petals. You might want to see a specialist.
Be aware of your body. Be aware of your perceptions. Keep in mind that not every physical sensation is a symptom of a terminal illness.
The Torah says, Love your neighbor as yourself. The Buddha says, There is no self. So … maybe we’re off the hook?
excellent by The Big Picture