Is Speed Worth It? (Federated Search vs. Unified Index)

This post was written by Darcy Katzman on October 30, 2009
Posted Under: Federated Search

The appeal of a unified index to search all of an organization’s collections is undeniable.  Lightning fast results make waiting for ticketyou feel as though you are searching Google, and this is particularly appealing for those trained to research on some of the more popular, less credible search engines. There are, however, some problems with the unified index approach which mirrors the approach taken in traditional web indexing.

The Need for Speed
Most researchers understand a unified index as an aggregate of results from multiple databases which quickly summons a result from the database on a user query.  Unlike popular search engines such as Google, a unified index does not crawl publisher’s content, but is updated according to a schedule from meta data repositories.   This equates to an information lag, depending on how often the publisher content is updated and how long it takes to get the information into the repository.  For example, if content is updated weekly, researchers needing up-to-the-minute information will miss relevant results from the publisher’s repository of results because the unified index is not current.

Federated Search, on the other hand, searches repositories in real-time with a time lapse of usually less than thirty seconds, returning articles just published to the database as well as other relevant results.  While this relies on each publisher’s index to be healthy and ready for search, the majority of publishers monitor their applications to minimize down-time.  For many researchers, real-time results is the critical factor to success.

Index Decay

Every index decays over time;  sources change their data structure and the meta data must be loaded and changed in the index.  The less frequently an index is updated, the greater the lag between the index results and the changes made by a publisher.  This may not be a problem if you do not need real-time information, but can present problems for researchers needing time-sensitive data.  Anyone who has done a search on the web and has encountered a 404 or 403 error has encountered an example of index decay.

Who’s Who?
Part of the beauty of federated search is the opportunity to discover new sources of information.  If a particular source of information is returning 2000 results for the term “blue light semiconductor” directly on the source page, and 100 relevant results to the federated search engine, it may be worthwhile to investigate that source further.

In a unified index, the water is muddied because the index serves as the source and it isn’t clear which source returned which results or how many.  This severely cripples the source discovery process of so called “unified discovery services”.  Not knowing what you are searching can be very problematic for researchers or business people.

Vital Collections
Most unified index discovery engines encourage publisher relationships at the expense of broadening their source lists.  For organizations using a small set of publishers, this may not present a problem.  For those depending on other publishers for important data, this is no molehill, but a mountain.

By eliminating the need for “connectors” and the maintenance associated with them, a unified index has created just a single “connector” to information: the index itself.  Inflexibility of adding new collections indicates that what you are getting is another boxed solution.

The unified index presents a viable option for those desiring search speed and a cost efficient approach to distributed search.  The lack of customization for these services may present significant problems for those organizations needing up-to-date information from publisher’s they specify.

Is speed worth it to your organization?

Reader Comments

Librarians have to struggle with what we believe is *best* for the user, and what the users think is *good enough* for them.

It is also heartbreaking (well, at least for me) watch hundreds of thousands of dollars in subscriptions to databases just Not Get Used. If speed means it will get used, then if it’s not “the answer” then it most certainly becomes part of the equation.

I also bet that, as federated search tools have become better at what they do, so will unified index tools.

#1 
Written By Alejandro Garza on November 2nd, 2009 @ 2:07 pm

I think you raise an interesting point. Both tools increase use of databases. Both approaches have their strengths. You just can’t use speed as the sole criteria in a solution. There are also other issues. It’s pretty clear though that having databases in disparate silos is a bad idea since it leads to content databases sitting idle.

#2 
Written By Brian Despain on November 5th, 2009 @ 6:55 pm

I also bet that, as federated search tools have become better at what they do, so will unified index tools.

#3 
Written By http://www.danjiw.com/ on May 19th, 2010 @ 10:06 am

Add a Comment

required, use real name
required, will not be published
optional, your blog address