Discovering Discovery Services

This post was written by Darcy Pedersen on February 26, 2010
Posted Under: Marketing Announcements

This article was written by Sol Lederman and is republished from the Federated Search Blog.

Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.

What is a discovery service?
A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.

It’s useful to note that discovery services aren’t new. IngentaConnect makes 4.5 million documents searchable from over 13,000 publishers. Infotrieve provides a document search and delivery service. And, there’s Thomson Reuters’ Web of Science. These are just three examples of discovery services that have existed for a long time. What is new about the recently introduced discovery services is the focus on integration with other content, typically the library’s OPAC. I’ll discuss integration in a separate article.

What is a unified search index?
The terms “unified index” and “unified search index” are associated with discovery services. Just as the terms imply, discovery services use a unified search index to search content from all sources they have access to from a single index. The discovery service must deal with differences in the structure of meta data (e.g. names and contents of fields) from different sources to produce the unified search index.

What is the motivation for discovery services?
In a word, speed. It’s no surprise that users don’t like to wait tens of seconds for their search results. In terms of response time, live searching can’t compete with index searching. A second factor driving the creation of discovery services is the willingness of publishers and content aggregators to form partnerships with developers of the services. Given the pressure to deliver search results in “Google time,” publishers have an incentive to cooperate with one another and with discovery service providers.

Some people say that a third driving factor is cost. While it’s possible that libraries could save money accessing sources via discovery services vs. via federated search, cost figures are very difficult to come by for either so cost may or may not, in reality, be a factor.
Another reason for the big interest in discovery services is that the onerous task of building, monitoring, and repairing connectors disappears since there are no connectors.

Unified indexes provide benefits due to their “homogenization” of meta data. Duplicates should be much easier to remove via discovery services than by federated search engines. And, discovery services will produce more “complete” results, i.e. results with titles, authors, publications dates and other fields of interest that federated search can’t reliably get. With better fielded results it will be easier to cluster and otherwise organize search results.

A potential benefit, but also a potential concern, is relevance ranking. It may be better or worse with discovery services depending on how search is performed. See the next section for further discussion.

Are there downsides to discovery services?
Yes – source lock-in. I’ve written, perhaps ad nauseam, about my concern that discovery services, if not integrated with federated search, force organizations that want a single search tool to choose one service or the other. Federated search is very important for organizations that have particular sources they want to search that are not available from one of the discovery services.

Even if an organization is happy with the set of sources provided through a discovery service, the availability of sources is dependent on the relationship with the publishers (and/or aggregators.) Discovery services are too new to know how publisher relationships will evolve, especially given the competition.

It’s also not clear how discovery services perform search. Let’s say that a particular discovery service has an index that’s built from meta data of its documents and not from its full text. In that case searching the index won’t produce results that are as relevant as results obtained by searching the native source, assuming the native source provides full-text search capability.

Another concern with discovery services is how current their indexes are. When one searches a source via federated search, the content is current because it is searched live. It’s not clear how frequently the discovery service indexes are updated.

The Oregon State University (OSU) Libraries evaluated WorldCat Local and other discovery services and recommended further evaluation and testing.

Reader Comments

Good post and points about downside of discovery services.

#1 
Written By dave tribbett on April 13th, 2010 @ 5:38 am

Another concern with discovery services is how current their indexes are. When one searches a source via federated search, the content is current because it is searched live. It’s not clear how frequently the discovery service indexes are updated.

#2 
Written By http://www.danjiw.com/ on May 19th, 2010 @ 10:12 am

Add a Comment

required, use real name
required, will not be published
optional, your blog address