3 Reasons an Indexed Discovery Service Doesn’t Work For Serious Researchers

This post was written by Darcy Katzman on February 24, 2015
Posted Under: Federated Search

Last week we alluded to three obstacles serious researchers face when using a Discovery Service index to retrieve information from their premium sources. Deep Web Technologies uses a real-time approach, choosing to search information sources on-the-fly rather than compiling information into an index as Discovery Services do, for some reasons we’ll discuss in a moment.  But, we should get a few definitions out of the way before we jump  in to why an index doesn’t work for all researchers.rolodex2

What is an index as used by Discovery Services?  An index holds all of the metadata, and sometimes full text, for one or more sources of information, usually external public or premium sources.  For example, Google has an index; its spiders compile information from millions of public websites across the Internet, funneling the information into a single, unified database for users to search. On a much smaller scale, Discovery Services compile information from an organization’s internal catalog, as well as a subset of the customer’s premium and public information sources, if these sources permit their information to be indexed. Discovery Services establish relationships with the information sources directly (if possible), procure the metadata (if they can) and add it to their index.

What is real-time (on-the-fly) searching?  Real-time, or “on-the-fly” searching, sends queries directly to the source – the catalog, premium databases, public sources – retrieves the metadata of top ranked results from each source relative to the user’s search,  then normalizes and aggregates the information into a single display of results.  The results are listed with direct links to the original source of information so users can explore the record and the full text.  There is no index built and the information is not stored.

To be perfectly clear, both of these approaches have their pros and their cons.  And, we certainly love the Google and Bing indexes for the everyday quick search for the nearest coffee house.  But, as we’ve heard from our customers time and again, “We’re serious about our research, and a single index just doesn’t work for us.”  (Note that in this post we’re singling out Discovery Service indexes, but will need to address how this applies to Enterprise Search indexes in a separate post.)

Abe Lederman, our CEO and CTO, addressed this issue before in other blog posts, but this is a good opportunity to reiterate that not all researchers benefit from a single master index such as those used by Discovery Services.  Let’s look at some of the obstacles that serious researchers face when searching an index through a Discovery Service.

  1. Limited access to content
    Serious researchers like to search for critical information on sources they trust.  In fact, there are some researchers who choose to search only and ever their chosen few information sources, excluding all other databases to which their organization may subscribe.  So, a Discovery Service had better, by gum, include all of the “trusted sources” that their serious researchers use or the Discovery Service may not get much use by that researcher.   And herein lies the problem: There are information vendors, trusted sources with critical information, that simply don’t want to share their information with a Discovery Service.  That information is theirs. This is particularly true of sources within industries such as legal and medical. When an information vendor doesn’t permit their information to be indexed, the Discovery Service index won’t contain information from that source.  Bottom line: Serious researchers may not use a Discovery Service because it’s incomplete.  It just doesn’t contain what they need, or want to search.  
  2. Frequency of content changes and updates
    Information vendors, or content sources, vary in how frequently they update their own database of information.  Some sources update daily or even hourly while other sources may update their database with new information every few weeks.  Depending on the need, a serious researcher may require up-to-the-minute, just-published information for a critical, time-sensitive topic. The drive for data may not allow for old information, no, not even one day old.  Could you imagine Kayak, a travel metasearch site, showing day-old data?  It just wouldn’t work.   Current information can make or break a researcher, and a Discovery Service index that is updated weekly from some, or even half, of the trusted data sources it indexes may not give serious researchers what they need to stay current.  Bottom line: Serious researchers require current information, not stale data.
  3. Muddiness about source searching and clickthrough
    In some Discovery Services, searching one specific source (or even just a handful of sources) is difficult, even impossible. An index contains vast amounts of information, but doesn’t always allow researchers to limit or search a specific source easily, pinpoint the source they need, or click through to the metadata or full text of a document directly at the source.  Discovery Services often create a uniform-looking record for results to create a consistent look and feel, but contain no way to click directly to the record available at the source.  Knowing where information is coming from and clicking through to the result directly at the source can be an important step in the research process.  Bottom line:  A serious researcher can’t afford to spend extra time narrowing down their information sources, or performing extra clicks to go directly to the source itself.  They need to be able to search one or more of their trusted sources rather than the entire index, and click directly to the source for further review and additional information.

At Deep Web Technologies we perform a real-time search, offering an alternative to the Discovery Service indexed approach. Our federated search retrieves information from sources that can’t be accessed through an index, sources that perform frequent updates to metadata, and sources that users want to search individually due to their more extensive, trusted database of information.  If your organization already uses a Discovery Service but you need more precise information for your serious researches, federated search can complement your Discovery Service by adding real-time results from content sources you don’t currently include in your Discovery Service.

We’d love to hear from researchers about this – What has your experience been with Discovery Services?

Reader Comments

Add a Comment

required, use real name
required, will not be published
optional, your blog address