Posted Under: Federated Search,The Deep Web
Filtering results is as much an art as a science.
Refining or clarifying an initial set of search results is a fairly common practice among search engine users that don’t know exactly what they are looking for. Yet in federated searching, refining a group of results is often not the best strategy for a user to find the best results they are looking for. Let’s do a quick walk-through of how a user might use a federated search application. Let’s say a user is interested in nanotechnology and is using our Explorit federated search application. When the user inputs the query, the search brings back roughly 2500 results (this is a rough number, and it can vary depending on the sources). After sorting through some of the results, the user decides that their interest really lies in medical nanotechnology, so they limit the search to just “medical nanotechnology”.
To get the best possible results from a federated search application, it makes sense to re-run the search with those two terms. The initial group of results was run with the term “nanotechnology” so it may contain some results with the term “medical” in it but since the original search strategy was to search for the nanotechnology industry as a whole, the odds of the best results for “medical technology” are slim to none. It’s far better for a user of federated search to type in the new query to get results from the various sources that match the new search strategy.
Now that the user has the information, how do they sort through it? Most search engines have their own relevancy ranking tuned to their own content. This means a two-term search strategy might be handled differently than a broader search and likely more efficiently.
The real goal, of course, to get the maximum number of results that match the users search strategy, and give them the tools for sorting through them. This strategy would also hold for true for discovery services as well, since many discovery services (such as Summon) only return 1,000 results in response to the query. Those 1,000 results are tuned to the initial query and refining such a small result set further, wouldn’t serve the needs of their users. The user would be better off, as we mentioned before, re-initiating the search.
Just as a side note, this 1,000 result limit is pretty common in large-scale indexes. Google for example, will only bring back 1,000 results for any query even though Google indicates that thousands or millions of results are available. Google brings back the 1,000 most relevant to the initial query, in an index that is filled with a huge amount of spam. So it’s a better idea to re-initiate your query with the refined search strategy to get access to more information. This 1,000 limitation makes less sense in a product like Summon since presumably the index doesn’t have any spam in it.