Deep Web Search – How to optimize queries?

This post was written by Brian Despain on February 3, 2009
Posted Under: The Deep Web

When conducting a Deep Web search, one of the common problems is understanding and optimizing queries.  Most deep web search engines are federated which means the query is passed simultaneously to multiple engines. If the federated search provider has done their job, then the query should seamlessly match to the logic of the native interface. At Deep Web Technologies we develop out connectors to mimic the capabilities of the native interface. However connectors cannot exceed the capacity of the native interface in a federated solution. Very often because we have made the federated search so seamless to the user, they make assumptions about the capabilities about the sources being searched. So here is some simple rules to remember when doing a federated search.

1. Not all sources are created equal. Your complex query might run well on one source but 15 ORS strung together with 4 NOTs isn’t going to run well on most search interfaces. (That’s a real life example btw). So start the query simply and then use a refine search mechanism or our own smart clustering engine to really narrow your query.
2. Start queries simply. Our software provides a robust clustering engine which will enable you sort through the results and categorizes them.
3. Remember this is the deep web, which means the sources are more technical and have far more detail. You are far less likely to encounter keyword spam in the deep web (unlike the surface web which is infested with it). This means highly technical uncommon keywords do well and the sources provide far better information. Here’s an example search fibromyalgia on Google. Conduct the same search at Mednar. Here’s a link to fibromyalgia on Mednar. See the depth of difference in a Deep Web search?
4. Fielded search is your friend. Deep web sources almost always support fielded search. This is a key advantage of the deep web over the surface web. This means you can search multiple fields and expect relevant reasonable responses.

Add a Comment

required, use real name
required, will not be published
optional, your blog address