Crawling the Deep Web

Nimish Sawant from LiveMint.com recently published a post on the Deep Web, and some of the services that search it.   He points to the differences between Google and other search appliances such as federated search. Deep Web Portals Nimish raises the most popular search question of our time, “If Google can’t find the data, where exactly is it and why can’t it be crawled?”  He came at this question from a slightly different perspective:

Let’s try to decode the deep Web by virtue of content. A database contains information stored in tables that are created by programs such as Access, SQL or Oracle. This data can only be retrieved by posting a query. The query, when executed, searches the database to come up with the result that has been specified. This is very different from searching static Web pages that can be accessed directly by crawlers.

Deep Web Technologies made the list of four companies that utilize federated search for the deep web.  It’s always nice to see articles that recognize our web portals such as Biznar and Mednar for both their Deep Web search capabilities and the federated search technology that powers them.

NFAIS on Multilingual Search

Abe Lederman, our company President recently presented on Multilingual Search at the NFAIS conference yesterday.  His talk, entitleBreaking Down the Language Barrierd “Federated Search: Breaking Down the Language Barrier” addressed the development of translation tools on the web, and how Deep Web Technologies is supporting to organizations find and deliver the most relevant results, regardless of language. The presentation highlights the WorldWideScience.org translation environment currently in our engineering department, where our translation application will integrate with the current version of WorldWideScience.org in June.

Federated Searching – Good Ideas Never Die

Barbara QFederated Searching - Good Ideas Never Dieuint, editor-in-chief of Information Today’s Searcher: The Magazine for Database Professionals has written yet another dazzling federated search article.   “A good federated system imposes a tremendous burden on the builders so the users can feel the search process as effortless.”  Indeed, at Deep Web Technologies, that is exactly what we feel we are doing. We’re creating search systems for our clients that require very little effort on their end, but do exactly what they need.  Her assessment of what produces quality searches (or the lack of them) is spot on as well:

More important, however, are the problems of truly making the systems perform effectively for end-users. Basically, a lot of human intelligence and expertise, not to mention sweat and persistent effort, has to go into these systems to make them “simple” and effective for users. For example, most of the databases have field structures where key metadata resides. A good federated system has to know just how each field in each database is structured and how to transform a search query to extract the needed data. Author or name searching alone involves layers of questions. Do the names appear firstname-lastname or last name-comma-firstname? Are there middle names or middle initials? What separates the components of the names – periods, periods and spaces, just spaces? The list goes on and on – and that’s just for one component.

The article also mentions our company President, Abe Lederman as well as several public-facing portals powered by Deep Web Technologies.  Thank you, Barbara!


Thematically Speaking

Many people are familiar with Google’s homepages, and the “themes” option where users can customize the look and feel of their DW_logo_finalhomepage with gadgets and skins to make it feel more personalized.  In our new federated search application, Deep Web Technologies will have a theming page, where administrators can select a theme for their search engine, or even tailor themes to their preference.  We’ve had several top-notch designers working on basic themes and will be adding more monthly for a wider selection at launch.

Your organization can easily adapt one of our default themes with your preferences, or upload your own design.  Altering themes will require some basic CSS knowledge (or a technical knack).  Administrators will have the ability to change the look and feel of their search engine on the fly, without ever needing to talk to us!

The theming feature will be available in the new product launched later this year.  Deep Web Technologies is currently accepting applicants to beta test the new product and give us feedback on new features such as themes.

Want to be a beta tester or development partner? Let us know.

Discovering Discovery Services

This article was written by Sol Lederman and is republished from the Federated Search Blog.

Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.

What is a discovery service?
A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.

It’s useful to note that discovery services aren’t new. IngentaConnect makes 4.5 million documents searchable from over 13,000 publishers. Infotrieve provides a document search and delivery service. And, there’s Thomson Reuters’ Web of Science. These are just three examples of discovery services that have existed for a long time. What is new about the recently introduced discovery services is the focus on integration with other content, typically the library’s OPAC. I’ll discuss integration in a separate article.

What is a unified search index?
The terms “unified index” and “unified search index” are associated with discovery services. Just as the terms imply, discovery services use a unified search index to search content from all sources they have access to from a single index. The discovery service must deal with differences in the structure of meta data (e.g. names and contents of fields) from different sources to produce the unified search index.

What is the motivation for discovery services?
In a word, speed. It’s no surprise that users don’t like to wait tens of seconds for their search results. In terms of response time, live searching can’t compete with index searching. A second factor driving the creation of discovery services is the willingness of publishers and content aggregators to form partnerships with developers of the services. Given the pressure to deliver search results in “Google time,” publishers have an incentive to cooperate with one another and with discovery service providers.

Some people say that a third driving factor is cost. While it’s possible that libraries could save money accessing sources via discovery services vs. via federated search, cost figures are very difficult to come by for either so cost may or may not, in reality, be a factor.
Another reason for the big interest in discovery services is that the onerous task of building, monitoring, and repairing connectors disappears since there are no connectors.

Unified indexes provide benefits due to their “homogenization” of meta data. Duplicates should be much easier to remove via discovery services than by federated search engines. And, discovery services will produce more “complete” results, i.e. results with titles, authors, publications dates and other fields of interest that federated search can’t reliably get. With better fielded results it will be easier to cluster and otherwise organize search results.

A potential benefit, but also a potential concern, is relevance ranking. It may be better or worse with discovery services depending on how search is performed. See the next section for further discussion.

Are there downsides to discovery services?
Yes – source lock-in. I’ve written, perhaps ad nauseam, about my concern that discovery services, if not integrated with federated search, force organizations that want a single search tool to choose one service or the other. Federated search is very important for organizations that have particular sources they want to search that are not available from one of the discovery services.

Even if an organization is happy with the set of sources provided through a discovery service, the availability of sources is dependent on the relationship with the publishers (and/or aggregators.) Discovery services are too new to know how publisher relationships will evolve, especially given the competition.

It’s also not clear how discovery services perform search. Let’s say that a particular discovery service has an index that’s built from meta data of its documents and not from its full text. In that case searching the index won’t produce results that are as relevant as results obtained by searching the native source, assuming the native source provides full-text search capability.

Another concern with discovery services is how current their indexes are. When one searches a source via federated search, the content is current because it is searched live. It’s not clear how frequently the discovery service indexes are updated.

The Oregon State University (OSU) Libraries evaluated WorldCat Local and other discovery services and recommended further evaluation and testing.

Our New Partnership with Swets

Deep Web Technologies is pleased to announce our new partnership with Swets!

BIG_Swets+tag_CMYK

Our next-generation federated search product, Explorit,  now powers the SwetsWise Searcher module, designed to simplify access to subscription sources with a single, easy-to-use interface.

Swets is the world’s leading subscription services company, serving over 160 countries. Building on more than 105 years of experience, Swets is a true “long tail” powerhouse that provides the most comprehensive and sophisticated e-commerce platform currently available in its field.

Read our press release here.

D.C. For A Week

washingtondc_mapUPDATE: Due to weather restrictions, Abe has delayed his trip to the week of March 8th.  Please let us know if you are interested in meeting with him during his visit.

Abe Lederman, Deep Web Technologies’ president, will be visiting the D.C. area next week from Monday the 8th through Thursday the 11th.  Along with a few client meetings, he’ll also be demonstrating an Alpha version of our new federated search application. We’re looking forward to getting feedback. (Note:  If you’re in the D.C. area and would like to meet with Abe, please let us know A.S.A.P. – info at deepwebtech.com.)

We’ll also be looking for additional feedback on our new application in the near future through beta testers…interested?

We Have A Winner!

The Federated Search Blog contest, sponsored by Deep Web Technologies, now has a winner!  The contest was this:  FS Blog

Tell us about the most impressive federated search application you’ve ever seen, or about one you’ve dreamed up. How innovative can federated search be? What unique problems can it solve?

You can read more about the contest, the judges and prizes on the Federated Search Blog.

SaaS-y Federated Search

Software as a Service, otherwise known as SaaS is quickly becoming THE standard for purchasing software solutions.  Systems such as Google Apps, Salesforce and Constant Contact have helped exponentially increase the Easy Streetpopularity of the SaaS application delivery model.

Deep Web Technologies is undergoing an intense engineering effort  to deliver our best in class federated search platform under a SaaS-based delivery model and we are looking to the Deep Web Technologies community to recruit and select beta testers, focus group members and development partners. Many clients are finding that budgets simply can’t support the IT staff it takes to maintain a locally installed application.  The SaaS model allows clients to shift their investments to where it counts, providing significant ROI through a reduction in overhead and maintenance costs by using a shared application hosting environment.  Here at Deep Web Technologies, we want to make your life easier by making access to our product fast and efficient.  If federated search is right for you, or you currently have an application installed locally, you may be interested in some of the benefits of our SaaS application:

  • No installation hassles: SaaS federated search is hosted on our server, so you don’t need to involve IT staff.  A noted concern with this could be application down times should our servers go down.  Deep Web Technologies hosts its applications using a failover architecture and load balancing to ensure your application is always up and running.
  • Automatic upgrades: A centralized architecture means you get the benefit of fast, reliable and free patches and upgrades.
  • Fast Implementation: The SaaS model makes new applications a breeze to build through an intuitive point-and-click interface.
  • Predictable budgets: With a fixed, hosted application fee, your budget for federated search is consistent each month as opposed to installed software.
  • Connector monitoring and maintenance: Connector maintenance is a big part of federated search.  When a source is unavailable, our connector team is alerted and investigates the problem immediately.  If the connector requires an update, we take care of it for you to make sure your sources are at their optimal health at no additional charge.
  • Scalability: As you grow, so do we!

Granted, many of our clients require secure, standalone installations behind firewalls.  For these clients, licensing an application to ensure proprietary information is safeguarded is a must and we have the solution for you.  At Deep Web Technologies, we offer the choice of an installed or hosted application. The SaaS alternative is an excellent choice for those organizations wishing to bypass the overhead associated with on-site installations of federated search.

If you are interested in participating in our Beta Program or becoming a development partner, please sign up here and we’ll send you information.  Our Beta Program is slated to begin the 2nd quarter of this year.

Stanford’s Excellent xSearch

Yesterday I read Grace xSearch2_smlBaysinger’s post on Stanford University’s new xSearch – Multidisciplinary Search Tool which uses the Deep Web Technologies Federated Search, and I need to say that I’m thrilled!  Our product offers features that Stanford wouldn’t have found in one package elsewhere.  For example, we coordinated with their IT department to integrate federated search directly with their WebAuth and LDAP servers – a powerful way for users to authenticate automatically to secure web pages and applications.

We also integrated our Search builder tool into xSearch so that an unlimited number of unique search engines can be created for courses or individuals.  Grace mentioned this capability in her article, referencing Stanford’s accessibility to these specially created search engines:

Create Your Own Search Cluster

It is possible to create your own custom search engine by choosing among the resources available in xSearch. The saved cluster of sources is accessible later in three different ways:

  • Through the xSearch interface after logging in,
  • As a link that you can bookmark or include on a Web page, and
  • As an embedded search box that you can include on a Web page.

Working with Stanford was a unique experience. They were a great team!  I look forward to our continuing relationship as we unveil new functionality and continue to refine our product to Stanford’s specifications.