Connectors – Federated Search’s Strength or Achilles Heel?

This post was written by Brian Despain on November 5, 2010
Posted Under: Federated Search,Product Development

Federated search is only as good as the connectors to the underlying sources. The care and quality with which connectors are written are important to the quality of the results that are brought back from the API or native search interface . Connectors are typically cited as a weak point in federated search. However,  Deep Web Technologies’ (DWT) approach to connectors and connector development represents one of our organizational strengths.

Deep Web Technologies has spent a lot of time building the connector framework we call C2. This framework is based on JRuby, and incorporates the many years of development DWT has spent building connectors. The C2 framework is designed to handle the most complex APIs and authentication mechanisms used to secure content in the deep Web. Additionally, it’s extensible enough to allow the full range of metadata that might be found at specific sources to be extracted. Many traditional federated search providers simplify the metadata returned, or keep it to a subset that all sources share.

DWT has a rapid development cycle for connectors that can best be described as a variant on extreme programming. Connectors are monitored by the connector team on an 8 hour cycle for any issues. If there is a reported issue by the monitoring system or by a customer, the connector team quickly triages the issue, develops a fix, and updates the connector across multiple deployments. Since DWT monitors collections across multiple applications (collection monitoring is a standard feature for all customers at DWT) a wider variety of issues with connectors will be uncovered and corrected.

What does all this technology mean in applicable terms?

Well it’s not uncommon for major sources to change every 6 months. Let’s take a look at how a major source changed and how DWT handled it to illustrate the difference our approach makes. Here’s a report of EBSCO being broke in Serial Solutions 360 Search. What’s important to notice here is the date. This librarian posted the issue on September 7th, 2010. Earlier in August EBSCO substantively changed their interface and this had the issue of breaking 360 Search’s connector EBSCO. Serial Solutions posted a notice to their customers on September 7th, 2010. That is nearly one month after DWT had fixed the issue with our own EBSCO connector.

Here is the account from DWT’s Alan Dawson, Lead Connector Developer

First monitor reports showed failing connectors for EBSCO on August 6th. We had a tentative fix for the new form parameters by end of day. I say tentative because as is the case with large publishers, not all our deployed connectors to EBSCO broke at once. For a period of time you’ll see both interfaces in a kind of duality while caches and individual nodes become synchronized across EBSCO’s infrastructure. Most connectors running the old version of EBSCO connector code were confirmed ‘completely broken’ by August 9th and we had new code deployed everywhere by August 11th.

Monitor reports again indicated failing connectors as early as August 13th and propagating failures were observed as late as the 16th. Here it starts getting cloudy as far as the time line because while this was clearly another adjustment by EBSCO, in retrospect. At the time, some of these failures get tagged as bad fixes to the original problem. Second round of fixes were fully deployed by August 19th.

The advantage of rapid iterative approach to connector development and monitoring is clear. DWT discovered the changes to the EBSCO interface sooner, and updated customer applications before the competition noticed the problem; EBSCO remained broken in Serial Solutions Search 360 a full three weeks after DWT had repaired the issue. That’s why our connectors and connector development approach is our true strength, it allows us to quickly address problems and respond to customer’s needs more quickly.

Reader Comments

Add a Comment

required, use real name
required, will not be published
optional, your blog address