Federated Search – The Killer iWatch App?

The Beagle Research Group Blog posted “Apple iWatch: What’s the Killer App” on March 10, including this line: “An alert iwatchmight also come from a pre-formed search that could include a constant federated search and data analysis to inform the wearer of a change in the environment that the wearer and only a few others might care about, such as a buy or sell signal for a complex derivative.” While this enticing suggestion is just a snippet in a full post, we thought we’d consider the possibilities this one-liner presents. Could federated search become the next killer app?

Well no, not really. Federated search in and of itself isn’t an application, it’s more of a supporting technology.  It supports real-time searching, rather than indexing, and provides current information on fluxuating information such as weather, stocks, flights, etc.  And that is exactly why it’s killer: Federated Search finds new information of any kind, anywhere, singles out the most precise data to display, and notifies the user to take a look.

In other words, its a great technology for mobile apps to use.  Federated search connects directly to the source of the information, whether medical, energy, academic journals, social media, weather, etc. and finds information as soon as it’s available.  Rather than storing information away, federated search links a person to the data circulating that minute, passing on the newest details as soon as they are available, which makes a huge difference with need-to-know information.  In addition, alerts can be set up to notify the person, researcher, or iWatch wearer of that critical data such as a buy or sell signal as The Beagle Research Group suggests.

Of course, there’s also the issue of real-estate to keep in mind – the iWatch wraps less that 2 inches of display on a wrist.  That’s not much room for a hefty list of information, much less junky results.  What’s important is the single, most accurate piece of information that’s been hand-picked (so to speak) just for you pops up on the screen.  Again, federated search can makes that happen quite easily...it has connections.

There is a world of possibility when it comes to using federated search technology to build applications, whether mobile or for desktop uses. Our on-demand lifestyles require federating, analyzing, and applying all sorts of data, from health, to environment, to social networking. Federated search is not just for librarians finding subscription content anymore.  The next-generation federated search is for everyone in need of information on-the-fly. Don’t worry about missing information (you won’t).  Don’t worry if information is current (it is).  In fact, don’t worry at all. Relax, sit back and get alert notifications to buy that stock, watch the weather driving home, or check out an obscure tweet mentioning one of your hobbies. Your world reports to you what you need to know.  And that, really, is simply killer.

Deep Web:  Legal Due Diligence

Editor’s Note: This is a guest article by Lisa Brownlee. The 2015 edition of her book, “Intellectual Property Due Diligence in Corporate Transactions: Investment, Risk Assessment and Management”, originally published in 2000, will dive into discussions about using the Deep Web and the Dark Web for Intellectual Property research, emphasizing its importance and usefulness when performing legal due-diligence.

Lisa M. Brownlee is a private consultant and has become an authority on the Deep Web and the Dark Web, particularly as they apply to legal due-diligence. She writes and blogs for Thomson Reuters.  Lisa is an internationally-recognized pioneer on the intersection between digital technologies and law.


In this blog post I will delve in some detail into the Deep Web. This expedition will focus exclusively on that part of the Deep Web that excludes the Dark Web.  I cover both Deep Web and Dark Web legal due diligence in more detail in my blog and book, Intellectual Property Due Diligence in Corporate Transactions: Investment, Risk Assessment and Management. In particular, in this article I will discuss the Deep Web as a resource of information for legal due diligence.

When Deep Web Technologies invited me to write this post, I initially intended to primarily delve into the ongoing confusion Binary code and multiple screensregarding Deep Web and Dark Web terminology. The misuse of the terms Deep Web and Dark Web, among other related terms, are problematic from a legal perspective if confusion about those terms spills over into licenses and other contracts and into laws and legal decisions. The terms are so hopelessly intermingled that I decided it is not useful to even attempt untangling them here. In this post, as mentioned, I will specifically cover the Deep Web excluding the Dark Web. The definitions I use are provided in a blog post I wrote on the topic earlier this year, entitled The Deep Web and the Dark Web – Why Lawyers Need to Be Informed.

Deep Web: a treasure trove of and data and other information

The Deep Web is populated with vast amounts of data and other information that are essential to investigate during a legal due diligence in order to find information about a company that is a target for possible licensing, merger or acquisition. A Deep Web (as well as Dark Web) due diligence should be conducted in order to ensure that information relevant to the subject transaction and target company is not missed or misrepresented. Lawyers and financiers conducting the due diligence have essentially two options: conduct the due diligence themselves by visiting each potentially-relevant database and conducting each search individually (potentially ad infinitum), or hire a specialized company such as Deep Web Technologies to design and setup such a search. Hiring an outside firm to conduct such a search saves time and money.

Deep Web data mining is a science that cannot be mastered by lawyers or financiers in a single or a handful of transactions. Using a specialized firm such as DWT has the added benefit of being able to replicate the search on-demand and/or have ongoing updated searches performed. Additionally, DWT can bring multilingual search capacities to investigations—a feature that very few, if any, other data mining companies provide and that would most likely be deficient or entirely missing in a search conducted entirely in-house.

What information is sought in a legal due diligence?

A legal due diligence will investigate a wide and deep variety of topics, from real estate to human resources, to basic corporate finance information, industry and company pricing policies, and environmental compliance. Due diligence nearly always also investigates intellectual property rights of the target company, in a level of detail that is tailored to specific transactions, based on the nature of the company’s goods and/or services. DWT’s Next Generation Federated Search is particularly well-suited for conducting intellectual property investigations.

In sum, the goal of a legal due diligence is to identify and confirm basic information about the target company and determine whether there are any undisclosed infirmities with the target company’s assets and information as presented. In view of these goals, the investing party will require the target company to produce a checklist full of items about the various aspects of the business (and more) discussed above. An abbreviated correlation between the information typically requested in a due diligence and the information that is available in the Deep Web is provided in the chart attached below. In the absence of assistance by Deep Web Technologies with the due diligence, either someone within the investor company or its outside counsel will need to search in each of the databases listed, in addition to others, in order to confirm the information provided by the target company is correct and complete. While representations and warranties are typically given by the target company as to the accuracy and completeness of the information provided, it is also typical for the investing company to confirm all or part of that information, depending on the sensitivities of the transaction and the areas in which the values–and possible risks might be uncovered.

Deep Web Legal Due-Diligence Resource List PDF icon

Check Out Our Article in Multilingual.com Magazine

The April/May 2015 issue of Multilingual.com Magazine features a new article, “Advancing science by overcoming language barriers,” co-authored by DWT’s own Abe Lederman, and Darcy Katzman.   The article discusses the Deep Web vs. the dark web, and the technology needed to find results in scientific and technical, multilingual Deep Web databases.   It also speaks of the efforts of the WorldWideScience Alliance in addressing the global need for a multilingual search through the creation of the WorldWideScience.org federated search application.MultiLingual 2015 Apr May

Think of the Deep Web as more academic — used by knowledge workers, librarians and corporate researchers to access the latest scientific and technical reports, gather competitive intelligence or gain insights from the latest government data published. Most of this information is hidden simply because it has not been “surfaced” to the general public through Google or Bing spiders, or is not available globally because of language barriers. If a publication reaches Google Scholar, chances are, it now floats in the broad net of the shallow web, no longer submerged in the Deep Web. A large number of global science publications are located in the Deep Web, only accessible through passwords, subscriptions and only accessible to native language speakers. These publications, hiding in the Deep Web, limit the spread of science and discovery.

The current issue of Multilingual.com Magazine is free to view at the time of this post.

Signs of Spring – Pollen Allergies!

March 20th marked the first day of spring. Here in northern New Mexico we have seen signs of spring (and allergies) for over a month. The crocus stretched out of the soil in February marking both a celebratory moment for my family, and one of concern.  The weather is already warm and beautiful causing the apricot, plum, and juniper trees to bloom like mad. But because they’ve bloomed so early, will a late freeze wipe out our delicate fruit?  And will we all sniffle and sneeze longer from the thick pollen collecting on our cars and sidewalks?

My questions took me to three different federated search engines to see if I could see what “spring” topics were circulating.

On Biznar, a social media and business search engine, I couldn’t help but search out how others were handling their spring allergies. Some dive into the Claritin box, while others go for a Kettlebell workout.  My family claims to have zero allergies, although we slyly keep a tissue box handy once the juniper pollen begins to circulate.  However, it looks like some research indicates that dairy may offer relief.  I shall eat more yogurt from here forth.

Speaking of pollen, Environar, a federated search portal dedicated to life, medical and earth sciences, had ePollen Allergy Forecast for SANTA FE  NM  87501    Pollen.comxcellent research on pollen through the ages.  Pollen has been used to document climate cycles, and indicate many other factors such as temperature and precipitation during the past 140,000 years or so.  Pollen, atchoo!, is scientifically important.

I particularly enjoyed browsing the government portal, Science.gov, on the effects of climate change on allergies.  I thought this interesting from the Annals of the American Thoracic Society found in PubMed regarding a survey on climate change and health: “A majority of respondents indicated they were already observing health impacts of climate change among their patients, most commonly as increases in chronic disease severity from air pollution (77%), allergic symptoms from exposure to plants or mold (58%), and severe weather injuries (57%).”  I shall buy more tissue.

While my questions may not have precise answers, I can at least plan ahead at the grocery store when I see high pollen counts – yogurt and tissues.  And perhaps I’ll have a new appreciation for the contributions pollen has made to our scientific community.

Explore your Pollen Allergy Forecast at Pollen.com:  http://www.pollen.com/allergy-forecast.asp.  Happy Spring and Happy Searching!


DWT Visits the Library of Congress

Our CEO and CTO, Abe Lederman, journeyed to the Washington D.C. area this week to meet with FEDLINK librarians at the U.S. Capitol at DuskLibrary of Congress.  The Federal Library and Information Network (FEDLINK) serves federal libraries and information centers as their purchasing, training and resource-sharing consortium.

Abe’s main purpose in visiting The District, aside from eating delicious food and admiring the start of the Cherry Blossom bloom (although the peak bloom is next week), was to expand FEDLINK librarians knowledge about federated search.  His presentation explored public federated search science portals as well as multilingual translation tools.  A large part of his talk was on Google, and why it isn’t suitable for serious government researchers.

View Abe’s presentation here.

Welcome to our Board of Advisors

We’d like to officially welcome the new Deep Web Technologies’ Board of Advisors!  The Board of Advisors will provide insights and strategic input to our executive staff and DWT’s mission.

Mary Ellen Bates
Bates Information Services, Inc.

Mary Ellen Bates is the owner of Bates Information Services Inc., which she founded in 1991. She provides businessMary Ellen Bates
research and analysis to strategic decision-makers, and consulting services to the information industry. She is a frequent keynote speaker and the author of seven books and innumerable articles on the information industry. Prior to starting her business, Bates managed specialized libraries within businesses and the federal government for over a decade. She received her Master’s in Library and Information Science from the University of California Berkeley in 1982 and her BA in Philosophy from the University of California Santa Barbara in 1976. She, her spouse and her dogs live near Boulder, Colorado.

Richard Boulderstone
British Library

Richard Boulderstone is the British Library’s Chief Digital Officer with responsibility to drive the long-term digital rboulderstone_lge216transformation of the Library. At the British Library Richard has led; the engagement with the scientific community; its digital library, preservation and web archiving programmes as well as the Library’s IT function. Formerly a Chief Technology Officer and Product Development Director at a number of international information providers, he has led the creation of many information-based products both in the UK and USA. Richard also serves as the Chair on the WorldWideScience Alliance Board.

Doug Dennis
CEO of GridSentry

Doug is currently the CEO of GridSentry.  Recently, he was CFO and Vice President of Strategic Partnering at Doug-Dennis-225x300inno360.  In that role, Mr. Dennis was responsible for partner strategy development and implementation which included building relationships with the portfolio of companies across the knowledge spectrum required to support the inno360 Open Innovation platform including networks of content, community, application, and service providers.  Mr. Dennis has an extensive background in sales, marketing, and manufacturing.  His vision for electronic commerce and web-deployed applications dates back to early uses of the Internet.  He has published articles on electronic commerce and manufacturing information deployment over the web.   He holds patents for 3D based view and markup technology that are currently being deployed in web browsers.

Andrew Mendelsohn
Oracle Database Server Technologies

Andrew Mendelsohn is Executive Vice President for Database Server Technologies at Oracle. He is responsible for andy_mendelsohn[1]the development and product management of Oracle’s family of database products, including software products such as Oracle Database, Oracle TimesTen In-Memory Database, Oracle Berkeley DB, and Oracle NoSQL Database and engineered systems such as Oracle Exadata Database Machine, Oracle Database Appliance, and Oracle Big Data Appliance.

Mr. Mendelsohn has been at Oracle since May 1984. He began his career at Oracle as a developer on Release 5.1 of Oracle Database. Prior to joining Oracle, he worked at HP and ESVEL.

Mr. Mendelsohn holds a BSE in electrical engineering and computer science from Princeton University and performed graduate work in computer science at M.I.T.

Gary Voight
Summa Technologies, Appfluent Technology

Gary Voight has thirty years of experience leading large corporations and industry changing startups.  Currently, he Gary.Voight bigis a board member at Summa Technologies and Appfluent Technology.  Previously, he was the President and CEO at CorasWorks Corporation January 2008 through July 2014. CorasWorks provided applications for the Microsoft SharePoint platform.  Prior to CorasWorks Gary was President and CEO at Archivas, Inc., a venture capital backed digital preservation software firm, that was acquired by Hitachi in February 2007.  Under his direction, Archivas developed digital archiving software, and sold it to financial services, medical, government, services and educational enterprises.

Prior to joining Archivas Gary was President and CEO of Software AG (Americas), where he grew the business to well over $200 million. He was Senior Vice President of Sales, Marketing and Services of SAGA Software at the time of its acquisition by Software AG.  Before SAGA Software, Gary was Vice President of Worldwide Transactions Systems at IBM/Transarc – where he was responsible for most of IBM’s application server, middleware, security and file systems products.  Prior to his tenure at IBM and Transarc, Gary was Vice President of Sales at ISIS Distributed Systems, a subsidiary of Stratus Computer.  In addition, Gary spent over 13 years at Stratus Computer in a wide variety of field and headquarters positions.

Thank you all for contributing to DWT’s success!

Is Vendor-Neutral Searching Important?

As a matter of fact, YES!  In doing research, being vendor-neutral implies, in part, that the order of the results set is impartial, and not biased toward any one information provider. A results set partial to an information vendor, source or database could cripple a serious research endeavour, particularly if a vital component of research is missed, scalehidden, or dropped due to a biased display of results, with a particular vendor’s results bubbling to the top.  In addition, being vendor-neutral implies a design approach compatible with different technologies, and a company philosophy that is willing to integrate with a broad spectrum of sources, technologies and products.

Back in 2010, our CEO and CTO Abe Lederman questioned the bias of information vendors in light of Google’s possible “preferential placement” in search results — “If Google Might Be Doing It…”  He asks, “If Google is being accused of such bias, might not EBSCO or ProQuest also have a bias?”

Deep Web Technologies recently received an email inquiry from someone looking, very specifically, for a vendor-neutral search service.

“We’re looking for a no vendor biased, easily applied and customised, reliable federated search. Can you help?”

The short answer is ABSOLUTELY!  Vendor-neutrality is a Deep Web Technologies specialty and a core value.  In addition to the Deep Web Technologies simple setup and deployment process and our next-generation single-search technology, we believe strongly that no vendor, information provider, database or source should be weighted differently, return results more frequently, or appear highlighted in any way UNLESS requested by you, our customer.  The choice to assign different databases a higher or lower ranking should be dictated by you.  It is your search, after all.

Training for Federal Librarians – March 31, 2015

Federal librarians are part of an ever-growing pool of knowledge workers who increasingly need timely knowledge, greater fedlinkresearch efficiency and information accuracy.  Aggregated databases, journals and e-books sources are now an integral part of a researcher’s repertoire.  One way to speed the research process while still finding accurate information is to use a federated search application to search multiple databases at one time.

Abe Lederman, CEO and CTO of Deep Web Technologies, will visit FEDLINK librarians at the Library of Congress on March 31st for training on the strategic researching of government and public federated search engines (see FEDLINK post). There will be a morning and afternoon session in the Adams Building at the Library of Congress.  Attendees will:

  • Discuss what is federated search
  • Explore science research portals together
  • Access global resources using multilingual translation tools

Contact Jim Oliver (joli@loc.gov) to reserve your space.

3 Reasons an Indexed Discovery Service Doesn’t Work For Serious Researchers

Last week we alluded to three obstacles serious researchers face when using a Discovery Service index to retrieve information from their premium sources. Deep Web Technologies uses a real-time approach, choosing to search information sources on-the-fly rather than compiling information into an index as Discovery Services do, for some reasons we’ll discuss in a moment.  But, we should get a few definitions out of the way before we jump  in to why an index doesn’t work for all researchers.rolodex2

What is an index as used by Discovery Services?  An index holds all of the metadata, and sometimes full text, for one or more sources of information, usually external public or premium sources.  For example, Google has an index; its spiders compile information from millions of public websites across the Internet, funneling the information into a single, unified database for users to search. On a much smaller scale, Discovery Services compile information from an organization’s internal catalog, as well as a subset of the customer’s premium and public information sources, if these sources permit their information to be indexed. Discovery Services establish relationships with the information sources directly (if possible), procure the metadata (if they can) and add it to their index.

What is real-time (on-the-fly) searching?  Real-time, or “on-the-fly” searching, sends queries directly to the source – the catalog, premium databases, public sources – retrieves the metadata of top ranked results from each source relative to the user’s search,  then normalizes and aggregates the information into a single display of results.  The results are listed with direct links to the original source of information so users can explore the record and the full text.  There is no index built and the information is not stored.

To be perfectly clear, both of these approaches have their pros and their cons.  And, we certainly love the Google and Bing indexes for the everyday quick search for the nearest coffee house.  But, as we’ve heard from our customers time and again, “We’re serious about our research, and a single index just doesn’t work for us.”  (Note that in this post we’re singling out Discovery Service indexes, but will need to address how this applies to Enterprise Search indexes in a separate post.)

Abe Lederman, our CEO and CTO, addressed this issue before in other blog posts, but this is a good opportunity to reiterate that not all researchers benefit from a single master index such as those used by Discovery Services.  Let’s look at some of the obstacles that serious researchers face when searching an index through a Discovery Service.

  1. Limited access to content
    Serious researchers like to search for critical information on sources they trust.  In fact, there are some researchers who choose to search only and ever their chosen few information sources, excluding all other databases to which their organization may subscribe.  So, a Discovery Service had better, by gum, include all of the “trusted sources” that their serious researchers use or the Discovery Service may not get much use by that researcher.   And herein lies the problem: There are information vendors, trusted sources with critical information, that simply don’t want to share their information with a Discovery Service.  That information is theirs. This is particularly true of sources within industries such as legal and medical. When an information vendor doesn’t permit their information to be indexed, the Discovery Service index won’t contain information from that source.  Bottom line: Serious researchers may not use a Discovery Service because it’s incomplete.  It just doesn’t contain what they need, or want to search.  
  2. Frequency of content changes and updates
    Information vendors, or content sources, vary in how frequently they update their own database of information.  Some sources update daily or even hourly while other sources may update their database with new information every few weeks.  Depending on the need, a serious researcher may require up-to-the-minute, just-published information for a critical, time-sensitive topic. The drive for data may not allow for old information, no, not even one day old.  Could you imagine Kayak, a travel metasearch site, showing day-old data?  It just wouldn’t work.   Current information can make or break a researcher, and a Discovery Service index that is updated weekly from some, or even half, of the trusted data sources it indexes may not give serious researchers what they need to stay current.  Bottom line: Serious researchers require current information, not stale data.
  3. Muddiness about source searching and clickthrough
    In some Discovery Services, searching one specific source (or even just a handful of sources) is difficult, even impossible. An index contains vast amounts of information, but doesn’t always allow researchers to limit or search a specific source easily, pinpoint the source they need, or click through to the metadata or full text of a document directly at the source.  Discovery Services often create a uniform-looking record for results to create a consistent look and feel, but contain no way to click directly to the record available at the source.  Knowing where information is coming from and clicking through to the result directly at the source can be an important step in the research process.  Bottom line:  A serious researcher can’t afford to spend extra time narrowing down their information sources, or performing extra clicks to go directly to the source itself.  They need to be able to search one or more of their trusted sources rather than the entire index, and click directly to the source for further review and additional information.

At Deep Web Technologies we perform a real-time search, offering an alternative to the Discovery Service indexed approach. Our federated search retrieves information from sources that can’t be accessed through an index, sources that perform frequent updates to metadata, and sources that users want to search individually due to their more extensive, trusted database of information.  If your organization already uses a Discovery Service but you need more precise information for your serious researches, federated search can complement your Discovery Service by adding real-time results from content sources you don’t currently include in your Discovery Service.

We’d love to hear from researchers about this – What has your experience been with Discovery Services?

Do You Mean Federated Search?

Occasionally, DWT employees will test the water when describing what we do by dropping the term “federated search” and trying a more generic description.

“We search all of the databases from a single search box in real-time.”

“We perform a single search of subscribed, public and deep web databases…”

“We capture results from your premium sources, the stuff Google can’t reach, and return them to you ranked and deduplicated…”

If a person is plugged in to the search world, we usually get this response: “Oh!  Do you mean you do Federated Search?”  Bingo.

Over the years, the concept of Federated Search has gone by many different names.  We’ve heard our technola-rose-by-any-other-name-would-smell-as-sweetogy called:

  • Distributed Search
  • Broadcast Search
  • Unified Search
  • Data Fusion
  • Meta-Search
  • Parallel Search
  • Cross-Database Search
  • Single Search
  • One-Search
  • Integrated Search
  • Universal Search

For the most part, all of these mean about the same thing: An application or a service that allows users to submit their query to search multiple, distributed information sources, and retrieve aggregated, ranked and deduplicated results.

But the question remains: Is “federated search” a master index, or a real-time (on-the-fly) search? And this is a very good question, given our familiarity with Google and their enormous public index.  Sol Lederman raised this question back in 2007 on the Federated Search Blog, What’s in a Name?

“The distinction, of course, is crucial. If you’re meeting with a potential customer who believes you’re discussing an approach where all content is harvested, indexed and accessed from one source but you think you’re discussing live search of heterogeneous sources then you’re talking apples and oranges.”

Deep Web Technologies’ real-time approach gives us an advantage over building a master index which we’ll discuss in our next blog post.   In the meantime, can you think of any other names for what we do?  We’d love to hear from you!