Last week we alluded to three obstacles serious researchers face when using a Discovery Service index to retrieve information from their premium sources. Deep Web Technologies uses a real-time approach, choosing to search information sources on-the-fly rather than compiling information into an index as Discovery Services do, for some reasons we’ll discuss in a moment. But, we should get a few definitions out of the way before we jump in to why an index doesn’t work for all researchers.
What is an index as used by Discovery Services? An index holds all of the metadata, and sometimes full text, for one or more sources of information, usually external public or premium sources. For example, Google has an index; its spiders compile information from millions of public websites across the Internet, funneling the information into a single, unified database for users to search. On a much smaller scale, Discovery Services compile information from an organization’s internal catalog, as well as a subset of the customer’s premium and public information sources, if these sources permit their information to be indexed. Discovery Services establish relationships with the information sources directly (if possible), procure the metadata (if they can) and add it to their index.
What is real-time (on-the-fly) searching? Real-time, or “on-the-fly” searching, sends queries directly to the source – the catalog, premium databases, public sources – retrieves the metadata of top ranked results from each source relative to the user’s search, then normalizes and aggregates the information into a single display of results. The results are listed with direct links to the original source of information so users can explore the record and the full text. There is no index built and the information is not stored.
To be perfectly clear, both of these approaches have their pros and their cons. And, we certainly love the Google and Bing indexes for the everyday quick search for the nearest coffee house. But, as we’ve heard from our customers time and again, “We’re serious about our research, and a single index just doesn’t work for us.” (Note that in this post we’re singling out Discovery Service indexes, but will need to address how this applies to Enterprise Search indexes in a separate post.)
Abe Lederman, our CEO and CTO, addressed this issue before in other blog posts, but this is a good opportunity to reiterate that not all researchers benefit from a single master index such as those used by Discovery Services. Let’s look at some of the obstacles that serious researchers face when searching an index through a Discovery Service.
- Limited access to content
Serious researchers like to search for critical information on sources they trust. In fact, there are some researchers who choose to search only and ever their chosen few information sources, excluding all other databases to which their organization may subscribe. So, a Discovery Service had better, by gum, include all of the “trusted sources” that their serious researchers use or the Discovery Service may not get much use by that researcher. And herein lies the problem: There are information vendors, trusted sources with critical information, that simply don’t want to share their information with a Discovery Service. That information is theirs. This is particularly true of sources within industries such as legal and medical. When an information vendor doesn’t permit their information to be indexed, the Discovery Service index won’t contain information from that source. Bottom line: Serious researchers may not use a Discovery Service because it’s incomplete. It just doesn’t contain what they need, or want to search.
- Frequency of content changes and updates
Information vendors, or content sources, vary in how frequently they update their own database of information. Some sources update daily or even hourly while other sources may update their database with new information every few weeks. Depending on the need, a serious researcher may require up-to-the-minute, just-published information for a critical, time-sensitive topic. The drive for data may not allow for old information, no, not even one day old. Could you imagine Kayak, a travel metasearch site, showing day-old data? It just wouldn’t work. Current information can make or break a researcher, and a Discovery Service index that is updated weekly from some, or even half, of the trusted data sources it indexes may not give serious researchers what they need to stay current. Bottom line: Serious researchers require current information, not stale data.
- Muddiness about source searching and clickthrough
In some Discovery Services, searching one specific source (or even just a handful of sources) is difficult, even impossible. An index contains vast amounts of information, but doesn’t always allow researchers to limit or search a specific source easily, pinpoint the source they need, or click through to the metadata or full text of a document directly at the source. Discovery Services often create a uniform-looking record for results to create a consistent look and feel, but contain no way to click directly to the record available at the source. Knowing where information is coming from and clicking through to the result directly at the source can be an important step in the research process. Bottom line: A serious researcher can’t afford to spend extra time narrowing down their information sources, or performing extra clicks to go directly to the source itself. They need to be able to search one or more of their trusted sources rather than the entire index, and click directly to the source for further review and additional information.
At Deep Web Technologies we perform a real-time search, offering an alternative to the Discovery Service indexed approach. Our federated search retrieves information from sources that can’t be accessed through an index, sources that perform frequent updates to metadata, and sources that users want to search individually due to their more extensive, trusted database of information. If your organization already uses a Discovery Service but you need more precise information for your serious researches, federated search can complement your Discovery Service by adding real-time results from content sources you don’t currently include in your Discovery Service.
We’d love to hear from researchers about this – What has your experience been with Discovery Services?
Occasionally, DWT employees will test the water when describing what we do by dropping the term “federated search” and trying a more generic description.
“We search all of the databases from a single search box in real-time.”
“We perform a single search of subscribed, public and deep web databases…”
“We capture results from your premium sources, the stuff Google can’t reach, and return them to you ranked and deduplicated…”
If a person is plugged in to the search world, we usually get this response: “Oh! Do you mean you do Federated Search?” Bingo.
Over the years, the concept of Federated Search has gone by many different names. We’ve heard our technology called:
- Distributed Search
- Broadcast Search
- Unified Search
- Data Fusion
- Parallel Search
- Cross-Database Search
- Single Search
- Integrated Search
- Universal Search
For the most part, all of these mean about the same thing: An application or a service that allows users to submit their query to search multiple, distributed information sources, and retrieve aggregated, ranked and deduplicated results.
But the question remains: Is “federated search” a master index, or a real-time (on-the-fly) search? And this is a very good question, given our familiarity with Google and their enormous public index. Sol Lederman raised this question back in 2007 on the Federated Search Blog, What’s in a Name?
“The distinction, of course, is crucial. If you’re meeting with a potential customer who believes you’re discussing an approach where all content is harvested, indexed and accessed from one source but you think you’re discussing live search of heterogeneous sources then you’re talking apples and oranges.”
Deep Web Technologies’ real-time approach gives us an advantage over building a master index which we’ll discuss in our next blog post. In the meantime, can you think of any other names for what we do? We’d love to hear from you!
I have at least 50 alerts set up through several different services. I know when my husband gives a talk (that he forgot to mention), when my son’s school is referenced in the news, if there’s a storm coming, and when topics of interest are discussed on the web. I stay connected to the information that I need because it finds my email inbox every day, without me even lifting a finger. Bottom line: My daily alerts are important to me.
About two-thirds of my alerts are set up through Deep Web Technologies public portals. I have alerts through Biznar, Mednar, Environar, Techscout and other portals. Each application searches a different set of sources (e.g. Business and Marketing, Medical and Health, Environment, Technical) and each returns specialized results on my topics of interest. I have alerts that are set for only a single source which may not be indexed by Google and doesn’t have an alerts system through its native interface. In this case, I’m getting unique information that I would otherwise need to go directly to the source and perform a search for which is not something I’d carve time out of my day to do.
DWT alerts are simple to set up and forget about. Here’s my preferred routine:
- Log in to a portal, and go to the Alerts homepage.
- Choose my topic, keyword, author or title that I want to automatically search.
- Choose my email preference, which is almost always “If New Results Only”. ( I don’t like logging in to view my alerts, nor do I like to get a bunch of blank emails.)
- Choose the sources or categories that I’d like searched.
- Leave almost everything else as default – frequency (daily), HTML format, etc.
The application scours all of my selected sources for my query term and within an hour, I have a long list of results. After the first alert, I should see only new results that that application hasn’t sent to me yet. And this is what I fell in love with because that’s exactly what I want: New Results, Every Day, In My Email.
Of course, there are plenty of other options for the DWT alerts feature. Some of our customers set up RSS feeds for community news or to feed results directly into a blog sidebar. For groups of researches, the RSS feed option allows for an extremely productive search process, without duplicating efforts.
Alerts keep me plugged in to my communities quite effectively and efficiently and are a well-used feature in the DWT applications. Researchers gravitate toward alerts for the time-saving aspects, particularly for sources that do not have an alerts system in place. Try setting up an alert today on Mednar or Biznar and simplify your information gathering processes!
Let’s face it, organizations love website and application data. While the number of users who “hit” your website applications may be important for understanding how well your marketing campaigns are going, when it comes to your search applications the number of time each of your
subscription and premium sources are “hit” may be more important when the fiscal end of year rolls around. Justifying the expense for those high-priced subscriptions by showing application and source usage, clickthroughs and errors is a valuable resource in and of itself.
Explorit Everywhere! tracks:
- User Queries – this is the number of daily and hourly queries sent to each source. If your researchers are only searching a handful of sources and often excluding the rest, you may decide that some of those resources are superfluous.
- Actual Search Expression – what did researchers search for? Did they find the results they were looking for by clicking through to the result?
- Documents/Results – this shows you how well the source connectors, and the source itself, is performing. Sometimes a connector should be re-evaluated for how it searches a source and how many results it returns.
- Errors – Errors can be the result of an interrupted search or a problem with a source/connector. DWT monitors errors closely and proactively fixes any connector issues.
- Ranking – find which sources are returning relevant results to your user queries (and which aren’t). This often shows surprising results!
Do we track everything? Nope, not by a long shot. There are great, FREE analytics available, such as Google Analytics, that track, for example, IP addresses, landing pages, user location, browsers, and mobile usage. Supplementing with a separate analytics provider is a great idea. You can capture all of the other important marketing information that you may want to look at: How many people went to your application? What pages did they go to? How long were they there? DWT is happy to include the code snippets for your own analytics in your Explorit Everywhere! application.
With a well rounded approach to capture your analytics, your ability to track the success of your application, or the failure of some of your sources, can mean a more tailored approach to your next year, weeding out sources that aren’t used by your researchers, and money in your pocket.
Our customers like Explorit Everywhere! applications because they don’t have to “think” about how relevant the results are; all DWT applications have a five star, relevance-ranking system. Since many researchers just look at the first page of results, Explorit Everywhere! merges, ranks and de-duplicates results from all sources searched so the most relevant results appear at the top of the list. Easy-Peasy.
But many of our researchers perform advanced searches, refine their queries, and want to know exactly what results the source returned and in the order that the source returned them without DWT’s ranking applied. In this case, they have two choices: they can either open a new browser tab, go directly to their source, perform the same search and review their results (the long route), or they can simply filter their Explorit Everywhere! results set by the source and then by sort by Source Order. Voila.
You can see this filter in action by visiting one of our publicly available applications such as Biznar, Mednar or Environar. On the results page:
- “Limit” your results set to the source you want to see results from.
- Select the “Sort by” filter to sort the results by “Source Order”.
You should now see the results page display results in the order that we received them from the source.
For researchers who prefer to see the results from sources directly, this is an efficient substitute for searching sources one by one and can save hours of research time. One search of all of the sources a researcher wants to include, then viewed individually by Source Order can mean less burnout and faster discovery. Source Order may be just a “little” filter on Explorit Everywhere! but for some researchers it’s one of the biggest benefits.
On January 8, 2015, Microsoft published a new, Customer Solution Case Study about Deep Web Technologies’ innovative search technology developed in collaboration with the WorldWideScience Alliance. Using the Microsoft Translation services, the search application WorldWideScience.org allows users to search in their native language, find results from sources around the world, and read the results translated back into their language. In light of the enormous strides made each year in the global scientific community where timely dissemination of the vast published knowledge is critical, WorldWideScience.org increases access to many important databases and encourages international collaboration.
The WorldWideScience Alliance turned to Abe Lederman, Chief Executive Officer and Chief Technology Officer of Deep Web Technologies, to realize its vision of a better, more automated solution with multilingual support. “We wanted to create an application that would make scholarly material more accessible worldwide to both English and non-English speakers,” he says. “For instance, we wanted a French-speaking user to be able to type in a query and find documents written in any language.”
The Case Study, posted to the Microsoft “Customer Stories” page, comes on the heels of a WorldWideScience.org update in 2014, improving the application look and feel and speed. Additionally, 2015 holds a bright future as the study mentions: “To provide better accessibility, WorldWideScience.org also offers a mobile interface. Deep Web Technologies is launching a streamlined HTML5 version that will work with virtually any device, whether PC, phone, or tablet. Other future enhancements include a localization feature that will provide search portals in the user’s native language.”
In response to the Case Study, Olivier Fontana, Director of Product Marketing for Microsoft Translator said, “Microsoft Translator can help customers better reach their internal and external stakeholders across languages. By building on the proven, customizable and scalable Translator API, Deep Web Technologies has developed a solution that has a direct impact on researcher’s ability to learn and exchange with their peers around the world, thereby improving their own research impact.” The Microsoft Translator Team Blog has followed up on the Case Study here.
Oh, and one more thing…WorldWideScience.org is not the only Deep Web Technologies’ multilingual application. WorldWideEnergy translates energy related content into four languages and the United Nations Economic Commission for Africa will be rolling out a multilingual search in 2015.
Read DWT’s Press Release
Deep Web Technologies is pleased to announce that we have signed a partnership agreement with SirsiDynix to resell our Explorit Everywhere! TM search
platform. Explorit Everywhere! will complement and enhance the library management technology solutions provided by SirsiDynix to their customers, providing the best solution in the marketplace for library patrons to access all their subscription content together with their holdings from one search box.
SirsiDynix, the world’s leading provider of library automation solutions, serves more than 23,000 public, academic and special libraries around the world. SirsiDynix is well known for their Integrated Library Systems, Symphony and Horizon.
On September 22, 2014, Swets Information Services B.V. filed for bankruptcy which was subsequently accepted by the court in
Amsterdam. The unfortunate announcement of the Swets bankruptcy took DWT by surprise along many others in the library world. However, Swets and DWT are continuing conversations as Swets determines their path forward.
Our partner since 2010, Swets sold Explorit, rebranded as SwetsWise Searcher, to their global markets. By pushing the envelope in service and product knowledge, Swets created many, mutually happy customers.
DWT has been in contact with our Swetswise Searcher customers to ensure that there is no lapse in customer service during this time. While we are working directly with customers, we are closely monitoring what is happening with Swets.
In June of 2014, the Energy Technology Data Exchange (ETDE) ended. It must have been a sad day for the Office of Scientific and Technical Information (OSTI), having nurtured ETDE from a fledgling search site over 27 years ago to a venerable collection of over 5 million literature citations and over 90 participating countries.
But OSTI recognizes a good thing when they have one. ETDE hasn’t disappeared completely, but fused into a bigger, stronger, application – WorldWideEnergy.org – with a robust search by Deep Web Technologies. Perhaps the song “Hello, Goodbye” by the Beatles should be playing in the background here – Hello, Hello, WorldWideEnergy.org. Goodbye, Goodbye, ETDE.
Aside from the new look, WorldWideEnergy.org boasts a truly foreign object: Multilingual support. Users can enter their query term in not only English, but Spanish, German and Swedish. On the results page, translate the results into your language. Talk about making it simple!
Search Language: Swedish
Search Term: termonukleär fusion (In English: thermonuclear fusion)
Results return in English, German, Swedish and Spanish; with Swedish chosen as the search language, you can Translate the results to Swedish!
Aside from the fueling the discovery of citations from participating countries, OSTI expects this feature to attract additional countries to the WWE Consortium, helping their content become more discoverable. The future is brilliant…thermonuclear perhaps.
WorldWideScience.org lets users search for science around the world in a matter of seconds. First deployed in 2008, this transformational technology unearths content that under normal circumstances would remain only available to users searching a database specifically. And if that’s not enough, scientists and researchers can search in their own language for cutting-edge information from other countries and have it translated back into their own language. Of course, WorldWideScience.org uses the Explorit Everywhere! Multilingual search feature by Deep Web Technologies.
Now, after 6 years of fabulous press and high useage, OSTI has decided to shake things up. They are making WorldWideScience.org BETTER. Eh? How’s that, you say?
I feel blue…
When an organization able to make a bright purple look authoritative says it’s time to make a change, we listen. WorldWideScience.org now sports a progressive, modern look reflective of the digital age. It’s an impressive new look and feel.
My country tis of thee…
For researchers looking for results from a specific country, or to explore and discover results by country, we have good news! WorldWideScience.org now has a country cluster, grouping results from specific countries for easier viewing.
Hello, Salut, Hola and Hallo
Using our new Explorit Everywhere! Multilingual architecture, the updated language search and translation of WorldWideScience.org is faster and more thorough than before. Users simply select their query language from the 10 language options and search!
The folks at OSTI have more up their sleeve, so don’t be surprised if you hear more good news in the coming months!