March 20th marked the first day of spring. Here in northern New Mexico we have seen signs of spring (and allergies) for over a month. The crocus stretched out of the soil in February marking both a celebratory moment for my family, and one of concern. The weather is already warm and beautiful causing the apricot, plum, and juniper trees to bloom like mad. But because they’ve bloomed so early, will a late freeze wipe out our delicate fruit? And will we all sniffle and sneeze longer from the thick pollen collecting on our cars and sidewalks?
My questions took me to three different federated search engines to see if I could see what “spring” topics were circulating.
On Biznar, a social media and business search engine, I couldn’t help but search out how others were handling their spring allergies. Some dive into the Claritin box, while others go for a Kettlebell workout. My family claims to have zero allergies, although we slyly keep a tissue box handy once the juniper pollen begins to circulate. However, it looks like some research indicates that dairy may offer relief. I shall eat more yogurt from here forth.
Speaking of pollen, Environar, a federated search portal dedicated to life, medical and earth sciences, had excellent research on pollen through the ages. Pollen has been used to document climate cycles, and indicate many other factors such as temperature and precipitation during the past 140,000 years or so. Pollen, atchoo!, is scientifically important.
I particularly enjoyed browsing the government portal, Science.gov, on the effects of climate change on allergies. I thought this interesting from the Annals of the American Thoracic Society found in PubMed regarding a survey on climate change and health: “A majority of respondents indicated they were already observing health impacts of climate change among their patients, most commonly as increases in chronic disease severity from air pollution (77%), allergic symptoms from exposure to plants or mold (58%), and severe weather injuries (57%).” I shall buy more tissue.
While my questions may not have precise answers, I can at least plan ahead at the grocery store when I see high pollen counts – yogurt and tissues. And perhaps I’ll have a new appreciation for the contributions pollen has made to our scientific community.
Explore your Pollen Allergy Forecast at Pollen.com: http://www.pollen.com/allergy-forecast.asp. Happy Spring and Happy Searching!
Our CEO and CTO, Abe Lederman, journeyed to the Washington D.C. area this week to meet with FEDLINK librarians at the Library of Congress. The Federal Library and Information Network (FEDLINK) serves federal libraries and information centers as their purchasing, training and resource-sharing consortium.
Abe’s main purpose in visiting The District, aside from eating delicious food and admiring the start of the Cherry Blossom bloom (although the peak bloom is next week), was to expand FEDLINK librarians knowledge about federated search. His presentation explored public federated search science portals as well as multilingual translation tools. A large part of his talk was on Google, and why it isn’t suitable for serious government researchers.
View Abe’s presentation here.
We’d like to officially welcome the new Deep Web Technologies’ Board of Advisors! The Board of Advisors will provide insights and strategic input to our executive staff and DWT’s mission.
Mary Ellen Bates
Bates Information Services, Inc.
Mary Ellen Bates is the owner of Bates Information Services Inc., which she founded in 1991. She provides business
research and analysis to strategic decision-makers, and consulting services to the information industry. She is a frequent keynote speaker and the author of seven books and innumerable articles on the information industry. Prior to starting her business, Bates managed specialized libraries within businesses and the federal government for over a decade. She received her Master’s in Library and Information Science from the University of California Berkeley in 1982 and her BA in Philosophy from the University of California Santa Barbara in 1976. She, her spouse and her dogs live near Boulder, Colorado.
Richard Boulderstone is the British Library’s Chief Digital Officer with responsibility to drive the long-term digital transformation of the Library. At the British Library Richard has led; the engagement with the scientific community; its digital library, preservation and web archiving programmes as well as the Library’s IT function. Formerly a Chief Technology Officer and Product Development Director at a number of international information providers, he has led the creation of many information-based products both in the UK and USA. Richard also serves as the Chair on the WorldWideScience Alliance Board.
CEO of GridSentry
Doug is currently the CEO of GridSentry. Recently, he was CFO and Vice President of Strategic Partnering at inno360. In that role, Mr. Dennis was responsible for partner strategy development and implementation which included building relationships with the portfolio of companies across the knowledge spectrum required to support the inno360 Open Innovation platform including networks of content, community, application, and service providers. Mr. Dennis has an extensive background in sales, marketing, and manufacturing. His vision for electronic commerce and web-deployed applications dates back to early uses of the Internet. He has published articles on electronic commerce and manufacturing information deployment over the web. He holds patents for 3D based view and markup technology that are currently being deployed in web browsers.
Oracle Database Server Technologies
Andrew Mendelsohn is Executive Vice President for Database Server Technologies at Oracle. He is responsible for the development and product management of Oracle’s family of database products, including software products such as Oracle Database, Oracle TimesTen In-Memory Database, Oracle Berkeley DB, and Oracle NoSQL Database and engineered systems such as Oracle Exadata Database Machine, Oracle Database Appliance, and Oracle Big Data Appliance.
Mr. Mendelsohn has been at Oracle since May 1984. He began his career at Oracle as a developer on Release 5.1 of Oracle Database. Prior to joining Oracle, he worked at HP and ESVEL.
Mr. Mendelsohn holds a BSE in electrical engineering and computer science from Princeton University and performed graduate work in computer science at M.I.T.
Summa Technologies, Appfluent Technology
Gary Voight has thirty years of experience leading large corporations and industry changing startups. Currently, he is a board member at Summa Technologies and Appfluent Technology. Previously, he was the President and CEO at CorasWorks Corporation January 2008 through July 2014. CorasWorks provided applications for the Microsoft SharePoint platform. Prior to CorasWorks Gary was President and CEO at Archivas, Inc., a venture capital backed digital preservation software firm, that was acquired by Hitachi in February 2007. Under his direction, Archivas developed digital archiving software, and sold it to financial services, medical, government, services and educational enterprises.
Prior to joining Archivas Gary was President and CEO of Software AG (Americas), where he grew the business to well over $200 million. He was Senior Vice President of Sales, Marketing and Services of SAGA Software at the time of its acquisition by Software AG. Before SAGA Software, Gary was Vice President of Worldwide Transactions Systems at IBM/Transarc – where he was responsible for most of IBM’s application server, middleware, security and file systems products. Prior to his tenure at IBM and Transarc, Gary was Vice President of Sales at ISIS Distributed Systems, a subsidiary of Stratus Computer. In addition, Gary spent over 13 years at Stratus Computer in a wide variety of field and headquarters positions.
Thank you all for contributing to DWT’s success!
As a matter of fact, YES! In doing research, being vendor-neutral implies, in part, that the order of the results set is impartial, and not biased toward any one information provider. A results set partial to an information vendor, source or database could cripple a serious research endeavour, particularly if a vital component of research is missed, hidden, or dropped due to a biased display of results, with a particular vendor’s results bubbling to the top. In addition, being vendor-neutral implies a design approach compatible with different technologies, and a company philosophy that is willing to integrate with a broad spectrum of sources, technologies and products.
Back in 2010, our CEO and CTO Abe Lederman questioned the bias of information vendors in light of Google’s possible “preferential placement” in search results — “If Google Might Be Doing It…” He asks, “If Google is being accused of such bias, might not EBSCO or ProQuest also have a bias?”
Deep Web Technologies recently received an email inquiry from someone looking, very specifically, for a vendor-neutral search service.
“We’re looking for a no vendor biased, easily applied and customised, reliable federated search. Can you help?”
The short answer is ABSOLUTELY! Vendor-neutrality is a Deep Web Technologies specialty and a core value. In addition to the Deep Web Technologies simple setup and deployment process and our next-generation single-search technology, we believe strongly that no vendor, information provider, database or source should be weighted differently, return results more frequently, or appear highlighted in any way UNLESS requested by you, our customer. The choice to assign different databases a higher or lower ranking should be dictated by you. It is your search, after all.
Federal librarians are part of an ever-growing pool of knowledge workers who increasingly need timely knowledge, greater research efficiency and information accuracy. Aggregated databases, journals and e-books sources are now an integral part of a researcher’s repertoire. One way to speed the research process while still finding accurate information is to use a federated search application to search multiple databases at one time.
Abe Lederman, CEO and CTO of Deep Web Technologies, will visit FEDLINK librarians at the Library of Congress on March 31st for training on the strategic researching of government and public federated search engines (see FEDLINK post). There will be a morning and afternoon session in the Adams Building at the Library of Congress. Attendees will:
- Discuss what is federated search
- Explore science research portals together
- Access global resources using multilingual translation tools
Contact Jim Oliver (email@example.com) to reserve your space.
Last week we alluded to three obstacles serious researchers face when using a Discovery Service index to retrieve information from their premium sources. Deep Web Technologies uses a real-time approach, choosing to search information sources on-the-fly rather than compiling information into an index as Discovery Services do, for some reasons we’ll discuss in a moment. But, we should get a few definitions out of the way before we jump in to why an index doesn’t work for all researchers.
What is an index as used by Discovery Services? An index holds all of the metadata, and sometimes full text, for one or more sources of information, usually external public or premium sources. For example, Google has an index; its spiders compile information from millions of public websites across the Internet, funneling the information into a single, unified database for users to search. On a much smaller scale, Discovery Services compile information from an organization’s internal catalog, as well as a subset of the customer’s premium and public information sources, if these sources permit their information to be indexed. Discovery Services establish relationships with the information sources directly (if possible), procure the metadata (if they can) and add it to their index.
What is real-time (on-the-fly) searching? Real-time, or “on-the-fly” searching, sends queries directly to the source – the catalog, premium databases, public sources – retrieves the metadata of top ranked results from each source relative to the user’s search, then normalizes and aggregates the information into a single display of results. The results are listed with direct links to the original source of information so users can explore the record and the full text. There is no index built and the information is not stored.
To be perfectly clear, both of these approaches have their pros and their cons. And, we certainly love the Google and Bing indexes for the everyday quick search for the nearest coffee house. But, as we’ve heard from our customers time and again, “We’re serious about our research, and a single index just doesn’t work for us.” (Note that in this post we’re singling out Discovery Service indexes, but will need to address how this applies to Enterprise Search indexes in a separate post.)
Abe Lederman, our CEO and CTO, addressed this issue before in other blog posts, but this is a good opportunity to reiterate that not all researchers benefit from a single master index such as those used by Discovery Services. Let’s look at some of the obstacles that serious researchers face when searching an index through a Discovery Service.
- Limited access to content
Serious researchers like to search for critical information on sources they trust. In fact, there are some researchers who choose to search only and ever their chosen few information sources, excluding all other databases to which their organization may subscribe. So, a Discovery Service had better, by gum, include all of the “trusted sources” that their serious researchers use or the Discovery Service may not get much use by that researcher. And herein lies the problem: There are information vendors, trusted sources with critical information, that simply don’t want to share their information with a Discovery Service. That information is theirs. This is particularly true of sources within industries such as legal and medical. When an information vendor doesn’t permit their information to be indexed, the Discovery Service index won’t contain information from that source. Bottom line: Serious researchers may not use a Discovery Service because it’s incomplete. It just doesn’t contain what they need, or want to search.
- Frequency of content changes and updates
Information vendors, or content sources, vary in how frequently they update their own database of information. Some sources update daily or even hourly while other sources may update their database with new information every few weeks. Depending on the need, a serious researcher may require up-to-the-minute, just-published information for a critical, time-sensitive topic. The drive for data may not allow for old information, no, not even one day old. Could you imagine Kayak, a travel metasearch site, showing day-old data? It just wouldn’t work. Current information can make or break a researcher, and a Discovery Service index that is updated weekly from some, or even half, of the trusted data sources it indexes may not give serious researchers what they need to stay current. Bottom line: Serious researchers require current information, not stale data.
- Muddiness about source searching and clickthrough
In some Discovery Services, searching one specific source (or even just a handful of sources) is difficult, even impossible. An index contains vast amounts of information, but doesn’t always allow researchers to limit or search a specific source easily, pinpoint the source they need, or click through to the metadata or full text of a document directly at the source. Discovery Services often create a uniform-looking record for results to create a consistent look and feel, but contain no way to click directly to the record available at the source. Knowing where information is coming from and clicking through to the result directly at the source can be an important step in the research process. Bottom line: A serious researcher can’t afford to spend extra time narrowing down their information sources, or performing extra clicks to go directly to the source itself. They need to be able to search one or more of their trusted sources rather than the entire index, and click directly to the source for further review and additional information.
At Deep Web Technologies we perform a real-time search, offering an alternative to the Discovery Service indexed approach. Our federated search retrieves information from sources that can’t be accessed through an index, sources that perform frequent updates to metadata, and sources that users want to search individually due to their more extensive, trusted database of information. If your organization already uses a Discovery Service but you need more precise information for your serious researches, federated search can complement your Discovery Service by adding real-time results from content sources you don’t currently include in your Discovery Service.
We’d love to hear from researchers about this – What has your experience been with Discovery Services?
Occasionally, DWT employees will test the water when describing what we do by dropping the term “federated search” and trying a more generic description.
“We search all of the databases from a single search box in real-time.”
“We perform a single search of subscribed, public and deep web databases…”
“We capture results from your premium sources, the stuff Google can’t reach, and return them to you ranked and deduplicated…”
If a person is plugged in to the search world, we usually get this response: “Oh! Do you mean you do Federated Search?” Bingo.
Over the years, the concept of Federated Search has gone by many different names. We’ve heard our technology called:
- Distributed Search
- Broadcast Search
- Unified Search
- Data Fusion
- Parallel Search
- Cross-Database Search
- Single Search
- Integrated Search
- Universal Search
For the most part, all of these mean about the same thing: An application or a service that allows users to submit their query to search multiple, distributed information sources, and retrieve aggregated, ranked and deduplicated results.
But the question remains: Is “federated search” a master index, or a real-time (on-the-fly) search? And this is a very good question, given our familiarity with Google and their enormous public index. Sol Lederman raised this question back in 2007 on the Federated Search Blog, What’s in a Name?
“The distinction, of course, is crucial. If you’re meeting with a potential customer who believes you’re discussing an approach where all content is harvested, indexed and accessed from one source but you think you’re discussing live search of heterogeneous sources then you’re talking apples and oranges.”
Deep Web Technologies’ real-time approach gives us an advantage over building a master index which we’ll discuss in our next blog post. In the meantime, can you think of any other names for what we do? We’d love to hear from you!
I have at least 50 alerts set up through several different services. I know when my husband gives a talk (that he forgot to mention), when my son’s school is referenced in the news, if there’s a storm coming, and when topics of interest are discussed on the web. I stay connected to the information that I need because it finds my email inbox every day, without me even lifting a finger. Bottom line: My daily alerts are important to me.
About two-thirds of my alerts are set up through Deep Web Technologies public portals. I have alerts through Biznar, Mednar, Environar, Techscout and other portals. Each application searches a different set of sources (e.g. Business and Marketing, Medical and Health, Environment, Technical) and each returns specialized results on my topics of interest. I have alerts that are set for only a single source which may not be indexed by Google and doesn’t have an alerts system through its native interface. In this case, I’m getting unique information that I would otherwise need to go directly to the source and perform a search for which is not something I’d carve time out of my day to do.
DWT alerts are simple to set up and forget about. Here’s my preferred routine:
- Log in to a portal, and go to the Alerts homepage.
- Choose my topic, keyword, author or title that I want to automatically search.
- Choose my email preference, which is almost always “If New Results Only”. ( I don’t like logging in to view my alerts, nor do I like to get a bunch of blank emails.)
- Choose the sources or categories that I’d like searched.
- Leave almost everything else as default – frequency (daily), HTML format, etc.
The application scours all of my selected sources for my query term and within an hour, I have a long list of results. After the first alert, I should see only new results that that application hasn’t sent to me yet. And this is what I fell in love with because that’s exactly what I want: New Results, Every Day, In My Email.
Of course, there are plenty of other options for the DWT alerts feature. Some of our customers set up RSS feeds for community news or to feed results directly into a blog sidebar. For groups of researches, the RSS feed option allows for an extremely productive search process, without duplicating efforts.
Alerts keep me plugged in to my communities quite effectively and efficiently and are a well-used feature in the DWT applications. Researchers gravitate toward alerts for the time-saving aspects, particularly for sources that do not have an alerts system in place. Try setting up an alert today on Mednar or Biznar and simplify your information gathering processes!
Let’s face it, organizations love website and application data. While the number of users who “hit” your website applications may be important for understanding how well your marketing campaigns are going, when it comes to your search applications the number of time each of your
subscription and premium sources are “hit” may be more important when the fiscal end of year rolls around. Justifying the expense for those high-priced subscriptions by showing application and source usage, clickthroughs and errors is a valuable resource in and of itself.
Explorit Everywhere! tracks:
- User Queries – this is the number of daily and hourly queries sent to each source. If your researchers are only searching a handful of sources and often excluding the rest, you may decide that some of those resources are superfluous.
- Actual Search Expression – what did researchers search for? Did they find the results they were looking for by clicking through to the result?
- Documents/Results – this shows you how well the source connectors, and the source itself, is performing. Sometimes a connector should be re-evaluated for how it searches a source and how many results it returns.
- Errors – Errors can be the result of an interrupted search or a problem with a source/connector. DWT monitors errors closely and proactively fixes any connector issues.
- Ranking – find which sources are returning relevant results to your user queries (and which aren’t). This often shows surprising results!
Do we track everything? Nope, not by a long shot. There are great, FREE analytics available, such as Google Analytics, that track, for example, IP addresses, landing pages, user location, browsers, and mobile usage. Supplementing with a separate analytics provider is a great idea. You can capture all of the other important marketing information that you may want to look at: How many people went to your application? What pages did they go to? How long were they there? DWT is happy to include the code snippets for your own analytics in your Explorit Everywhere! application.
With a well rounded approach to capture your analytics, your ability to track the success of your application, or the failure of some of your sources, can mean a more tailored approach to your next year, weeding out sources that aren’t used by your researchers, and money in your pocket.
Our customers like Explorit Everywhere! applications because they don’t have to “think” about how relevant the results are; all DWT applications have a five star, relevance-ranking system. Since many researchers just look at the first page of results, Explorit Everywhere! merges, ranks and de-duplicates results from all sources searched so the most relevant results appear at the top of the list. Easy-Peasy.
But many of our researchers perform advanced searches, refine their queries, and want to know exactly what results the source returned and in the order that the source returned them without DWT’s ranking applied. In this case, they have two choices: they can either open a new browser tab, go directly to their source, perform the same search and review their results (the long route), or they can simply filter their Explorit Everywhere! results set by the source and then by sort by Source Order. Voila.
You can see this filter in action by visiting one of our publicly available applications such as Biznar, Mednar or Environar. On the results page:
- “Limit” your results set to the source you want to see results from.
- Select the “Sort by” filter to sort the results by “Source Order”.
You should now see the results page display results in the order that we received them from the source.
For researchers who prefer to see the results from sources directly, this is an efficient substitute for searching sources one by one and can save hours of research time. One search of all of the sources a researcher wants to include, then viewed individually by Source Order can mean less burnout and faster discovery. Source Order may be just a “little” filter on Explorit Everywhere! but for some researchers it’s one of the biggest benefits.