<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Deep Web Technologies Blog &#187; Features</title>
	<atom:link href="http://deepwebtechblog.com/category/features/feed/" rel="self" type="application/rss+xml" />
	<link>http://deepwebtechblog.com</link>
	<description>covering federated search and how to get the best from the Deep Web.</description>
	<lastBuildDate>Tue, 25 Oct 2011 16:35:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>The Age of Discovery</title>
		<link>http://deepwebtechblog.com/the-age-of-discovery/</link>
		<comments>http://deepwebtechblog.com/the-age-of-discovery/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 16:06:22 +0000</pubDate>
		<dc:creator>Darcy Pedersen</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Federated Search]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=1696</guid>
		<description><![CDATA[Abe Lederman is heading to the ALA Annual Conference this weekend in New Orleans to take part in a fascinating panel discussion: The Age of Discovery: Understanding Discovery Services, Federated Search and Web Scale.   Here&#8217;s a brief description: Findability, discovery services, federated search, web scale—ways to discover content are increasing all the time, but [...]]]></description>
			<content:encoded><![CDATA[<p>Abe Lederman is heading to the ALA Annual Conference this weekend in New Orleans to take part in a fascinating panel discussion: <a href="http://connect.ala.org/node/145429">The Age of Discovery: Understanding Discovery Services, Federated Search and Web Scale</a>.   Here&#8217;s a brief description:<img class="alignright size-full wp-image-1697" title="download" src="http://deepwebtechblog.com/wp-content/uploads/2011/06/download1.png" alt="" width="223" height="165" /></p>
<blockquote><p>Findability, discovery services, federated search, web scale—ways to discover content are increasing all the time, but how do we discover which discovery mechanism is appropriate? Join us to learn more about the discovery landscape. When is it appropriate to use federated search over a discovery service? How does this differ by type of researcher? What kinds of resources should be included in discovery tools? Learn discovery implementation from two librarians in the trenches; learn about “web scale” and how federated search and discovery are evolving from the experts; and how the rest of us can sort out this tangle of access methods!</p></blockquote>
<p>Join Abe and the other panelists Sunday, June 26, 2011, 4:00-5:30 pm, at the Hilton Riverside – Grand Salon C.</p>
<p>Abe&#8217;s presentation is available here: <a href="http://www.deepwebtech.com/ala2011.ppt">http://www.deepwebtech.com/ala2011.ppt</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/the-age-of-discovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WorldWideScience receives warm welcome at the UN</title>
		<link>http://deepwebtechblog.com/worldwidescience-receives-warm-welcome-at-the-un/</link>
		<comments>http://deepwebtechblog.com/worldwidescience-receives-warm-welcome-at-the-un/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 15:05:10 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
				<category><![CDATA[Clients]]></category>
		<category><![CDATA[Features]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[Multilingual Search]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=1680</guid>
		<description><![CDATA[WorldWideScience is a global science gateway that combines national and international scientific databases into a search engine. From a single search form, a scientist, researcher, or curious citizen can search over fifty databases in English and now 22 multilingual sources (with translation to the searcher&#8217;s native language) and seven multimedia sources. WorldWideScience is the brainchild [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://worldwidescience.org">WorldWideScience</a> is a global science gateway that combines national and international scientific databases into a search engine.<a href="http://www.worldwidescience.org"><img class="alignright size-medium wp-image-1691" title="WorldWideScience now includes multilingual and multimedia sources!" src="http://deepwebtechblog.com/wp-content/uploads/2011/06/download-300x141.png" alt="" width="300" height="141" /></a> From a single search form, a scientist, researcher, or curious citizen can search over fifty databases in English and now 22 multilingual sources (with translation to the searcher&#8217;s native language) and seven multimedia sources. WorldWideScience is the brainchild of the director of the DOE Office of Scientific and Technical Information (OSTI), Dr. Walt Warnick. The gateway is maintained and hosted by OSTI and governed by the <a href="http://worldwidescience.org/alliance.html">WorldWideScience Alliance</a>.</p>
<p><a href="http://deepwebtech.com">Deep Web Technologies</a> is proud to have developed the federated search technology behind WorldWideScience. And, with the cooperation of the Microsoft Translation services team, Deep Web Technologies also implemented the multilingual technology. It was a major undertaking but a worthwhile one for the science community, whose members can now greatly expand their reach to scientific papers in languages beyond their own.</p>
<p>Dr. Warnick was invited to deliver a <a href="http://www.osti.gov/speeches/fy2011/warnick/UNC2011/index.shtml">presentation</a> at the 14th session of the United Nations&#8217; Commission on Science and Technology (CSTD). In a post at the <a href="http://www.osti.gov/ostiblog/worldwidescience-opens-international-doors">OSTI Blog</a>, Dr. Warnick shares the warm reception that WorldWideScience received.</p>
<blockquote><p>I wish more of my OSTI colleagues could have been in Geneva to share the warm response from the attendees.   Several country representatives offered up new sources for WorldWideScience (WWS).  Another member of the audience searched mobile WWS for his own name and remarked that he found many of his papers.  I received enthusiastic comments, so many that I couldn’t address all of them because of time constraints.  Significantly, the Chair of CSTD volunteered to pay the costs of becoming a member of the WorldWideScience Alliance.  There was great excitement about the possibilities for its use within the home countries of the attendees and how WWS advances the goals of CSTD.</p></blockquote>
<p>The paper &#8220;<a href="http://iospress.metapress.com/content/f767t1076251xu84/">Breaking down language barriers through multilingual federated search</a>&#8221; co-authored by Abe Lederman (founder and president of Deep Web Technologies), and Dr. Warnick, Brian Hitson, and Lorrie Johnson from OSTI, explains the importance of the gateway:</p>
<blockquote><p>&#8220;WorldWideScience.org (WWS) is a global science gateway developed by the US Department of Energy Office of Scientific and Technical Information (OSTI) in partnership with federated search vendor Deep Web Technologies. WWS provides a simultaneous live search of 69 databases from government and government-sanctioned organizations from 66 participating nations. The WWS portal plays a leading role in bringing together the world&#8217;s scientists to accelerate the discoveries needed to solve the planet&#8217;s most pressing problems. In this paper we present a brief history of the development of WWS and discuss how a new technology, multilingual federated search, greatly increases WWS&#8217; ability to facilitate the advancement of science.&#8221;</p></blockquote>
<p>Deep Web Technologies is delighted to be working with OSTI and other organizations to push the envelope of search technology and to make the world a smaller place.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/worldwidescience-receives-warm-welcome-at-the-un/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Federated search: the challenges of incremental results</title>
		<link>http://deepwebtechblog.com/federated-search-the-challenges-of-incremental-results/</link>
		<comments>http://deepwebtechblog.com/federated-search-the-challenges-of-incremental-results/#comments</comments>
		<pubDate>Fri, 13 May 2011 16:07:55 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[View from Inside]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=1628</guid>
		<description><![CDATA[Welcome to the second edition of &#8220;Best of the Federated Search Blog.&#8221; In this series I pull articles out of the Federated Search Blog archive and comment on them for the benefit of those considering Deep Web Technologies&#8216; offerings. In March, 2008 I explored the &#8220;incremental results&#8221; feature which Deep Web Technologies makes available in [...]]]></description>
			<content:encoded><![CDATA[<p>Welcome to the second edition of &#8220;Best of the Federated Search Blog.&#8221; In this series I pull articles out of the <a href="http://federatedsearchblog.com">Federated Search Blog</a> archive and<a href="http://www.federatedsearchblog.com"><img class="alignright size-medium wp-image-1608" title="bestof" src="http://deepwebtechblog.com/wp-content/uploads/2011/05/bestof-300x76.png" alt="" width="300" height="76" /></a> comment on them for the benefit of those considering <a href="http://www.deepwebtech.com">Deep Web Technologies</a>&#8216; offerings.</p>
<p>In March, 2008 I explored the &#8220;<a href="http://federatedsearchblog.com/2008/03/28/federated-search-the-challenges-of-incremental-results/">incremental results</a>&#8221; feature which Deep Web Technologies makes available in all its federated search applications. As a consultant to Deep Web Technologies I may be somewhat biased but I do believe that this feature is a huge differentiator for the company.</p>
<p>What are incremental results?</p>
<blockquote><p>The idea is simple: display results in chunks as they are received from the sources being searched. <a href="http://www.science.gov">Science.gov</a>, <a href="http://WorldWideScience.org">WorldWideScience.org</a>, and <a href="http://scitopia.org">Scitopia.org</a> are three applications that display incremental results.</p></blockquote>
<p>Why is it a big deal to provide incremental results? It&#8217;s because we live in the age of Google speed. Users don&#8217;t want to wait the 30 seconds it could take a content source to provide its results. The achilles heel of federated search is the fact that we have no control over how quickly sources respond with their results. If a federated search application is searching 30 sources at once and 29 of them return results quickly but one is slow to respond then the traditional approach to displaying search results has users wait until the last source returns its results. This is bad news for the impatient user.</p>
<p>Deep Web Technologies&#8217; approach is to wait just a few seconds, long enough to get a variety of documents from a number of sources. It then relevance ranks those documents and displays those results quickly to users. While users are inspecting those first results, Explorit (Deep Web Technologies&#8217; federated search engine) is gathering results from the other sources to display when the user is ready.</p>
<p>Explorit is polite to users. It doesn&#8217;t simply overwrite the first set of search results with a later batch. It instead informs the users that a newer set is available and asks the user if he wants that set. The user can take the offer, turn it down or defer it (waiting until later to refresh the results.)</p>
<p>Incremental results are a nice way to balance the federated search speed issue with the user demand for speed. We think the feature works well. You can judge for yourself at <a href="http://www.science.gov">Science.gov</a>, <a href="http://WorldWideScience.org">WorldWideScience.org</a>, and <a href="http://scitopia.org">Scitopia.org</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/federated-search-the-challenges-of-incremental-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deep Web Tech Launches Webinar Series</title>
		<link>http://deepwebtechblog.com/deep-web-tech-launches-webinar-series/</link>
		<comments>http://deepwebtechblog.com/deep-web-tech-launches-webinar-series/#comments</comments>
		<pubDate>Wed, 08 Dec 2010 21:58:00 +0000</pubDate>
		<dc:creator>Nicholas</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Marketing Announcements]]></category>
		<category><![CDATA[Product Development]]></category>
		<category><![CDATA[Deep]]></category>
		<category><![CDATA[deep web technologies]]></category>
		<category><![CDATA[federated knowledge]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[Federated Search  Blog]]></category>
		<category><![CDATA[Webinar]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=1305</guid>
		<description><![CDATA[December 15th, 2010 kicks off Deep Web Technologies&#8217; first webinar installment! Titled: &#8220;Explorit Federated Search Key Differentiators&#8221; Interested parties will have the option to sign up for different dates, depending on your convenience. Our webinars typically last 30-40 mins, with the last 10 minutes devoted to questions and answers, which means that it will be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://deepwebtechblog.com/wp-content/uploads/2010/11/webinar.jpg"><img class="alignleft size-medium wp-image-1306" title="webinar" src="http://deepwebtechblog.com/wp-content/uploads/2010/11/webinar-300x225.jpg" alt="" width="300" height="225" /></a></p>
<div id="_mcePaste">
<p>December 15th, 2010 kicks off  Deep Web Technologies&#8217; first webinar installment!</p>
<p>Titled: &#8220;Explorit Federated Search Key Differentiators&#8221;</p>
<p>Interested parties will have  the option to sign up for different dates, depending on your convenience. Our  webinars typically last 30-40 mins, with the last 10 minutes devoted to  questions and answers, which means that it will be a short overview of all the  value-added benefits our product has to offer.</p>
<p>Our VP of Business Development  Andy Alsop said,</p>
<blockquote><p>&#8220;Building on our reputation for developing  next-generation federated search applications, these no-cost webinars brings the  experience of the latest federated search technology right to your desktop. It&#8217;s  an engaging and educational process that allows us to show off the key  differentiators that Deep Web Technologies&#8217; Explorit has to offer. By attending  the webinar you will get a firsthand demonstration of how the Explorit  application will give your users a single search box to discover knowledge based  on the investment you&#8217;ve made on your subscription content.  &#8220;</p></blockquote>
<p>The webinars will evolve into user-suggested topics that we will  be integrated into the series to maintain consistent interaction between  speakers and listeners.</p>
<p>Signing up is easy and free,  simply click on the links below. I&#8217;ll see you online!</p>
<table class="MsoNormalTable" style="width: 100%;" border="0" cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr style="mso-yfti-irow: 0; mso-yfti-firstrow: yes;">
<td style="padding: 0in 0in 0in 0in;"><span style="font-size: 9.0pt; font-family: &amp;amp;amp;"> </span><a href="https://www3.gotomeeting.com/register/561693254">Wed, Feb 16, 2011   11:00 AM &#8211; 12:00 PM MST</a></td>
</tr>
<tr style="height: 3.75pt;">
<td style="padding: 0in; height: 3.75pt;"></td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="padding: 0in 0in 0in 0in;"></td>
</tr>
<tr style="height: 3.75pt;">
<td style="padding: 0in; height: 3.75pt;"></td>
</tr>
<tr style="mso-yfti-irow: 4; mso-yfti-lastrow: yes;">
<td style="padding: 0in 0in 0in 0in;">
<p class="MsoNormal"><span style="font-size: 9.0pt; font-family: &amp;amp;amp;"> </span></p>
</td>
</tr>
</tbody>
</table>
<p><span style="font-size: 12.0pt; font-family: &amp;amp;amp; mso-fareast-font-family: &amp;amp;amp; color: black; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"> </span></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/deep-web-tech-launches-webinar-series/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If Google might be doing it…</title>
		<link>http://deepwebtechblog.com/if-google-might-be-doing-it%e2%80%a6/</link>
		<comments>http://deepwebtechblog.com/if-google-might-be-doing-it%e2%80%a6/#comments</comments>
		<pubDate>Wed, 01 Dec 2010 17:47:17 +0000</pubDate>
		<dc:creator>Abe</dc:creator>
				<category><![CDATA[Alerts]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[View from Inside]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=1324</guid>
		<description><![CDATA[Last week (on November 23rd) Sol wrote an article for the Federated Search Blog, Beyond search results bias, which raises the concern over search result bias by Google and by discovery services. Sol refers to the allegation by Harvard Professor Benjamin Edelman that Google is biasing some of its search results by first displaying results [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://deepwebtechblog.com/wp-content/uploads/2010/12/evil_google.jpg"><img class="alignleft size-full wp-image-1326" title="evil_google" src="http://deepwebtechblog.com/wp-content/uploads/2010/12/evil_google.jpg" alt="" width="300" height="270" /></a> Last week (on November 23rd) Sol wrote an article for the Federated Search Blog, <a href="http://federatedsearchblog.com/2010/11/23/beyond-search-result-bias/">Beyond search results bias</a>, which raises the concern over search result bias by Google and by discovery services. Sol refers to the allegation by Harvard Professor Benjamin Edelman that Google is biasing some of its search results by first displaying results from its own properties. Edelman is not just conjecturing; he has performed research to back his allegation.</p>
<p>Sol’s article was prescient as just yesterday the New York Times published the article, <a href="http://www.nytimes.com/2010/12/01/technology/01google.html?_r=2&amp;src=me">E.U. Launches Formal Antitrust Investigation of Google</a>, in which the E.U. Commission informed the world:</p>
<p>“&#8230; that it was also looking into whether Google may have given its own services ‘preferential placement’ in search results.”</p>
<p>So as <a href="http://federatedsearchblog.com/2010/11/23/beyond-search-result-bias/">Sol</a>, <a href="http://commentary.exlibrisgroup.com/2010/10/gladiators-to-perform-sleight-of-hand.html">Carl Grant</a> and <a href="http://deepwebtechblog.com/discovery-services-over-hyped-and-under-performed/">I</a> have pointed out in recent blog articles, librarians who are evaluating whether to subscribe to a discovery service such as Summon or EDS need to be really concerned about “vendor neutrality.” Might Summon or EDS, today or in the future, favor results from some publishers to increase usage of some sources? If Google is being accused of such bias, might not EBSCO or ProQuest also have a bias?</p>
<p>Beyond the question of whether or not I would even subscribe to a discovery service, assuming that I did, I would much prefer to select such a service from an independent 3rd party vendor whose main business was not selling you content.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/if-google-might-be-doing-it%e2%80%a6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exit Stage Right&#8211; WebFeat.  Center Stage&#8211; Explorit</title>
		<link>http://deepwebtechblog.com/exit-stage-right-webfeat-center-stage-explorit/</link>
		<comments>http://deepwebtechblog.com/exit-stage-right-webfeat-center-stage-explorit/#comments</comments>
		<pubDate>Fri, 19 Nov 2010 19:05:42 +0000</pubDate>
		<dc:creator>Nicholas</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Marketing Announcements]]></category>
		<category><![CDATA[Product Development]]></category>
		<category><![CDATA[deep web technologies]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[WebFeat]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=1221</guid>
		<description><![CDATA[SWITCH TO DEEP WEB TECHNOLOGIES AND SAVE THOUSANDS Deep Web Technologies is offering a unique chance to replace  existing WebFeat installations. The catch? You must be willing to save over $3,800 on next-generation federated search for your library!  Now, with Deep Web Technologies next-generation federated search, you can go deeper and faster than you have [...]]]></description>
			<content:encoded><![CDATA[<h1><a href="http://deepwebtechblog.com/wp-content/uploads/2010/11/piggy-bank-red.jpg"><img class="alignleft size-medium wp-image-1226" title="piggy bank red" src="http://deepwebtechblog.com/wp-content/uploads/2010/11/piggy-bank-red-300x199.jpg" alt="" hspace="10" width="300" height="199" /></a><a href="http://www.deepwebtech.com/WebFeat/"><span style="font-weight: normal;">SWITCH TO DEEP WEB TECHNOLOGIES<br />
AND SAVE THOUSANDS</span></a></h1>
<p>Deep Web Technologies is offering a unique chance to replace  existing WebFeat installations. The catch? You must be willing to save over $3,800 on next-generation federated search for your library!  Now, with Deep Web Technologies next-generation federated search, you can go deeper and faster than you have ever before!</p>
<p>Explorit, Deep Web Technologies’ federated search product, enables libraries to search 50+ academic sources simultaneously&#8211; fast, and includes value-added features such as alerts, reporting statistics and a customized interface. We are offering Explorit at $5995 for existing WebFeat customers &#8211; a savings of <strong>$3800</strong>.</p>
<p>With Explorit there aren&#8217;t any additional hidden fees. Just one low fixed price for the length of your contract, and free lifetime upgrades to customers. We don&#8217;t abandon our customers.</p>
<p><a href="http://www.deepwebtech.com/WebFeat/">Click here</a> for more information.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/exit-stage-right-webfeat-center-stage-explorit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Art and Science of De-duping Results</title>
		<link>http://deepwebtechblog.com/art-and-science-of-de-duping-results/</link>
		<comments>http://deepwebtechblog.com/art-and-science-of-de-duping-results/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 19:09:44 +0000</pubDate>
		<dc:creator>Brian Despain</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[de-dupe]]></category>
		<category><![CDATA[duplication]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[results]]></category>
		<category><![CDATA[The Deep Web]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=896</guid>
		<description><![CDATA[One of the critical problems in federated searching is de-duplication of results. Many sources contain the same journal articles and, clearly, presenting the same result multiple times isn&#8217;t useful to users. To solve this thorny problem, Deep Web Technologies has taken a flexible and configurable approach to de-duplication. The Explorit application de-dupes on multiple fields to [...]]]></description>
			<content:encoded><![CDATA[<p>One of the critical problems in federated searching is de-duplication of results.  Many sources contain the same journal articles and, <a href="http://deepwebtechblog.com/wp-content/uploads/2010/08/pluto10.jpg"><img class="alignright size-full wp-image-929" src="http://deepwebtechblog.com/wp-content/uploads/2010/08/pluto10.jpg" alt="" width="213" height="160" /></a>clearly, presenting the same result multiple times isn&#8217;t useful to users. To solve this thorny problem, Deep Web Technologies has taken a flexible and configurable approach to de-duplication. The Explorit application de-dupes on multiple fields to ensure that users don&#8217;t see duplicate results across multiple source. The application has conditional logic which compares various fields to see if the results would be considered a duplicate.</p>
<p>The de-duplication mechanism can use multiple fields that can be compared using boolean logic. This means that various fields can be matched (field A OR field B) or (field C AND field D) using boolean operators. If either condition is true, the result is determined to be a duplicate and removed according to the source de-dupe order.  Additional fields can be added to the mix to improve accuracy for example, (field A + field N OR field B + field D).</p>
<p>De-duplication order specifies which sources take priority over other sources. This order can be controlled allowing customers to specify the source order for de-duplication. Sources lower on the de-dupe list which have results determined to be duplicates will have those removed from the list. You could look at de-duplication order this way: you have source A, source B, and source C in your federated search application.  Source A has a de-dupe order of 1 (this means this source&#8217;s results will be the highest priority). Source B has a de-dupe order of 10, and source C has a de-dupe order of 5. This means if the same result is in both source B &amp; C, the result from source C will be displayed. If the same result is in Source A, only that result will be displayed.</p>
<p>We have found the following de-duplication most effective for our federated search applications. The application first checks the full text URL of the result. If two results have the same full text URL, then it&#8217;s assumed that they are duplicates, the application will then not display the results from the source with the higher priority de-duplication order. Next we check a combination of the title of the article and the publication date. If these two fields match, the results are considered duplicates and the lower priority results are removed.</p>
<p>Finding the right balance of fields for de-duplication in a federated search application can be difficult, but Deep Web Technologies has the capability and the knowledge on what fields are best for your sources and your search needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/art-and-science-of-de-duping-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reminiscing on a 12-Year Partnership with OSTI</title>
		<link>http://deepwebtechblog.com/reminiscing-on-a-12-year-partnership-with-osti/</link>
		<comments>http://deepwebtechblog.com/reminiscing-on-a-12-year-partnership-with-osti/#comments</comments>
		<pubDate>Wed, 25 Aug 2010 04:17:10 +0000</pubDate>
		<dc:creator>Abe</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Federated Search]]></category>
		<category><![CDATA[Multilingual Search]]></category>
		<category><![CDATA[View from Inside]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=901</guid>
		<description><![CDATA[This afternoon, I put aside an hour from yet another hectic day to read Dr. Walter Warnick’s article, “Federated Search as a Transformational Technology Enabling Knowledge Discovery: the Role of WorldWideScience.org.” This article by Dr. Warnick&#8211;or Walt to me&#8211;presents a wonderful overview of OSTI’s mission dating all the way back to 1947. OSTI (Department of Energy [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-903" title="Blog Post Pic" src="http://deepwebtechblog.com/wp-content/uploads/2010/08/Blog-Post-Pic-300x139.jpg" alt="" width="300" height="139" /></p>
<p>This afternoon, I put aside an hour from yet another hectic day to read Dr. Walter Warnick’s article, “<a href="http://www.osti.gov/ILDS_38_2Warnick2010.pdf">Feder</a><a href="http://www.osti.gov/ILDS_38_2Warnick2010.pdf">ated Search as a Transformational Technology Enabling Knowledge Discovery: the Role of WorldWideScience.org</a>.” This article by Dr. Warnick&#8211;or Walt to me&#8211;presents a wonderful overview of OSTI’s mission dating all the way back to 1947. OSTI (Department of Energy Office of Scientific and Technical Information), originally known as the Technical Information Division, was tasked with collecting and disseminating the wealth of non-classified research from the Manhattan Project.  Having lived in Los Alamos the past 15 years, where development of the atomic bomb took place, I’m very familiar with the history of the Manhattan Project and the reasons behind the creation of OSTI. Nevertheless, I found Walt’s article to be an informative and insightful read that provided a unique insider’s perspective.</p>
<p>Dr. Warnick talks quite a bit about the OSTI corollary, which asserts that accelerating the diffusion of scientific knowledge will accelerate the advancement of science.  In the 12 years that I have known him, it has been Dr. Warnick’s singular goal to do everything in his power to increase the speed of scientific discovery.  I know Walt to be a trail-blazer, highly respected among federal government employees in his dedication and leadership at OSTI.  He has made major strides towards making science more accessible to “science-attentive” citizens, researchers and students.</p>
<p>The article focuses on the major role played by OSTI in championing, supporting and adopting federated search, which is the enabling technology for WorldWideScience.org, Science.gov, DOE Science Accelerator and other sites developed and maintained by OSTI. Deep Web Technologies has benefitted greatly from our 12-year partnership with OSTI, who has supported the development of the Explorit federated search technology, motivated us to keep pushing the boundaries of federated search capabilities and been an eager early adopter of our products.</p>
<p>In my next blog article,  I will be highlighting a few of the many accomplishments achieve through our partnership with OSTI, so please stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/reminiscing-on-a-12-year-partnership-with-osti/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Breaking Down the Language Barriers</title>
		<link>http://deepwebtechblog.com/breaking-down-the-language-barriers/</link>
		<comments>http://deepwebtechblog.com/breaking-down-the-language-barriers/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 17:15:34 +0000</pubDate>
		<dc:creator>Abe</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Multilingual Search]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=860</guid>
		<description><![CDATA[It was an honor to attend and for my company to have played a key role in the launch of multilingual WorldWideScience.org in Helsinki this past June 11th. Beginning more than three years ago, the R&#38;D effort that ultimately resulted in the launch of our ground-breaking multilingual federated search capability involved plenty of hard work [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_862" class="wp-caption alignnone" style="width: 310px"><a href="http://deepwebtechblog.com/wp-content/uploads/2010/06/MWWS-photo11.jpg"><img class="size-medium wp-image-862  " title="Deep Web Partners in WWS Multilingual Launch" src="http://deepwebtechblog.com/wp-content/uploads/2010/06/MWWS-photo11-300x200.jpg" alt="" width="300" height="200" /></a><p class="wp-caption-text">Photo credit: Jakke Nikkarinen/STT Info Kuva Pictured, from left, Dr. Walter Warnick, U.S. Department of Energy Office of Scientific and Technical Information (OSTI) Director; Yuri Arskiy, All-Russian Institute of Scientific and Technical Information (VINITI) Director; Tony Hey, Microsoft Research Corporate Vice-President; Richard Boulderstone of the British Library and the WorldWideScience Alliance Chairman; and Wu Yishan, Institute of Scientific and Technical Information of China (ISTIC) Chief Engineer.</p></div>
<p>It was an honor to attend and for my company to have played a key role in the launch of multilingual WorldWideScience.org in Helsinki this past June 11<sup>th</sup>. Beginning more than three years ago, the R&amp;D effort that ultimately resulted in the launch of our ground-breaking multilingual federated search capability involved plenty of hard work by lots of folks at Deep Web Technologies. It certainly could not have been accomplished without our invaluable partnerships with the Department of Energy Office of Scientific and Technical Information (OSTI), the WorldWideScience Alliance, and Microsoft Research.<span id="more-860"></span></p>
<p>Multilingual WorldWideScience was launched at a special ceremony culminating the International Council for Scientific and Technical Information’s (ICSTI) Annual Conference titled <em><a href="http://www.vtt.fi/sites/icsti2010/icsti2010_welcome.jsp?lang=en">From Information to Innovation</a></em>, which was attended by several hundred people.</p>
<p>Dr. Walt Warnick, Director of OSTI, gave the <a href="http://worldwidescience.org/speeches/June2010/warnick_multi.html">keynote</a> presentation at the launch. As the driving force behind the creation of Science.gov and WorldWideScience.org, Dr. Warnick is someone with whom I have had the pleasure of working closely in the past decade, and he has continually pushed my company to advance to the next level in state-of-the-art Federated Search.</p>
<p>Dr. Warnick started his presentation with a quote from Sir Isaac Newton (written on Feb 15, 1676 to fellow British Researcher Robert Hooke):</p>
<p style="text-align: center;">“If I have seen further it is only by standing on the shoulders of giants.”</p>
<p>Presenting his vision of accelerating access to worldwide scientific information in order to advance scientific discovery, Dr. Warnick talked about the significant role that multilingual search can play in providing both non-English speakers translated access to research in languages other than their own and English speakers with access to the ever-increasing body of non-English scientific content.</p>
<p>After Dr. Warnick’s opening remarks, I had the opportunity to demo and explain how multilingual WorldWideScience works to the Conference attendees. Rather than go into detail about my demonstration, I’d like to point you to a wonderful review of multilingual WorldWideScience written by <a href="http://intellogist.wordpress.com/2010/06/14/conduct-a-global-literature-search-in-seconds/">Kristin Whitman – Conduct a global literature search in seconds!</a></p>
<p>Next up was Richard Boulderstone, Chairman of the WorldWideScience Alliance and Director of eStrategy at the British Library. He spoke on the growth and significance of WorldWideScience:</p>
<p style="text-align: center;"><em>“Since its launch in 2007 WorldWideScience.org has grown at an absolutely phenomenal rate, providing researchers with easy access to the publicly funded research output of 65 different countries from around the world. Fast becoming a key resource for researchers around the world, these new search and translation tools are absolutely essential to opening up research and enabling the global scientific community to share knowledge in the pursuit of progress.”</em></p>
<p><em> </em></p>
<p>Tony Hey, Corporate Vice-President of External Research for Microsoft, next talked about the significance of Microsoft’s partnership with the WorldWideScience.org Alliance:</p>
<p style="text-align: center;"><em>“The launch of multilingual WorldWideScience.org adds yet another resource that we can all leverage in support of collaborative relationships. Those relationships, in turn, expedite our ability to drive research that has the power to improve lives around the world. All of us at Microsoft Research look forward to more meaningful contributions to multilingual WorldWideScience.org to make the world’s scientific and technical information globally accessible. It has been an honor to be involved in this groundbreaking project.”</em></p>
<p><em> </em></p>
<p>Following Tony Hey, Wu Yishan, Chief Engineer of the Institute of Scientific and Technical Information of China (ISTIC), addressed the audience and commented on the volume of scientific literature being published in Chinese and the growing need for multilingual searching:</p>
<p><em> </em></p>
<p style="text-align: center;"><em>“In 2008, while Chinese scholars published 110,000 papers on international journals recorded by SCI, they also published 470,000 papers on domestic Chinese journals. Without accessing these 470,000 papers, it is impossible to obtain a realistic feeling about the thrust of scientific and technological advancement in China. Therefore, the need for mutual translation between English and Chinese and for cross-language retrieval is increasingly urgent.”</em></p>
<p>The final remarks at this event were made by Yuri Arskiy, Director of the All-Russian Institute of Scientific and Technical Information, who spoke through a translator. The unavailability of Russian scientific research in English was  a major impetus for the development of multilingual WorldWideScience. Dr. Arskiy addressed the many challenges in searching multiple databases, such as different specifications and software platforms and various classification systems across scientific disciplines. WorldWideScience.org&#8217;s federated search technologies overcomes these challenges and will make Russian science results more accessible than ever before.</p>
<p>Finally, Wu Yishan returned to the stage to announce that ISTICI will be hosting the next ICSTI Annual Conference in Beijing in June 2011. If I’m going to have a reason to attend the ICSTI Conference in Beijing next year, my team at DWT, in collaboration with our partners at OSTI and the WorldWideScience Alliance, better get busy planning and implementing the next round of enhancements to WorldWideScience.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/breaking-down-the-language-barriers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Clusters That Think</title>
		<link>http://deepwebtechblog.com/clusters-that-think/</link>
		<comments>http://deepwebtechblog.com/clusters-that-think/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 23:50:37 +0000</pubDate>
		<dc:creator>Brian Despain</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[deep web analysis]]></category>
		<category><![CDATA[Deep Web Clustering]]></category>
		<category><![CDATA[deep web semantic]]></category>
		<category><![CDATA[latent semantic analysis]]></category>
		<category><![CDATA[latent semantic index]]></category>
		<category><![CDATA[LSA]]></category>
		<category><![CDATA[LSA clustering]]></category>

		<guid isPermaLink="false">http://deepwebtechblog.com/?p=619</guid>
		<description><![CDATA[One of the most interesting features of our Explorit search product is our clustering engine, which analyzes results and produces &#8220;clusters&#8221; that represent a new and powerful way to navigate search results. The true power of these clusters is often overlooked, for they superficially resemble the output generated by the keyword-based systems and fixed taxonomies [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most interesting features of our Explorit search product is our clustering engine, which analyzes results and produces &#8220;clusters&#8221; that represent a new and powerful way to navigate search results. The true power of these clusters is often overlooked, for they superficially resemble the output generated by the keyword-based systems and fixed taxonomies of other search engines. Our clustering technology, however, is more akin to a document-discovery engine, which provides a significant improvement over the alternatives in the library world.</p>
<p>The Explorit engine provides a unique approach to clustering taken from <a title="Learn more about LSA" href="http://en.wikipedia.org/wiki/Latent_semantic_analysis">Latent Semantic Analysis</a> (or LSA). We took a look at some of the traditional methods at taxonomy generation (i.e. learning approaches, semantic knowledge bases, and word nets) and after carefully examining their advantages and shortcomings, we chose latent semantic analysis, and a <em>&#8220;description comes first&#8221;</em> approach, to provide a rich result analysis tool for customers. LSA is a fully automatic mathematical/statistical technique for extracting and inferring relations of contextual usage of words in search results.  This technology provides a concept-based approach to analyzing and clustering results from a result set. Applying the LSA approach, our clustering engine analyzes the relationships between a set of documents and the terms contained within the documents to produce a set of concepts related to the results. In other words, our search engines can generate more sophisticated and nuanced result clusters, which will help to cut down on the time and tries it takes for users to find the desired information.</p>
<h2>More Meaningful Searches, Superior Cluster Results</h2>
<p>A solid introduction to LSA can be found in the study, <a title="Read the Study" href="http://lsa.colorado.edu/papers/dp1.LSAintro.pdf">An Introduction to Latent Semantic Analysis</a>, by Landauer, Foltz and Laham.</p>
<blockquote><p>Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. The adequacy of LSA’s reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word–word and passage–word lexical priming data . . .</p></blockquote>
<p>This means our clusters, leveraging the concepts behind LSA, actually discover relationships in the results and presents them in a way that mimics the way users actually think.  The superior quality of our clusters can perhaps best be demonstrated by comparing them to one of our competitors.  Consider a search for <em>&#8220;satellite communication&#8221;:</em></p>
<div style="padding: 0px 0px 0px 10px;float: right;width: 169px;text-align: center"><strong>Our Clusters</strong><br />
<img class="alignleft size-full wp-image-644" src="http://deepwebtechblog.com/wp-content/uploads/2010/05/scitopia_clustering.png" alt="" width="169" height="723" /></div>
<div style="padding: 0px 0px 0px 10px;float: right;width: 207px;text-align: center"><strong>Competitor&#8217;s Clusters</strong><br />
<img class="alignright size-full wp-image-643" src="http://deepwebtechblog.com/wp-content/uploads/2010/05/summon_dartmouth.png" alt="" width="207" height="223" /></div>
<p>As you can see, our clusters (on the far-right) provide far more meaningful results.  The top cluster terms provided by our competitor is &#8220;studies,&#8221; which provides no concrete information about the documents the set contains.  Additionally, synonymous terms such as  &#8220;United States&#8221; and &#8220;US&#8221; are treated as separate keywords by our competitor, which places demand on the user to then manually sort through to find what they are looking for.  With our LSA based clustering, results tend to be more relevant and more narrowly focused, with stop words removed from the cluster results. A user interested in &#8220;satellite communications pointing systems,&#8221; for example, can easily find the articles they are looking for with our clusters, while end users of the competition will no doubt have to run another search.</p>
<h2>Users Think in Concepts, not Keywords</h2>
<p>Our approach utilizes the entire set of search results and performs an LSA-type analysis, which helps reduce the cluster size and provide more granular results.  Users can control cluster breadth (i.e. maximum number of top level clusters), cluster depth (i.e. maximum number of hierarchical levels), cluster arrangement (i.e. alphabetically or by occurrence), and cluster size.  This means that the type of clustering can be configured to match the data sources in the federated search, narrowing or broadening clusters as desired. Simple keyword-based clustering cannot be customized in these ways. The Explorit approach matches the way that users actually think&#8211; which is in concepts, not keywords.</p>
<p>The clusters produced by our search engines can be enhanced and customized by utilizing synonyms (i.e. word aliases), label filtering (e.i. excluding offensive words), label boosting (i.e. promoting terms), and more. At Deep Web Technologies, we can tailor many of these settings per client request to create the best possible user experience for any projects.</p>
<h2>Benefits of Explorit&#8217;s LSA Based Clusters over traditional taxonomy methods.</h2>
<ul>
<li>Clusters that reveal the concepts contained within in the results, not just keywords.</li>
<li>Natural language clusters, not keyword snippets</li>
<li>Discovery of concepts across disparate collections, journals and ebooks</li>
<li>Customization of synonyms for concepts</li>
<li>Tailored approach for unique settings (i.e. label filtering, boosting, sorting and more)</li>
</ul>
<p>Our clustering solution provides capabilities far beyond a simple keyword-based system&#8211; it provides significant insight into the result set itself through the use of semantic analysis.  This approach allows users to employ Explorit as a true discovery tool, identifying relationships between documents contained across multiple collections and sources.  With our <em>&#8220;deeper, richer&#8221;</em>snippets approach to searching, the deep semantic discovery engine presents users with a more efficient and more powerful way to research.</p>
<p>For more information on our clustering capability and/or LSA, you may be interested in the following studies:</p>
<ul>
<li><a href="http://project.carrot2.org/publications.html">Related LSA Cluster papers</a></li>
<li><a href="http://www.cs.put.poznan.pl/dweiss/site/publications/download/iipwm-osinski-weiss-stefanowski-2004-lingo.pdf">Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition</a></li>
<li><a href="http://stanislaw.osinski.name/papers/pdf/osinski-ecir2006.pdf">Improving Quality of Search Results Clustering with Approximate Matrix Factorisations</a></li>
<li><a href="http://www.computer.org/portal/site/intelligent/">A Concept-Driven Algorithm for Clustering Search Results.</a></li>
</ul>
<p>For a real-world view of our clusters in action, you may be interested in one or more publicly available research portals below:</p>
<ul>
<li><a title="Visit mednar" href="http://www.mednar.com/">www.mednar.com</a></li>
<li><a title="Visit the Science.gov alliance" href="http://www.science.gov/">www.science.gov</a></li>
<li><a title="Visit scitopia.org" href="http://www.scitopia.org/">www.scitopia.org</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://deepwebtechblog.com/clusters-that-think/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

