ED456863 2001-10-00 Uncovering the Hidden Web, Part I: Finding What the Search Engines Don't. ERIC Digest.

ERIC Identifier: ED456863
Publication Date: 2001-10-00
Author: Mardis, Marcia
Source: ERIC Clearinghouse on Information and Technology Syracuse NY.

Uncovering the Hidden Web, Part I: Finding What the Search Engines Don't. ERIC Digest.

THIS DIGEST WAS CREATED BY ERIC, THE EDUCATIONAL RESOURCES INFORMATION CENTER. FOR MORE INFORMATION ABOUT ERIC, CONTACT ACCESS ERIC 1-800-LET-ERIC

Currently, the World Wide Web contains an estimated 7.4 million sites (OCLC, 2001). Yet even the most experienced searcher, using the most robust search engines, can access only about 16% of these pages (Dahn, 2001). The other 84% of the publicly available information on the Web is referred to as the "hidden," "invisible," or "deep" Web.

Despite the explosion in Web content, commonly used search processes have not changed significantly since the Web's inception. Information is commonly found now as it was ten years ago, with directories and search engines. But the ever-quickening pace of the World Wide Web's growth demands an expanded set of search tools and skills. This article provides tips on augmenting traditional search techniques with knowledge of the hidden Web, helping readers to access some of the Web's most valuable content.

THE WRATH OF THE MATH

Recent of the hidden Web to be about 500 times larger than the size of the known "surface" Web indexed by search engines. There are billions of documents obscured in databases, written in non-HTML formats, and hosted through non-http means. According to experts (Bergman, 2000), the hidden Web is comprised of:

* The largest Internet

* Content that need, market and domain

* sites <p><img src= * Total quality than that of the surface Web

While Web directories are obviously constrained by human limits, search engines fail because they primarily index documents written in HTML. Spiders cannot index pages generated dynamically like those in Microsoft's Searchable Knowledge Base and documents written using methods like Adobe Acrobat, Active Server Pages, or Cold Fusion. Likewise, database contents are excluded from the indexing process; spiders cannot transform search terms in database queries or complete a login process. And, in many instances, protocols other than HTTP (e.g., FTP, gopher) are excluded.

FINDING THE HIDDEN step to accessing the hidden Web is much like that of other search processes: use familiar and reliable resources. Although directories offer limitations as primary search tools, directory categories often contain hidden Web databases. Also, professional journals and magazines provide a wealth of current knowledge; look for reviews of new reference tools and subject directories. In addition to these basic steps, Web-based and desktop solutions are available to access the hidden Web
With over 7,000 topic-specific databases, there is no way to access every hidden Web resource. But, Web-based gateways, collections, and desktop tools point to specialized databases. These tools are most effective when a few of them are used regularly and integrated into an overall search strategy.

A SMATTERING OF SOLUTIONS

* Around the Web in 80 Sites: The Best of the Invisible Web (http://websearch.about.com/library/blow2000.htm) The search gurus at About.com created this list of hidden Web resources strong In categorization and expert selection.

* LexiBot (http://www.lexibot.com/) that is able to make dozens of queries simultaneously. Surface and hidden Web results are tested for dead links and presented in a format that allows previewing or Web browser viewing. Made for PCs only, this tool is free to try.

Dahn, M. (2000, January/February). Counting angels on a pinhead: Critically interpreting web size estimates" "Online," 35-40.

Diaz, K. (2000). The invisible Web: Navigating the Web outside traditional Search engines. "Reference & User Services Quarterly," 40 (2), 131-134.

Ensor, P. (2001, June 14). "Toolkit for the expert web searcher." Library Information Technology Association. Retrieved August 15, 2001, from the World Wide Web: http://www.lita.org/committe/toptech/toolkit.htm

OCLC (Online Computer Library Center). (2001, July 13). "Statistics." Online Computer Library Center, Inc. Retrieved August 15, 2001, from the World Wide Web: http://wcp.oclc.org/stats.html

O'Leary, M. (2000, January). Invisible Web uncovers hidden treasures. "Information Today," 16-18.

Price, G., & Sherman, C. (2001, July/August). Exploring the invisible Web. "Online," 32-34.

Price, G. & Sherman, C. (2001). "The invisible Web: Uncovering information Sources search engines can't see." CyberAge Books.

Sherman, C. (2000, n.d.). "Worth a look: Searching the invisible Web." About.com. Retrieved August 15, 2001, from the World Wide Web: http://websearch.about.com/library/searchwiz/bl_invisibleweb_apra.ht m

Sherman, C., & Price, G. (2001). The invisible Web. "Searcher," 9 (6), 62-74.

Snow, B. (2000, May). The Internet's hidden content and how to find it. "Online," 24. (EJ 613 396).

Marcia Mardis, MILS, a former K-12 media specialist, is Program Coordinator And Internet Media Specialist at the Center to Support Technology in Education at Merit Network, Inc. She presents on Web searching issues at conferences around the country and writes frequently on K-12 use of the Internet.

-----

ERIC Digests are in the public domain and may be freely reproduced and disseminated.

-----

ERIC Clearinghouse on Information & Technology, Syracuse University, 621 Skytop Road, Suite 160, Syracuse, NY 13244-5290; 800-464-9107; 315-443-3640 ; Fax: 315-443-5448; e-mail: -----

This publication is funded in part with Federal funds from the U.S. Department of Education under contract number ED-99-CO-0005. The content of this publication does not necessarily reflect the views or policies of the U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government. Visit the Department of Education's Web site at http://www.ed.gov

Title: Uncovering the Hidden Web, Part I: Finding Don't. ERIC Digest.
Note: For part II, see IR 058 326.
Document Type: Information Analyses---ERIC Information 621 Skytop Rd., Suite 160, Syracuse, NY 13244-5290. Tel: 315-443-3640; Tel: 800-464-9107 (Toll Free); Fax: 315-443-5448; e-mail: Descriptors: Access to Information, Databases, Information Sources, Online Searching, Search Strategies, World Wide Web
Identifiers: Search Engines

###

* Nearly 550 billion individual growing category of new information on the is highly relevant to every information ">* More focused content than surface Web content that is up to 2,000 times greater ">* 95% publicly accessible information ARE NOWDirectories WEBThe first alt="* ">* Direct src="blueball.gif" alt="* ">Provides src="blueball.gif" alt="* ">* alt="* ">A virtual library and

Desktop software alt="* ">* Searchability: Guides To ">(http://www.searchability.com) A src="blueball.gif" alt="* ">* claim that the hidden alt="* ">ABOUT THE class="__cf_email__" data-cfemail="93f6e1faf0d3f6e1faf0fae7bdfce1f4">[email protected]; URL: What the Search Engines Analysis Products (IAPs) (071); Information Analyses---ERIC Digests (Selected) in Full Text (073);
Available From: ERIC Clearinghouse on Information & Technology, Syracuse University, data-cfemail="dcb9aeb5bf9cb9aeb5bfb5a8f2b3aebb">[email protected]; Web site: http://ericit.org/ithome. Information Retrieval, src="/icons/up.gif" alt="">[Return to ERIC Digest Search Page]