International Journal of Scientometrics, Informetrics and Bibliometrics
ISSN 1137-5019
  > Homepage  > The Source  > Tools  > Searching the Web  > 1st Generation  > Invisible Web

 

 

FIRST GENERATION TOOLS:
INVISIBLE WEB

Invisible Internet or Infranet

The robots of the search engines do not collect a very important part of the contents of the Web, so that information is not indexed in those databases and therefore it is invisible. The reason behind that behavior is that robots only download the contents of the pages and not the data obtained by using a gateway or protected by a password.

All the great library catalogues, full-text, factual or numeric databases, many electronic journals that required registration to access their contents, and other equally interesting data are not indexed by the search engines. The volume of the invisible Internet or infranet is very large, therefore tools are needed to recover some of the information involved.

Classification

Excluded multimedia invisible web

Webliography

Sullivan, Danny (2000). "Invisible Web Gets Deeper". The Search Engine Report. Aug. 2, 2000.

Clifford A. Lynch (1997) Z39.50 Information Retrieval Standard

1. Directories and compilations of invisible resources

Far from an exhaustive coverage, these directories offers lists of heterogeneous databases, organized in different categories.

Search www.search.com Large number of databases in a well conceived classification and with direct gateways to some of them.
Internet Sleuth www.isleuth.com
www.thebighub.com
One of our favorites reference sites. New URL with databases slightly hidden in the middle of the page (Feb. 2000)
Invisible Web Catalog dir.lycos.com Large. Over 9.000 databases. although part of the Lycos directory, it is the result of independent effort (invisible web) bought by the portal. Perhaps not as well maintained as before.
Now embedded in the categories. Look for "Searchable Databases" entries (May 2000)
Invisible Web www.profusion.com Very large. Probably this directory was the one bought by Lycos. It seems it is updated regularly.
Easy Searcher 2 www.easysearcher.com DESCRIPTION
Internet invisible www.internetinvisible.com Updated frequently. Spanish resources
Search-It-All www.searchitall.net Not very large, but updated recently (feb. 2000)
LincOn
ex-Edison
www.lincon.com/srclist.htm As usual, interesting but very heterogeneous and no exhaustive coverage
Galileo www.peachnet.edu/cgi-bin/intres.cgi DESCRIPTION
Abyan www.abyan.com Includes invisible resources, but it is mainly a directory of directories
Langenberg www.langenberg.com Useful links, including some gems. Not very large
Search Broker webglimpse.net/sb DESCRIPTION
Columbia University Digital Library Collections wwwapp.cc.columbia.edu/
ldpd/app/rti/
Other heterogeneous list with links to the main database producers websites
The Invisible Web www.searchwise.net/
p/iw-fla2003.htm
DESCRIPTION
Invisible-web.net www.invisible-web.net DESCRIPTION
Invisible Web UC Berkeley www.lib.berkeley.edu/
TeachingLib/Guides/
Internet/InvisibleWeb.html
DESCRIPTION
FindLaw Law Crawler lawcrawler.findlaw.com DESCRIPTION
Complete Planet www.completeplanet.com DESCRIPTION
Scirus www.scirus.com DESCRIPTION
Singinflish search.singingfish.com DESCRIPTION
Animal Search animalsearch.net DESCRIPTION
Educator's Reference Desk www.eduref.org DESCRIPTION
LookSmart's FindArticles www.findarticles.com DESCRIPTION
Directory of Open Access Journals www.doaj.org DESCRIPTION
IncyWincy www.incywincy.com DESCRIPTION
Invisible Web Search Engines library.trinity.wa.edu.au/
library/invis/invisible.htm
DESCRIPTION
Academic Index www.academicindex.net DESCRIPTION
SearchLight Social Sciences / Humanities searchlight.cdlib.org/
cgi-bin/searchlight?SSH
DESCRIPTION
SearchLight Sciences / Technologies searchlight.cdlib.org/cgi-bin/searchlight?Science DESCRIPTION
Discovering The Invisible Web www.lakenet.org/net_ref/
manuals/invisible.html
DESCRIPTION
The Invisible (Deep) Web Introduction www.angelfire.com/az3/
info-center/Search/
invisible_web.html
DESCRIPTION
Library Servers via WWW sunsite.berkeley.edu/Libweb DESCRIPTION
Turbo 10 turbo10.com DESCRIPTION
Seaswzcdr webglimpse.net/sb/ DESCRIPTION
Seaswzcdr webglimpse.net/sb/ DESCRIPTION
Seaswzcdr webglimpse.net/sb/ DESCRIPTION

2.1. Preprints and other documents repositories

This is a very important topic for scientometric work as the number of e-journals are increasing and many of them are beginning to exchange citations between electronic and paper media. This is a very basic list of the main directories of e-journals available.


2.2. Directories of Electronic Journals

This is a very important topic for scientometric work as the number of e-journals are increasing and many of them are beginning to exchange citations between electronic and paper media. This is a very basic list of the main directories of e-journals available.

Virtual Library www.edoc.com/ejournal DISCONTINUED Do you know where this comprehensive is available now?
NewJour gort.ucsd.edu/newjour Mainly new journals, but also old ones transferred recently
Australian Journals www.nla.gov.au/ajol In this moment the coverage of Australian journals seems very good.
Voice of the Shuttle vos.ucsb.edu Good coverage of journals about humanities
Electronic Journal Access ejournal.coalliance.org A very large collection, periodically updated and easy to browse
Highwire Press highwire.stanford.edu A very well designed interface to some of the best paper journals which have an electronic version
Selection of Nordic WWW Journals www.vtt.fi/inf/nordep/
projects/webpilot/journals
DESCRIPTION
Full-Text Archives of Scholarly
Society Serial Publications
www.lib.uwaterloo.ca/
society/full-text_soc.html
DESCRIPTION
Ejournal SiteGuide : a MetaSource www.library.ubc.ca/
ejour/abc.html
DESCRIPTION
Ent'revues www.enes.org DESCRIPTION
Electronic Journals Resource Directory library.usask.ca/
~scottp/links

DESCRIPTION
e-journals www.e-journals.org DESCRIPTION
Pub-list www.publist.com DESCRIPTION
Directories of Electronic Journals www.ukoln.ac.uk/isg/
hyperjournal/director.htm
DESCRIPTION
Social Sciences Virtual Library www.clas.ufl.edu/users/
gthursby/socsci/ejournal.htm
DESCRIPTION
Online Book Pages http://onlinebooks.library.
upenn.edu
DESCRIPTION
Directory of Electronical journal www.arl.org/scomm/
edir/archive.html
DESCRIPTION
CiteSeer citeseer.ist.psu.edu DESCRIPTION
Pudcgt www.publist.com DESCRIPTION
Purfht www.publist.com DESCRIPTION


3. Protocol Z39.50

Access to a large part of the invisible Internet, including the possibility to make simultaneous searches to several catalogues even with different software. See the section about this topic in "second generation tools".

WebCats www.libdex.com A very comprehensive collection of OPACs accessible by WWW gateways from all over the world. Organised by region, library type and automatization system
Library of Congress loc.gov/z3950 The central repository of information about the standard with plenty of information
National Library of Canada novanet.ns.ca/vCucadm.html Canada is increasing its Z39.50 coverage in order to make a large Virtual Library
National Library of Australia www.nla.gov.au/libraries Very useful because offer updated material from Australia
Index Data www.indexdata.dk/targettest This server and Bookwhere's one (Seachange) are the main sources for Z39.50 configuration files.
MultiOpac www.multiopac.com Italian with European vocation system. Includes not only Z39.50 sites but more than 50 OPACs
Karlsruhe Virtueller Katalog www.ubka.uni-karlsruhe.de/
kvk.html
Germany. About 20 European libraries, but it is not a   Z39.50 product.
CIC cicvel.lib.uiowa.edu For searching in the OPACs of the ten largest US libraries
ANSI/NISO Z39.50 Protocol www.cni.org/pub/NISO/
docs/Z39.50-brochure
DESCRIPTION
Z39.50 in Europe www.ukoln.ac.uk/dlis/z3950 DESCRIPTION
ILRT Z39.50 www.ilrt.bris.ac.uk/discovery
/z3950/resources
DESCRIPTION
Z39.50 Resource Page www.niso.org/z39.50/
z3950.html
DESCRIPTION
CIcccccccC cicvel.lib.uiowa.edu DESCRIPTION
CIcccccccC cicvel.lib.uiowa.edu DESCRIPTION
CIcccccccC cicvel.lib.uiowa.edu DESCRIPTION
CIcccccccC cicvel.lib.uiowa.edu DESCRIPTION


4. Special format documents

Some of the most important document formats (those built with rich text formats and page description languages) are also invisible to robots. More important, the increasing number of the pdf or ps files are a "quality archipelago" as they represent definitive versions of articles or referred papers.

An important tool is:

Search PDF  searchpdf.adobe.com
+1 million summaries (May 2000)

Another way for estimating the size of this archipelago is the indirect use of search engines:

May 31st, 2000 pdf ps ppt doc
ALTAVISTA anchor: 1,837,033 444,852 123,722 639,618
link: 3,954,225 1,784,542 281,237 4,347,961
FAST must__in the link name 1,049,053 388,884 106,805 654,287
must__in the link to URL 2,750,099 3,838,102 197,057 2,620,531

March 8th, 2005 pdf ps ppt doc xls
GOOGLE 135,000,000 14,400 2,570,000 24,400,000 3,220,000
YAHOO 109,000,000 2,680,000 2,890,000 14,200,000 2,690,000

January 2001. Google began to index pdf files (over 13 million documents) offering a direct way to access this part of the invisible Web with delimiter intheurl:pdf.

Google has begun indexing PDF files, becoming the first engine to offer searchers an easy way to find these high-quality documents that make up a significant portion of the Invisible Web.

October 2004. Google Desktop Search is how our brains would work if we had photographic memories. It's a desktop search application that provides full text search over your email, computer files, chats and web pages you've viewed. By making your computer searchable, Desktop Search puts your information easily within your reach and frees you from having to manually organize your files, emails and bookmarks.