| Ricci Street
< Port 80 < Lighthouse
< Internet Searching || search | sitemap | help plaza | theater | bistro |
| | |
|
Archive.org's WayBack Machine now larger (more stuff) than the Library of Congress
A
Library as Big as the World
by Heather Green
Business Week, February 28, 2002
Brewster Kahle has the technology to assemble the ultimate archive of human knowledge. What's stopping him? Restrictive copyright laws.
After you've tried the search engines, you're going to have to develop more strategies. Many people at this point turn to specialized searches. Some of them, you can pay for. Lexis-Nexis, in Dayton, Ohio, has almost two million subscribers. Half of them use the service each month, some of them extensively. What's there? Almost 1.5 billion documents. You will not always end up with nicely formatted and illustrated .htm pages. But you will end up with a lot of information. Their data base rivals the web itself in size, and it's available over the Internet via the telnet protocol rather than the hypertext protocol (http://), so it's not on the Web. No search engine ever indexes these document unless they're also available on the public Web.
BrightPlanet's white paper, The Deep Web, estimates more than 100,000 "content-rich" searchable databases are available on the Web. They comprise some 550 billion individual documents, and 95% of it is publicly available via the thousands of search engines listed at the CompletePlanet. See especially the Deep Web White Paper.
Our key findings from this study include the following:
Public information on
the deep Web is currently 400 to 550 times larger than the commonly defined
World Wide Web
The deep Web contains
7,500 terabytes of information, compared to 19 terabytes of information in the
surface Web
On average, deep Web
sites receive about 50% greater monthly traffic than surface sites and are more
highly linked to than surface sites
A full 95% of the deep
Web is publicly accessible information – not subject to fees or subscriptions.
ProFusion is a "search engine for search engines" that can access deep web databases.
Bringing
a Much Bigger Internet to Light
by Brian McDonough
NewsFactor, July 17, 2002
What may prove valuable about deep-Web mining is the ability
to interpret individual pieces of data that might not otherwise be of much use.
Deep mining of online data could transform the way information is collected and
analyzed. It promises companies an easier and more effective way to keep up with
their rivals and manage their brands, and it will help researchers develop a
deeper understanding of social and economic trends. It also might alter the way
the average citizen is targeted and analyzed, bringing more personal information
into the glare of the spotlight. ...
What may prove really valuable about deep-Web mining is the ability to interpret
various individual pieces of data that might not otherwise be of much use. At UC
Berkeley, Hellerstein has worked at mining the deep Web in collaboration with
social science researchers trying to find new ways to draw answers from the
increasing collections of disparate data available online. ...
Currently, Hellerstein is working with Hal Varian, dean of UC Berkeley's School
of Information Management and Systems, on research into worker migratory
patterns.
"We're always trying to look for leading indicators or forecasts of
different economic variables," said Varian. The question at hand is how
long someone will look for a job in a region that has suffered an economic hit
before leaving the area.
Taking Texas in the wake of the Enron scandal as a model case, Varian,
Hellerstein and others plan to create a picture of how workers weather economic
storms. They intend to scour online resume postings and job-oriented databases,
as well as standard economic data and a range of diverse factors -- freeway
traffic patterns, for example.
What about stuff that just got put onto a web page that the search engines haven't had time to find yet?
Moreover.com recently received Search Engine Watch's 2000 award for Best Specialty Search: "Moreover crawls a large number of sites with news content, making it easy to find the freshest information on current event topics."
TechWeb: The Business Technology Network's TechEncyclodpedia
TechTarget's WhatIs?
Internet.com's Webopedia: Online Computer Dictionary for Internet Terms and Technical Support
Celebrity Sleuth?
Forbes' People Tracker
The largest collection of free public records on the internet. ... to help you to find the public record information you need in order to make critical decisions.
Metor - "the gate of information" - information from hundreds of databases, archives and catalogs whose content cannot be retrieved by traditional search engines
Subject Directory of Search Engines
WebData.com - Collection of searchable databases on the Web organized into topics maintained by ExperTelligence, Inc.
Searcher: The Magazine for Database Professionals
Explores and deliberates on a comprehensive range of issues important to the professional database searcher. The magazine is targeted to experienced, knowledgeable searchers and combines evaluations of data content with discussions of delivery media. Searcher includes evaluated online news, searching tips and techniques, reviews of search aid software and database documentation, revealing interviews with leaders and entrepreneurs of the industry, and trenchant editorials.
Tip | Place a dictionary button on your browser's Links toolbar. Then highlight any word you want to look up and click the dictionary icon.
The MetaIQ site will let you select from a range of search services.
xrefer.com has a "reference engine", a meta-search of several dozen reference titles, such as:
Oxford Dictionary of Art
Concise Medical Dictionary, Oxford University Press
Oxford Dictionary of Quotations
Penguin Dictionary of Psychology
Penguin Dictionary of Sociology
The Grove Concise Dictionary of Music
The New Grove Dictionary of Jazz
Oxford Companion to Philosophy
Concise Oxford Dictionary of Linguistics
Fowlers Modern English Usage
Dictionary of English Place Names
A Dictionary of First Names, Oxford University Press
A Dictionary of Shakespeare
American Heritage Concise Dictionary
The American Heritage Dictionary of Idioms
The Compact American Dictionary of Computer Words
Roget’s II: The New Thesaurus
Wall Street Words: An Essential A to Z Guide for Today’s Investor
The Houghton Mifflin Dictionary of Geography
VISION STATEMENT: Refdesk is not about revenue. Refdesk is not about traffic. Refdesk is not about promotional vehicles or any form of commercialism. Refdesk is only about indexing quality Internet sites and assisting visitors in navigating these sites. At Refdesk that is all that counts and that is all that will ever count.
Librarians' Index to the Internet
A searchable, annotated subject directory of more than 7,200 Internet resources selected and evaluated by librarians for their usefulness to users of public libraries.
search4science - 50,000 scientific words and expressions
MegaConverter - "The Web's (and the Universe's) Best Place to Figure What Equals What"
IPSearchEngine - search engine for intellectual property information
The New York State Society of Certified Public Accountants' Accounting Terminology Guide
University of Virginia's Electronic Text Center -- 1,200 free ebooks
eBook Directory -- some 12,000 books free for the download
EpistemeLinks.com -- philosophy
XYZFind - Indexes collections of XML data. Specify a schema and then keyword-search it. Their home page asks, does your search engine:
Differentiate between "$1999", "made in
1999", and "1999 miles"?
Understand comparisons like "less than" or "at least"?
Distinguish between orange the color and orange the fruit?
Places Named -- This geographic encyclopedia has over 200,000 place names and last names. Or query by zip code or area code. For example, it told me that Douglas is the 45th most popular male first name in the U.S. and that Anderson is the 11th most popular last name. That really made me feel special.
Quoteland - Search for a quotation by topic or author; or, a literary, special, or random quotation.
What about symbols?
The graphic index at Symbols.com
is the largest online encyclopedia of graphic
symbols. You can also match word descriptions with the appropriate symbol.
What about images?
Google's image search on the advanced search page
AltaVista -- click
"images" to the left of the query window
Ditto -- emphasizes family and education
users
Scour -- search for MP3 files and video as
well as images
Lycos -- search the Image Gallery, a
collection of photos that are free for personal or classroom use. Lycos also
uses this MultiMedia search for
pictures, movies, streams, and sounds.
GoGraph.com -- graphics database that gives access to several thousand animated GIF's, icons, photos, and clip art.
SearchTurtle - "Simple Remote Control Web Search & Navigation" - Web, MP3, Images, News, Audio, Video
What about sounds?
What about maps?
Maps Index - city plans, earthquakes, ski slopes, historical maps
Napster, Gnutella, and the FreeNet Project, among many others, introduced a new idea -- online users searching each other's computers. When the peers are sharing computing power instead of information, it is called grid computing.
Lists of Worldwide Search Engines and Directories from About.com
African | Asian | European | Latin and South American | Middle Eastern | Oceania
African Web Pages - WoYaa!
The longest list of specialized search engines that I know is Beaucoup, where you'll find dozens of categories containing over a thousand engines, including:
Research-It!
Information Please
Whatis.com
Dictionary.com
AcronymFinder
A bunch I've used recently:
Bartleby.com: encyclopedia, dictionary, etc.
searchgov.com
searchedu.com
searchebooks.com
GuruNet: "one-click information service."
Instead of having to go to a lot of sites and use their search forms, The BigHub has brought them together in categories all on the same page. A time saver.
SearchAbility has a complete list of guides (with descriptions) to thousands of search engines by size and category, including specific search engines for academic, regional, popular, and children's topics.
What about the specialized search engine for web developers (that's you, now) at Project Cool's DevSearch? Here's a partial list of the sites indexed (click the + sign next to Other Site to see the full list):
Builder.com
devhead
High Five
HTML Help
Microsoft's
Sitebuilder
Netscape's DevEdge
Project Cool's
Developer Zone
WebCoder
Web Developer
Web Developer's Virtual Library
Web Monkey
Web Page Design for Designers
Web Reference
Web Review
Mailing lists to keep you informed:
Charles Kessler's Cool Tricks and Trinkets Newsletter
weekly insights into new, cool, useful, fun, unusual and interesting sites on the Internet
Librarian's
Index to the Internet
The Scout Report
Gleason Sackmann's Net-Happenings
Gary Price's Resource Shelf
Marylaine Block's Neat New Stuff
on the Net
|
||||||||||||||||||||||||||||||||||||