Port 80 logoLighthouse

on Port 80: the harbor at the end of Ricci Street

Matteo Ricci, your hostInternet Searching

Information and Research

other pages
search tips
Research | Database Searches (The Deep Web)
Reference Desk (info hubs) | News Desk (traditional media sites)
Global News Desk

Beyond the Search Engines | The Future of Searching

Researching a Company

this page
tutorials | search engines
meta-search engines | keywords
directories

Best Bets
Research Buzz | Quick Search Deskbar


The Internet is the world's biggest library. The Web has more than a billion "pages". Great, but how do you find anything? After you find it, how do you tell whether it's any good?

Wise professionals -- whatever their specialty or industry or job category or responsibility level -- will know how to search the Web effectively. What are they going to do, ask for the morning off to drive down to the Erie County Public Library?

Psst.... You at the card catalog. Your competitors are back at their offices firing up their Web browsers.

The Internet has leveled the research playing field not only for small colleges compared to the wealthy Harvards and Stanfords. It has leveled it also for small businesses compared to large ones. Money still buys a lot of privileged information, and a smart MBA with a list of good links can command some of that money in salary. One of the best things you can do as a student is develop those links as a centerpiece of every course you take. Share with other students. Ask your professors to share their lists of links.

Site Searching

At a large site, look for a search option.

Ricci Street's Search Bureau

Tips for searching Ricci Street

The Bistro has a separate search if you know that what you're looking for is there. It is not a search engine, the robot part. It's just the word-match part, so it's fast, it's accurate, and it searches the full text of every message at the Bistro.

up to the top of the page

Tutorials

gsgreen.gif (53 bytes)How to Search the World Wide Web: A Tutorial for Beginners and Non-Experts
gsgreen.gif (53 bytes)The Complete Planet's Tutorial: Guide to Effective Searching of the Internet
gsgreen.gif (53 bytes)Bare Bones 101
gsgreen.gif (53 bytes)The Pandia Goalgetter
gsgreen.gif (53 bytes)Web Search Strategies
gsgreen.gif (53 bytes)Searching, by Chris Sherman, research guru of About.Com
gsgreen.gif (53 bytes)Goggle's Help Central
gsgreen.gif (53 bytes)AltaVista's Advanced Search Tutorial - especially good on the difference between directories and search engines and on how a search engine's index works
gsgreen.gif (53 bytes)Other Search Tutorials

up to the top of the page

Best Bet

gsgreen.gif (53 bytes)Tara Calishain's ResearchBuzz.com. Subscribe to her free email newsletter (on top right).

Google's Desktop Search: First Impressions
by Harry McCracken
PC World, October 17, 2004

It's bizarre when you think about it: At the moment, it's easier for most PC users to find information in the billions of pages that make up the Web than it is to find it on their own hard drives. That's because Windows' built-in search tools are so crude, and Google is so good. But with Google's new Desktop Search utility, help is at hand -- because you can now use Google to search your drive.

No, Google isn't the first company to come up with a fast and effective disk search tool. In fact, our current issue has Dennis O'Reilly's review of a bunch of such utilities. But the existing programs haven't been all that widely adopted, most aren't free (Dennis's favorite costs $199), and none has the advantage of integrating into the search engine that's nearly synonymous with searching.

Install Google Desktop Search and let it index your drive, and when you go to Google, you'll get an extra item (along with Web, Images, Groups, etc.) called Desktop. Use it, and Google finds text inside the documents on your drive, including Microsoft Office files, AOL IM chat sessions, cached Web pages, Outlook and Outlook Express e-mail, and more.

It's a completely Googlesque search experience, which means it's fast, uncluttered, and accurate.

up to the top of the page

Search Engines

To learn most of what you need to know about search engines, you should read all the material at Search Engine Watch. A good place to start would be the collection of tips for using search engines better. Then go to Search Engine World, which explores URL slash counts, top level domains, document size, and file types.

Search Engine Guide

Daily news about search engines and the search engine industry and information on using search engines to market your web site.

How Internet Search Engines Work
by Marshall Brain

How a Search Engine Works
by Elizabeth Liddy
Searcher, May 2001

Search engines match queries against an index that they create. The index consists of the words in each document, plus pointers to their locations within the documents. This is called an inverted file. A search engine or IR system comprises four essential modules:

> A document processor
> A query processor
> A search and matching function
> A ranking capability

While users focus on "search," the search and matching function is only one of the four modules. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine.

Search Engine Marketing
by Danny Sullivan and Paul J. Bruemmer
ClickZ column

Search Engine Marketing is both an art and science. Its goal is to optimize a site's ability to be found on Internet search engines and directories by employing relevant keywords, phrases and design. This column will help you understand the basics of optimizing your site for better positioning in search databases and will update you on search industry trends.

Web search forum - what's next?
by Gwen Harris
Information Highways, July - August 2001

KartOO -- shows results of search on a two-dimensional "map"

Marketleap -- link popularity tool -- who is linking to your site -- benchmarking reports

Pandia SEO

Gateway to search engine optimization and submission sites and resources. Find information on how to improve your rankings and gain access to tools for Web site promotion.

Bruce Clay's URL Ranking Methodology

ways to optimize and improve search engine results with ranking and promotion advice, placement hints, tips, and clues to improve your search engine keywords relative to existing leaders. After all, better keyword ranking is your real objective.

Step by step

A small software program called a bot (from robot) automatically, systematically, and frequently (weekly or monthly) scours the Web. It makes requests of every domain name, a "home page" such as RicciStreet.net or IBM.com, and follows all the links (in HTML, the <a href="x.htm">) within that domain. The files the bot encounters are harvested, that is, stored in a database and indexed.

A search indexer is another software program. It scans the files harvested from a site, usually stripping out the irrelevancies like HTML tags, and creates an index file according to preset criteria of relevance.

The index file stores all the words on a site in a special format for speedy lookup.

At the site, you use a search form to type in search terms and set various options.

When you submit the form, a database query program, sometimes called a search engine, scans the previously created index file for matches to the search terms.

The matches or "hits" get formatted into an HTML page called the results listing.

The results listing is usually sorted in order of relevance, with the closest matches at the top.

Search engines distinguish themselves by the relevance of the results and the speed with which they are returned to you.

Popular search engines

As of September 2000, half a dozen general, Web-wide search engines dominate the lists in numbers of search requests they process and relative size of their databases. These seven:

Google
Fast (Fast Search)
Northern Light
AltaVista
Inktomi (used by AOL, MSN, Snap, Hotbot, iWon, GoTo, and LookSmart, purchased in 2003 by Yahoo!, which will soon stop using Google)
Excite
Infoseek

rate at the top according to Search Engine Showdown. Google increasingly has the largest market share. An article in the July 8, 1999, Nature magazine reported a study done in February 1999. The authors posted in a summary of the article that estimated that the searchable Web had 800 million static pages and that none of the top six covers more than 16% of them. A year later, an Inktomi-NEC study claimed it was past 1 billion.

In February 2004, the industry had changed.

Google Bulks Up As Competition Looms (reg req)
article summary from Newscan
by AP, Los Angeles Times, February 18, 2004

Google added an additional 1 billion pages to its Web index yesterday, increasing the number of pages it indexes from 3.3 billion to 4.28 billion. The search leader said it also had doubled the number of images in its index from 400 million to 880 million. Even those impressive numbers don't come close to covering the whole Web, however, which is estimated at somewhere around 10 billion pages. Meanwhile, rivals Yahoo and Microsoft are girding for battle. Yahoo plans to dump Google as its search engine and switch over to technology acquired through its purchase last year of Inktomi and Overture. At the same time, Microsoft is spending millions to develop its own proprietary search engine to use on MSN.com. According to comScore Media Metrix, Google's Web sites handled 35% of all Web searches in December, while Yahoo claimed 27% and Microsoft 15%. AOL and other Web sites owned by Time Warner made up 16% of the market.

So Google had 62% through itself and Yahoo. Another 31% went to Microsoft and AOL. That left 7% for all the hundreds of other search sites.

The largest and fastest (thus, the "best"?), Google, claimed in mid-2001 to have a searchable database with 2.07 billion pages -- that's billion, not million.

The LA Times article above says 4.28 billion in February 2004.

Which percentage does a search engine cover?

All together, the top search engines combined covered well under half the Web in 2001. The LA Times article above says that Google jumped from 33% to 42% in one day.

For example, copy this < site:riccistreet.net +of > without the angle brackets into the Google search box. It will then list all the Ricci Street pages in its database that have the word "of", which should be almost all of them. You should get something in excess of 350. Now go to AltaVista and type in (or copy and paste) < host:riccistreet.net > without the angle brackets. It will tell you (on the far right), how many Ricci Street pages are in AltaVista's database. Every time I try it, I get a much smaller number than I do at Google. Thus, if you go to AltaVista and search for something that's on a Ricci Street page but doesn't happen to be one of the Ricci Street pages in AltaVista's database, you're not ever going to find it. Unless you get beyond using a single search engine.

Has anyone done a controlled study of the relative overlap?

Yes. Search Engine Showdown has a page that answers that question as well as it can be answered.

We know what's not covered:

dynamic pages, that is, pages generated from a database. The URLs are often very long and end in .asp or .cfm, .php, or .cgi, sometimes followed by a ? and a list of database fields and query terms. Learn more about the "deep Web".

For example, on Ricci Street, each Bistro message is a separate file that gets called when needed. The search bots never get back there.

static pages that are orphaned. They do not have links to them from the home page of a domain name or from any page linked to the home page. They are sitting on a server but are available only if you type (or bookmark) the whole URL.

The Ricci Street server contains dozens of these private pages, which I put there. Then I send that url to someone. The chance of someone other than my intended audience typing in that exact filename along the correct directory path is so slim that I don't worry about it.

up to the top of the page

Best Bet

Dave Bau's Quick Search Deskbar

This tiny textbox is designed for search hounds with weary mouse-fingers. Unlike the Google Toolbar, this little deskbar lets you launch searches without starting a web browser first, directly from your Windows Explorer Taskbar.

Google finally caught up with Dave ...

Google Desktop

Search with Google from any application without lifting your fingers from the keyboard. Installs easily in your Windows taskbar.

Key Features:

Search using Google, even when your browser isn't running
Preview search results in a small inset window that closes automatically
Access Google from any application by typing Ctrl+Alt+G

Delivering the goods
by Jack Schofield
The Guardian, January 8, 2004

There's no doubting Google's power and popularity. Yet few of us use the search engine effectively. ...

People could also get better results simply by improving their search techniques. Few bother, which is a pity, because fruitless searches waste a lot of time. If you make more than a dozen searches a day, then a small improvement in your techniques can deliver dramatic benefits. With that in mind, here are my top 10 search tips.

Reviews

Watching Google Like A Hawk - News & Commentary On The World's Most Popular Search Engine

Information maze
by Cecilia Kang
Detroit Free Press, November 26, 2000

Search engines often lead you astray in your quest for knowledge.

Better Internet Search Engines is a three-part overview by Online Journalism Review columnist Paul Grabowicz. Note the links to parts two and three on the right.

In Search of...
by Nancy Sirapyan
PC Magazine, December 5, 2000

Reviews of 20 search engines. Top honors to Google, Northern Light, HotBot and Oingo.

SearchShots -- type in keywords and see both text descriptions and thumbnail pictures of each pertinent website from a database of more than 1.3 million screenshots of websites listed in the Open Directory Project, the most comprehensive directory of websites on the Internet.

Engines Idling Roughly
by David Lake
Industry Standard, February 9, 2001

Less than half of all Web pages are indexed by search engines, but 6 out of 10 Web surfers spend one hour or more using them each week.

Docster: The Future of Document Delivery?
by Daniel Chudnov
oss4lib (Open Source Systems for Libraries), April 2000

In walks docster

Imagine all the researchers you know with a new bibliographic management tool that combined file storage with a Napster-like communications protocol -- docster. Instead of just citations, docster also stores the files themselves and retains a connection between the citation metadata and each corresponding file. Somewhere in the ether is a docster server to which those researchers connect. They're reading one of their articles, and they find a new reference they want to pull up. What to do? Just query docster for it. Docster will figure out who else among those connected has a copy of that article and, if it's found, requests and saves a copy for our friendly researcher.

Of course, we cannot do this. Libraries depend too much on copyright to attack the system so directly.

up to the top of the page

Keywords

A search engine is only as good as the keywords you ask it to search for. The keywords that are obvious to you may not prove fruitful. You should experiment with different ones.

You should also use more than one keyword. Depending on which engine you're using, you would combine those words differently. Most web sites where you access search engines will provide Help or Advanced Search options to tell you how to combine and group you keywords. The technical term is Boolean searching after the 19th century British mathematician who developed the logic. You'll find a summary explanation of Boolean operators on Ricci Street's Search Tips.

When keywords don't work, what about regular English sentences? Some search services use natural language searches. Wouldn't it be great to just type in questions in English instead of having to fuss with these keywords? Well, why not ask Jeeves?

What keywords do others type in? The Lycos 50 Daily Report gives you a glimpse into their traffic.

Fred Langa's More Clever Google Tricks.

Google's Advanced Search -- Use the "all of the words" box and lower down specify the domain (for example, RicciStreet.net or content.techweb.com) in the "Domains Only results from the site or domain" box.

What People Search For - Most Popular Keywords

keywordcount.com

We've created a way to determine the keyword frequency as a percentage of total words (excluding HTML tags), and compare those numbers to those of another URL. Why is someone else ranking higher than you in Infoseek, even though your meta tags are more descriptive? Check their keyword frequency against yours!

up to the top of the page

Meta-Search Engines

It probably also won't surprise you to learn that some "search engines" are meta-engines, also known as meta-crawlers. Most meta-engines send your search terms to one or more of the top six. MetaCrawler, SavvySearch, ProFusion, SurfWax, C|Net's Search.com, ixquick, and Mamma, The Mother of All Search Engines, will collate the results, eliminate the duplicates, and present the rest to you. The whole process happens faster than you could slide open two drawers of a library's card catalog. For many reasons, if only their sense of humor, Dogpile is my favorite.

LookOff - helping you select a search engine from thousands depending on the topic you are searching for.

LLEK-Bookmarks' Scientific Search Engines

Learn more about Usenet newsgroups. The Newsgroup FAQs and Google's Newsgroup Archives are terrific sources of information and experts. With GrabIt, you can search Usenet for software: MP3s, files, programs, images, games and more.

search tools

While I recommend using a web-based search engine, especially Google, for quick searches, you need something more powerful for the kind of in-depth research you need to do as a professional.

I recommend a meta-search software program called Copernic 4.55. It's a separate piece of software that you will download and run when you are online. It will send your search terms to many search engines and quickly collate the results. Even better, it will let you save and reuse the results on your computer.

Search Engines Worldwide -- a collection of 1,400+ search engines sorted by the country as well as region.

up to the top of the page

Directories

If you've been doing this Internet stuff for a while, it probably won't surprise you to learn that common knowledge is often wrong. For example:

Yahoo is not a search engine.

Yahoo ...

is a subject directory compiled by humans that covers way less than a quarter of one percent of the possible sites, although it covers most of the popular sites.
almost always links to home pages, not to specific pages within a site.
is full of links that don't work any more.
is browsable but not searchable.
has a Sponsored Sites program, which "allows commercial web sites already listed in the Yahoo! Directory to enhance their placement."

The difference:

for a search engine, you use keywords and get a list of results from a database

for a directory, you click on increasingly specialized topic subsets, like thinner and thinner branches on a tree, until you reach lists of links.

You can, however, search the Yahoo directory as well as launch a full Internet search while you're at Yahoo. They used Inktomi until mid-2000 when they switched to Google. In other words, Yahoo will search Google's index if there are no results in the Yahoo directory. Why not go to Google in the first place?

Other popular directories include LookSmart, Snap, Network Solutions' dot com, and the World Wide Web Virtual Library. Note its marketing links page.

The Open Directory project is attempting to build a structure for a self-organizing directory. Google uses it.

They listed, on August 31, 2000, that they 2,041,461 sites put into 309,934 categories by 28,882 editors. By April 17, it was 3,296,572 sites, 344,574 categories, and 47,836 editors. They have a long way to go. The best part is the brief annotation that each entry carries. How current will they be able to keep it?

By November 2003, they boasted over 3.8 million sites in over 460,000 categories by 59,855 editors.

up to the top of the page


your host, Matteo RicciLighthouse logo

information and research


Port 80

Customhouse
concepts and buzzwords

Charthouse
trends and currents

Boardwalk
people and communities

Lighthouse
information and research

Shoreline
issues and policies

Docks
systems and processes


Ricci Street

search | sitemap | help

Ricci Green | Digital Wares | Gizmos, Inc.
CyberSea Inn | Port 80


modified: April 17, 2002
by Douglas Anderson
http://RicciStreet.net/port80/lighthouse/searching/index.html