

Internet Searching
other pages
search tips
Research | Database
Searches (The Deep Web)
Reference Desk (info hubs) | News Desk
(traditional media sites)
Global News Desk
Beyond the Search Engines | The
Future of Searching
this page
tutorials | search engines
meta-search engines | keywords
directories
Best Bets
Research Buzz | Quick Search
Deskbar
The Internet is the world's biggest library. The Web has more than a billion "pages". Great, but how do you find anything? After you find it, how do you tell whether it's any good?
Wise professionals -- whatever their specialty or industry or job category or responsibility level -- will know how to search the Web effectively. What are they going to do, ask for the morning off to drive down to the Erie County Public Library?
Psst.... You at the card catalog. Your competitors are back at their offices firing up their Web browsers.
The Internet has leveled the research playing field not only for small colleges compared to the wealthy Harvards and Stanfords. It has leveled it also for small businesses compared to large ones. Money still buys a lot of privileged information, and a smart MBA with a list of good links can command some of that money in salary. One of the best things you can do as a student is develop those links as a centerpiece of every course you take. Share with other students. Ask your professors to share their lists of links.
At a large site, look for a search option.
Ricci Street's Search Bureau
Tips for searching Ricci Street
The Bistro has a separate search if you know that what you're looking for is there. It is not a search engine, the robot part. It's just the word-match part, so it's fast, it's accurate, and it searches the full text of every message at the Bistro.
How to Search the World Wide Web: A
Tutorial for Beginners and Non-Experts
The Complete Planet's
Tutorial: Guide to
Effective Searching of the Internet
Bare Bones
101
The Pandia
Goalgetter
Web
Search Strategies
Searching,
by Chris Sherman, research guru of About.Com
Goggle's Help Central
AltaVista's Advanced Search
Tutorial - especially good on the difference between directories and search
engines and on how a search engine's index works
Other
Search Tutorials
Tara Calishain's ResearchBuzz.com. Subscribe to her free email newsletter (on top
right).
Google's Desktop
Search: First Impressions
by Harry McCracken
PC World, October 17, 2004
It's bizarre when you think about it: At the moment, it's easier for most PC
users to find information in the billions of pages that make up the Web than it
is to find it on their own hard drives. That's because Windows' built-in search
tools are so crude, and Google is so good. But with Google's new
Desktop Search utility, help is at hand
-- because you can now use Google to search your drive.
No, Google isn't the first company to come up with a fast and effective disk
search tool. In fact, our current issue has
Dennis
O'Reilly's review of a bunch of such utilities. But the existing programs
haven't been all that widely adopted, most aren't free (Dennis's favorite costs
$199), and none has the advantage of integrating into the search engine that's
nearly synonymous with searching.
Install Google Desktop Search and let it index your drive, and when you go to
Google, you'll get an extra item (along with Web, Images, Groups, etc.) called
Desktop. Use it, and Google finds text inside the documents on your drive,
including Microsoft Office files, AOL IM chat sessions, cached Web pages,
Outlook and Outlook Express e-mail, and more.
It's a completely Googlesque search experience, which means it's fast,
uncluttered, and accurate.
To learn most of what you need to know about search engines, you should read all the material at Search Engine Watch. A good place to start would be the collection of tips for using search engines better. Then go to Search Engine World, which explores URL slash counts, top level domains, document size, and file types.
Daily news about search engines and the search engine industry and information on using search engines to market your web site.
How
Internet Search Engines Work
by Marshall Brain
How a Search Engine
Works
by Elizabeth Liddy
Searcher, May 2001
Search engines match queries against an index that they
create. The index consists of the words in each document, plus pointers to their
locations within the documents. This is called an inverted file. A search engine
or IR system comprises four essential modules:
> A document processor
> A query processor
> A search and matching function
> A ranking capability
While users focus on "search," the search and matching function is only one of
the four modules. Each of these four modules may cause the expected or
unexpected results that consumers get when they use a search engine.
Search Engine
Marketing
by Danny Sullivan and Paul J. Bruemmer
ClickZ column
Search Engine Marketing is both an art and science. Its goal is to optimize a site's ability to be found on Internet search engines and directories by employing relevant keywords, phrases and design. This column will help you understand the basics of optimizing your site for better positioning in search databases and will update you on search industry trends.
Web search
forum - what's next?
by Gwen Harris
Information Highways, July - August 2001
KartOO -- shows results of search on a two-dimensional "map"
Marketleap -- link popularity tool -- who is linking to your site -- benchmarking reports
Gateway to search engine optimization and submission sites and resources. Find information on how to improve your rankings and gain access to tools for Web site promotion.
Bruce Clay's URL Ranking Methodology
ways to optimize and improve search engine results with ranking and promotion advice, placement hints, tips, and clues to improve your search engine keywords relative to existing leaders. After all, better keyword ranking is your real objective.
A small software program called a bot (from robot) automatically, systematically, and frequently (weekly or monthly) scours the Web. It makes requests of every domain name, a "home page" such as RicciStreet.net or IBM.com, and follows all the links (in HTML, the <a href="x.htm">) within that domain. The files the bot encounters are harvested, that is, stored in a database and indexed.
A search indexer is another software program. It scans
the files harvested from a site, usually stripping out the irrelevancies like
HTML tags, and creates an index file according to preset criteria of relevance.
The index file stores all the words on a site in a special format for
speedy lookup.
At the site, you use a search form to type in search terms and set
various options.
When you submit the form, a database query program, sometimes called a search
engine, scans the previously created index file for matches to the search
terms.
The matches or "hits" get formatted into an HTML page called
the results listing.
The results listing is usually sorted in order of relevance, with the
closest matches at the top.
Search engines distinguish themselves by the relevance of the results and the speed with which they are returned to you.
As of September 2000, half a dozen general, Web-wide search engines dominate the lists in numbers of search requests they process and relative size of their databases. These seven:
Google
Fast (Fast
Search)
Northern Light
AltaVista
Inktomi (used by AOL, MSN,
Snap, Hotbot,
iWon, GoTo,
and LookSmart, purchased in 2003 by
Yahoo!, which will soon stop using Google)
Excite
Infoseek
rate at the top according to Search Engine Showdown. Google increasingly has the largest market share. An article in the July 8, 1999, Nature magazine reported a study done in February 1999. The authors posted in a summary of the article that estimated that the searchable Web had 800 million static pages and that none of the top six covers more than 16% of them. A year later, an Inktomi-NEC study claimed it was past 1 billion.
In February 2004, the industry had changed.
Google Bulks Up As Competition Looms (reg req)
article summary from Newscan
by AP, Los Angeles Times, February 18, 2004
Google added an additional 1 billion pages to its Web index yesterday, increasing the number of pages it indexes from 3.3 billion to 4.28 billion. The search leader said it also had doubled the number of images in its index from 400 million to 880 million. Even those impressive numbers don't come close to covering the whole Web, however, which is estimated at somewhere around 10 billion pages. Meanwhile, rivals Yahoo and Microsoft are girding for battle. Yahoo plans to dump Google as its search engine and switch over to technology acquired through its purchase last year of Inktomi and Overture. At the same time, Microsoft is spending millions to develop its own proprietary search engine to use on MSN.com. According to comScore Media Metrix, Google's Web sites handled 35% of all Web searches in December, while Yahoo claimed 27% and Microsoft 15%. AOL and other Web sites owned by Time Warner made up 16% of the market.
So Google had 62% through itself and Yahoo. Another 31% went to Microsoft and AOL. That left 7% for all the hundreds of other search sites.
The largest and fastest (thus, the "best"?), Google, claimed in mid-2001 to have a searchable database with 2.07 billion pages -- that's billion, not million.
The LA Times article above says 4.28 billion in February 2004.
Which percentage does a search engine cover?
All together, the top search engines combined covered well under half the Web in 2001. The LA Times article above says that Google jumped from 33% to 42% in one day.
For example, copy this < site:riccistreet.net +of > without the angle brackets into the Google search box. It will then list all the Ricci Street pages in its database that have the word "of", which should be almost all of them. You should get something in excess of 350. Now go to AltaVista and type in (or copy and paste) < host:riccistreet.net > without the angle brackets. It will tell you (on the far right), how many Ricci Street pages are in AltaVista's database. Every time I try it, I get a much smaller number than I do at Google. Thus, if you go to AltaVista and search for something that's on a Ricci Street page but doesn't happen to be one of the Ricci Street pages in AltaVista's database, you're not ever going to find it. Unless you get beyond using a single search engine.
How long would it take Yahoo's employees to place one billion web pages into the appropriate directories and keep the whole thing up to date?
Has anyone done a controlled study of the relative overlap?
Yes. Search Engine Showdown has a page that answers that question as well as it can be answered.
We know what's not covered:
dynamic pages, that is, pages generated from a database. The
URLs are often very long and end in .asp or .cfm, .php, or .cgi, sometimes
followed by a ? and a list of database fields and query terms. Learn more about
the "deep Web".
For example, on Ricci Street, each Bistro message is a separate file that gets called when needed. The search bots never get back there.
static pages that are orphaned. They do not have links to them
from the home page of a domain name or from any page linked to the home page.
They are sitting on a server but are available only if you type (or bookmark)
the whole URL.
The Ricci Street server contains dozens of these private pages, which I put there. Then I send that url to someone. The chance of someone other than my intended audience typing in that exact filename along the correct directory path is so slim that I don't worry about it.
The search engines are great if they give you what you want. If they don't, they're only the beginning of your search. Learn more.
Dave Bau's Quick Search Deskbar
This tiny textbox is designed for search hounds with weary mouse-fingers. Unlike the Google Toolbar, this little deskbar lets you launch searches without starting a web browser first, directly from your Windows Explorer Taskbar.
Google finally caught up with Dave ...
Search with Google from any application without
lifting your fingers from the keyboard. Installs easily in your Windows taskbar.
Key Features:
Search using Google,
even when your browser isn't running
Preview search results
in a small inset window that closes automatically
Access Google from any
application by typing Ctrl+Alt+G
Delivering the goods
by Jack Schofield
The Guardian, January 8, 2004
There's no doubting Google's power and
popularity. Yet few of us use the search engine effectively. ...
People could also get better results simply by improving their search
techniques. Few bother, which is a pity, because fruitless searches waste a lot
of time. If you make more than a dozen searches a day, then a small improvement
in your techniques can deliver dramatic benefits. With that in mind, here are my
top 10 search tips.
Watching Google Like A Hawk - News & Commentary On The World's Most Popular Search Engine
Information maze
by Cecilia Kang
Detroit Free Press, November 26, 2000
Search engines often lead you astray in your quest for knowledge.
Better Internet Search Engines is a three-part overview by Online Journalism Review columnist Paul Grabowicz. Note the links to parts two and three on the right.
In
Search of...
by Nancy Sirapyan
PC Magazine, December 5, 2000
Reviews of 20 search engines. Top honors to Google, Northern Light, HotBot and Oingo.
SearchShots -- type in keywords and see both text descriptions and thumbnail pictures of each pertinent website from a database of more than 1.3 million screenshots of websites listed in the Open Directory Project, the most comprehensive directory of websites on the Internet.
Engines Idling
Roughly
by David Lake
Industry Standard, February 9, 2001
Less than half of all Web pages are indexed by search engines, but 6 out of 10 Web surfers spend one hour or more using them each week.
Docster:
The Future of Document Delivery?
by Daniel Chudnov
oss4lib (Open Source
Systems for Libraries), April 2000
In walks docster
Imagine all the researchers you know with a new bibliographic management tool
that combined file storage with a Napster-like communications protocol --
docster. Instead of just citations, docster also stores the files themselves and
retains a connection between the citation metadata and each corresponding file.
Somewhere in the ether is a docster server to which those researchers connect.
They're reading one of their articles, and they find a new reference they want
to pull up. What to do? Just query docster for it. Docster will figure out who
else among those connected has a copy of that article and, if it's found,
requests and saves a copy for our friendly researcher.
Of course, we cannot do this. Libraries depend too much on copyright to attack
the system so directly.
A search engine is only as good as the keywords you ask it to search for. The keywords that are obvious to you may not prove fruitful. You should experiment with different ones.
You should also use more than one keyword. Depending on which engine you're using, you would combine those words differently. Most web sites where you access search engines will provide Help or Advanced Search options to tell you how to combine and group you keywords. The technical term is Boolean searching after the 19th century British mathematician who developed the logic. You'll find a summary explanation of Boolean operators on Ricci Street's Search Tips.
Tip | When you find an especially relevant page, view its source code. Look in the head's meta tags for a list of keywords for some you might add to your list of search keywords.
When keywords don't work, what about regular English sentences? Some search services use natural language searches. Wouldn't it be great to just type in questions in English instead of having to fuss with these keywords? Well, why not ask Jeeves?
What keywords do others type in? The Lycos 50 Daily Report gives you a glimpse into their traffic.
Fred Langa's More Clever Google Tricks.
Google's Advanced Search -- Use the "all of the words" box and lower down specify the domain (for example, RicciStreet.net or content.techweb.com) in the "Domains Only results from the site or domain" box.
What People Search For - Most Popular Keywords
We've created a way to determine the keyword frequency as a percentage of total words (excluding HTML tags), and compare those numbers to those of another URL. Why is someone else ranking higher than you in Infoseek, even though your meta tags are more descriptive? Check their keyword frequency against yours!
It probably also won't surprise you to learn that some "search engines" are meta-engines, also known as meta-crawlers. Most meta-engines send your search terms to one or more of the top six. MetaCrawler, SavvySearch, ProFusion, SurfWax, C|Net's Search.com, ixquick, and Mamma, The Mother of All Search Engines, will collate the results, eliminate the duplicates, and present the rest to you. The whole process happens faster than you could slide open two drawers of a library's card catalog. For many reasons, if only their sense of humor, Dogpile is my favorite.
LookOff - helping you select a search engine from thousands depending on the topic you are searching for.
LLEK-Bookmarks' Scientific Search Engines
Learn more about Usenet newsgroups. The
Newsgroup FAQs and Google's Newsgroup Archives are terrific sources of
information and experts. With GrabIt,
you can search Usenet for software: MP3s, files, programs, images, games and
more.
While I recommend using a web-based search engine, especially Google, for quick searches, you need something more powerful for the kind of in-depth research you need to do as a professional.
I recommend a meta-search software program called Copernic 4.55. It's a separate piece of software that you will download and run when you are online. It will send your search terms to many search engines and quickly collate the results. Even better, it will let you save and reuse the results on your computer.
Search Engines Worldwide -- a collection of 1,400+ search engines sorted by the country as well as region.
If you've been doing this Internet stuff for a while, it probably won't surprise you to learn that common knowledge is often wrong. For example:
Yahoo is not a search engine.
Yahoo ...
is a subject
directory compiled by humans that covers way less than a quarter of one
percent of the possible sites, although it covers most of the popular sites.
almost always links to
home pages, not to specific pages within a site.
is full of links that
don't work any more.
is browsable but not
searchable.
has a Sponsored Sites program, which
"allows commercial web sites already listed in the Yahoo! Directory to
enhance their placement."
The difference:
for a search
engine, you use keywords and get a list of results from a database
for a directory,
you click on increasingly specialized topic subsets, like thinner and thinner
branches on a tree, until you reach lists of links.
You can, however, search the Yahoo directory as well as launch a full Internet search while you're at Yahoo. They used Inktomi until mid-2000 when they switched to Google. In other words, Yahoo will search Google's index if there are no results in the Yahoo directory. Why not go to Google in the first place?
Other popular directories include LookSmart, Snap, Network Solutions' dot com, and the World Wide Web Virtual Library. Note its marketing links page.
The Open Directory project is attempting to build a structure for a self-organizing directory. Google uses it.
As the web grows, automated search engines and directories
with small editorial staffs will be unable to cope with the volume of sites.
The Open Directory Project's goal is to produce the most comprehensive directory
of the web, by relying on a vast army of volunteer editors.
They listed, on August 31, 2000, that they 2,041,461 sites put into 309,934 categories by 28,882 editors. By April 17, it was 3,296,572 sites, 344,574 categories, and 47,836 editors. They have a long way to go. The best part is the brief annotation that each entry carries. How current will they be able to keep it?
By November 2003, they boasted over 3.8 million sites in over 460,000 categories by 59,855 editors.
|
||||||||||||||||||||||||||||||||||||