Ricci Street < Gizmos, Inc. < Toolkit < Webmaking || search | sitemap | help
plaza | theater | bistro
|
spacer

Gizmos, Inc. logoThe Anatomy of a Web Site

The PC Workshop

other pages about Webmaking
browser | plug-ins | HTML | your first web page | web page anatomy |
links | web site anatomy | file transfer protocol |
site management system | weblog | page-editing software | FrontPage |
style sheets | Document Object Model | mouseovers

other sections in the PC Workshop
operating system | business media
collaboration | office productivity | webtop services

other Ricci Street pages
mba 504 | mba 600
basic skills form

this page
what is a web site?
how to read a URL | the home page
what's on the site? | what's behind the scenes?
what's out on the Web? | where are the files?

FAQ


To many Web surfers, the Web site itself is a classic "black box". There could be chipmunks on treadmills making it all run. Many people are afraid to learn about what's going on because it seems too geeky. As long as the site works, they don't really care.

a note on terminology

It gets confusing when people use "site" to mean every unit from a page to a home page to a web to a web site.

They say, "Go to that site," when they mean "Go to that page." They say, "On that site," when the mean "On that page."

Make sure that we're using web making terms such as site, web, and page the same way.

W3C's Web Characterization Terminology

The Web has proceeded for a surprisingly long time without consistent definitions for concepts which have become part of the common vernacular, such as "Web site" or "Web page". This can lead to a great deal of confusion when attempting to develop, interpret, and compare Web metrics.

up to the top of the page

What is a web site?

Some web sites have only a home page and a couple of pages linked to it and probably to each other. These sites are common, simple, and static, a set of linked .htm files in the same directory.

Other web sites have hundreds or even thousands of pages, some assembled on the fly from pieces in a database. Seldom have you explored all the parts of one of these complex and dynamic sites.

The simple and complex sites have enough in common that we can learn a lot from a site like Ricci Street between these extremes. Most importantly, all sites use the same protocol, hypertext transfer protocol. HTTP puts constraints on them, as does the operating system, that force similar structures.

That's behind the scenes, where most users never venture. On the screen, web pages are as dissimilar as rooms, web sites as dissimilar as buildings. Yes, all the rooms have walls, floors, and ceilings made out of wood and metal, but they're decorated and furnished and populated very differently. A person's experience in a room is another level of structure.

physical spaces

information spaces

building

web site

wood, metal, concrete

files and folders

architecture

site design

hallways and stairs

navigational link system

furnishing and decorations

images and styled text

user's experience
and mental map

user's experience
and mental map

People form mental models of the rooms in a building. When you go into a building for the first time, you form a mental map of how to get from one room to another. If you spend a day or two, you quickly adjust your map to be more accurate.

In that sense, there are as many different web sites as there are people clicking around in them. Many sites also have site maps, usually list of pages, but they may be color-coded or even arranged according to some metaphor. For example, Ricci Street uses a geographic metaphor for top-level navigation.

Just as all buildings have infrastructures and floor plans, so web pages are arranged in hierarchical directory structures that probably have little similarity to the web maker's site map or the web user's mental map.

To make Ricci Street more instructive, I have made the underlying directory structure similar to the surface structure.

I recommend that you do what you can do to make your web pages conform to industry standards and best practices as well as the most common practices developed in the late-90's and guided by Tim Berners-Lee's World Wide Web Consortium at M.I.T.

up to the top of the page

How to read a URL

Every document on the Web has a unique address, the uniform resource identifier or locator (URI or URL). This address is a key part of Tim Berners-Lee's invention. He writes:

The Web is an information space. Human beings have a lot of mental machinery for manipulating, imagining, and finding their way in spaces. URIs are the points in that space.

Unlike web data formats, where HTML is an important one, but not the only one, and web protocols, where HTTP has a similar status, there is only one Web naming/addressing technology: URIs.

Uniform Resource Identifiers (URIs, aka URLs) are short strings that identify resources in the web: documents, images, downloadable files, services, electronic mailboxes, and other resources. They make resources available under a variety of naming schemes and access methods such as HTTP, FTP, and Internet mail addressable in the same simple way. They reduce the tedium of "log in to this server, then issue this magic command ..." down to a single click.

For example, the address (URI / URL) of this page is  

http://RicciStreet.net/gizmos/toolkit/webmaking/websiteanatomy.htm

The first part is the protocol. Other protocols on the Internet include ftp and mailto:

The next part is the domain name, RicciStreet.net (not case sensitive). RicciStreet.net is registered to me. The .net, like .com, .edu, .org and the country codes like .hu and uk, is the TLD or top-level domain.

The next part, gizmos (case-sensitive from there on), is path, the sequence of directories or folders through the RicciStreet.net hierarchy to get to the folder containing the page. The gizmos directory has files and subdirectories, one of which is toolkit. The toolkit directory has files and subdirectories, one of which is webmaking, which has many files, one of which is websiteanatomy.htm.

The final part is the file name, in this case websiteanatomy.htm. If you don't ask for a specific file, that is if you request only the domain and path, for example, RicciStreet.net/gizmos/toolkit/webmaking/, you will get the default page, in this case, index.html. I have an index.html page in every directory that has HTML files.

up to the top of the page

The home page

By convention, the default file name for a folder is index.html. This convention lets you use shorter addresses. For example, when you use the address http://RicciStreet.net, the page that will display is the index.html file in the http://RicciStreet.net folder. Its full address is http://RicciStreet.net/index.html.

For your Plaza web, it means you can give out the already too long address http://RicciStreet.net/dwares/plaza/lastname, with or without the trailing slash. What will display is the file index.html in that folder.

As a principle of sound web construction, every folder in your web should have an index.html file as its home page or index page. That page should at the least link to every other page in that folder as well as back up in the web's hierarchy, often to the whole web's home page.

I have followed this principle on Ricci Street with one exception. If you go to any RicciStreet.net address and remove parts of the address back to any slash, you will get the index.html file in that folder.

Here's the exception. If you don't put a default page in every one of your directories, your user will see a plain dump of all the file names, for example, RicciStreet.net/gizmos/toolkit/images/. The server is saying, "Pick one."

Partial URLs

Once you are viewing a document located somewhere on a server (say, the document at RicciStreet.net/gizmos/toolkit/webmaking/websiteanatomy.htm), you can use a partial, or relative, URL to point to another file in the same directory, on the same machine, being served by the same server software. For example, if another file exists in that same directory called "webedit.htm", then webedit.htm is a valid partial URL at that point.

This provides an easy way to build sets of hypertext documents. If a set of hypertext documents are sitting in a common directory, they can refer to one another (i.e., be hyperlinked) by just their filenames. However readers get to one of the documents, they can jump to any other document in the same directory by merely using the other document's filename as the partial URL at that point. The additional information (access method, hostname, port number, directory name, etc.) will be assumed based on the URL used to reach the first document.

up to the top of the page

What's on the site?

How to FTP files to the server

When I use my FTP client, I can see my C: drive directory on the left and the server directory in North Carolina on the right. Note the differences in the paths. I keep the content of the site updated by making or editing pages on my desktop and then transferring the files frequently. Transfer text files like .htm as ASCII and .gif and .jpg files as binary.

FTP interface

When your browser asks the server for http://RicciStreet.net/, without or without the slash, the server will return the index.html page. When the browser asks for http://RicciStreet.net/dwares/, with or without the slash, the sever will return the index.html page from the screenshot below.

Digital Wares directory

Again, note how the path is growing. When the browser asks for http://RicciStreet.net/dwares/lane/, with or without the slash, the sever will return the index.html page from the screenshot below. At this level, I start using templates. The thanks.htm page is what you should see after you fill out a form anywhere on my course webs.

Lantern Lane directory

When the browser asks for http://RicciStreet.net/dwares/lane/mba600/ with or without the slash, the sever will return the index.html page from the screenshot below. However, if it asks for a specific page that's in this directory, the server will return it. If it asks for a page that's not in this directory, you'll see some sort of error page depending on the server's configuration.

MBA 600 directory

You probably thought some of the course web URLs were long. Take a look at the path on the screenshot above.

up to the top of the page

What's behind the scenes?

the httpd directory

Let's back up and go behind the scenes. The httpd is called the daemon. If you download this web server from Apache.org, this is what you'll get. I'm using Apache 1.3.7 as well as the Linux operating system, which is by far the most common combination.

Apache daemon directories

Starting from the bottom, the logs directory stores the traffic information. The icons directory stores the icons used in various administrative interfaces. The htdocs directory stores what you experience as "the Web site". For the conf directory, see the next paragraph. The cgi-src directory stores source code for cgi (Common Gateway Interface) files. Mine is empty because I store them on my desktop. For the cgi-bin directory, see below.

the conf directory

Apache configuration files

The conf directory stores the configuration files. The access file is where I define the services and features available on all or parts of the site.

The httpd file is the main Apache server configuration file. It contains the configuration directives that give the server its instructions based upon the NCSA server configuration files originally by Rob McCool.

The mime types file lists the media types that are sent to the browser or other client. Sending the correct media type to the client is important so it knows how to handle the content of the file. You give similar information to your PC's operating system when you set your file associations. For more information about Internet media types, download the registry.

The srm file defines the server settings which affect how requests are serviced and the results formatted.

the cgi-bin directory

Finally, the cgi-bin directory (for Common Gateway Interface - binary) is where I store the scripts and password information that let you, among other things:

diamond bulletsend email from Ricci Street
diamond bulletfill out forms
diamond bulletsearch Ricci Street
diamond bulletparticipate at the Bistro

Ricci Street's cgi-bin directory

If this were my server alone and I were the only one who would see these directories, I probably would have it set up differently. I would have changed the names of several files and directories. However, I share it with other faculty members and students need to be able to generalize from it. Thus, I have left as many of the defaults as possible while still being able to do what I need to do.

the other directories

As you can see from the file path in these screen shots, there are usr, local, and etc directories further up the tree structure to the / or root directory.

I need to add screen shots and explanations.

up to the top of the page

What's out on the Web?

Within the htdocs directory is everything that you experience through a web browser. The names and slashes in the URL correspond to the directory structure on the server.

Last year, I made a directory for every team that has given me a team name: biorite, events, i3, suresire, unlimited. Here's a reduced-size screen shot of what's on the mba600 directory on the server in N Carolina. The others, erp, mp3, music, and travel, are from last semester; note the dates of last update.

Right now, your directories are empty except for a sub-directory called "images", which itself is empty. If you try this URL:

http://RicciStreet.net/dwares/lane/mba600/biorite/

the server will look for a default page and return an error message or a default directory listing because it won't find an index.html page.

This year, I have a directory for every student who wants one at Parkside Plaza, and the same logic applies.

http://RicciStreet.net/dwares/plaza/ 

Add your last name and you'll see the index page.

How to FTP files to the server

up to the top of the page

Where are the files?

Mirror your web on your hard drive

The source specifies what image as well as where the image is. In the examples below, let's say you are inserting an image file called "copper.gif". Using an absolute URL (or complete address), the source would be "http://domainname.com/images/copper.gif".

Using a relative URL (or partial address), the browser will look for the image named copper.gif in the same folder (or directory) as the html document itself.

The key to relative URL's is to exactly duplicate your directory structure on the server and on your PC. Then the key to using FrontPage is to have FrontPage recognize the "root" of that directory structure as a "FrontPage web".

If you have FrontPage doing it correctly, you should see the whole web in the folder list when you have the web open in FrontPage. In the folder list, the very top-left yellow folder icon should say C:\ and then the path to the web on your PC. As you can see on the screen shot on the right of FrontPage open to the page you are now reading, my root directory for Ricci Street reads C:\Windows\Desktop\riccistreet. For your purposes, this is equivalent to the plaza/yourname directory on the server. What is in it should have the same set of subdirectories as what's on the server.

Directly underneath the C:\ yellow folder on the folder list should be a folder named _private. There may be some other underscored folders such as _vti_cnf. And underneath them should be all the folders (same thing as directories) that are on the server.

That's the key to making the links work.

up to the top of the page

FAQ

My guess is that you didn't save company.htm to where the home page could find it.

If the link in index.html (your home page) goes to href="company.htm", then the two .htm files should be in the same directory.

Let's say, on the other hand, you have index.html in the yourlastname directory. Within it, you make a subdirectory called mba624 and within that you make a directory called pets and within that directory, you put the company.htm file. Then the link from index.html to company.htm will read href="mba624/pets/company.htm". This is called the path from one file to the other. You phrase is, "Go down two directories, through the mba624 directory into the pets directory and find the file company.htm."

Now what about going in the other direction?

In the file company.htm, you want to say "Go up two directories and find the file index.html." The link will read href="../../index.html".

You probably have absolute paths to images on your computer's hard drive instead of relative paths to the images in your images folder.

Why? 1) You edited your pages with FrontPage but not inside a FrontPage Web -- a web created by doing File > New > Web. 2) You inserted your images by browsing to them on your hard drive.

You should always drag/drop your images into your images folder and then insert them onto pages by browsing to the images within your web.

Let's look at the same thing from another point of view. Below are a few diagrams, thanks to the original version (1996) of the terrific tutorials now at Joe Barta's Page Tutor.

You will find a complementary discussion about relative links, which must take into account this

SRC="copper.gif"

The image is in the same folder as the html document calling for it.

SRC="images/copper.gif"

The image is one folder down from the html document calling for it. You can go down as many levels as necessary.

SRC="../copper.gif"

The image is in one folder up from the html document calling for it. You can go up as many levels as necessary.

SRC="../../copper.gif"

For example, this image is two folders up from the html document calling for it.

SRC="../images/copper.gif"

The image is one folder up and then another folder down in the images directory.

SRC="../../../other/images/copper.gif"

The image is three folders up and then two folders down. This path would let you call an image from another Ricci Street neighborhood.

Because you can build your site locally with its own relative integrity and all the links will work. When your pages are done, you just upload the whole set to your server and everything will work just fine. You could move it to another server as a set. Call it a web with a small "w". Another advantage is that it is easier for the browser to get the images and other pages and everything will load faster. Learn more about relative or partial URLs.

Is there ever a reason to use an absolute URL? Sure, if the page you're linking to resides on a completely different server or different domain.

If you have more questions, please email me. If you'd like credit or a personal response, don't forget to include your name and email address.

up to the top of the page


Toolkit


Gizmos, Inc.

Showroom
information design

Playroom
interactivity design

Research Lab
usability design

Workbench
web design applications

Kiln
digital development process

Toolkit
digital technology guide


Ricci Street

search | sitemap | help

Ricci Green | Digital Wares | Gizmos, Inc.
CyberSea Inn | Port 80


modified: August 15, 2004
by Douglas Anderson
http://RicciStreet.net/gizmos/toolkit/webmaking/websiteanatomy.htm