Home | Resumé | COMS 490 | MothDraw | Links

COMS 490
Communication Programming
Resource page

If you're a Mac OS X user, you may appreciate Apple's new Safari browser. It's small, fast and nearly invisible: it hides itself behind the content you actually want to see. That and it's standards-friendly, open-source and not made by Microsoft. It'll be shipping with MacOS from now on, so testing on it is probably a good idea if you have OS X access. If not, it's very close to Mozilla with the font zoom at about 80% in how it handles markup.

Notes

week 1: introductions

Pleased to meet you. I'll do my best to get everyone's names straight as quickly as possible... but please don't be offended if I need reminding for the first little while. I'll be one of your technical resources (along with Marie-Christiane, each other, the Internet and paper publications) over the term. My particular interests lie in programming, computer modelling and simulation, accessibility, interoperability and efficiency. I've put together a bunch of resources I think you'll find useful throughout the course (although not so much in the next week or two). In addition to reading the content of this page, you may want to view its source: I've added comments every few lines explaining roughly what's going on. I haven't commented it yet, but here's the CSS that goes with this page: {generic_page.css ; non47extensions.css}

Here's a link to some vintage COMS 490 material. There is a bunch of sample Flash code as well as a demonstration of how to do pop-up windows with JavaScript and HTML.

You may want to browse past some of the technical specifications for HTML, XHTML and CSS... they are quite dry reading, but describe the languages exhaustively. This page is written in XHTML 1.0 and laid out using CSS level 2.

You may also want to get yourself a (free) copy of Mozilla. Version 1.0 is stable and in wide use, version 1.1 has a few more features and is a little faster. When testing out web pages, Mozilla gives you the closest approximation of standards-compliant display. Once a page looks nice in Mozilla, you can then fix it up so it also looks good in other, older and/or buggier browsers (e.g. Lynx, pre-6 Netscape, IE). Mozilla also has a setting that lets you turn off unsolicited popup windows and a first-rate debugging tool (the DOM inspector) for HTML.

For accessibility testing of pages, nothing beats Lynx, as it displays only text, roughly in the order that a blind person listening to the page using a speech agent would hear it. If your page makes sense in Lynx, it's probably well-formed under all the visual stuff, and it probably conforms to legislation in some countries (e.g. the USA) regarding Web accessibility.

A brief note on contacting me by e-mail and sending me files: I use a dial-up connection of sometimes dubious stability to get my e-mail. If you have a question related to a Flash file, or want to show me something you've done, please follow a couple of guidelines: keep the size small (use a Zip or Stuffit utility to compress the file, send multiple files in their own e-mails, try and put together a minimal test case or piece of a project rather than a large project) and send me only the working file (the .fla — not the .swf file — file in the case of Flash, a zipped or stuffed folder of files for a Web site) or the URI of any Web work. Under 100kb per e-mail is fine, you might want to just describe the problem (I'll ask for the file if necessary) or bring me a CD to look at if it's substantially bigger. I will do my best to get you a coherent answer to your e-mails (or at least a note to say I'm working on it) within 48 hours.

See you next week!

week 2: Photoshop in a nutshell

The Photoshop framework

At a fundamental level, Photoshop defines a grid of locations for each image, and then assigns colours to every point on that grid. Understanding this should help in understanding Photoshop's strengths and weaknesses.

Location and shape

Photoshop is a program designed to take images from outside sources (e.g. scanned paper/film, digital photos or video stills, downloaded (legally, of course) images), allow the user to edit and combine them, and then return the results in any number of useful (specialized) formats. Images of this type are most usefully dealt with in raster format: any format which represents an image as a sequence of discrete pixels. Pixel is short for "picture element", and is a little bit like a tile in a mosaic. An image will have a set number of pixels per unit of length (usually some number of pixels per inch, although you can set them by more esoteric units like centimeters or picas), this is the image's resolution. Pixels can, but don't need to, correspond to screen dots or clusters of printer dots. Unlike a run of the mill mosaic, and some of its competition, Photoshop can have several layers of pixels in a file and create composite pixels for the screen on the fly. The final picture is a combination of all the layers, each pixel in each layer letting some percentage of the layers below it through depending on its transparency or alpha. This makes smooth transitions and fades between images easy... and can lead to overly busy, eyesore pieces if one is not careful. Another caveat with pixels generally: because every pixel is a little discrete dot, finding edges that make visual sense can be hard, particularly in cases where the area for which an edge is needed has subtle colour variation or a smooth gradient. Photoshop's "magic wand" tool has the job of finding edges, and it often takes several tries with different starting points and tolerances to select the desirred area with it. The other major method for representing shape on a computer is vector format: shapes are defined as mathematical curves and can be enlarged, shrunk, stretched trimmed with type or used as cookie-cutters at will. Flash and Illustrator use this representation. Its weaknesses are in representing fine patterns and gradients (i.e. many photos, particularly of people, animals, vegetation, etc.) and in the fact that almost all screens take rater data, so the computer has to take an extra conversion step to put them on screen. Photoshop uses vector information for paths and type, but as soon as you flatten the image or save it out to a Web format, that information is typically lost.

Colour

Photoshop can represent the colour of pixels in a number of ways. Most common for multimedia work is RGB mode: each pixel has a colour defined by a red value, a green value and a blue value. Zeroes for all these colour elements give black, and 255 is the maximum value, giving white if red, green and blue are all maxed out. RGB colour doesn't mix quite like pigment colour... the least intuitive mix is that red plus green equals yellow. RGB colour is a form of additive colour, the colour is measured in terms of the light hitting the eye. CMYK colour is the dominant representation mode for colour printing (at least on low-cost inkjet printers) and represents colour as a combination of pigments: Cyan, Magenta, Yellow and blacK. CMYK colour is a subtractive way of measuring colour, as it works in terms of the pigments which reduce the reflected light from the printed surface. There is not an exact mapping between all of RGB and all of CMYK. Some colours display in RGB but don't print using CMYK process inks (CMYK colour is also called process colour). These colours (out-of-gamut colours can cause some grief while converting a file from RGB to CMYK, and because most computer screens operate on the RGB model, it means that what you see while editing may not be what the printer will give you (even for in-gamut colours, an uncalibrated screen, changes in lighting and changes in paper can cause these issues). If you are working in black and white, your choices are to use one of the full-colour modes, greyscale, where each pixel has a single lightness value or bitmap, where each pixel is either black or white. Greyscale images are typically much smaller than RGB or CMYK before compression, and bitmaps (not to be confused with Windows bitmap [.bmp] files) are even smaller.

Affecting parts of a Photoshop document

Operations in Photoshop can apply to various areas of the document, and one can define them in a few different ways: by selecting an area, by painting an area, or by affecting the whole document.

Selection operations

One basic way of affecting a Photoshop document is through a selection: an area of the document which is more or less marked off from the rest of the document and is modified by subsequent commands. Many of the menu items (e.g. adjusting levels, applying filters, creating borders and fills) act on the selection if there is one, and the whole document otherwise. One can “build up” a selection by selecting an area (with the lasso, the marquee, the magic wand or through the selection menu), then holding down shift and selecting another area. The two will be joined together. One can also “carve back” a selection by holding option on subsequent selection operations, and switch between the two. Another viable tactic, particularly if one wants to select most of the document but not one or more small points, is to select the areas one wants to ignore, and then choose invert from the select menu. Last, the “quick mask” mode (toggled at the bottom of the tool palette) allows you to “paint” a selection.

Painted operations

In addition to the pencil, eraser and brushes, Photoshop provides “paintable” effect tools to smudge, soften, sharpen, lighten, darken, copy/paste and otherwise modify the image. Each one of those tools has a “brush” which can be picked from or customized in the brushes palette. If there is a selection, brushed operations only work within the selected area: the rest is “masked”, allowing you to brush right out to the edges of a selection without worrying about spilling out.

Document-wide operations

Selection operations lacking a selection often affect the whole document, but there are also operations that always have global effect. Many of these affect the way the document is stored: changing the colour space, changing the resolution, flattening the image (merging all the layers into one layer containing their combined image). To change these qualities in smaller areas of an image, the image would need some resolution-independent way of locating these areas. This is something better done in a vector format that allows placing of raster images (such as PostScript, Flash, Illustrator or Quark) rather than in Photoshop.

Features and gimmicks

Two big selling points for Photoshop are its support for multiple layers of varying transparency and its support for built-in and outside filters.

Layers

As discussed above, Photoshop has the ability to composite layers of graphics together. In simple cases, this may be as simple as two layers of photographic content, perhaps a few foreground objects cut out from their original pictures sitting on a background layer. More complicated for Photoshop to deal with is the (relatively new) text layer, which contains text stored as text (it is also possible and common to use “pictures of text” — the trade-off being the difficulty of removing typos from pictures, especially if they are carelessly painted into a layer as opposed to put on their own layer), editable, movable and resizable. The rendering part of the software converts the text to pixels at the last minute, and so you can see the image as a whole. Adobe has introduced other special layer types, including "effects layers" which allow one to experiment with various filters and adjustments in a mix-and-match, non-commital way.

Layers have quite a few benefits. First is the ability to keep source images more or less intact on a layer so as to make adjustments easier. Replacing images becomes much easier as well... one can, for example, do English and French text for the same image in different layers and export the two versions of the document separately, or easily mat in holiday decorations (e.g. painted eggs for Easter or tropical birds and speckled moths for Darwin Day) for special holiday editions of a page. Temporary hiding of layers can also make detail work on specific elements easier. Layers reflect a growing trend in computer science and interface design to compartmentalizing a job to keep separate parts safe from each other and allow more decisions to be revisable at a later date.

GIF, JPEG, TIFF and other file formats don't support layers — when Photoshop writes to these formats, it has to “flatten” the images, leaving only the rendered image and not the layer information. Result: you will probably want to save a Photoshop (.psd) file and one or more distribution files (GIF, JPEG, TIFF, etc.) for each image you create. Why not just distribute Photoshop-native files? It's a bit impractical because they're huge compared to other formats, and Photoshop is a rather expensive piece of commercial software with no free reader. Law-abiding citizens without lots of cash to spare won't be able to view your work. There are free viewers either built into browsers or easily available (e.g. QuickTime, GraphicConverter, Preview, etc. on the Mac) that deal nicely with the other file formats.

Filters

Filters in Photoshop apply a change to the selection. They range from blurring, sharpening and adding noise to (often clichéed) stylizations, lens flares, edge finders and distorting the image in a variety of ways. If Adobe hasn't supplied a specific filter, their plug-in architecture allows other developers to write (and sell) filters that serve specific needs: from the loud, technically impressive but often obnoxious filters in the “Eye Candy” tradition to solid rendering, colour vetting and antiquing filters, there exist piles and piles of add-ons, most of which wind up in the Filters menu. Before you use too many of them, take a tour of personal sites, look at what has been done to otherwise nice design by these tools (and the nice design that has been done without them) and then be very selective about when and if you use filters. I've found one can get by quite nicely with the blurs, sharpens and unsharp mask, plus Gaussian noise and offsets. Much more time should be spent getting the colours, crop and contrast just right than in the filters menu (in my humble opinion).

That's it for the Kohl's notes on Photoshop. Experiment, get the feel for the brushes, learn to love colour histograms and come to the workshops for some of the less threoretical stuff.

Don't forget to view source!

week 3: [X]HTML (and CSS) part I

This week's content is rather long... but much of it is intended as reference material for the rest of the year. Get a sense of where we are in HTML history, get a rough idea of how it all fits together. Take a quick look at the page source: every element in the page source is discussed in the definitions below. We'll do some practical examples this week.

What does HTML do?

HTML is a markup language: at its basic level it adds meaning to text in ways that conventional [English | French | Inuttitut | Serbo-Croat | etc. ] grammar and style don't add easily. HTML is not a document language like PostScript or an encoding like Photoshop... you can open up an HTML file and often you will see mostly human-readable text with a few markup tags.

Content delivery

HTML is about defining the parts of a text in a universally readable way. It provides a standard for indicating headings, paragraphs, emphasis, various types of citation, listing and tables... and perhaps more importantly, the ability to place other types of data inline or as links.

Formatting language

As a consequence of having defined these parts of a text, the browser is supposed to render these headings, paragraphs, lists, etc. as best it can and in a way that makes sense in the particular context in which it is running. HTML was originally designed as a set of tags indicating structural parts of text (with a bias towards academic and technical documents), with some suggestions on how browsers might display them.

What actually happened was a bit different. Browser implementors used a strange blend of uniformity and one-upmanship in their design: basic elements (headings, lists, paragraphs) all rendered more or less consistently. The leader (Netscape at first, then Microsoft) would then add proprietary tags and try to convince page developers to use them (often wrecking the page for browsers without the proprietary tags built in). Print designers (e.g. David Siegel) came in and wanted dictatorial control over layout and typography, and they got it by hacking tables, inserting invisible graphics and generally brutalizing the language. David Siegel eventually and famously admitted: “The Web is dead and I killed it” with reference to his hacks and tricks written up in Killer Web Design. He has repented a bit, but a search for Siegel turns up lots of killer design and no apology (that page must've been taken down). The HTML code for these shimmed table layouts became increasingly painful for mere humans to create, read and adjust. Reliance on expensive and inefficient web design tools soared. Accessible sites became a rarity. Fortunately, we don't have to live in this era.

Folks, first at the standards organizations and eventually at the browser manufacturers, recognized that HTML was getting badly bloated and drifting further and further from its original purpose (and its persistent basic design). Modern HTML is an attempt to write the content and its structure in individual documents using a simple rule set, and keep formatting information in a separate file that can be referred to by many different texts (thus reducing the work involved in revising the look and feel of a site as well as making it easier to dig out the content in a given HTML file). The formatting information was to be written up in a specialized, fairly intuitive language called CSS.

The way a modern HTML document works is as follows: the document itself defines a text with parts: headings, lists, paragraphs, links, etc. It also includes or refers to some Cascading Style Sheet (CSS) code. The CSS tells the browser (also called a user agent as it represents the human user to the machines out on the Internet and [usually] does the user's bidding there) how to format the text. One nice thing is that the CSS can request one way of formatting for paper, another for cell phones, another for a screen reader (a program people with vision problems can use to read text on a computer) and still another for the regular old consumer model web browser. If a given page doesn't have the CSS for a given display/rendering technology (or if the user has their own preferences as to how to display the page), then the user or the browser may have their own CSS to deal with standard tags... as long as the document is in well-formed, recognizable HTML (or XHTML).

Web glue

The Web isn't just about stretches of linear text. It is also teeming with other types of data: images, sound, video, downloadable programs, page layouts (in PDF, PostScript, T_EX, etc.). For the most part, this data is held together by, searched from and tied into HTML. HTML files also link between each other, creating the web/network aspect of the Web. All these functions are accomplished in fairly similar ways: the HTML code supplies a URI address to the item (also known as a URL) and a few hints on what to do with it: open it up in a current or new window, display it on the page (including hints as to where and how big), use it in case some other content doesn't work, play it in the background or hold it in memory for later use by a script. Thus although all the graphics and multimedia on a Web page are stored in some format other than HTML, they are placed and managed in accordance with HTML instructions. HTML is flexible enough to be adapted to new image and file formats without changing its basic grammar, and has several ways to embed instructions in other languages (e.g. JavaScript, CSS, XML) in itself.

Flavours of HTML

Finding an HTML 1.0-compliant document out on the Web is not an easy task. Many things that are taken for granted (e.g. control over colour, images, tables, fonts and layout) were added later (and often added a few times, differently, by different people).

HTML 2.0 + Netscape extenstions

This was the version with which HTML became popular around 1994–95. Netscape Navigator 1.0 and 1.1 pushed the envelope on a pretty dry, academic markup language with coloured text (sort of), background colours and images, tables, inline images (that's images on the same screen as text) and the soon-to-be-loathed BLINKING TEXT (I can't even demonstrate this on most browsers now... it's been so thoroughly killed by browser manufacturers since).

Many people still use a variant on HTML 2.0 when creating small personal pages... it doesn't require a whole lot of training, it works everywhere (even on very old computers) and the code still looks pretty much like English.

Microsoft and Netscape HTML

Through the 2.0 through 4.x browsers from both companies, Netscape and Microsoft added one or two new, non-backwards-compatible technologies to each major version: Layers (now pretty dead), frames (still in fairly wide use), point-casting (now very dead), early stabs at stylesheets (now more or less standardized), weird object IDs (unfortunately not going away), table cell colouring, the (troublesome) FONT tag, inline frames (never caught on), assorted variants on JavaScript, the MARQUEE tag (which creates scrolling banner text) and a 1500% increase in browser file size mark this period. Designers had two choices: design for specific browsers (shutting out others) or do mental and coding contortions to try and satisfy four or five divergent “standards”. As Internet Explorer started to pull ahead in market share, more and more designers pulled the plug on other browsers.

HTML 3.0

HTML 3.0 was the first attempt by the W3C to try to bring HTML back on track by codifying the private changes made to HTML 2.0, working them into a consistent framework and adding lots of new, non-academic types of content as well as a first stab at internationalization. It was a very nice standard but nobody built it into their browsers. It died a slow, mostly unnoticed death on the drawing board. Standardized recognition of stylesheets was suggested in this version.

HTML 3.2

HTML 3.2 made more concessions to current practice and basically represented a snapshot of what designers could get away with using on all recent browsers. HTML 3.2 was thus adopted before it was written and codified something which everyone could use on the Web. Microsoft and Netscape HTML continued to churn out new developments around HTML 3.2.

HTML 4.0.x

HTML 4.0 was a housecleaning revision at the W3C: Dave Raggett and company deprecated all of the non-structural markup (but left it available for backwards compatibility), added lots of hooks for CSS instructions, tried to standardize where scripts could go, filled in missing bits and pieces, did tons of work on internationalization (language indicators, right-to-left and bidirectional layout). Browser makers actually brought their browsers mostly in line with the suggestions in HTML 4.0 by the time their Netscape and IE released their 4.0 versions. CSS support was still hovering between flaky and non-existent.

XHMTL (hallelujah!)

XHTML is an attempt to take HTML 4.0's grammatical quirks and idiosyncracies out and rewrite HTML in XML, which closely resembles HTML but is specifically designed to be general, extensible and have a consistent set of grammatical rules. XHTML almost demands CSS in its 1.0 version (but keeps some old HTML visual stuff in its “transitional” mode) and specifically demands it (banning the mixing of format and content) in the 1.1 version. We'll be using 1.0 for now as 1.1 support is pretty thin and 1.1 compatibility with older browsers leaves some things to be desired.

Good HTML citizenship

Points I'd like to emphasize about the prosocial use of HTML

Don't kill the Web

The balkanization and fragmentation of HTML plus reliance on (often buggy) behaviour of specific browsers was a major hinderance to the browser developers getting any useful improvements made for years. The increasing difficulty of putting together pages was a consequence of that stalling process. By accepting the mutability of HTML as a markup language and by calling in more appropriate measures (PDF, PostScript, Flash, etc.) when dictatorial control of layout and type is required, a designer can protect their pages against obsolescence and let browser writers get on with improving their code at the same time.

Be nice to disabled people and execs with browser-phones

The simplest way to make your page accessible to people whose user agents don't do graphics is to make sure your documents read sensibly from the beginning of your HTML code to the end: your text and illustrations should be placed in a logical order before you mark them up, and your text should make sense without your graphics (you can and should help this along by using ALT tags — text descriptions or replacements which you include with each meaningful image). A further test of this is to load your page into Lynx (see week 1) and see if it still makes sense.

Things to expect in an HTML document

Tags!

I've now referred to tags a few times without defining them. Tags are bits of text that tell the browser where something starts, exists or ends. The end user never sees tags, they only see their effects. Each tag is enclosed in <chevrons> (also known as less-than and greater-than symbols or angle brackets). Slashes (/) indicate what general class of tag we have: no slash at the beginning or the end indicates a start tag. For example, <H1> indicates the beginning of a top-level (large, important) heading. A slash at the beginning of a tag indicates a end tag. For example, </div> indicates the end of a division in a document. A slash at the end of a tag (new in XHTML — before, start and empty tags were structurally the same) indicates an empty tag, something that causes a once-off effect, such as a line break: <br />. Notice the space between the br and the slash. This space is important.

Start and empty tags may also have attributes and values. These define additional details about the tag. For instance, the tag <a href="http://www.metawidget.net/coms490"> is an a (anchor, menaing link) tag with a href (hyperlink reference) attribute set to the value “http://www.metawidget.net/coms490” (the URI for this page). Another common attribute-value pair is found in the image tag: <img src="../images/foo_baz.gif" height="32" width="55" alt="[Foo-baz campaign participant]"/> is an img (image) tag with a src (source) attribute set to the value “../images/foo_baz.gif” (meaning go up to the directory containing this one, open the “images” folder, then get the “foo_baz.gif” and display it), the height attribute set to 32 pixels and the width attribute set to 55 pixels (these two alert the browser to save a box 32 pixels high and 55 pixels wide for when the image downloads). The alt (alternate text) attribute's value is displayed in the place of the image if the image isn't downloaded: this tag is not optional under the standard and is really helpful for people without graphical browsers. Images don't contain or format other things, so the tag is an empty one, indicated by the ending slash.

Tags can be placed one within the other (e.g. <p>This is a <em>paragraph</em> -- p means paragraph, em means emphasize.</p>) but crossing them so they intersect (e.g. <em>bad <strong>markup</em> here</strong>) is illegal and may cause unexpected things to happen to your page.

You can learn a lot of tags by watching the code pane in Dreamweaver or viewing other people's source code on the Web, but the definitive dictionaries of tags are the W3C standards (see week 1) for HTML and XHTML. Certain tags, such as html, head, title and body, occur in every well-formed HTML document (the browser does its best to infer them if they're not there). Most of the other tags are used to add specific invisible information to the head, or organize the visible parts of the web page in the body.

Character entities

You may be wondering how I can display chevrons without the browser turning them into tags. HTML, like most mixes of content and instructions, has a way of escaping characters in content that might be confused for instructions or aren't available in the character set used. HTML uses an ampersand followed by the character name and a semicolon. The left chevron (a less-than symbol to the type designers) is hence <. The ampersand (& - which I needed to code that last bit) is &.

There is a pretty complete list of character entities in the W3C standards, but here are some frequently-used ones:

selected character entities
Display	Description	Entity
escapes for markup characters
&	ampersand	&
<	less-than or left chevron	<
>	greater-than or right chevron	>
selected accents
á	a with acute accent	á
Á	A with acute accent	Á
â	a with circumflex accent	â
Â	A with circumflex accent	Â
à	a with grave accent	à
À	A with grave accent	À
ç	c with cedilla	ç
Ç	C with cedilla	Ç
é	e with acute accent	é
É	E with acute accent	É
ê	e with circumflex accent	ê
Ê	E with circumflex accent	Ê
è	e with grave accent	è
È	E with grave accent	È
ë	e with umlaut accent	ë
Ë	E with umlaut accent	Ë
í	i with acute accent	í
Í	I with acute accent	Í
î	i with circumflex accent	î
Î	I with circumflex accent	Î
ì	i with grave accent	ì
Ì	I with grave accent	Ì
ï	i with umlaut accent	ï
Ï	I with umlaut accent	Ï
ñ	n with tilde accent	ñ
Ñ	N with tilde accent	Ñ
ô	o with circumflex accent	ô
Ô	O with circumflex accent	Ô
ö	o with umlaut accent	ö
Ö	O with umlaut accent	Ö
û	u with circumflex accent	û
Û	U with circumflex accent	Û
ü	u with umlaut accent	ü
Ü	U with umlaut accent	Ü
typography
	non-breaking space: great for joining together numbers, e.g. 7 890, so they don't break between lines
–	typographer's n dash — used for number ranges, trips, etc., e.g. 7 890 – 9 876; Montréal–Lachute	–
—	typographer's m dash — used for abrupt changes in thought	—
“	left curly (double) quotes	“
”	right curly (double) quotes	”
‘	left curly (single) quote	‘
’	right curly (single) quote	’
‹	left guillemet (single angle quote)	&lsaquo;
›	right guillemet (single angle quote)	&rsaquo;
©	copyright symbol	©

The DTD: Information on the HTML flavour

The first element in a well-formed HTML document is a document type definition (sometimes preceded by an XML schema). The DTD indicates to the browser which standard the document follows, and which variant on that standard. The XML schema (XHTML only) indicates which character set to use and which version of XML the document falls under.

These elements are not tags per se, although they look pretty similar. Whichever standard you follow, copy a DTD that will give the browser a clue what you're up to. This document uses a flexible interpretation of XHTML which allows for some hinting to older browsers:

<?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

If you're hand-coding XHTML, this is a pretty good schema-DTD pair to use. Dreamweaver will automatically choose a conforming DTD for its output. In the XML schema, iso-8859-1 is “Latin-1”, a standard character set that works well in English and other Western European languages.

Leaving out the DTD will generally prompt modern browsers to go into “bugwards compatibility” mode — they will revive all kinds of bugs which designers may have used in hacks in the past. It is probably not a good idea for a fresh, standards-compliant document to revive these bugs for its display, so include a valid DTD!

The HEAD: Information for search engines, recurring scripts, style and windowing

The head element takes care of information that is largely invisible in the browser pane, either because it gives information to non-human agents such as browsers, web servers and search engines. It makes itself felt to the user first by defining the title that appears in the title bar of the window, and then by providing formatting rules for CSS-based pages. It may also contain scripts waiting to spring into action, character set and server instructions and link tags to document how it relates to other documents (link tags which are generally ignored by browsers, unfortunately).

After the DTD, every well-formed HTML document has a head element (an element is a stretch between a start tag and its corresponding end tag; an empty element is the thing in the browser's internal model of the page defined by an empty tag). Minimally, this element contains a title element, whose text appears in the window title bar: <title>Put title here</title>. It's also a good idea to include a meta tag (a multipurpose empty tag that contains information about the document) reiterating the character set information from the DTD: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />. If the page uses a stylesheet in a separate file (which saves you more work than other ways of attaching CSS), the head will contain a link tag to the stylesheet, e.g. <link rel="stylesheet" type="text/css" href="my_stylesheet.css" />. Alternately (or additionally), there may be a pair of style tags (with the opening tag indicating the type of stylesheet) containing in-file style information, e.g. <style type="text/css">BODY {font-family: "verdana", "monaco", "mishawaka", sans-serif; color: #000000; background-color: #e8e8ff;}</style>. The head element can also hold script tags, which usually contain JavaScript functions to be run from elsewhere on the page.

The head element is automatically created by Dreamweaver, given a page title and any changes you make to the page's appearance that need accounting for in the head. Dreamweaver's head element will generally include a script tag with a bunch of functions with names staring with MM_ — these are functions that Dreamweaver uses to run its “actions” and in some cases provide bugwards compatibility with various browsers.

The BODY: Content for the browser pane

Everything you see between the toolbars at the top of the window and the status bar at the bottom is provided and logically structured in the body element. Text, links, embedded images and sound: these are the meat of the body element. “Bare text” can exist in the body element, but more control and interoperability can be had by making sure all text is in more specific elements: paragraphs, headings, lists, tables, addresses, objects, etc.

In modern HTML, the body should be almost entirely content and its logical labels. The core language provides a stack of logical tags, a selection of which appear below (block and inline will be explained later):

h1,h2,h3,h4,h5,h6: Headings are titles, section heads, and headlines. Lower numbers (h1, h2) are more important and by default render bigger and bolder. Make the most important heading on a site an h1 and then step down one number at a time for each level of headings (h2 for sections, h3 for subsections, etc.) — the actual look of the headings can be customized using CSS. Headings are block elements.
p: Paragraphs should be used as paragraphs, and (although empty paragraph tags are often incorrectly used as line breaks) should use a start tag and an end tag to enclose a block of text. Paragraphs are block elements.
br: Breaks force a new line. HTML ignores new lines and whitespace by default, so these are the way to break a line without any side effects. Breaks are empty tags and are inline elements.
ul, ol, menu, dir: Lists contain list elements and display them by default in a numbered (ol — ordered list), bulleted (ul, dir and menu — all roughly equivalent: unordered list, directory and menu) or glossary (dl — definition list) format. Except for definition lists, list items are contained in li tags. Definition lists use two types of list items: dt definition term tags and dd definition definition tags. Lists can be nested: a whole list (or other block or inline element) can be an item in a larger list. Lists are block elements.
div: Divisions are abstract blocks. They can be used to work with large blocks of markup to align them, style them or organize them thematically. They really come into their own with the class attribute and CSS.
pre: Preformatted text allows one to drop in typewriter-style text, with spaces, returns and tabs (although tabs may cause unpredictable results) left intact. From ASCII art to output from old mainframe programs, preformatted text is a block-level element with a variety of uses.
hr: Horizontal rules are those familiar horizontal "grooves" across the page (their appearance can be modified using CSS). They count as block-level elements.
blockquote: Block quotations hold blocks of quoted material and usually render indented from both margins. They shouldn't be relied upon to indent text (use CSS for that). They are block elements.
address: Address blocks mark off contact, dateline and author information. By default they render in italic type.
table, form: Tables and forms contain special-purpose markup to hold table data and form fields and widgets. Tables have also been used to mark up columns, toolbars and sidebars, and fix “pixel perfect” positions — much to the detriment of people with more or less pixels on their screens, and even more to the detriment of those using speech or braille browsers. CSS can define a visual layout for a page without abusing the logical structure of the content which it lays out... and often with less graphical shims, less pixel-counting and less arithmetic (which varies from browser to browser) on the part of the designer. More about tables and forms (which are both block-level elements) later.
a: Anchors are multipurpose tags: they can be either links or mid-page destinations for them. An anchor with a href attribute (which has a URI as a value) is a link. An anchor with a name attribute (which has some text without spaces as a value) is a destination. An address with a pound (#) in it has a name after the pound: loading the address will load the document and then jump within it to the anchor bearing that name. Link anchors can include a title attribute which should contain a long description of the linked item.
span: Spans are the inline equivalent of divisions. Absolutely invisible by default, giving them a class attribute allows arbitrary formatting to be performed on them using CSS.
em, strong: Emphasis and strong emphasis are applied to inline text to, well, emphasize. Emphasis usually renders as italics and strong emphasis usually renders in bold.
code, samp, kbd, var: Source code, sample code, keyboard and variable tags are particularly good for technical and mathematical discussions. The first three typically render in typewriter font and variables render in italic type. They are all inline elements. For blocks of code or computer output, pre is a good choice.
cite: Citation tags are used to mark the title and sometimes the bibliographic information of a work. Titles may inclide a href attribute to point to an online version, and a title attribute to give a long version of the title. Citation elements are inline.
abbr, acronym: Abbreviation and acronym tags both mark off shortened forms of words and phrases. Their title attributes should be used to provide their expanded form. Both elements are inline.
q: Quotes mark off quotations inline, adding quotation marks and allowing for title or href attributes to describe or source the material. Unfortunately, they typically use ugly straight quotes as opposed to “typographer’s quotes”. Some browsers don't quote items in the q tag at all.
dfn: Definition tags indicate inline the defining instance of a word or phrase.
sup,sub: Superscript and subscript tags put enclosed inline text up or down from the main flow of text and at a slightly smaller size.
img: Image tags are empty tags which are replaced by an image (a GIF, JPEG/JFIF or PNG specified by the src attribute) on the page. If the image cannot be displayed for one reason or another, then the element is displayed as its alt text. Even if an image is simply decorative or a shim (not that you should use shims), putting a pair of empty quotes as the value of alt will keep the browser from displaying something like [image] or a broken-image icon should it not be able to load the image. If the image didn't have any narrative or expository value, then your blind, cell-using and Lynx-enabled users who don't do graphics won't miss the image and won't have any replacement text getting in their way. Images may have a map attribute which points to a map element (not described here) elsewhere, turning the image into a clickable image map. Images are inline by default, but CSS or an align attribute can turn them into blocks.
object: Object tags are a generalization of the image tags for a wider variety of data types: they can call images, but also Flash, video, audio files, esoteric graphics formats and (theoretically) other HTML. The exact attributes required depend on the data type. Object elements often contain param tags to provide additional information about the file, and due to inter-browser wrangling, they often contain an embed tag for older Netscape browsers (and strangely enough, IE 5.1 for Mac) which does much the same thing. Object elements can contain other object elements or markup: if the outer object fails to display, the browser should try rendering whatever is inside, going as many levels deep as required. Not all browsers have this straight for text inside object tags, but objects within objects are a good way of providing a backup for a file which requires an esoteric plugin.
applet: Applets are Java programs designed for use on the Web. The syntax of the applet tag is very similar to that of the object tag.

Block and inline elements

The simplest approch to block and inline elements is to note that all block elements begin and with new lines (this doesn't work for floating blocks, but it's a good start). Blocks are rectangular chunks of content with definite tops, bottoms, left edges and right edges. Inline elements can be thought of as strings of content: they can wrap around lines, they can be in the midst of a block with other content all around them, and if they actually are rectangular blocks, that's luck and window size, not a necessary thing. Inline elements can contain other inline elements, but not blocks. Block elements can contain either inline or block elements.

How does this work? This means that, for instance, a block quotation can contain some paragraphs (block), and one paragraph (block) can contain a citation (inline), and that citation can contain an abbreviation (inline). The citation and the abbreviation could contain other inline elements: the citation could have two separate abbreviations in it, or an abbreviation and an anchor (inline)... but none of those inline elements could contain a horizontal rule or another paragraph (both block).

It should be noted that browsers will try to accomodate attempts to wrap inline elements around blocks, but the degree of success is not always great.

How Dreamweaver helps

We will use Macromedia Dreamweaver over the next couple of lessons. Despite some limitations, it has a couple of things going for it:

Syntax guidance

Dreamweaver usually generates correct code. What's more, clicking on an item pops up many of its design items in an inspector window. This can let you get comfortable with the concepts of HTML without spending 75% of your time leafing through printouts of specifications and manuals. Dreamweaver can also read outside code, and provides a measure of validation by highlighting things it doesn't understand. These systems attempt to be in line with current practice at press time (which, for Dreamweaver 4, is a couple of years ago now) and generate functional web pages with a minimum of fuss.

WYSIWYG and quick testing

Dreamweaver's preview pane is pretty close to what you see of a page in the browser, and you can drag, resize and modify things in that pane. Under the hood, Dreamweaver can translate the user's dragging and dropping into a number of coding approaches. Dreamweaver can also throw its working copies of a document to a browser (or some browsers) of your choice for real-world testing.

Quick conversion and tried-and-true hacks

Dreamweaver has an internal model of a page that allows it to switch a layout from tables to CSS to layers and back again. Because Dreamweaver tends towards pixel measurements as soon as the user drags things around, and because pixel-perfection is very difficult to achieve without some hacking by a hand-coder, Dreamweaver has several hacks built in to keep the look of a page exactly consistent across (most) browsers.

How Dreamweaver hinders

Dreamweaver (and all WYSIWYG HTML editors) suffer from being programs that need to anticipate the whims of a huge variety of designers. This means one can wind up with some rather ugly shortcuts taken in the creation of pages.

One size fits all

You press return, you get a break. Repeat, get another. Typing away at a document, the user may have a logical structure in mind, but Dreamweaver doesn't know that and it doesn't ask. Mostly, it keeps to physical markup: markup that defines where something is and what it looks like, rather than what it is. This means that a wide variety of documents can be represented using a small vocabulary of tags. Meaning suffers at this point -- particularly for non-graphical users and (more importantly in many cases) the lexical analysis done by search engines to estimate what your page is about and how useful it is. Dreamweaver does have the full repertoire of logical markup tags, but it makes the physical ones so much more convenient. It's easy to forget they're there.

Code bloat and rigid pixels

Also as a consequence of the limited tag vocabulary, plus the assumption that the designer is always looking for a pixel-perfect layout, is that Dreamweaver can use a lot of code to say something that would seem very simple. This makes your HTML files slower to download, more demanding to render and harder to debug by hand, and it also relies on assumptions on the size of the recieving browser window to work: too big and it looks strange, too small and the user has to scroll left and right as well as up and down.

A working knowledge of what is going on underneath can help you minimize these penalties, and it can also help you rework Dreamweaver files for publication. Dreamweaver is especially good at dealing with tedious work like creating image maps and inserting correct dimensions for images and classids for Flash files, and makes it a valuable tool — along with a good text editor — for Web design. Dreamweaver tacitly acknowledges this by including support for switching off to BBEdit, an industrial-strength text editor (with a free “lite” version).

Mixing hand-coding and Dreamweaver

Depending on your confidence level with coding and the type of work you're doing, you may find Dreamweaver’s weak points tiresome after a while. You may prefer to start hand-coding your sites. Or you may have a particularly high-traffic site that needs to be optimized within an inch of its life (or a site with legal and/or audience requirements for particularly accessible code). Going cold turkey may be a little intimidating... but there are features of Dreamweaver and ways of using it that can get you launched into hand coding and familiar with the inner workings of HTML.

One feature of Dreamweaver that lends itself to the transition is the code pane. It gives you a split view of your page: on top is your code, colour-coded by function. On the bottom is the familiar drag-and-drop Dreamweaver WYSIWYG interface. By watching what happens to the code as you work on the bottom, and watching the WYSIWYG change as you work up top (you may need to click the refresh button to see changes down below), you can get a feel for how the two relate.

Another way of working is to start a file in Dreamweaver — let it deal with the DTD, drop in Flash files and images, do image maps, etc. — and then type in the text, tag it, and do the CSS formatting in your preferred text editor. You'll get used to seeing most of the Dreamweaver bookkeeping and you'll be able to concentrate on developing your sense of document structure. Try deleting Dreamweaver stuff that seems irrelevant (make backups), tweaking measurements, removing fixed pixel values where you want something to be “stretchy” (stretchiness refers to the quality of a page that lets it nicely fill the window, be it 600 pixels across or 1100, usually achieved by leaving the width of some elements unspecified or by setting all the widths in percentages)... gradually you'll be exercising more and more direct control over your code, making it cleaner, easier to maintain, closer to the spirit of the language and more compact than any machine-generated code will be for a while.

Another way to make hand-coding easier is to save a few starting points for files, for instance a file with a valid DTD, a fill-in-the-blanks head and a blank body ready to start typing. A similar CSS starter file (some people suggest starting with a file which duplicates the usual browser way of formatting and then changing and adding parts to reflect your customizations — the W3C standard for CSS [see week 1] includes one such stylesheet) will save you a lot of typing and contain code to jog your memory as you work, often saving you time spent leafing through standards. This way, you do your research once, then use it for all your projects (making changes or spinning off specialized variants as you learn more about the technology, the technology changes and you feel your own work patterns developing.

Book

conflict of interest note:: I do get a small commission if you purchase the book through the following link. Camelot Books (Phillips Square, Metro McGill) has it too... shop around a bit...

Moock, Colin. ActionScript: the Definitive Guide. Sebastopol, California: O’Reilly and Associates, 2001.: Moock’s companion web site (with errata, sample files and goodies) is at http://www.moock.org. His site is worth looking at as a site as well as as a reference.