Markup for indexing, not for formatting

(I am “reposting” this because I found a great link, near the bottom, and because I don’t think this got enough air-time the first time I posted it.)

“Every serious book of nonfiction should have an index if it is to achieve its maximum usefulness. A good index records every pertinent statement made within the body of the text. The key word here is pertinent.” (18.1, p. 512, The Chicago Manual of Style 1982)

A web page is a document and a web site, a book. Books go through a process of editing: several drafts are written before the final draft, and several proofs may be produced before the book is sent to the press for mass reproduction. At one point in the process, an index is created to facilitate finding the information a reader is looking for.

In brief, the Indexer (the person creating the index) makes annotations on a proof of the book about what should be indexed and how. They even include a “header” section with data about the context of the document, or general document-wide indexing information (page, chapter, date of publication, etc.).

Here is a sample of how books are marked up for indexing (The Chicago Manual of Style 1982):

Sample of markup for indexing

The figures, lines, colons, and other annotations all have very specific meaning, but that is not what concerns us here. The point is some effort is made to objectively indicate what pieces of the text are considered important and how they relate to each other. This is a standard practice in manuscript and book publishing. If you want to maximize your web site’s usefulness, it should be a standard practice in web site publishing for any serious, nonfiction site.

What Is HTML?

The original intention of HTML was to allow scientists to share academic articles with some sort of search and retrieval capabilities to aid in finding articles that referenced their work, among other things. To accomplish this, web page documents are annotated in a fashion similar to the sample shown above. The annotations are known as “markup” from the verb “to mark up” and look similar to this: [title]Chocolate Chip Cookies in 10 Minutes[/title].

Hyper Text Markup Language is the term used to refer to this type of markup (“hyper text” = text that links to other text, a third dimension to a linear experience). The goal of HTML is to aid the retrieval of manuscripts from immense information systems, but historically HTML has been (ab)used primarily to aid the presentation (formatting) of a document. HTML optimized for indexing by the search engines is hard to produce. Most “modern” web pages are either missing such markup or the markup is inaccurate, which may be worse. A common example of inaccurate markup are web sites that have the same title for every page.

Many HTML editors provide the ability to include some of this markup in your documents. Markup that is strictly for formatting may make indexing more difficult and should be implemented via other means, typically Cascading Style Sheets (CSS).

A good document index communicates the internal document structure to the search engines. Specifically, it communicates which parts are titles, subtitles, word emphases, citations, diagrams, tables, etc. This is where it gets difficult: How do you know what sections need to be identified and how to identify them? What Mark Up do you use to tell the search engines: “This is a main menu.”? There are many guides on how to do this on the internet and the answers are beyond the scope of this blog post. I will say, however, that technologies such as Flash, Java, and JavaScript are unable to be used as markup and should be avoided if you hope to have your web site properly indexed by the search engines.

The Bottom Line

If you facilitate indexing by the search engines, there is a higher likelihood they will get it right and return your page as the first result for exactly the right query. This can be accomplished with little or no negative drawbacks on the design (if you have the right tools, some training, and are motivated). Furthermore, this is likely going to make your potential clients very happy: the first result was exactly the answer they were looking for and the design led them directly to the answer. This, to me, is the definition of a web page optimized for search engines: your page is the first result for specific queries and the design makes the answer clear.

I would estimate 90% of all web sites would benefit from this kind of web site optimization. Although at that point, they would all be equally optimized rather than equally un-optimized and the benefit could be nil.

Bruce Clay has an excellent description of the search engine optimization process.

Bibliography:

The University of Chicago. The University of Chicago Press, Chicago 60637. 1982.
The Chicago Manual of Style. Chicago: The University of Chicago Press.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.