Desde verano de 2009 he estado desarrollando la tienda web de Carolan, una tienda de disfraces y otros artículos de fiesta con sede en Las Palmas de Gran Canaria. Ya he perdido el control de las horas totales que he invertido en este proyecto porque se ha convertido en una obsesión por cumplir con las “mejoras prácticas” (best practices).

El pensamiento corriente dice que cuando uno quiere montar una tienda web, coge el sistema de software libre de moda y la monte sin más. La alternativa (desarrollar una tienda a medida, desde cero) seguramente me habría costado el doble, ¿o no? De esto se trata este artículo…

Carolan vende artículos de fiesta y productos de un solo uso. Tienen una gran variedad de artículos como disfraces, copas, servilletas, guirnaldas, decoraciones y mucho más. En realidad tienen más de 20.000 artículos catalogados. Cuando empecé a trabajar con el equipo de Carolan, uno de sus principales deseos era crear una conexión “viva” entre su base de datos y la tienda web para minimizar gastos e incrementar la eficacia. Otro deseo importante era que la web fuera fácil de usar (y los artículos fáciles de encontrar) y dinámica con una portada que variara según la temporada y que la tienda saliera bien en los resultados de los buscadores.

Analizando sus deseos / necesidades determiné que la mejor opción era la adopción de una tienda de software libre. Creía que me ahorraría montón de tiempo pero ahora no lo tengo tan claro y quiero saber qué opinas tú. Aquí resumo las modificaciones que se han tenido que hacer desde que empecé con la instalación de la tienda. Ten en cuenta que no detallo TODAS las modificaciones sino las más gordas / importantes.

URL

Por defecto Zen Cart viene con una opción que te permite hacer las URLs “search engine safe” (legibles a los buscadores). Básicamente convierte
/index.php?main-page=product-info&products-id=1234
en
/index.php/main-page/product-info/products-id/1234
(o algo por el estilo). No está mal pero hoy en día este tipo de transformación no es necesaria.

Lo que sí quería era convertir la URL en algo como /disfraces-y-complementos/disfraz/adulto/mujer/disfraz-abeja-adulto.html para tener palabras claves en la URL. Lo logré con un par de modificaciones al código base de Zen Cart y algunas transformaciones via mod_rewrite. Me ha costado perfeccionar el método pero de momento parece estar funcionando más o menos bien.

Etiqueta Title

El valor de la etiqueta Title por defecto de Zen Cart no se presta a una buena indexación por parte de los buscadores. Tuve que modificarla unos cuanto sitios para que fuera siempre único (no repetido) ya que valores repetidos minimiza la usabilidad.

Optimización de Velocidad y de Código Fuente

Cada vez más la velocidad de una web determina su ranking dentro de los resultados de una búsqueda. He intentado cumplir con todas las sugerencias de Yahoo! and Google referente mejorar la velocidad de la web. Según mis cálculos, la web empezó cargándose en 7 segundos (sin cache, incluyendo lentitud de la red). Tras estos cambios, las peticiones terminan en menos de 1.5 segundos y para implementar estas sugerencias tuve que tocar:

  • el diseño general: se implementaron sprites, se combinaron hojas de estilo y ficheros .js
  • el diseño del perfil de un artículo (minimicé la cantidad de HTML generado por la tienda)
  • el diseño del buscador (se mejoraron el diseño de los enlaces a las páginas siguientes, se modificó la presentación de resultados, se modificó la etiqueta Title de los resultados para incluir los nombres de los productos y no solo el número de la página, se creó un Índice de productos)
  • se modificó el diseño, y el proceso, de “checkout” (pasar por caja)
  • se eliminarón los caracteres “codificados” (á se convirtió en á)
  • se redujo el número de peticiones por página de 40~ a menos de 20
  • se implementó el uso de dominios sin cookies y de una Red Distribuidora de Contenidos (Content Distribution Network – CDN)

En fin, tras todos estos cambios y teniendo en cuenta que sólo utilizan un método de pago, me pregunto si realmente me he ahorrado algo usando zen-cart (lo cual supuso una inversión de tiempo para aprenderlo). ¿Qué opinas tú?

I’ve been experimenting with a couple of tools for creating cross-platform web design. I’m quite happy with the results (which will be used on production sites in the coming weeks). I’m no longer plagued by the woes of differing font sizes, incorrect positioning, CSS hacks, etc. that makes a web developers life misery.

I am using the 960 grid system for managing layout in combination with a blog post on how to get cross browser compatibility every time, a simple list of DOs and DON’Ts when writing the HTML and CSS for the first time.

The combination has been a major time-saver (and FOR ONCE I can have a multi-column design WITHOUT using tables)! I can hardly recommend these two links enough. The only remaining doubts I have are whether to use EMs or pixels for padding and margin sizes. My brief experimentation suggests avoiding setting such values altogether but if they must be set, use pixels.

I’d love to hear about other’s experiences with these tools (and other, similar tools).

Several months ago I moved tedmasterweb.com to a subdomain of clevernet.biz. I expected the PageRank of tedmasterweb.com (4) to flow upstream and increase the PageRank of clevernet.biz (3). For months I waited and waited but nothing seemed to happen. In fact, I appeared to have completely lost the PageRank I had for tedmasterweb.com.

About a month ago I did several things to try and regain my lost PageRank. Specifically, I planned on asking people linking to the old site to update their links to point to the new location (clevernet.biz/tedmasterweb), but in many cases I was unable to find the authors of those links, or the links came from USENET or forum lists and they couldn’t be changed.

I’m not sure which of all of those things was the culprit, but I seem to have regained my PageRank, or at least, I’ve regained a PageRank on clevernet.biz/tedmasterweb (it is now 4).

Over the weekend we moved clevernet.biz to a new hosting provider. I do not expect such a move to impact PageRank since all we are doing is changing the IP address. I do wonder, though, how it will affect searches originating from Spain since the new server is housed in Germany and previously we were hosting our web site ourselves right in the Canary Islands. As of this writing, a search for “salir primero en Google” has us at number 1. This holds true when searching from the Canary Islands or from Chicago. We’ll see how this changes over time…

What I find really interesting, though, is that my tedmasterweb subdirectory has a higher PageRank than my main web site (clevernet.biz). My theory that a higher ranked subdirectory will boost the main domain has been debunked. It will be interesting to see if that ever changes, especially when we publish the new version of my PHP BBEdit Clippings set (which gave me a PageRank of 5 on tedmasterweb.com some years ago).

Around 2001 I started a web site called tedmasterweb.com. It was my professional face on the web with “articles” I’d written and software I’d produced available for download. Some of the software was quite popular and garnered me a PageRank 5 for the site as recently as October 2006.

In the first half of 2007 I decided to try an experiment. Being a partner in Clevernet, it seemed logical to merge the resources on my personal site with those on clevernet.biz. Since merging the content could have been a time-consuming task, I decided to make my entire personal site a subdirectory of clevernet.biz: Basically, tedmasterweb.com would become clevernet.biz/tedmasterweb/.

Tedmasterweb.com had a PageRank of 5 prior to the move. Concerned that such a massive change could eliminate my PageRank, I followed Matt Cutt’s guidelines on moving a domain using a 301 redirect so that search engines (Google) would know that the content had moved, permanently, to a new location (and in this case, a new domain). I had also read other articles regarding 301 redirects and tried to follow their advice as well. I have since discovered a similar, but more concrete, 301 redirect experiment. The primary difference, though, between all of these experiments and suggestions is that I am not just moving content to a new domain, but also to a subdirectory, which may have been my downfall.

Today, several months later, it would seem that my PageRank of 5 for tedmasterweb.com has been completely lost. Just issuing an appropriate redirect, in my case, was not enough to maintain it. I suspect that if I could get all the people linking to resources on the old page to update their links to point to the new location I would regain some, or all, of my PageRank, minus the penalty for not updating the content regularly and for the links being somewhat old. As soon as I post this article, I’m going to go after those old links. My guess is my PageRank will suddenly improve.

Bottom Line: migrating a domain with a decent page rank to a subdirectory of another domain is a bad idea and in the end, inbound links really are key to a high PageRank.

Roger Johansson over at 456 Berea Street, reflecting on a series of articles by John Allsopp regarding HTML semantics, asks the question: “Should there be another way of extending and improving the semantics of HTML without requiring the specification to be updated?”

Personally, I think the issue revolves around the misuse of HTML to mark up something other than research papers.

It is my understanding that HTML is a subset of SGML, a markup language used to mark up research papers for mass reproduction on offset printers. As such, the vocabulary (the tags) in HTML reflect the type of data being marked up. Consequently, when HTML is used to mark up documents that are not academic in nature (are not research papers), authors are left cobbling together solutions to retain the semantic value, but that rarely works. For example, if you want to mark up a mathematic equation, you’ll need the MathML specification precisely because HTML doesn’t have the vocabulary necessary for describing the content.

I find it a little ironic that Tim Berners-Lee has basically turned everyone into an academic in some sense, by enabling them to do massive research and post their findings. However, current technology limits us to “browsing” research papers, even though we’ve creatively found ways to publish much, much more than that.

I think the world is missing a browser that is able to render a variety of markup languages (vocabularies), including HTML, MathML, XHTML, XHTML2, XForms, SMIL, and others (although the last 2 are not technically markup languages). I can imagine a world in which marketers define their own markup specification for sharing data (a problem I think microformats are trying to solve) safely. In fact, markup languages can be defined for nearly any field. The problem is, we don’t have web browsers capable of rendering the data in the source documents in any meaningful fashion because no formatting information is associated with any of the elements of these foreign markup languages. In fact, I find it hard to imagine what a marketing database or recipe list would look like if not some kind of document.

So, in conclusion, I’m not sure if I’ve made my point, but basically I think any semantic improvements in HTML will come from focusing on the domain it was originally intended for (academia) than by trying to extend it to other domains that have little or nothing to do with writing research papers.

“Every serious book of nonfiction should have an index if it is to achieve its maximum usefulness. A good index records every pertinent statement made within the body of the text. The key word here is pertinent.” (18.1, p. 512, The Chicago Manual of Style 1982)

A web page is a document and a web site, a book. Books go through a process of editing: several drafts are written before the final draft, and several proofs may be produced before the book is sent to the press for mass reproduction. At one point in the process, an index is created to facilitate finding the information a reader is looking for.

In brief, the Indexer (the person creating the index) makes annotations on a proof of the book about what should be indexed and how. They even include a “header” section with data about the context of the document, or general document-wide indexing information (page, chapter, date of publication, etc.).

Here is a sample of how books are marked up for indexing (The Chicago Manual of Style 1982):

Sample of markup for indexing

The figures, lines, colons, and other annotations all have very specific meaning, but that is not what concerns us here. The point is some effort is made to objectively indicate what pieces of the text are considered important and how they relate to each other. This is a standard practice in manuscript and book publishing. If you want to maximize your web site’s usefulness, it should be a standard practice in web site publishing for any serious, nonfiction site.

What Is HTML?

The original intention of HTML was to allow scientists to share academic articles with some sort of search and retrieval capabilities to aid in finding articles that referenced their work, among other things. To accomplish this, web page documents are annotated in a fashion similar to the sample shown above. The annotations are known as “markup” from the verb “to mark up” and look similar to this: [title]Chocolate Chip Cookies in 10 Minutes[/title].

Hyper Text Markup Language is the term used to refer to this type of markup (“hyper text” = text that links to other text, a third dimension to a linear experience). The goal of HTML is to aid the retrieval of manuscripts from immense information systems, but historically HTML has been (ab)used primarily to aid the presentation (formatting) of a document. HTML optimized for indexing by the search engines is hard to produce. Most “modern” web pages are either missing such markup or the markup is inaccurate, which may be worse. A common example of inaccurate markup are web sites that have the same title for every page.

Many HTML editors provide the ability to include some of this markup in your documents. Markup that is strictly for formatting may make indexing more difficult and should be implemented via other means, typically Cascading Style Sheets (CSS).

A good document index communicates the internal document structure to the search engines. Specifically, it communicates which parts are titles, subtitles, word emphases, citations, diagrams, tables, etc. This is where it gets difficult: How do you know what sections need to be identified and how to identify them? What Mark Up do you use to tell the search engines: “This is a main menu.”? There are many guides on how to do this on the internet and the answers are beyond the scope of this blog post. I will say, however, that technologies such as Flash, Java, and JavaScript are unable to be used as markup and should be avoided if you hope to have your web site properly indexed by the search engines.

The Bottom Line

If you facilitate indexing by the search engines, there is a higher likelihood they will get it right and return your page as the first result for exactly the right query. This can be accomplished with little or no negative drawbacks on the design (if you have the right tools, some training, and are motivated). Furthermore, this is likely going to make your potential clients very happy: the first result was exactly the answer they were looking for and the design led them directly to the answer. This, to me, is the definition of a web page optimized for search engines: your page is the first result for specific queries and the design makes the answer clear.

I would estimate 90% of all web sites would benefit from this kind of web site optimization. Although at that point, they would all be equally optimized rather than equally un-optimized and the benefit could be nil.

Bruce Clay has an excellent description of the search engine optimization process.

Bibliography:

The University of Chicago. The University of Chicago Press, Chicago 60637. 1982.
The Chicago Manual of Style. Chicago: The University of Chicago Press.

El otro día alguien me preguntó si realmente conseguíamos clientes a través de Internet. La respuesta es un “sí” rotundo y claro, pregunta siguiente: “¿Y cómo los consigues?”

Una personalidad cibernética

Conseguir clientes por Internet no termina con salir primero en Google. Hace falta tener una personalidad cibernética, por decirlo así. Los visitantes a tu web, tus posibles clientes, necesitan alguna manera de comprobar que eres quién dices que eres. Necesitan comprobar tu personalidad antes de iniciar un contacto contigo, y hay varias maneras de crear una personalidad cibernética:

  • Crear una web personal
  • Llevar un blog (ya lo estás leyendo)
  • Participar en foros y listas de distribución (LISTSERVs)
  • “Donar” contenidos a otros webs (siempre que aparezca tu nombre)
  • Lograr que tus datos aparecen en las webs de otros (como referencia, siempre que hablen bien de ti, por supuesto)
  • Publicar artículos que demuestran todo lo que sabes y en qué campos te especializas

En mi caso cuando alguien me quiere “investigar” metiendo mi nombre en Google, aparecen muchos resultados y pueden llegar a tener una idea sobre quién soy y si concuerda con lo que digo en mi web personal. De esta manera el posible cliente tendrá una cierta confianza, lo suficiente como para ponerse en contacto con nosotros y basado en la interacción posterior, contratarnos (o no).

No pretendo que todos los que se ponen en contacto con nosotros se conviertan en clientes, ni que nos estén llegando correos todos los días, pero sí llegan y con la frecuencia suficiente como para mantenernos vivos. Así que cuando otros hablan de la dedicación que hay que tener para salir primero en los buscadores, yo iría más allá y hablaría de la dedicación que hay que tener para tener una personalidad cibernética y de confianza.

Por cierto, lo publiqué el otro día pero aprovecho para repetirlo ya que tuvimos un pequeño problema con la página… Publicamos nuestro proceso de desarrollo de aplicaciones web que, aunque no vayas a trabajar con nosotros, podría serte de interés.

Taking a break from writing and posting my own articles, I’d like to direct your attention to a post about the real meaning of Search Engine Optimization (SEO) and a great example of the kind of smarmy people trying their best to steal your money. Would anybody ever really buy anything from the guy in the video? I almost feel sorry for him, almost. The bottom line is, SEO really just means good web design (something I’ve been trying to communicate ever since I arrived here).

People visiting your site can easily download or copy your copyrighted images without your permission using four different methods: hotlinkingright clickingscreen shots, and web archiving.

What follows is a description of each method including:

  • counter measures for inhibiting unauthorized copying
  • methods for circumventing the counter measures
  • potential incompatibilities / issues of implementing a counter measure
  • some of the system requirements for each counter measure.

Hotlinking

Description
Hot Linking is when someone sets the SRC attribute of an IMG tag to an image on your web site. Besides making it look like your images are on their web page, they also consume some of your allotted bandwidth.The following is an example of the HTML one would use to display an image from susansexton.com on their own web site.

<img src="http://www.susansexton.com/images/horse1.jpg" alt="" />

Counter Measures
Limit display of images to users who are visiting your site (your domain). This is accomplished by implementing two server-side technologies: Sessions and custom File Handler rules.

Sessions
When a user first visits your site, they are sent a cookie that uniquely identifies them. Before each image is sent to the browser, the system checks to see if the cookie has been set. If it has, the image is sent. If it hasn’t, no image is sent. This method requires users to view your images in the context of a web page on the first visit. From that point on, they can, theoretically, access the images directly by entering the image’s SRC attribute in their browser’s Location bar.
Custom File Handler Rules
This method can be modified so that images can only be viewed in the context of a web page. This is accomplished by telling the web server to handle image files using a custom script. The custom script looks for a unique key in the Session variable. If the key exists, the image is served. If not, it’s not.In practice, we set the key at the start of the page. The server then sends the image and destroys the key immediately after the image has been sent. If the key doesn’t exist, the images cannot be viewed and thus, cannot be viewed except in the context of a web page. This is accomplished in Apache by adding the following two lines to .htacces (for the directory you wish to protect):
Action blockhotlinking /hotlinkcheck.php
AddHandler blockhotlinking .jpg .jpeg

where hotlinkcheck.php is a PHP script that does what we’ve described.

Any anti-hotlinking counter measure incorporating both of these techniques will be very successful.

Circumvention Techniques
This technique can be circumvented by hacking the server, which is not the kind of activity people seeking to copy pictures are likely to engage in. This technique may also be circumvented by Web Archiving programs, but in our experience, if the script is written correctly, even these programs can be prevented from siphoning the images.
Potential Incompatibilites/Issues
Such a system makes designing the pages difficult because all images are “generated” on the fly and thus, cannot be viewed in the standard WYSIWYG HTML design program (such as Dreamweaver of GoLive). This system can be used, however, with automated image gallery creation tools such as iView Media Pro. Click here for more information. Also, users who don’t accept cookies will experience problems too. In fact, people with hotmail.com email accounts frequently experience problems since by default, Internet Explorer for Windows doesn’t allow framed documents from another domain to send cookies. The work-around is to include some JavaScript that breaks your site out of frames (keeps your site from being framed by someone else).

System Requirements

Server-side
 Any server that can be configured to handle image files using a custom script or with some sort of server-side scripting capabilities.This system does not require the GD library but could benefit from having it available.
Client-side
Cookies are used by default, but the system can be configured to use a transparent ID in cases where cookies are not allowed.

Right Clicking

Description
Users can right click on an image and choose to save it on their computer. This is the same as dragging the image to the desktop.
Counter Measures
There are four counter measures that can be used to inhibit right clicking or dragging and dropping of images: Spacer w/ Background Image, JavaScript Hide Source, Microsoft Meta Tag, and A Thousand Images. These counter measures can be used together or separately as they each address a different aspect of the right click feature.

Spacer w/Background Image
This is my personal favorite technique because it is widely supported, does not require any special JavaScript, fools most attempts at copying the image, can be used in a WYSIWYG HTML editor, and when used in conjunction with the hotlinking counter measures, is nearly foolproof. Basically, each image is displayed as a transparent GIF whose size and background image is set to the image you want to display. The result is that the user sees the image as she normally would, but when she tries to right click to save the image, she ends up saving the clear GIF rather than the actual image.
JavaScript Hide Source
Using JavaScript, the functionality of the right mouse button can be altered to prohibit downloading of images. It can also be used to hide the source code of a page so that the user can not navigate directly to the image outside the context of the actualy page. Because this is a very complicated topic, no examples of how to do this are provided here. Regardless, here are a couple of links in case you’re interested: hide source code and disable right click.
Microsoft Meta Tag
This doesn’t really disable right clicking, but it does disable the Microsft “Image Bar” that appears when Internet Explorer (version 6) users hover their mouse over an image. Simply put this code in the HEAD element of your web pages:
A Thousand Images
Adobe ImageReady (the web companion to Photoshop) allows you to slice images up into little pieces. You could slice each image up into individual pixels making it very time consuming to download each pixel and recreate the image. This may be more practical than it seems…
Circumvention Techniques
To view the source of a page that is using JavaScript to hide the source, you can either enter this into the location bar and press Enter: java script: alert ( document.body.innerHTML ); or you could visit the site with Mozilla and use the DOM Browser to view the source code as a Document Object Model tree (and thus see the SRC attribute of the image you are after) to get at the image directly.To view an image that is being protected by an invisible GIF (the first method on this page), just look at the source and point your browser to the actual image you wish to download (and the image will appear in isolation in its own window).There is no way, that I’m aware of, to circumvent the capturing of an image that’s been split into a thousand pieces, except downloading them one by one.
Potential Incompatibilites/Issues
Incompatible JavaScript versions: It is very, very difficult to write JavaScript code that works the same on all browsers and platforms. In all likelihood, any script you find or write yourself, will fail in one environment or another. Incompatible CSS/browser: Not all browsers will know how to display background images. Failure to do so means the users won’t see your image.
System Requirements
In any situation where JavaScript is required, you must include the tags with some sort of content that explains why JavaScript is needed.Similarly, you may need to test wheter or not Cascading Style Sheets in enabled before displaying your page, if you want to be sure that your visitors see what you intend.

Screen Shots

Description
The user presses PrtScr (Print Screen) on Windows or Command+Shift+3 (or 4) to capture the computer display as a separate image. On Windows systems, the screen is on the clipboard and needs to be pasted into an application before it can be viewed. On the Macintosh, the screen is saved as a file on the user’s desktop, usually named Picture 1.pdf.There are a variety of system utilities that also provide this functionality and it is important for people reading this to understand that it is very valuable to be able to take a picture of your screen and share it with someone else.
Counter Measures
Counter measures for this method are virtually impossible. This is because web pages have very limited access to controlling, or even being cognizent of when certain keys are pressed. You could, conceivably, swap the image out for a blank image whenever a key is pressed. Because this counter measure relies on JavaScript, you have to content with everything that entails (browser compatibility, noscript fallback tags, etc.)
Circumvention Techniques
Even if you could disable all keys and mouse activity, the visitor could write a script that opens their browser, goes to your page, and executes a screen shot, all without touching any of the keys or the mouse. This seems extreme, but it hints at the limited capabilities of browsers to control what their owners do.
Potential Incompatibilites/Issues
JavaScript implementations across browsers and platforms.
System Requirements
In any situation where JavaScript is required, you must include the tags with some sort of content that explains why JavaScript is needed.

Web Archiving

Description
Users save the page as a Web Archive (a self-contained version of the entire page, including all images) or print the page (or save it as a PDF).
Counter Measures
Same as Screen Shots, but even more limited; display image in modal window using a Java applet or Flash.
Circumvention Techniques
It depends on what kind of counter measure is used, but usually, taking a screen shot can circumvent them.
Potential Incompatibilites/Issues
Many people like to use Flash as the “player” for their images because it protects their images rather well (by disabling many of the methods, except for screen shots, listed on this page). For the most part, Flash is widely suppported (available on most computers). For people who use Java or who don’t have the Flash player installed, however, it can mean that they don’t see anything at all.
System Requirements
Depends on the method…