Although I’ve been using XSLT on a variety of projects for nearly 10 years now, I’m still stuck using XSLT version 1.0 only processors and frequently turn to “keys” to solve complex grouping problems. A few weeks ago I was presented with a grouping problem that put my knowledge to the test and forced me to fully understand how keys work and how best to set them up. Thanks to the fabulous people on the XSL mailing list, I was given lots of valuable feedback and pointed in the right direction. I feel it’s only fair to share what I learned: the USE attribute is the XPath to the node whose data you want to use as the key (the “grouper” if you will). Furthermore, the XPath is relative to the ELEMENT in the MATCH attribute.

Normally it’s as easy as specifying an attribute to use as the key but let’s consider an example in which all the elements are the same and the only thing that uniquely identifies them is their location relative to each other. Consider a table with the following structure (TD elements with nothing below them have ROWSPANs set):

td td td td td td
td td td td td td
      td td td td
      td td td td
      td td td td
   td td td td td
      td td td td
      td td td td
   td td td td td
      td td td td
      td td td td
      td td td td
td td td td td td
      td td td td
      td td td td
   td td td td td
      td td td td
      td td td td
      td td td td
      td td td td
   td td td td td
td td td td td td
      td td td td
   td td td td td
      td td td td
      td td td td
   td td td td td
      td td td td
      td td td td
      td td td td
      td td td td
      td td td td

In a KEY element, you identify the source element you wish to capture (TR in this case) in the MATCH attribute. In the USE attribute you specify an XPath statement indicating the node the matched element will belong to (be grouped by). This is fairly straight forward when the input is well defined, but in a situation like ours, where structure is the only thing we can key off of, and the structure itself is somewhat amorphous, it can be quite difficult to write the proper XPath.

Let’s say the first row is the header row. It should not be included in the output as it simply contains labels for the columns. Rather than accounting for it in the key element, we’ll simply skip it when applying templates (select="//tr[position() > 1]").

The key (or group name) for the matched elements will come from rows containing 6 or more TD elements. Specifically, the data will come from the first TD element in those rows. The XPath would be something like this ancestor-or-self::tr[count(td) >= 6]/td[1]. Unfortunately, this code only groups rows in which there are 6 or more TD elements. Rows with 5 or less TD elements are left out of the result set. For rows with 5 or less TD elements we will need to look up in document order and stop at the first row above them containing 6 or more TD elements. This is where it gets complicated… This could be solved with some sort of IF-THEN-ELSE construct but since we’re using XSLT, that’s not the best approach.

Instead, we’re going to capture ALL the potential keys above the current row and filter out the ones we don’t need.

To “look up” we use preceding-sibling: ancestor-or-self::tr/preceding-sibling::tr[count(td) >= 6]. This will give us ALL the rows with 6 TD elements preceding the currently matched row. However, we only want the row [with 6 TD elements] that immediately precedes the currently matched row, and not all of the preceding siblings. Thus we append: [position() = count(.)] which gives us the last item in the set (not exactly sure why last() doesn’t work here, but it doesn’t) followed by the first TD element in the row: /td[1].

Finally we filter out the nodes we don’t need. We do this by joining the statements via the pipe character | and enclosing the whole thing in parentheses, and from that result set we take the very last element: [last()], which is exactly the key we are looking for. Here is the final key element:

 

<xsl:key 
	name="ports-by-ship" 
	match="tr" 
	use="(ancestor-or-self::tr[count(td) &gt;= 6]/td[1] 
		| ancestor-or-self::tr/preceding-sibling::tr[count(td) &gt;= 6][position() = count(.)]/td[1])[last()]" 
/>

 

Because it’s hard to see the result of such a complex XPath, I first run the transformation using a template match on TR elements and copy-of the results to the output. That way I can see what my XPath is actually producing. Once I’ve got the set I’m looking for, I move it into a key element.

They say you don’t really know something until you can explain it to someone else. I’m not sure if I’ve succeeded in explaining it or not, but I feel like I’m much, much closer than I was when I was presented with this problem a few weeks ago.

My XML input.

My XML output.

My XSLT.

CarolanHighlights

  • More than 10,000 items online
  • Highly SEO Optimized
    • SEO Friendly URLs
    • Main content delivered first in document order, navigation last
    • Extensive use of CSS sprites
    • All HTML, JavaScript, and CSS files minified and gzipped
  • Fully automated integration with backend database, including roll-back capabilities
  • Fully automated connection between products and their images
  • Fully automated watermarking of images
  • Custom store administration interface for new items

Description

Carolanfiestas.com is a “bricks to clicks” solution for Carolan, a party supply store in the Canary Islands. With more than 40,000 items in their inventory, Carolan needed a online store to expand their presence to other parts of Spain. Furthermore, they wanted to improve their ranging in the search engines and sell, sell, sell!

Technical

Carolanfiestas.com was a Zen Cart store with a custom theme and a few minor tweaks to improve search engine optimization. On the back end, the server would receive updates to articles in XML and update the Zen Cart database every few minutes with the changes. Likewise, whenever an item was sold, an XML file was sent to the backend database to keep inventory accurate. This site made extensive use of Bash scripts to move the data from XML to MySQL and to manage the creation of images with their watermarks (a minimum of 4 sizes per image). During the busy season, the site was very busy with more than 10,000 visits per day. You can read more about this project here (in Spanish).

Desde verano de 2009 he estado desarrollando la tienda web de Carolan, una tienda de disfraces y otros artículos de fiesta con sede en Las Palmas de Gran Canaria. Ya he perdido el control de las horas totales que he invertido en este proyecto porque se ha convertido en una obsesión por cumplir con las “mejoras prácticas” (best practices).

El pensamiento corriente dice que cuando uno quiere montar una tienda web, coge el sistema de software libre de moda y la monte sin más. La alternativa (desarrollar una tienda a medida, desde cero) seguramente me habría costado el doble, ¿o no? De esto se trata este artículo…

Carolan vende artículos de fiesta y productos de un solo uso. Tienen una gran variedad de artículos como disfraces, copas, servilletas, guirnaldas, decoraciones y mucho más. En realidad tienen más de 20.000 artículos catalogados. Cuando empecé a trabajar con el equipo de Carolan, uno de sus principales deseos era crear una conexión “viva” entre su base de datos y la tienda web para minimizar gastos e incrementar la eficacia. Otro deseo importante era que la web fuera fácil de usar (y los artículos fáciles de encontrar) y dinámica con una portada que variara según la temporada y que la tienda saliera bien en los resultados de los buscadores.

Analizando sus deseos / necesidades determiné que la mejor opción era la adopción de una tienda de software libre. Creía que me ahorraría montón de tiempo pero ahora no lo tengo tan claro y quiero saber qué opinas tú. Aquí resumo las modificaciones que se han tenido que hacer desde que empecé con la instalación de la tienda. Ten en cuenta que no detallo TODAS las modificaciones sino las más gordas / importantes.

URL

Por defecto Zen Cart viene con una opción que te permite hacer las URLs “search engine safe” (legibles a los buscadores). Básicamente convierte
/index.php?main-page=product-info&products-id=1234
en
/index.php/main-page/product-info/products-id/1234
(o algo por el estilo). No está mal pero hoy en día este tipo de transformación no es necesaria.

Lo que sí quería era convertir la URL en algo como /disfraces-y-complementos/disfraz/adulto/mujer/disfraz-abeja-adulto.html para tener palabras claves en la URL. Lo logré con un par de modificaciones al código base de Zen Cart y algunas transformaciones via mod_rewrite. Me ha costado perfeccionar el método pero de momento parece estar funcionando más o menos bien.

Etiqueta Title

El valor de la etiqueta Title por defecto de Zen Cart no se presta a una buena indexación por parte de los buscadores. Tuve que modificarla unos cuanto sitios para que fuera siempre único (no repetido) ya que valores repetidos minimiza la usabilidad.

Optimización de Velocidad y de Código Fuente

Cada vez más la velocidad de una web determina su ranking dentro de los resultados de una búsqueda. He intentado cumplir con todas las sugerencias de Yahoo! and Google referente mejorar la velocidad de la web. Según mis cálculos, la web empezó cargándose en 7 segundos (sin cache, incluyendo lentitud de la red). Tras estos cambios, las peticiones terminan en menos de 1.5 segundos y para implementar estas sugerencias tuve que tocar:

  • el diseño general: se implementaron sprites, se combinaron hojas de estilo y ficheros .js
  • el diseño del perfil de un artículo (minimicé la cantidad de HTML generado por la tienda)
  • el diseño del buscador (se mejoraron el diseño de los enlaces a las páginas siguientes, se modificó la presentación de resultados, se modificó la etiqueta Title de los resultados para incluir los nombres de los productos y no solo el número de la página, se creó un Índice de productos)
  • se modificó el diseño, y el proceso, de “checkout” (pasar por caja)
  • se eliminarón los caracteres “codificados” (&aacute; se convirtió en á)
  • se redujo el número de peticiones por página de 40~ a menos de 20
  • se implementó el uso de dominios sin cookies y de una Red Distribuidora de Contenidos (Content Distribution Network – CDN)

En fin, tras todos estos cambios y teniendo en cuenta que sólo utilizan un método de pago, me pregunto si realmente me he ahorrado algo usando zen-cart (lo cual supuso una inversión de tiempo para aprenderlo). ¿Qué opinas tú?

I recently implemented a newsletter subscription form in ASP.NET (2.0) for the CGIAR Secretariat on behalf of CGNET. This is the second project I’ve done in ASP.NET for CGIAR. I’ve never considered myself a skilled ASP developer and like many, picked up my ASP skills based on code I’d seen on the intertubes and via transfer from related languages. In other words, prior to this project, I was a somewhat capable ASP spaghetti coder.

Tired of producing mediocre code and eager to learn what this whole .Net thing was all about, I decided to invest some time learning how to write better ASP and take advantage of as many features of .Net as I could. Armed with two really good books on the subject (Beginning ASP.NET 3.5 in C# and VB and Programming ASP.NET 3.5, 4th Edition), I learned a lot about the .Net revolution and in the end I significantly improved the quality of my code.

DOT.NET borrows heavily from other MVC-like frameworks. I was surprised by the number of similarities between the ASP.NET-way of doing things and the Fusebox-way of doing things. The rest of this post examines some of these similarities and other aspects of working with ASP.NET. This is mostly an examination of ASP.NET from a (PHP) Fusebox developers point of view.

The Project

The CGIAR Secretariat is responsible for www.cgiar.org. The CGIAR Newsroom is one of the primary
sections of their web site. It includes an aggregate RSS news feed of all news items coming out of many of the CGIAR centers. Since many people still are unaware of the advantages of RSS, the CGIAR Secretariat asked if we could set up a system that would allow people to subscribe to the feed via email. Specifically, the system we set up allows people to subscribe and/or unsubscribe via cgiar.org which then automatically sends periodic emails of recently added news items (as they’re added, of course).

The Bottom Line

As an “experienced” web application developer I very much appreciated the ASP.NET-way of doing things. There was nothing in this project that ASP.NET wasn’t able to handle elegantly and more or less efficiently. The project consisted of implementing the following features:

  • A public subscribe and unsubscribe form with CAPTCHA
  • A nightly script that produces a notification email, with alternate views (plain text and html), consisting of news items that had not been sent in prior emails (which implies keeping track of what’s been sent and what hasn’t)
  • A password protected administration interface

The entire project was completed in about 80 hours (including an initial version of the administration interface which was later tabled).

If given the choice of doing the same project in PHP Fusebox, assuming I had the same knowledge and experience with PHP that I had with ASP when I started, would I have chosen PHP Fusebox over ASP.NET? Maybe…

Master Pages

One of the goals of any application framework is to maximize code reuse (and conversely, minimize code duplication). Functions (methods) are one example of how this is accomplished, but when it comes to the presentation level, developers often find they need a more powerful programming model. Both ASP.NET and Fusebox (and many other web application frameworks) provide tools
capable of complete solutions. In ASP.NET, a Master Page is a template for all the pages of a site although its functionality goes beyond a simple templating engine. Master Pages also let you define “behaviors” common to all pages, similar, somewhat, to the Fusebox 3 fbx_Settings file or the Fusebox 4 fusebox.init file.

A powerful templating engine will frequently go beyond “one layer” and allow the developer to subdivide sections of the Master to be handled by other parts of the application. ASP.NET offers this functionality directly and at least one project I’ve worked on in Fusebox had the same functionality. I found a few opportunities to use this feature on this project.

Fusebox does not offer a templating engine out of the box, but you can easily create much of this capability in Fusebox 4 (and to some degree in Fusebox 3) using Content Variables and a “layout” circuit. Most projects I work on offer some such circuit.

In the end I got a lot of milage out of Master Pages for this project and hope to be able to use them in future projects for CGIAR.

Postback

By default, every ASP.NET page contains a form that executes on the server. The value of the “action” attribute for the form is the current file. Microsoft has termed this approach “postback” because you post the form back to the same document that created it. In some respects this is similar to the Fusebox implementation of the Front Controller design pattern, where every request is for the same server-side script (e.g.: /index.php) followed by a Query String containing directions on what files, functions, and procedures to execute.

ASP.NET offers the developer server-side elements knows as panels. Panels are “controls” (elements?) that contain other controls. By setting the visible property of a panel you can control whether or not its contents are visible on the page. For the CGIAR project mentioned above, I used this technique to display either the subscription form or the “thank you” message following a successful subscription. I suppose that, if each ASP.NET page is the equivalent of a Fusebox circuit, that each panel could be the equivalent of a Fuseaction. You would simply set the display for all for all of them to false (except the default fuseaction 😉 ) and then display them as needed. Coming from Fusebox, I found the concept very easy to grasp.

Code-behind

Microsoft made an attempt to separate application logic from presentation with DOT.NET. In my opinion, they succeeded.

In order to minimize the amount of raw code found in HTML, .Net provides something known as “code-behind” pages, which are essentially includes with the same name as the file they are attached to. The idea is that your application code goes in the code-behind page and if you need to modify the presentation (the HTML) from within the application code, you do so by referencing elements in your HTML page via their ID attribute (this is an oversimplification but summarizes the approach).

Fusebox, on the other hand, tries to separate code by prefixing file names with one of dsp_, qry_, act_ (and sometimes lyt_).

  • qry_ files contain datasource queries and (usually) return some sort of object or array echoed to the browser by a dsp_ file.
  • act_ files are for those instances in which you need process data prior to executing a query or echoing to the browser.
  • lyt_ files are, in essence, the same as Master Pages in ASP.NET.

In ASP.NET the “HTML” files contain a LOT of .Net namespaced elements. This means the files can be completely valid XHTML (with the single, notable exception of the Processing Instructions found at the top of each page). The benefit is that these files are suddenly very portable and can be consumed by any system capable of reading XML.

If you wanted to reproduce this ASP.NET functionality in Fusebox, you would need to write a plug-in that parses the dsp_ files looking for <fbx: elements and responds accordingly. You could put all of your code in act_ files which would essentially turn them into code-behind files. Now there’s a potential open source time sucker!

Data Controls

As one would expect, the data controls are very complete, but figuring out how to do something like
nesting GridViews was not obvious and were it not for a Nested GridView walk-through article on MSDN, I never would have been able to figure out how to do it. Furthermore, I’m not convinced executing sql on EVERY ROW of a record set is really a good idea (the authors of the walk-through admit this is not the best approach, but only as it concerns caching…).

Much of the CGIAR Newsroom revolves around their RSS feed (which is a compilation of feeds from all of the CGIAR Centers). ASP.NET 2.0 and above include controls for using XML as a data source and thus facilitating the display of XML data in a web page.

Unfortunately, in version 2.0 of .Net (and possibly higher), using XML as a data source only allows you to display the data. It does not allow you to use the built-in INSERT, UPDATE, and DELETE
features of the GridView control. There are work-arounds for implementing this missing functionality but I have to wonder if you save any time hacking in the functionality vs. building the entire administration interface the “old” way (which can be done pretty quickly using XSLT). By my estimates, it’s a draw, at best.

ASP.NET Advantages

For my needs, web controls, validation in particular, significantly reduce web application development time. I simply cannot express how much I like the validation controls and WISH PHP had something similar!

IntelliSense greatly speeds up coding – Microsoft offers several free (web) application development tools that work quite well and are more than adequate for the projects I usually work on and many include IntelliSense.

I don’t think I could have completed this project as quickly without IntelliSense.

ASP.NET Ambiguities and Disadvantages

Since no language is free of sin, here is my list of things that got the best of me while working on this project:

  • web.config cannot be part of code repository since it is machine dependent. If application configuration options are so different between deployment environments, then maybe the author of the application should consider using a different development environment. I would prefer to have the application configuration code right in the application tucked into code that “sniffs” the environment and configures accordingly. This makes for MUCH more portable code.
  • .Net developers frequently publish their source code, but it usually needs to be compiled so unless you’re into that or have the time to learn how to do it (and do it right), you won’t find the same kind of huge Open Source community of code that exists for PHP
  • ASP.NET 2.0 includes some Authentication and Authorization controls, but like most stuff like this, you have to do things the ASP.NET-way or you can’t use these controls. In other words, there is no (apparent) way to retrofit these controls onto existing authentication mechanisms. In the end this is probably a good thing since most existing authentication methods are not very secure, but in the real world most clients simply are not willing to put money into changing existing systems unless you can clearly demonstrate they are broken.
  • One of the main complaints of Fusebox 4 was the XML files. Rather than being used simply for configuration, you could easily add business logic to them. More than one programmer has asked herself: “Why bother with XML to represent classes? Why not just use classes directly?” I must say, when programming in ASP.NET, I often feel like I’m simply setting application configuration parameters which, for anything but the most basic interactions, makes programming harder (and possibly more time-consuming) rather than easier (and faster) since you have to have a clear understanding of what state the application is in at the exact point where your code appears. This can be harder than it seems.

In the end, if I had to do it all over again (and had the choice), I would probably stick with PHP Fusebox but I’m grateful I had the opportunity to improve my knowledge ASP.NET and I wish the CGIAR Secretariat the best with their new system.

Additional Reading and Links

Convert ASP.NET applications into real MVC frameworks

Fusebox Basics

Comparison of Web Application Frameworks

CGIAR

Home page of the CGIAR Newsroom. Please note that the page did NOT look like this when I worked on it. This is a NEW design that I had nothing to do with.

Highlights

  • Centralized news for all of CGIAR
  • Subscribe by Email Interface

Description

I created an RSS aggregator with an option for receiving updates via email. The CGIAR, part of the World Bank, desired to centralize their communications through the CGIAR’s newsroom and web site. When the project started, the varied departments of the CGIAR all had their own web sites, designs, CMSs, and ideas about how to get the word out about what they were doing. In 2008 I worked with the CGIAR Newsroom staff to consolidate the RSS feeds from all of there centers (where available) into a single feed on their main Newsroom web page. For centers that did not have their own web site, we included created an RSS feed for the Newsroom’s own CMS and they were encouraged to use that. In the end, users could subscribe to a single feed for all the CGIAR news.

Technical

The entire solution was done in ASP.NET in Visual Basic using XSLT to transform the XML news feeds into HTML. The ability to receive alerts whenever a new article appeared was a web form that included a CAPTCHA test to minimize subscriptions by robots.

Hinshaw & Culbertson LLP Intranet Home PageHighlights

  • Role-based, Page-level Security
  • Integrated Windows Authentication
  • WYSIWYG Content Management
  • Modular Application Framework (Fusebox)
  • Firm Directory with Photos
  • Firm Calendar, News, and Links
  • Security Aware Recently Modified Block
  • Security Aware Search Engine with Support for Phrase Searching and “Best Match” Results Ranking

Description

This is the first version of the firm’s intranet. The Managers and Directors provided the required input on the type of content they wanted on the intranet and how they hoped to be able to update it. Self-publishing was a requirement as was Role-based security. The system includes a variety of content management features like custom display order of section contents, recategorizing entire sections (and their children), attaching documents to a given section, limiting access to a section to members of administrator-defined groups and much more. The system was first released in the summer of 2002 and remains in active use.

 

Technical Info

The system was built in PHP with the Fusebox application framework (http://www.fusebox.org/). The database is SQL Server. Access to the database server is provided by the ADODB Database Abstraction Library (http://php.weblogs.com/ADODB/). The WYSIWYG editing functionality is provided by Interactive Tools (http://www.htmlarea.com/). The system uses the Htdig search engine (http://www.htdig.org/). The system is served on Windows 2000, IIS 5 and all pages are valid and compliant XHTML 1.0 Transitional.