Monday, April 20, 2009

Downstream caching and pixel tagging

Copies of Web pages served to browsers are often 'captured' by content caching systems. 'Downstream' caching systems are typically operated by third parties such as the ISPs and other organisations through whose networks the pages travel on their route to users' computers. These caching systems are able to serve pages of which they hold copies in response to subsequent requests for them without reference to the origin server.

From an Internet-wide perspective caching content downstream close to the browsers is a good thing: serving content to topologically nearby browsers is quicker and consumes less network resource than transmitting it from the origin servers. It also reduces the load on the origin servers.

In order to have a website inter-operate properly with downstream caches (for example, to avoid out-of-date pages being served to users), it is important that appropriate cache control directives are included in the HTTP headers of the content that it serves. Getting this right normally involves having your server administrator configure the web server software appropriately. Note that it is not appropriate to attempt to control downstream caches by using HTML mark up elements because the special purpose appliances typically used for caching only act upon HTTP directives in the content headers.

There is an important consideration with regard to website traffic measurement arising from the increasing deployment of downstream caches on the Internet. Typically, there will be no record of pages served from downstream caches in your traffic log. As downstream caches are increasingly deployed on the Internet, standard origin web server logs tend to underestimate the number of your pages that have actually been viewed by users.

The pixel tag approach

One way of achieving a more accurate page view counts in origin web server logs is to ensure that every page contains a content element whose HTTP headers mark it as non-cacheable. This can be achieved by including a tiny transparent image referred to as a pixel tag in each HTML page. This pixel tag is typically served from a directory the contents of which the web server has been configured to serve out with HTTP headers marking the content as non-cacheable.In a pixel-tagging regime, page impressions served (including those served from downstream caches) can be estimated by counting the number of pixel tags served. If more detailed information is required about which pages have been served, then all or a part of the page's own URL can be included as a query string on the end of the pixel tag.

Examples of pixel tagging

A basic pixel tag could be generated by including the following image element in HTML pages (conventionally just before the closing tag):


In this example, the directory named 'nocache' resides at the root of the web server. The web server would be configured to include HTTP headers marking any files served out of the 'nocache' directory as non-cacheable. The file named 'trans.gif' would be a one pixel square transparent GIF image.

If it is required to track actual pages visited by users. In this case, the pixel tag for example, in the file at:

http://www.e-envoy.gov.uk/insideoee/index.shtml, would be:

0 comments: