Sunday, April 19, 2009

Evaluation Website - Not the full picture

You should be aware that there are limitations to the information that can be discovered from the analysis of Web server log files .The principal issues are:
  • Most ISPs use dynamic IP addressing. This means they maintain a pool of IP addresses from which an IP number is ‘loaned out’ to each dial-up call for the duration of the call. A particular IP number will therefore be used by many different users and a particular user may appear at your website with many different IP numbers. The firewalls used at the interface between the Internet and corporate networks typically use a process named Network Address Translation (NAT) which has a similar effect. Firewalls also often use a process named Port Address Translation (PAT). With PAT, many users behind the firewall ‘share’ a single Internet IP number. The result of all this is that a specific IP number only rarely corresponds to a specific user and it is inappropriate to attempt to base estimates of the number of visitors to your website on a count of the different IP numbers found in server log files alone.
  • Caches - almost all ISPs and many corporate users deploy 'perimeter caches' to conserve their Internet connection bandwidth and improve the speed with which web pages can be served to their users. These are often set up to work ‘transparently’ regardless of whether users have configured their browser’s cache settings. . Perimeter caches work by storing a copy of pages fetched by the client systems on whose behalf they are deployed. Subsequent requests for pages from other users behind a cache will be served from the cache if it already has a copy of the page. This may be done without any further reference to the origin server. Therefore web pages may be served to users without the creation of any record being captured in the origin server's log file.
  • Dynamic proxies - dynamic IP addressing and perimeter cacheing make the identification of page requests from specific users uncertain. This uncertainty is further compounded by the fact that some organisations assign proxy devices such as perimeter caches dynamically during the course of a user’s Internet session. The result is that a sequence of page requests that is in fact from a single user may appear to come from several users even during the course of a single visit or session. AOL is an example of an organisation that uses dynamic proxying.
  • Cookie manipulation - users can delete, or otherwise manipulate cookies stored by their browsers. Browsers can convert persistent cookies to session cookies. Cookies cannot therefore be relied upon as the basis for accurately measuring the number of users of a website or for identifying users that revisit a website.
  • Browsers - some browsers are known to incorrectly identify the referring URL by indicating the previous page that the client was viewing even if the user recalled a bookmarked URL or typed a URL in to their browser’ as opposed to following a link on the displayed page.
  • Anonymisers - some clients use 'anonymisers' which deliberately send false browser and referrer data.
All of these issues mean that there have to be reservations concerning the reliability of estimates derived from standard web server logs of the number of users of a website or of their browsing behaviour when they visit a website. The Internet advertising industry develops and promotes standard website traffic metrics and methodologies for calculating them. It is recognised that the measurements are flawed for the reasons outlined above, however, it is believed that the metrics provide the basis for comparing one website's usage with another on the basis that these issues will affect all websites to broadly the same extent. There is, however, no sound basis for this belief.

The Joint Industry Committee for Web Standards in the UK and Ireland

JICWEBS is the body created by the UK and Ireland media industry whose aim is to ensure independent development and ownership of standards for measuring use and effectiveness of advertising on electronic media.

The International Federation of Audit Bureaux of Circulations

The IFABC Web Standards Committee promotes similar aims on a worldwide basis.
  • www.jicwebs.org
  • www.ifabc.org
User agent masquerading

The term 'user agent masquerading' refers to browsers that transmit an incorrect browser identification string in the requests that they send to servers. Some browsers just do not properly identify themselves and are therefore not being identified in server log file records. Deliberate masquerading is also used for a number of reasons:
  • Some websites alter the content they serve based on the browser identification string, so masquerading can be used to work-around this.
  • Some websites reject requests from browsers that they are not intended to work with, so masquerading can be used to work-around this.
  • Some users simply wish to remain as anonymous as possible.

Wednesday, April 15, 2009

Evaluation and website metrics - Advanced techniques

Log files can be further analysed through advanced techniques. For example:
  • Sessions and visits - the identification of sequences of page requests from individual users.
  • Session and visit duration the measurement of the length of time that individual users spend viewing a website.
  • Categorisation - a process whereby similar items, eg URLs, browsers, platforms, a specific directory, are grouped together for pattern matching.
  • Aggregation - a process by which all combinations of entities and their resulting measurements are combined.
Other website server software may also keep logs that can provide useful insights to the way visitors use your website. For example, it may be possible to configure search facility software to record the search terms that visitors have used when they are attempting find information on your website. This information can be useful when considering whether there are areas of the site that are not easy to find and can help with organising navigation. It also may indicate what other information users are expecting to be on the website, which would be of use when considering whether additional content should be included on your website.

Tuesday, April 14, 2009

Using a server log file

A standard HTTP server log entry may look like this:

193.63.182.194 [03/March/2001:11:30:35]
‘GET/webguidelines/index.htm HTTP/1.0’ 200 35000

What this means:
  • 193.63.182.194 is in principle the IP number of the client’s (the visitor’s) host name or computer making the request. In fact it may actually be the IP number of a ‘proxy’ device that made the HTTP request on behalf of the real user. Such devices include the web content caching appliances that ISPs are increasingly deploying (‘perimeter caches’) and the firewalls that are typically deployed between corporate networks and the Internet. See section 1.4.5 Not the whole picture!
  • 03/March/2001 indicates the date of the access.
  • 11:30:35 indicates the time (hours:minutes:seconds) of the access.
  • ‘GET/webguidelines/index.htm HTTP/1.0’ is the request that the browser sent to the server.
  • 200 is the HTTP status code with which the request completed (code 200 means that the file was served successfully. See annex I Common HTTP server status codes.
  • 35000 is the size in bytes of the file that was transferred to the client’s browser.
Depending upon the logging capabilities of the web server software and how the web server logging has been configured, web server logs may contain a large amount of additional information such as:
  • HTTP_REFERRER this records the URL of the web page that referred the visitor to the current page. This actually records how a user (client) makes their way through your website.
  • USER_AGENT this records the program name and version number of the browser that the user (client) employed. For example, Microsoft Internet Explorer/4.04 (Windows 95).

Sunday, April 12, 2009

Understanding user statistics

Website usage statistics are generally obtained by analysing the server logs. A typical HTTP server log contains in a log entry for each HTTP request (or hit) on the server. This entry will contain information about the web resource requested and the browser to which it was served. Software can be used to analyse and process these log files and provide a picture of the traffic to the website.
  • the number of visitors,
  • visitor duration and traffic pattern,
  • visitor origin including which country, when it can be identified,
  • visitor IP address,
  • visitors’ technical preferences, such as browser type and version, platform.
This analysis will also indicate:
  • traffic peaks and troughs against time of day and day of the week,
  • average daily user load,
  • what obstacles may turn visitors away,
  • which pages get high traffic,
  • which directories are getting high traffic,
  • which graphic files are acceptable in terms of size and download time,
  • type of browsers (user agent) being used.
There is a wide range of software available for processing and analysing the potentially huge amount of raw data contained in web server logs. This ranges from the commercially available Webtrends product family through to ‘shareware’ packages such as Wusage and free software like Analog.
  • www.webtrends.com
  • www.boutell.com/wusage/
  • www.analog.cx

Thursday, April 9, 2009

Evaluation and website metrics

Is the web strategy working? Does the navigation get people to the information they need? Is the server reliable? Measuring audience satisfaction, looking at feedback, understanding access statistics without measures such as these you will not be able to demonstrate value for money, or that you are meeting the needs of users and the aims of management. Therefore, regular (quarterly will be sufficient), formal evaluation exercises of both the content and the technology are strongly recommended.

Evaluation of website design and content can be carried out by drawing on:
  • Website access statistics provided by the ISP/hosting service provider. (The ISP/hosing services provider may either supply the raw web server logs or the results of their having been processed by analysis software);
  • Responses via feedback tools (forms, databases, email addresses);
  • Feedback from contributors to the website;
  • Conventional audiences research, for example, focus groups and professionally authored online questionnaires.
The effectiveness of the website can also be judged by measuring achievement in other ways. For example, one recruitment website was evaluated on:
  • The number of recruits that applied via the website.
  • Their performance of web recruits measured against that of staff recruited by other means.
  • The cost per recruit measured against the cost per recruit of publicity in other media.
If the ISP/hosting service supplier provides the results of analysing the web server logs as opposed to providing the unprocessed raw logs, the minimum information that should be required from them is statistics on:

  • number of unique users (visitors)
  • number of visits ,and
  • page impressions (page views).
Some examples of other relevant metrics that can be identified from web server logs are:
  • error message counts (indicating that pages and other content were not served successfully); and
  • traffic analysis focussing on peak times (to assess bandwidth requirements) and ‘dead’ times (should it be necessary to switch the site off while maintenance is carried out)
Additional useful information can include:
  • successful requests;
  • unsuccessful requests;
  • most frequently visited pages;
  • least frequently visited pages;
  • top entry pages;
  • top referring websites.
This information can be used to do such things as:
  • identify the most popular content,
  • review the navigation system for example, identifying orphaned pages,
  • identify referring websites (the sites from which users arrive at your website),
  • audit the level of response to electronic forms,
  • assess the effectiveness of marketing/PR campaigns in bringing traffic to the website,
  • provide information on users’ platforms and browsers,
  • identify users’ DNS domains and thus visits from abroad or from within government.
It is, in addition, recommended that web teams should:
  • give more importance to visitors, unique visits and page impressions than to hits;
  • take as much notice of error logs as of any other statistics;
  • determine who is using the website the most;
  • monitor current bandwidth use, and attempt to project future requirements;
  • archive server logs to use for monitoring trends over time.

The web strategy and management team should ensure, at the procurement stage, that ISPs/hosting services are offering to provide a full range of server log information.

It is acceptable to use HTTP cookies or session identities to track visitors' paths through the website (and this will be essential in e-transactional sites). The website should contain a clear statement of policy on the use of cookies.

Good practice dictates that the need for attention to the accuracy and timeliness of information will increase as the level of activity of a site increases.

Web managers should, in the interests of open government, consider publishing a summary of usage statistics on their websites

Wednesday, April 8, 2009

The commercial value of credits

The giving of credit to suppliers of web services that you employ directly within the functionality of your website can have commercial value. Significant reductions to the cost of features such as search engines can be negotiated especially if logos and links to suppliers’ sites are granted. The value will vary with the popularity of the specific web pages, and the relevance of the service to your readership.

The giving of credit to suppliers of web services, for example, by name, by email address, particularly if within your metadata will also have commercial value. Reductions to costs should be negotiated.

Tuesday, April 7, 2009

Sponsorship

Sponsorship may be a useful means of saving public expenditure. Like all government publicity projects, websites should observe the guidance given in the Cabinet Office Guidance for Departments on Sponsorship of Government Activities. This document can be found online at: http://www.gics.gov.uk or published in the Directory of Civil Service Guidance. These guidelines should be consulted in full. Like all government guidelines they are subject to amendment and update.

In general, sponsorship:
  • must avoid any suggestion that the sponsors will be sympathetically regarded for other purposes;
  • must be seen to add significant benefit;
  • should add to, not replace, core funding for the project;
  • cannot be given by firms which are involved in significant commercial negotiations with the department or are licensed/regulated by it;
  • should be sought in an open and even handed manner between organisations in a particular field, using the appropriate public sector procurement methods to secure the contractual arrangements;
  • must not be an endorsement by Government of the sponsor or its products or services;
  • must not dilute the effectiveness of your website or the message that lies behind it. Sponsors cannot influence, the messages of Government communication in their business area;
  • must not bring adverse publicity to the project;
  • must be of websites and not of individual Ministers or civil servants;
  • does not place a Minister or a Department under an obligation to a sponsor.
Sponsorship of individual amounts, including value-in-kind, of more than £5,000 must be disclosed in Departmental Annual Reports.

To measure the value of in-kind sponsorship, where the sponsor provides goods or services that benefit of the project, Departments should consider the opportunity cost, ie, how much it would have cost the department if it had paid for the support provided. Ongoing costs should also be taken into account for the lifetime of the sponsorship agreement.

Returns to the sponsor must be specified in writing as part of the sponsorship agreement. The agreement should cover, for example, the display of the name of the sponsor or whether there is to be a link to the sponsor’s website.

Credit to a sponsor must never create confusion about branding or your website’s identity.

Credit to a sponsor should only occur on those parts of your web space where the sponsor is directly contributing to its provision. This should be specified in the sponsorship agreement.

Acknowledgement should be concise. A company logo, if used, must not distract from clear branding of your website’s own identity or any government branding. A sponsor’s logo must comply with the universal accessibility and graphics requirements of these guidelines.

A company logo must be seen as appropriate and must not be of a size that is visually or perceived to be visually larger or more important than any official or campaign logo. A link to the sponsor’s own web page is perfectly okay. To retain your audience, you may wish to have it open in a new browser window.

If these guidelines have been followed, then no specific disclaimer for this instance of sponsorship should be necessary. It should be evident that the source of sponsorship is appropriate. It is, however, your responsibility to ensure that this relationship cannot be misinterpreted.

In the case that a disclaimer is necessary to avoid the semblance of an inappropriate relationship with the company, then it should be placed next to the credit line in the same heading level and typeface and on the same page. This is because disclaimers that are a link away from a credit have not in practice proved to be effective at avoiding the appearance of a problem.

It would be useful if the government’s policy on sponsorship is included were the disclaimer information just off the home page together with an assertion that all sponsorship of the site meets these criteria.