Sunday, April 19, 2009

Evaluation Website - Not the full picture

You should be aware that there are limitations to the information that can be discovered from the analysis of Web server log files .The principal issues are:
  • Most ISPs use dynamic IP addressing. This means they maintain a pool of IP addresses from which an IP number is ‘loaned out’ to each dial-up call for the duration of the call. A particular IP number will therefore be used by many different users and a particular user may appear at your website with many different IP numbers. The firewalls used at the interface between the Internet and corporate networks typically use a process named Network Address Translation (NAT) which has a similar effect. Firewalls also often use a process named Port Address Translation (PAT). With PAT, many users behind the firewall ‘share’ a single Internet IP number. The result of all this is that a specific IP number only rarely corresponds to a specific user and it is inappropriate to attempt to base estimates of the number of visitors to your website on a count of the different IP numbers found in server log files alone.
  • Caches - almost all ISPs and many corporate users deploy 'perimeter caches' to conserve their Internet connection bandwidth and improve the speed with which web pages can be served to their users. These are often set up to work ‘transparently’ regardless of whether users have configured their browser’s cache settings. . Perimeter caches work by storing a copy of pages fetched by the client systems on whose behalf they are deployed. Subsequent requests for pages from other users behind a cache will be served from the cache if it already has a copy of the page. This may be done without any further reference to the origin server. Therefore web pages may be served to users without the creation of any record being captured in the origin server's log file.
  • Dynamic proxies - dynamic IP addressing and perimeter cacheing make the identification of page requests from specific users uncertain. This uncertainty is further compounded by the fact that some organisations assign proxy devices such as perimeter caches dynamically during the course of a user’s Internet session. The result is that a sequence of page requests that is in fact from a single user may appear to come from several users even during the course of a single visit or session. AOL is an example of an organisation that uses dynamic proxying.
  • Cookie manipulation - users can delete, or otherwise manipulate cookies stored by their browsers. Browsers can convert persistent cookies to session cookies. Cookies cannot therefore be relied upon as the basis for accurately measuring the number of users of a website or for identifying users that revisit a website.
  • Browsers - some browsers are known to incorrectly identify the referring URL by indicating the previous page that the client was viewing even if the user recalled a bookmarked URL or typed a URL in to their browser’ as opposed to following a link on the displayed page.
  • Anonymisers - some clients use 'anonymisers' which deliberately send false browser and referrer data.
All of these issues mean that there have to be reservations concerning the reliability of estimates derived from standard web server logs of the number of users of a website or of their browsing behaviour when they visit a website. The Internet advertising industry develops and promotes standard website traffic metrics and methodologies for calculating them. It is recognised that the measurements are flawed for the reasons outlined above, however, it is believed that the metrics provide the basis for comparing one website's usage with another on the basis that these issues will affect all websites to broadly the same extent. There is, however, no sound basis for this belief.

The Joint Industry Committee for Web Standards in the UK and Ireland

JICWEBS is the body created by the UK and Ireland media industry whose aim is to ensure independent development and ownership of standards for measuring use and effectiveness of advertising on electronic media.

The International Federation of Audit Bureaux of Circulations

The IFABC Web Standards Committee promotes similar aims on a worldwide basis.
  • www.jicwebs.org
  • www.ifabc.org
User agent masquerading

The term 'user agent masquerading' refers to browsers that transmit an incorrect browser identification string in the requests that they send to servers. Some browsers just do not properly identify themselves and are therefore not being identified in server log file records. Deliberate masquerading is also used for a number of reasons:
  • Some websites alter the content they serve based on the browser identification string, so masquerading can be used to work-around this.
  • Some websites reject requests from browsers that they are not intended to work with, so masquerading can be used to work-around this.
  • Some users simply wish to remain as anonymous as possible.

0 comments: