Proposed Guideline

WD-countmethod-19990208

Internet Advertising Banner Counting Methodology

Working Draft WD-countmethod-19990208

Latest version:
http://www.basswoodassoc.com/standards/WD-countmethod.html
This version:
http://www.basswoodassoc.com/standards/WD-countmethod-19990208.html
Author:
Tom Shields <tom.shields@basswoodassoc.com>

Status of this document

This is a Working Draft for review by IAB members and other interested parties.  The IAB has NOT approved this draft for implementation; this draft is likely to change somewhat before approval.  Anyone using this as a basis for implementation deserves what they get.

Abstract

A standard, practical methodology for counting Internet banner ad impressions and clicks is presented.  The methodology is designed such that two compliant implementations will generate basic impression and basic click counts that differ by less than 5%.  The methodology is implementable by content sites, content networks, ad networks, advertisers, agencies, and audit firms in order to achieve comparable counts.

Introduction

Internet advertisers often place similar ad buys across multiple web sites or advertising networks, and they require the ability to compare the results of those buys in order to evaluate the value they have received.  Current Internet technology does not permit perfect accuracy in ad measurement, so ad delivery systems must use approximate methods of counting.  Because there is no standard way to count, the performance numbers reported by different sites and networks are often measured using different methods, making comparisons between them invalid.  Browser and proxy server issues often make even similar counting methods result in wildly differing numbers.  Nevertheless, advertisers want reports with numbers that are comparable—advertisers want to compare apples to apples, not apples to oranges.

The counting methodology has the following design goals:

There are two basic methods for ad counting in use on the majority of the Internet today: ad requests (sometimes also called ad insertions), and ad downloads.  Ad requests refer to the method of counting an ad impression when a page containing the ad HTML is requested.  The ad download method counts an ad impression when the ad media (in this case, an image) is requested from the server.  These two basic methods count at very different points in the communication channel between the browser and the server, and therefore produce different results.  Most ad networks count using a variation of the ad download metric; it is often not technically possible for them to count ad requests because they have no access to the server that serves the pages containing their ads.  Ad-supported web sites use variations on either metric.  For the many sites that use variations of the ad request method, however, it should be technically possible for them to switch to an ad download metric, although it may require additional development and hardware resources.  For this reason, the basic impression methodology described below is based on an ad download method.

The IAB MMTF has already published definitions of such terms as "ad requests" and "ad clicks" [IAB97].  These definitions have been invaluable to ensure that companies using the defined concepts and metrics for advertising are consistent.  That document also states: "It is important to note that for true comparability to exist, we need to define both the concepts and the metrics themselves as well as the methodology sites should use to generate those metrics."[IAB97]  This document is an attempt to define a standard methodology, so the resulting reports can be truly comparable.  This document also attempts to address the objections raised in the Q&A section titled "Why are images not a comparable measure?"[IAB97].  In particular, by defining the methodology clearly, we hope to mitigate the effects of caching and other "environmental" factors on the comparability of the counts, resulting in a measure that is both implementable by ad networks, and potentially more accurate than ad requests.

There have been many incompletely defined terms used to describe measurement methodologies to date, including "impressions", "requests", "downloads", "insertions", "views", "exposures", etc.  To distinguish this methodology from others, counts performed according to this method should be labelled "basic impressions" and "basic clicks".

For simplicity, this guideline only addresses measuring ads that use or include clickable image media, including GIFs, animated GIFs, and JPGs.  These ads constitute the vast majority of the advertising on the Internet today.  No attempt is made in this document to define a methodology that can measure HTML ads such as banner forms (except to the extent that banner images within them can be measured), Java ads, Javascript ads, or embeddable media such as Shockwave, nor do we intend to measure HTML ads delivered via IFRAME or ILAYER tags; these will be defined in a later revision.  Further, this guideline does not attempt to account for client-side counting by offline browsers, or filtering of hits from non-human browsers such as spiders and robots.  Finally, this guideline does not define the user action required to measure a basic impression; in particular, it makes no distinction between a basic impression as a result of a user-initiated event or one resulting from a timed refresh.

Methodology

We will define an ad counter as a program that responds to browser requests related to advertising.  For the purposes of this document, these requests will only include IMG SRC requests for ad media, and A HREF requests for ad clicks.

A valid basic impression may only be counted when an ad counter receives and responds to a request for an ad image from a browser.  This image request must be the result of an IMG tag in the page HTML.  In response, the ad counter must return a location redirect, specifying the location of a file or other program that will deliver the image media.  The location redirect must take the following form:

302 Moved Temporarily
Location: http://www.site.com/ad.gif
Pragma: no-cache
Cache-Control: no-cache

A valid basic click may only be recorded when an ad counter receives and responds to a click request from a browser.  This click request must be the result of a user clicking on an Anchor tag in the page HTML.  In response, the ad counter must  return a location redirect, specifying the location of the destination for the ad.  The location redirect must take the following form:

302 Moved Temporarily
Location: http://www.advertiser.com/index.html
Pragma: no-cache
Cache-Control: no-cache

The response in both cases must be a location redirect - implementations which respond with a valid page or a status code other than 302 will NOT count a basic impression or basic click.  The response may NOT contain the Last-Modified header, as it may encourage caching.  Other headers may be used if desired.  Note that the IMG SRC URL MUST be unique across page requests.

The URLs that are used to make the IMG SRC and A HREF requests may take the form of any valid URL (see [RFC1738]).  However, the IMG SRC URL MUST be unique for each page request by a single browser, in order to prevent browser caching.  The URLs that are then redirected to may be any valid URL.  In many cases, the ad counter functionality will be included in a more full-featured ad server, which chooses the appropriate image, or determines the correct ad destination, and may require additional parameters as part of each URL.

The server that responds to the basic click must also respond to a request for "/robots.txt", in the manner described in [KOST98].  This response must configure the server to disallow robots from clicking on ads.  An example "/robots.txt" file follows:

User-agent: *
Disallow: /

Commentary

This counting methodology is both accurate and makes efficient use of Internet resources.  Conceptually, a small control request must go "end-to-end" from the browser to the origin server, in order to ensure accurate counting (and, possibly, to select the correct ad).  The ad media, however, may come from a cache as close to the browser as possible.

The methodology requires the ability to defeat caching on a location redirect, in order to count accurately and efficiently.  There are several ways that caching can occur, most commonly either in the browser or in an intermediate proxy server.  There are also several mechanisms for defeating caching, including response headers, and URL construction techniques.  These and other issues are discussed in this section.

The mechanism chosen here to defeat proxy caching is to use the headers "Pragma", and "Cache-Control".  These methods should defeat most proxy caches that incorrectly consider a 302 to be cacheable.  In addition, the omission of the "Last-Modified" header should help prevent caching.  The "Set-Cookie" header also prevents proxy caches from storing documents, but it has social implications that make its use often undesirable, therefore it is not required.

To increase accuracy at some expense of simplicity, this guideline requires the IMG SRC URL to be unique across page requests by a single browser.  This is the only known consistent method of defeating browser caching of images.

One simple method for ensuring IMG SRC URL uniqueness is to insert the current time with seconds, or a sufficiently large random number, in the IMG SRC URL as the page is delivered to the browser.  Another popular method, which does not require server involvement, is to use Javascript to construct a unique URL - although not all browsers support Javascript, in practice this method generates basic impression counts that are comparable with server URL modification methods.  Other methods involve server side includes, or other client-side scripting.  It is not sufficient to ensure that IMG SRC URLs on different pages are different, because a single browser reloading the same page should generate multiple basic impressions.  Note that the server side modifications occur at the originating site that delivers the page, which may not be the site hosting the ad counter. Practically, many ad server systems require this unique identifier as part of both the IMG SRC and the A HREF URLs, to link the image served with the appropriate clickthrough without using cookies.

There have been reports of browser bugs causing incompatibilities with this counting methodology.  The most common problem is that in many versions of Netscape, animated GIFs set to rotate continuously will only rotate once, and then stop.  This problem has been fixed in the latest versions, and should become less important as the browser population upgrades.  Some versions of Netscape may also continuously re-request animated GIFs if the "Expires" header is used.  For this reason, we do not include the "Expires" header as part of the cache-defeating mechanism.

The methodology has been designed to handle multiple cascading ad counters.  For example, a small site may accept local advertising, but send the rest of its inventory to a larger ad network.  The network, in turn, may accept advertising from a large advertiser who serves their own ads, or uses a third party ad server.  In this case there are at least three ad counters that will be involved in each ad delivery.  This methodology has been designed so that the basic impression and basic click numbers produced by each counter should be within 5%.

Many sites are already performing ad measurement according to their own methodology, and may have contracts with advertisers based on that method.  The implementation period for adoption of this guideline will need to include time to benchmark the variances which may occur between different ad counters.

The HTTP 1.1 standard [RFC2068] recommends that servers responding to a request with a status 302 (location redirect) include an entity body that contains a short hypertext note with a hyperlink to the new URL.  For maximum efficiency, and because nearly all browsers will automatically follow a 302, this method does not encourage an entity body in the 302 response.  A "Content-Length: 0" header may be included, so browsers do not wait for the entity body to be transmitted, but it is not clear that this will actually improve performance, so this is not required.

As noted in the methodology, basic impressions may only be counted when an ad counter receives a request for an ad image from a browser, even though the measurement might be slightly more accurate if the counting were performed after the redirect was known to have transmitted successfully.  This is much more difficult to implement with today's web servers, however, and would probably result in fewer sites counting this way.  Also, because ad counters typically deal with requests in milliseconds, and the redirect is a relatively small response, the likelihood of successful transmission is high.

This methodology requires the use of "robots.txt", which will ensure that ad clicks are suppressed from well-behaved spiders, but does not suppress activity from ill-behaved spiders, nor from link validation utilities and offline readers.  Some counting methods attempt to use filters or timing to remove impressions caused by these other influences.  However, most spiders and robots don't request images, so few basic impressions should be actually recorded.  Also, requiring counting agents to filter against a potentially constantly changing list of known ill-behaved robots and spiders, or to set arbitrary limits on IP address click frequency, violates the simplicity constraint.  It is permissable, however, to use a unique identifier to mark each ad request and disallow multiple clicks per request identifier.

Example

This example uses a hypothetical ad counter that is implemented as a CGI to a standard web server.  The following is a sample page containing an ad that will be counted by the ad counter.  Note the UTC parameter in the image URL - this is an example of how to make the image URL unique across browser requests.

<HTML><HEAD><TITLE>Ad Tester</TITLE></HEAD><BODY>
<H1>Ad Tester</H1>
<A HREF="http://ad.counter.com/cgi-bin/adcounter.cgi?DEST=http%3A%2F%2Fwww.advertiser.com%2Findex.html">
<IMG SRC="http://ad.counter.com/cgi-bin/adcounter.cgi?IMAGE=http%3A%2F%2Fwww.site.com%2Fad.gif&UTC=908229511"></A>
</BODY></HTML>

When this page is downloaded by a browser, the browser will make a subsequent request to "adcounter.cgi?IMAGE" (assuming images are turned on).  This hypothetical adcounter.cgi program would retrieve the correct image URL (in this case, from the query string), record a basic impression, and issue the following response:

Status: 302 Moved Temporarily
Location: http://www.site.com/ad.gif
Pragma: no-cache
Cache-Control: no-cache

The browser will receive this response, and automatically retrieve "ad.gif" and display it.  Note that "ad.gif" may in fact be retrieved from an intermediate proxy cache, or even the browser's own cache.  However, because the redirect has been marked non-cacheable, the next browser to retrieve the page will also make a call through to adcounter.cgi.

When the user clicks on the ad, the browser will make a request to "adcounter.cgi?DEST".  In this case, the ad counter will retrieve the appropriate destination URL (in this case, from the query string again), record a basic click, and issue the following response:

Status: 302 Moved Temporarily
Location: http://www.advertiser.com/index.html
Pragma: no-cache
Cache-Control: no-cache

Again, the browser will receive the response and retrieve the proper destination page.  Because the location redirect defeats caching, click counting is not subject to cache discrepancies.

Note that multiple ad counters may be involved in counting basic impressions or basic clicks, in particular when a third party ad server is delivering advertising on behalf of an advertiser.  To ensure comparability it is important for all ad counters involved to adopt the same methodology.  The first ad counter, for example, may redirect to another ad counter with a response like this:

Status: 302 Moved Temporarily
Location: http://www.othercounter.com/cgi-bin/adcounter.cgi?IMAGE=http%3A%2F%2Fwww.site.com%2Fad.gif&UTC=908229511
Pragma: no-cache
Cache-Control: no-cache

Acknowledgements

Thanks to the MMTF for their invaluable work on Metrics and Methodology [IAB97], and to the IAB for providing the infrastructure to refine and agree upon guideline proposals such as this one.  Thanks in particular to Paul Hart and Jim Jones, for their technical work on guideline proposals leading up to this one.  Thanks also to Mike Griffiths and David Zinman, for their specific feedback on this proposal.  Finally, thanks to Kate Everett-Thorpe and Rich LeFurgy, for their persistence in pushing this proposal through the standards process.

References

[IAB97]
IAB Media Measurement Task Force, Metrics and Methodology for Internet Advertising, June 1997.
[RFC2068]
R. Fielding, et al, Hypertext Transfer Protocol -- HTTP/1.1, January 1997
[RFC1738]
T. Berners-Lee, L. Masinter, Uniform Resource Locators (URL), December 1994
[KOST98]
Martijn Koster <m.koster@webcrawler.com>, Robots Exclusion
$Id: WD-countmethod.html,v 1.4 1999/06/23 00:42:56 ts Exp $