MetaCDN - StreamShark

Optimising your website caching behaviour for MetaCDN

Overview

So - you have accelerated your website with MetaCDN, and have followed our integration guide for your web platform. Great! You are already well on your way to delivering a fantastic web experience for your customers. But - did you know you can do more to maximise your benefit from using the MetaCDN platform? By making some small changes on your web server or web application, you can give us 'hints' about what content can be cached, and for how long. This helps us deliver an even better experience for your customers.

Help us help you

tumblr_m3eqrs2pLK1qgo65oo1_250.gif

Websites are becoming increasingly sophisticated and deliver richer experiences every year. This often means the size of the website (specifically the amount of data downloaded) increases also. Fortunately, MetaCDN can cache and accelerate nearly any asset that makes up your website - so it is delivered in a speedy fashion, no matter where in the world your end users are. These assets can range from the webpage itself (i.e. the HTML web code), images (e.g. png, jpg), your stylesheets (e.g. css), scripts (e.g. js), webfonts (e.g. woff, eof, ttf). However, MetaCDN works best when your website (which we refer to as the origin) tells us what can be cached (and for how long), and what cannot be cached (e.g. private pages or logged in sections of your website).

Controlling MetaCDN behaviour with HTTP headers

Your content can be cached at multiple levels between your website and your user. When you access a website on your browser, the content could actually be returned by a cache:

  • at the client's browser,
  • at a proxy (shared) cache, or
  • at a gateway cache (e.g. a content delivery network such as MetaCDN).
cache-levels.png

Caching is generally considered to be a 'good thing' as it can drastically speed up initial and subsequent access to your websites. However care must be taken to ensure the right information is cached for an appropriate amount of time. Thankfully, the HTTP protocol (versions 1.0 and 1.1) allows for headers to be set on responses to HTTP requests, which control how these intermediary caches store assets from your website. Note that setting the below headers is simply good web practice, regardless of whether you are using a CDN or not. When combined with a CDN such as MetaCDN, these settings give you fine grained control regarding what content is cached, and for how long. Below is an example HTTP header that is returned when an asset on a web server is requested:

HTTP/1.1 200 OK
Date: Fri, 08 Feb 2013 03:32:00 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Fri, 27 Jul 2012 01:23:24 GMT
ETag: "680027-6493-4c5c58eed7b00"
Accept-Ranges: bytes
Content-Length: 25747
Cache-Control: max-age=86400, public
Content-Type: image/jpeg

In the following sections we will explain what some of these headers do, and how they relate to improving the cache-ability of your website.

The 'Expires' HTTP header

The Expires HTTP header signals to all caches how long the requested asset is fresh for. If this time has expired, the cache will check with your origin server to see if the asset has changed. Most caches support the Expires headers. However, when both the Cache-Control and Expires headers are present, Cache-Control takes precedence. As such, we recommend using the Cache-Control header (described next) rather than the Expires header.

The 'Cache-Control' HTTP header (recommended)

The Cache-Control header was introduced in HTTP 1.1 to address the many shortcomings of the Expires header. In nearly every instance, the Cache-Control header is recommended to use over the Expires header. The Cache-Control header allows you specify the maximum amount of time that an asset will be considered valid (relative to the time of the request) via the max-age parameter. It also allows you to specify whether the response should be considered cacheable (public) or not (private) by intermediary caches. There are further parameters that control how data is stored and validated. They are described well in an excellent document available here.

The 'Vary' HTTP header

Coming soon.

What happens if I don't set any caching headers?

When your web server returns files that do not have explicit freshness information (like a Cache-Control or Expires header), downstream caches may choose to estimate how fresh it is using other headers, such as the Last-Modified header. For example, if your response was last modified a week ago, a cache might decide to consider the response fresh for a day.

Consider adding a Cache-Control header; otherwise, it may be cached for longer or shorter than you'd like, or not at all.

What happens if I don't want my content cached?

Often you do not wish certain pages or files to be cached. For instance, a logged in section of your site, or a sensitive file. In these instances, it is possible to explicitly tell any caches (and CDNs) that these files should not be cached under any circumstances via the Content-Control header. A cache control header such as the one below would be appropriate in this instance:

Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform

In this instance we are instructing any downstream cache that the content is private, so no shared proxy (such as a CDN) can store it. The no-cache directive maintains freshness of the content by forcing any caches to submit the request to the origin server for validation before releasing a cached copy. The no-store directive instructs caches not to keep a copy of the content under any circumstances. The must-revalidate directive informs caches that they must honor any freshness information you give them about the content, and cannot serve stale content under any circumstances. The proxy-revalidate directive is equivalent to the must-revalidate directive, however it only applies to proxy caches.

Help - I need to purge the cache!

Sometimes you may set a long max-age or Expires value, causing the CDN to cache your page or asset for a long period of time. If you need to make a change in such an instance, you can use the MetaCDN invalidation interface. This is available in our web interface and also via the Web Service API for Sites. However, this option should be considered a last resort. If your pages or assets are changing frequently, we recommend you set an appropriately small max-age value so the assets are regularly refreshed from your website origin.

invalidate.png

How do I set HTTP headers?

Now I have convinced you of the importance of setting the correct HTTP headers on your website, you might be wondering what is the best way to do this. The best way to enable these headers really depends on the architecture of your website. For static websites (or static assets on a dynamic website), the best place to enable these headers is at the web server level. If you are using a dynamic web language (such as PHP, JSP, Ruby or Python) to generate web pages, you may wish to set your HTTP headers in your page generation or web framework code.

Setting headers on your web server

If you are running Apache, the best place to implement this change is in your .htaccess file. If you are running Microsoft IIS, follow this link to find instructions on how to modify the header. An example .htaccess code snipped for Apache is shown below that sets the Cache-Control header. In this particular example, for any files that match specific extensions (in this case; webfonts, images, stylesheets and scripts) is set as publically cache-able for 1 day (86400 seconds):

<FilesMatch "\.(eot|ttf|woff|jpg|png|css|js)$">
   Header set Cache-Control "public, max-age=86400"
</FilesMatch>

In the next .htaccess Apache example, we wish to ensure all dynamic pages (php, cgi and pl) are not cached:

<FilesMatch "\.(php|cgi|pl)$">
   Header set Cache-Control "private, no-cache, no-store, proxy-revalidate, no-transform"
</FilesMatch>

Setting headers on your dynamic web page

If your web application is using dynamic page generation, it is often possible to set the appropriate cache control header. The methods differ from language to language (and often within different web frameworks running on the same language). Below are some simple examples for common web programming languages. 

JSP (Java)

You can set the cache headers in your servlet code. A common approach is to create a Java Filter which is mapped to specific url-patterns that match the assets or pages whose cache control headers you wish to modify.

HttpServletResponse hresp = (HttpServletResponse) response;

// This content will expire in 24 hours.
hresp.setDateHeader("Expires",System.currentTimeMillis() + expires);
// Set standard HTTP/1.1 cache headers.
hresp.setHeader("Cache-Control", "public, max-age=86400");
// Set IE extended HTTP/1.1 no-cache headers (use addHeader).
hresp.addHeader("Cache-Control", "post-check=86400, pre-check=86400");
// Set standard HTTP/1.0 no-cache header.
hres.setHeader("Pragma", "no-cache");

PHP

You can set cache headers in your PHP code. It is essential that the Header() function must come before any other output for the below code snippet to work correctly.

<?php
   Header("Cache-Control: must-revalidate");
   $age = 60 * 60 * 24;
   $CacheControlStr = "Cache-Control: public, max-age=$age";
   Header($CacheControlStr);
?>

PHP + CakePHP

When using PHP combined with the CakePHP framework, the cache control headers can be set as follows:

public function view() {
   ...
   // set the Cache-Control as public for 3600 seconds
   $this->response->sharable(true, 3600);
}

public function my_data() {
   ...
   // set the Cache-Control as private for 3600 seconds
   $this->response->sharable(false, 3600);
}

Further details can be found here.

Python

Coming soon

Ruby

Coming soon

Further reading

The following websites are invaluable resources when it comes to understanding the behaviour of web servers and caches.

CACHING TUTORIAL for Web Authors and Webmasters - Mark Nottingham

Accept-Encoding, It’s Vary important. - Justin Dorfman (NetDNA)

Speed up your site with Caching and cache-control - Charles Torvalds

The Resource Expert Droid helps check HTTP resources for common problems and makes suggestions about how to improve them.

Have more questions? Submit a request