Outline of functionality
The basic idea is that all content generated would be cached so it wouldn't have to be generated again until it is changed. No web development platforms have anything like this at the moment. Aranha should provide a module by default to achieve this functionality.
Why would you want to do this?
- To avoid the Slashdot effect - dynamic pages should "feel" almost as fast as static pages if unchanged
- To reduce CPU load - there's no good reason to regenerate pages every time if they haven't changed
- To utilise the correct HTTP headers to make the content more cachable (Cache-Control, Last-Modified, etc.)
Implementation
Page must generally must be loaded to decide whether they need to be regenerated or not. Ideally the process should be as transparent as possible to allow programmers to use it without having to think too much, though still allowing them fine-grained control when needed.
Interface
The interface would only need to have one function, which would allow the current script to give information about itself to the caching system - I assume this function will be called cache.setinfo(). When called, this function would determine how to cache the content, depending in its paramaters:
- If the page has never been cached before, the caching system diverts the content and saves it, along with the information
cache.setinfo()has passed and any HTTP headers the script had generated. - If the page has been cached before, the cache checks if the cached version has the same details as those passed by the script.
- If it has, then the cache system terminates the current script and responds with the cached version.
- It it hasn't, then the page is treated as if it has never been cached before, and saved along with its headers.
For fairly static pages, all that would need to be passed to cache.setinfo() is the last modified date.
For dynamic pages (ie. ones which take parameters), a call to cache.setinfo() would also contain a table of any parameters which change from one page load to the next; this should be a unique identifier for that page. This allows the cache to keep track of the difference between pages with different paramaters, such as /page.lua?q=1 and /page.lua?q=2.
You should also be able to disable the cache on a page-by-page basis.
Concerns
Because the cache is shared amongst worker processes for an aranha instance, it should be able to deal with conflicts - i.e. multiple subprocesses regenerating the page at the same time.
The cache module would have to be very early in the module chain (i.e. before the templating module, etc.) so that it can take advantage of the predicted module short-circuiting system to allow modules to rollback divert stack changes to a state consistent with their prepage hook and then jump directly to their postpage hook.
Comments
Please leave comments here so they can be addressed and factored into the main content.