digital-scurf wiki Aranha

Page Contents

A lot of this page comes as a direct cut and paste from [The Brave New World Document] but this wiki page is now definitive.


Repository info


Sub pages of interest


Process Structure

                       Controlling Aranha
                 /--------------^---------------\
                 |      |        |      |       |
	      Worker  Worker  Worker  Worker  Cache

Some bits:

The parent knows each time a worker uses up a state as the parent handles all LISTEN operations, handing out work to the workers.

The socketpair to the parent allows the parent to pass FDs for handling requests and also to allow the child to request services of the parent, including getting requests to the cache, responses from the cache and at times, shared memory handles or IPC segments as appropriate.


The Code Cache

Each worker has its own socketpair with the cache process. It is plausible that in the future this may be a set of caches which have an inter-cache protocol, but for now, this is not the case.

The cache structure looks as follows (pseudo C):

typedef struct {
	uint32 op;
	uint32 keylen;
	uint32 datalen;
	char   key[keylen];
	char   data[datalen];
} cacheop;

typedef struct {
	uint32  result;
	uint32  datalen;
	uint32  data[datalen];
} cacheresponse;

The protocol looks a bit like:

WorkerDirectionCache
GET (key)------->Cache looks up the key and then responds once that entry in the cache can be locked
If no data was found, the worker has the opportunity to fill this entry in the cache.<--------LOCKED (data if available)
WRITE (data)---------->Cache writes the data given to the currently locked entry in the cache and unlocks it.

Also note that the worker can optionally provide a flag in the GET() which says that if the entry does not exist, it does not want a lock, and that if the entry does exist, it wants the data but no lock.

In those circumstances the result codes of NO_DATA and OK_NO_LOCK can be returned.

Other operations include querying the state of the cache with certain special keys. Those can never be locked and a lack of NO_LOCK in the request op will result in a BAD_REQUEST response from the cache.

Since the key namespace of the cache is heirarchical, and the separator of / is provided for, it is also possible to query the contents of the cache.

The following namespaces are defined to exist...

NamespaceContents
/.......Files in the filesystem as they exist on the filesystem (these will be refreshed by stat() calls as well as expired by general LRU etc)
*/......Files in the filesystem whose contents are processed. (these will only be expired by stat() calls, not refreshed)
!.......This namespace is reserved for names internal to the cache protocol and should not be queried directly

Any name not conforming to those names is considered a general cache entry and is subject only to LRU and memory usage expiry rules as defined in the caching policy for the aranha process tree.

A suggested structure for those remaining names is:

organisational-unit.module.[subname.]object

E.g.

aranha.pagecache./foo/bar/wibble.lua

Entries with the 'aranha.' prefix are considered as reserved for the aranha process and for official modules. It is suggested that the Grunt Corporation might use the prefix 'org.grunt.'

The internal cache namespace contains the following pseudo-entries...

Pseudo EntryMeaning
!usageThis returns a Lua table (source string) containing information about the cache usage level. E.g. How full it is, how much RAM it is using, how many processes are using it, how many entries it has, etc.
!contentsThis returns the keys of all of the values known to the cache. This is an expensive request since the entire cache structure will be walked in order to answer your request. Note that newly created entries not yet given values will be listed, but will have a datalength of zero.

Values which can go in a cacheop.op:

CACHEOP_GET                 0x00000001
CACHEOP_WRITE               0x00000002
CACHEOP_DELETE              0x00000003
CACHEOP_NOCREATE            0x01000000
CACHEOP_NOLOCK              0x02000000
CACHEOP_NOWAIT              0x04000000
CACHEOP_NODATA              0x08000000

The last four are flags which can be or'ed in for appropriate effect.

Values available in a cacheresponse.result:

CACHERESULT_OK              0x00000000
CACHERESULT_BADREQUEST      0x00000001
CACHERESULT_WOULDBLOCK      0x00000002
CACHERESULT_NODATA          0x01000000

Again, CACHERESULT_NODATA is a flag which might be or'ed in.

As you can see, in both cases, the top 8 bits of the op or result are flags and should be masked off in order to check the lower bits.


The Module Loader

Currently the loader is very manky indeed -- The intention is to smarten it up, clean it out, rewrite the C-side implementation and make the general file structure more sensible.

In order to support the idea that there are three ways a module can be implemented, the following on-disk structures are assumed...

For each entry in LOADER_PATH

If a <foo>.am file is present for a given UseModule "<foo>" and the corresponding <foo>.so is not, then the <foo>.am file is loaded as a pure Lua file.

If no <foo>.am file is present but the <foo>.so file is, the following pseudo-code is executed on behalf of the non-present .am

   if dlopen("<foo>.so") == okay then
      if( dlsym( "__aranha_import" ) ) then
         if( <foo>.__aranha_import == okay ) then
	    -- Module is loaded
	 else
	    error("Unable to run __aranha_import in <foo>.so");
         end
      else
         error("Unable to find __aranha_import in <foo>.so");
      end
   else
      error( "Unable to open <foo>.so: " .. dlerror() );
   end

If both the <foo>.am and <foo>.so files are present, the <foo>.am file is parsed, then <foo>.so is loaded, passing in the version information found in the <foo>.am file. Following that, an IMPORT function in the <foo>.am file will be called to 'fix up' the lua side of things before returning from the UseModule call.


Other notes

Built into Aranha: