Sunday, April 13, 2008

Adding a new primitive

I added two primitives to the Factor VM to allow setting and unsetting of environment variables. It's not that hard to do, but you have to edit several C files in the VM and a couple .factor files in the core.  Really they should not be primitives, so eventually they will be moved into the core.

The primitives that I added are defined as follows:
IN: system
PRIMITIVE: set-os-env ( value key -- )
PRIMITIVE: unset-os-env ( key -- )

Adding a primitive to vm/

Since Factor's datatypes are not the same as C's datatypes, and because of the garbage collector, there are C functions for accessing and manipulating Factor objects. The data conversion functions are not documented yet, so here's a sampling of a few of them:
  • unbox_u16_string() - pop a Factor string off the datastack and return it as a F_CHAR*
  • from_u16_string() - convert a C string to a Factor object
  • REGISTER_C_STRING() - register a C string with Factor's garbage collector
  • UNREGISTER_C_STRING() - unregister a registered C string
  • dpush() - push an object onto the datastack
  • dpop() - pop an object off of the datastack
Registering a C string with the garbage collector is required when VM code calls code that may trigger a garbage collection (gc).  Any call to Factor from the VM might trigger a gc, and if that happened the object could be moved, thus invalidating your C pointer.  When a pointer is unregistered, it's popped from a gc stack with the corrected pointer value.

Here is the call to set-os-env:
DEFINE_PRIMITIVE(set_os_env)
{
F_CHAR *key = unbox_u16_string();
REGISTER_C_STRING(key);
F_CHAR *value = unbox_u16_string();
UNREGISTER_C_STRING(key);
if(!SetEnvironmentVariable(key, value))
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}
The function is defined with a macro DEFINE_PRIMITIVE that takes only the function name.  A corresponding DECLARE_PRIMITIVE goes in run.h as your function declaration.  Not all primitives use these C preprocessor macros, for instance bignums don't because it doesn't improve the performance.  Parameters to your primitive are popped or unboxed off the data stack, so a primitive's declaration expands to:
F_FASTCALL primitive_set_os_env_impl(void);

F_FASTCALL void primitive_set_os_env(CELL word, F_STACK_FRAME *callstack_top) {
save_callstack_top(callstack_top);
primitive_set_os_env_impl();
}
INLINE void primitive_set_os_env_impl(void)
F_FASTCALL is a wrapper around FASTCALL, which on x86 will pass the first two arguments in registers as an optimization.  Note that while it declares that it takes no arguments (void), most primitives will do something to the data stack.

Since unbox_u16_string() allocates memory for the Factor object, it could trigger a gc, so it's registered as a string.  You can also register values using REGISTER_ROOT for cells, REGISTER_BIGNUM for bignums, and REGISTER_UNTAGGED for arrays, words, and other Factor object pointers for which the type is known.  The key string can immediately be unregistered after calling unbox on the next stack value since the rest of the function will not cause a gc.  If the win32 call fails, there's a function general_error() that throws an exception.  In this case, it's an ERROR_IO that calls a helper function to return the Windows error message.

Now that the function is written, you have to add it to the list of primitives in primitives.c.  The important thing is that this list remains in the same order as the list in core/ which you will edit in the next section.  Also, add a prototype to the run.h file.

Adding a primitive to core/

Everything in Factor compiles down to primitives.  Because they are by definition "primitive", the compiler cannot infer the stack effect and argument types. To make a primitive's stack effect "known", edit core/inference/known-words/known-words.factor:
\ set-os-env { string string } { } <effect> set-primitive-effect
The next step is to put your word in the file core/bootstrap/primitives.factor in the same order as in vm/primitives.c.

Sometime in the future there might be a PRIMITIVE: word that will reduce the number of different places to edit to add a primitive. If it used Factor's FFI, you could add a new primitive without even having to bootstrap again.

No comments: