Monday, April 28, 2008

Word renaming (part 2)

Here are the word names that have changed:
  • find* -> find-from
  • find-last* -> find-last-from
  • index* -> index-from
  • last-index* -> last-index-from
  • subset -> filter
New shorthand words:
  • 1 tail -> rest
  • 1 tail-slice -> rest-slice
  • swap compose -> prepose
Changes to existing word behavior:
  • reverse stack effect of assoc-diff, diff
  • before? after? before=? after=? are now generic
  • min, max can compare more objects than before, such as timestamps
  • between? can compare more objects
  • <=> returns symbols
There are several motivations at work here. One is that words named foo* are a variant of foo, but otherwise the * is no help to what the word actually does differently. We're trying to move away from word names with stars to something more meaningful.

Code with common patterns that come up a lot, like 1 tail and swap compose, is more clearly understood if these patterns are given a single name.

Factor's subset is not equivalent to the mathematical definition of subset, so it was renamed to filter to avoid confusion. Along these same lines, diff and assoc-diff are now the more mathematically intuitive; you can think of diff like set subtraction now, seq1 seq2 diff is like seq1 - seq2.

Finally, the "UFO operator" <=> now returns symbols +lt+ +eq+ +gt+ instead of negative, zero, and positive numbers. The before? and after? words can compare anything that defines a method on this operator. Since between?, min, and max
are defined in terms of these comparison words, they also work on more objects.

Please let me know if you have any more suggestions for things words that have awkward argument orders, imprecise names, or if you can suggest alternate names for words with stars in their names.

Monday, April 14, 2008

Word renaming

Several words have been renamed and moved around to make Factor more consistent:
  • new -> new-sequence
  • construct-empty -> new
  • construct-boa -> boa
  • diff -> assoc-diff
  • union -> assoc-union
  • intersect -> assoc-intersect
  • seq-diff -> diff
  • seq-intersect -> intersect
To make things symmetrical, a new word union operates on sequences.

Somehow, seq-diff and seq-intersect were implemented as O(n^2) algorithms. Now, they use hashtables and are O(n).

Lastly, a new vocabulary named ``sets'' contains the set theoretic words, along with a new word unique that converts a sequence to a hash table whose keys and values are the same. An efficient union and intersect are implemented in terms of this word.

Sunday, April 13, 2008

Adding a new primitive

I added two primitives to the Factor VM to allow setting and unsetting of environment variables. It's not that hard to do, but you have to edit several C files in the VM and a couple .factor files in the core.  Really they should not be primitives, so eventually they will be moved into the core.

The primitives that I added are defined as follows:
IN: system
PRIMITIVE: set-os-env ( value key -- )
PRIMITIVE: unset-os-env ( key -- )

Adding a primitive to vm/

Since Factor's datatypes are not the same as C's datatypes, and because of the garbage collector, there are C functions for accessing and manipulating Factor objects. The data conversion functions are not documented yet, so here's a sampling of a few of them:
  • unbox_u16_string() - pop a Factor string off the datastack and return it as a F_CHAR*
  • from_u16_string() - convert a C string to a Factor object
  • REGISTER_C_STRING() - register a C string with Factor's garbage collector
  • UNREGISTER_C_STRING() - unregister a registered C string
  • dpush() - push an object onto the datastack
  • dpop() - pop an object off of the datastack
Registering a C string with the garbage collector is required when VM code calls code that may trigger a garbage collection (gc).  Any call to Factor from the VM might trigger a gc, and if that happened the object could be moved, thus invalidating your C pointer.  When a pointer is unregistered, it's popped from a gc stack with the corrected pointer value.

Here is the call to set-os-env:
DEFINE_PRIMITIVE(set_os_env)
{
F_CHAR *key = unbox_u16_string();
REGISTER_C_STRING(key);
F_CHAR *value = unbox_u16_string();
UNREGISTER_C_STRING(key);
if(!SetEnvironmentVariable(key, value))
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}
The function is defined with a macro DEFINE_PRIMITIVE that takes only the function name.  A corresponding DECLARE_PRIMITIVE goes in run.h as your function declaration.  Not all primitives use these C preprocessor macros, for instance bignums don't because it doesn't improve the performance.  Parameters to your primitive are popped or unboxed off the data stack, so a primitive's declaration expands to:
F_FASTCALL primitive_set_os_env_impl(void);

F_FASTCALL void primitive_set_os_env(CELL word, F_STACK_FRAME *callstack_top) {
save_callstack_top(callstack_top);
primitive_set_os_env_impl();
}
INLINE void primitive_set_os_env_impl(void)
F_FASTCALL is a wrapper around FASTCALL, which on x86 will pass the first two arguments in registers as an optimization.  Note that while it declares that it takes no arguments (void), most primitives will do something to the data stack.

Since unbox_u16_string() allocates memory for the Factor object, it could trigger a gc, so it's registered as a string.  You can also register values using REGISTER_ROOT for cells, REGISTER_BIGNUM for bignums, and REGISTER_UNTAGGED for arrays, words, and other Factor object pointers for which the type is known.  The key string can immediately be unregistered after calling unbox on the next stack value since the rest of the function will not cause a gc.  If the win32 call fails, there's a function general_error() that throws an exception.  In this case, it's an ERROR_IO that calls a helper function to return the Windows error message.

Now that the function is written, you have to add it to the list of primitives in primitives.c.  The important thing is that this list remains in the same order as the list in core/ which you will edit in the next section.  Also, add a prototype to the run.h file.

Adding a primitive to core/

Everything in Factor compiles down to primitives.  Because they are by definition "primitive", the compiler cannot infer the stack effect and argument types. To make a primitive's stack effect "known", edit core/inference/known-words/known-words.factor:
\ set-os-env { string string } { } <effect> set-primitive-effect
The next step is to put your word in the file core/bootstrap/primitives.factor in the same order as in vm/primitives.c.

Sometime in the future there might be a PRIMITIVE: word that will reduce the number of different places to edit to add a primitive. If it used Factor's FFI, you could add a new primitive without even having to bootstrap again.