Sunday, October 19, 2008

Moving primitives out of the Factor VM: Factor vs C

In a previous lifetime, I added environment variable primitives to the VM. Well, it turns out that a better place for them is in the Factor basis vocabulary root, so this post is about moving them again.

To move a primitive out of the VM, implement its functionality in Factor code and replace usages with your word if necessary, remove it from vm/primitives.c and core/bootstrap/primitives.factor, remove the primitive code from the VM, make a new image, recompile Factor, and bootstrap. Basically, do the inverse of the previous post.

What's more interesting, unless you have to actually remove a primitive someday and need this reference, is comparing the code for each version side-by-side. The C code has to call functions to prevent the garbage collector from moving data around when it shouldn't, but it's written without the usual ifdefs you will find in most cross-platform C code, so overall it's fairly clean. The Factor version has a high-level protocol that is implemented by both backends across separate files, with one-liners for most of the Unix definitions and high-level combinators for the Windows ones. I find the Factor version much easier to understand and I believe it's more maintainable. Factor is a better C than C.

High-level environment variable interface

Factor's high-level environment variable words let you get a single variable or all of them, set a single variable or all of them, and unset a variable. On Windows you cannot set all of the variables at once, and on Windows CE the whole concept of environment variables does not exist.

Here is the code for the main vocabulary. Notice that there are hooks on the os word, which will be a value like macosx or winnt or linux. The boilerplate at the bottom is for loading the platform-specific code.

USING: assocs combinators kernel sequences splitting system
vocabs.loader ;
IN: environment

HOOK: os-env os ( key -- value )

HOOK: set-os-env os ( value key -- )

HOOK: unset-os-env os ( key -- )

HOOK: (os-envs) os ( -- seq )

HOOK: (set-os-envs) os ( seq -- )

: os-envs ( -- assoc )
(os-envs) [ "=" split1 ] H{ } map>assoc ;

: set-os-envs ( assoc -- )
[ "=" swap 3append ] { } assoc>map (set-os-envs) ;

{
{ [ os unix? ] [ "environment.unix" require ] }
{ [ os winnt? ] [ "environment.winnt" require ] }
{ [ os wince? ] [ ] }
} cond

Unix environment variables, before and after

DEFINE_PRIMITIVE(os_env)
{
char *name = unbox_char_string();
char *value = getenv(name);
if(value == NULL)
dpush(F);
else
box_char_string(value);
}

DEFINE_PRIMITIVE(os_envs)
{
GROWABLE_ARRAY(result);
REGISTER_ROOT(result);
char **env = environ;

while(*env)
{
CELL string = tag_object(from_char_string(*env));
GROWABLE_ARRAY_ADD(result,string);
env++;
}

UNREGISTER_ROOT(result);
GROWABLE_ARRAY_TRIM(result);
dpush(result);
}

DEFINE_PRIMITIVE(set_os_env)
{
char *key = unbox_char_string();
REGISTER_C_STRING(key);
char *value = unbox_char_string();
UNREGISTER_C_STRING(key);
setenv(key, value, 1);
}

DEFINE_PRIMITIVE(unset_os_env)
{
char *key = unbox_char_string();
unsetenv(key);
}

DEFINE_PRIMITIVE(set_os_envs)
{
F_ARRAY *array = untag_array(dpop());
CELL size = array_capacity(array);

/* Memory leak */
char **env = calloc(size + 1,sizeof(CELL));

CELL i;
for(i = 0; i < size; i++)
{
F_STRING *string = untag_string(array_nth(array,i));
CELL length = to_fixnum(string->length);

char *chars = malloc(length + 1);
char_string_to_memory(string,chars);
chars[length] = '\0';
env[i] = chars;
}

environ = env;
}
Factor
USING: alien alien.c-types alien.strings alien.syntax kernel
layouts sequences system unix environment io.encodings.utf8
unix.utilities vocabs.loader combinators alien.accessors ;
IN: environment.unix

HOOK: environ os ( -- void* )

M: unix environ ( -- void* ) "environ" f dlsym ;

M: unix os-env ( key -- value ) getenv ;

M: unix set-os-env ( value key -- ) swap 1 setenv io-error ;

M: unix unset-os-env ( key -- ) unsetenv io-error ;

M: unix (os-envs) ( -- seq )
environ *void* utf8 alien>strings ;

: set-void* ( value alien -- ) 0 set-alien-cell ;

M: unix (set-os-envs) ( seq -- )
utf8 strings>alien malloc-byte-array environ set-void* ;

os {
{ macosx [ "environment.unix.macosx" require ] }
[ drop ]
} case

MacOSX environment variables, before and after

On OSX, we have to use a function to access the environment variable.

#ifndef environ
extern char ***_NSGetEnviron(void);
#define environ (*_NSGetEnviron())
#endif
Factor
USING: alien.syntax system environment.unix ;
IN: environment.unix.macosx

FUNCTION: void* _NSGetEnviron ( ) ;

M: macosx environ _NSGetEnviron ;

Windows NT environment variables, before and after

Draw your own conclusions.

DEFINE_PRIMITIVE(os_env) 
{
F_CHAR *key = unbox_u16_string();
F_CHAR *value = safe_malloc(MAX_UNICODE_PATH * 2);
int ret;
ret = GetEnvironmentVariable(key, value, MAX_UNICODE_PATH * 2);
if(ret == 0)
dpush(F);
else
dpush(tag_object(from_u16_string(value)));
free(value);
}

DEFINE_PRIMITIVE(os_envs)
{
GROWABLE_ARRAY(result);
REGISTER_ROOT(result);

TCHAR *env = GetEnvironmentStrings();
TCHAR *finger = env;

for(;;)
{
TCHAR *scan = finger;
while(*scan != '\0')
scan++;
if(scan == finger)
break;

CELL string = tag_object(from_u16_string(finger));
GROWABLE_ARRAY_ADD(result,string);

finger = scan + 1;
}

FreeEnvironmentStrings(env);

UNREGISTER_ROOT(result);
GROWABLE_ARRAY_TRIM(result);
dpush(result);
}

DEFINE_PRIMITIVE(set_os_env)
{
F_CHAR *key = unbox_u16_string();
REGISTER_C_STRING(key);
F_CHAR *value = unbox_u16_string();
UNREGISTER_C_STRING(key);
if(!SetEnvironmentVariable(key, value))
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}

DEFINE_PRIMITIVE(unset_os_env)
{
if(!SetEnvironmentVariable(unbox_u16_string(), NULL)
&& GetLastError() != ERROR_ENVVAR_NOT_FOUND)
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}

DEFINE_PRIMITIVE(set_os_envs)
{
not_implemented_error();
}
Factor
USING: alien.strings fry io.encodings.utf16 kernel
splitting windows windows.kernel32 ;
IN: environment.winnt

M: winnt os-env ( key -- value )
MAX_UNICODE_PATH "TCHAR" <c-array>
[ dup length GetEnvironmentVariable ] keep over 0 = [
2drop f
] [
nip utf16n alien>string
] if ;

M: winnt set-os-env ( value key -- )
swap SetEnvironmentVariable win32-error=0/f ;

M: winnt unset-os-env ( key -- )
f SetEnvironmentVariable 0 = [
GetLastError ERROR_ENVVAR_NOT_FOUND =
[ win32-error ] unless
] when ;

M: winnt (os-envs) ( -- seq )
GetEnvironmentStrings [
<memory-stream> [
utf16n decode-input
[ "\0" read-until drop dup empty? not ]
[ ] [ drop ] produce
] with-input-stream*
] [ FreeEnvironmentStrings win32-error=0/f ] bi ;

Saturday, October 18, 2008

Introducing the Factor database library

Background

One of the first libraries I wrote in Factor was a binding to PostgreSQL library in May 2005. Factor makes it incredibly easy to bind to C functions -- just add a FUNCTION: declaration and call the C function. For example,

( scratchpad ) FUNCTION: int getuid ( ) ;
( scratchpad ) getuid .
0
At about the same time, Chris Double wrote a SQLite binding and a high-level library he called tuple-db. It had some good ideas, such as query-by-example and prepared SQL statements for better security.

Earlier this year, I took both of these libraries and merged them to come up with the the current database library. The supported backends are PostgreSQL and SQLite and there are protocols for implementing other backends with relative ease. Contributions are welcome.

Overview

Briefly, Factor tuples correspond one-to-one to tables in the database through a mapping defined with the define-persistent word. Tuples are then manipulated through insert, update, delete, and select words. Upon calling the insert word, all of the filled-in slots in a tuple will be saved to the database. The process is reversible, and by filling in slots for a tuple and calling select-tuples, the libary generates a select statement from the filled-in slots (query-by-example) and returns tuples in a sequence. More advanced queries and lower level raw SQL statements are possible as well.

First, we'll look at how to connect to a database.

Connecting to a database

Every database has its own setup which Factor encapsultes in a tuple. SQLite just needs a path to a file on disk, but since PostgreSQL has a server/client model, the setup is more complex. After making your database tuple, connecting works the same for any database by using the with-db word. This word takes a quotation (a block of code) and your tuple object, then calls the database open routine and your quotation. After running your quotation, the database is closed, even if your code throws an exception. Having an interface like this means you can simply swap out your SQLite database for a networked PostgreSQL one if you suddenly need more scalability or networked database access.

You should generally make a custom combinator with your project's connection information. Here are a couple of examples.

SQLite example combinator:

USING: db.sqlite db io.files ;
: with-sqlite-db ( quot -- )
"my-database.db" temp-file <sqlite-db> swap with-db ; inline

PostgreSQL example combinator:

USING: db.postgresql db ;
: with-postgresql-db ( quot -- )
<postgresql-db>
"localhost" >>host
5432 >>port
"user" >>username
"seeecrets?" >>password
"factor-test" >>database
swap with-db ; inline
To make sure your database connection works, you can test it with an empty quotation:
 [ ] with-postgresql-db 

If that line of code doesn't throw an error, you can safely assume connecting to the database worked.

Defining persistent tuples

The highest-level database abstraction Factor offers relates tuples directly to database tables. Tuples must map one-to-one to a database table, but foreign keys to other tables are allowed.

Primary keys

Each tuple needs a primary key for indexing by the database. The primary key can be an increasing integer assigned by the database (a +db-assigned-id+), an object assigned by the user (a +user-defined-id+), a random number, or even a compound key consisting of multiple values used together as a key.

Database defined primary keys are automatically set on the tuple after insertion. While SQLite makes this feature easy to implement with the sqlite3_last_insert_rowid library call, PostgreSQL lacks such a feature and instead, in the backend, the inserts are done through a SQL function that queries the most recently inserted row.

User-assigned ids can be anything that the programmer knows will be unique for each object in the table. The same is true for compound keys, but only one of the values has to differ in this case for it to be unique.

Sometimes, a randomly generated id is useful, and the database library makes it easy to associate a tuple with its random key. By default, it generates a 64-bit random integer and in the unlikely event that it collides with an existing entry, it tries again up to ten times. The random integer is then set in the primary key slot of the tuple.

Database types

Data should have a type to allow the database to optimize its queries and storage. The supported data types are booleans, varchars, text, integers, big-integers, doubles, reals, timestamps, byte-arrays, URLs, and Factor blobs, which store arbitrary Factor objects. Types can have default values, unique qualifiers, and not-null restrictions. The database framework does all of the marshalling of objects for you transparently.

Creating and dropping tables

There are quite a few options when it comes to creating tables. The most basic is the create-table word. The problem is that it throws an exception when the table exists, which happens often. So one alternative is ensure-table -- we make sure a table exists and silently ignore errors. However, if we're just developing an application and we want to make sure the table is always the latest version of the code, we might use recreate-table, which drops and creates the table without throwing errors, and of course drops all the data with it. To simply drop a table, use the drop-table word.

Inserting tuples

Tuples are inserted one at a time with the insert-tuple word. SQL insert commands are generated directly from the Factor object and its filled-in slots. An example will follow.

Selecting tuples

Select statements are generated from exemplar tuples, or tuples with certain slots filled in with any value besides f. Passing an empty exemplar tuple will select all tuples from the table and return them as a sequence. A useful feature we support is querying by ranges, sequences, or intervals, as shown in following demonstration.

Exams demo

About the simplest example of interest is one that matches students names with their grades on a particular exam. Here's aa exam tuple, its mapping to the database, and a utility word to generate random exam objects.

USING: math.ranges random db.types ;

TUPLE: exam id name score ;

exam "EXAM" {
{ "id" "ID" +db-assigned-id+ }
{ "name" "NAME" TEXT }
{ "score" "SCORE" INTEGER }
} define-persistent

: random-exam ( -- exam )
f
6 [ CHAR: a CHAR: z [a,b] random ] replicate >string
100 [0,b] random
exam boa ;

I'll use a trick for opening a database to use it interactively without putting a with-db wrapper around every call.

"my-database.db" temp-file <sqlite-db> db-open db set

Note that the database handle should be cleaned up with db get dispose if you don't want to leak memory.

Let's create the table and add some random exams to the database:

exam create-table

25 [ random-exam insert-tuple ] times

Now to demonstrate selects. You can select all the exams:

T{ exam } select-tuples .
{
T{ exam { id 1 } { name "qklkzk" } { score 38 } }
T{ exam { id 2 } { name "feeuwv" } { score 38 } }
T{ exam { id 3 } { name "hlzwuu" } { score 51 } }
T{ exam { id 4 } { name "liiptp" } { score 52 } }
T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
...
}

Ranges can be used with any datatype that makes sense, like integers or timestamps.

A range of passing exams:

T{ exam } 70 100 [a,b] >>score select-tuples .
{
T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 11 } { name "rorbyd" } { score 83 } }
T{ exam { id 16 } { name "ttvkar" } { score 78 } }
T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
T{ exam { id 21 } { name "izoiqi" } { score 83 } }
}

You can search for specific grades by using a sequence:

T{ exam } { 10 20 30 40 50 60 70 80 90 100 } >>score select-tuples .
{
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 9 } { name "bdoztd" } { score 30 } }
T{ exam { id 20 } { name "lnmxfq" } { score 60 } }
}

An interval query. Nobody should have above a 100:

USE: math.intervals T{ exam } 100 1/0. (a,b) >>score select-tuples .
{ }

Let's order by the grade:

<query> T{ exam } >>tuple "score" >>order select-tuples .
...
T{ exam { id 21 } { name "izoiqi" } { score 83 } }
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
}

Or by (random-generated) name:

<query> T{ exam } >>tuple "name" >>order select-tuples .
{
T{ exam { id 13 } { name "aagnof" } { score 61 } }
T{ exam { id 36 } { name "ailftz" } { score 41 } }
T{ exam { id 9 } { name "bdoztd" } { score 30 } }
...

Updates and deletes are also by example and may use sequences and intervals in the 'where' clause in the same way as the selects with the update-tuples and delete-tuples words.

Raw SQL

You can drop down into SQL if you want to do anything complex or things that are not supported by the library. A drawback is that the SQL you write may not work across SQL databases. Notice that, for a select, you have to convert the data to Factor tuples yourself since the data is returned as an array of strings.

"select * from exam where name like '%a%'" sql-query .
{
{ "8" "rxoyga" "53" }
{ "10" "fchhha" "1" }
{ "13" "aagnof" "61" }
{ "16" "ttvkar" "78" }
{ "22" "yhqkav" "28" }
}

Raw SQL that does not return results is possible too:

"delete from exam where name like '%a%'" sql-command
"select * from exam where name like '%a%'" sql-query length .
0

Another tutorial

Here is a tutorial from the database documentation that shows moref basic usage of the library. If you run this from within the Factor environment itself, you can click on the code blocks and run them one by one without copy/pasting or typing them in yourself.

Real-world usage

The Factor Wiki uses the database library, as does Factor's web framework, Furnace. Expect to see more useful websites and applications using it in the future.

Lastly, if anyone can suggest a catchy name for the database library, please let me know.

Wednesday, September 03, 2008

Vocabulary and documentation tool: tools.scaffold

The Factor project is on a documentation binge until all of the core and basis vocabularies are documented. I noticed, as anyone tasked to write a bunch of documenation would, that quite a bit of it could be automated. To begin writing docs for a vocabulary that already exists, you can just run the tool in it and edit the generated help file.

I actually ended up writing two tools -- one for the docs, and another to create new vocabularies with a simple command.

Scaffold tool to create new vocabularies


First, an aside. Factor's module system is based on a directory layout, with several vocabulary roots which are stored in the vocab-roots symbol.
( scratchpad ) USE: vocabs.loader vocab-roots get .
V{ "resource:core" "resource:basis" "resource:extra" "resource:work" }
Knowing this, you can now create a new vocabulary and boilerplate empty Factor files from inside of Factor and click the links to edit them in your favorite editor.
( scratchpad ) "extra" "alchemy" scaffold-vocab
Creating scaffolding for P" resource:extra/alchemy/alchemy.factor"
Creating scaffolding for P" resource:extra/alchemy/alchemy-tests.factor"
Creating scaffolding for P" resource:extra/alchemy/authors.txt"

( scratchpad ) "extra" "alchemy" scaffold-help
Creating scaffolding for P" resource:extra/alchemy/alchemy-docs.factor"
The scaffold tool is programmed not to overwrite files if they already exist.

Scaffold tool for documenation



Say I have a Factor word that turns lead into gold:
: lead>gold ( lead -- gold ) ... ;

The implementation of this word is left as an exercise to the reader. We don't have to understand how it works to document what it does.

Without an automation tool, you would begin by creating a new file (extra/alchemy/alchemy-docs.factor) in the same directory as your code file (extra/alchemy/alchemy.factor). What follows is the standard bolierplate for documentation. Notice that documenation goes in the same vocabulary as your actual code, but comes from different files on disk.

! Copyright (C) 2008 Doug Coleman.
! See http://factorcode.org/license.txt for BSD license.
USING: help.markup help.syntax ;
IN: alchemy
Every doc file has such a header based on the date, who you are, and the vocabulary name. With those three pieces of information, the above can be auto-generated. I added a new symbol developer-name in tools.scaffold that should be set to your name when writing docs.

Now let's look at the documenation for lead>gold:
HELP: lead>gold
{ $values { "lead" "boring lead" } { "gold" "shiny gold" } }
{ $description "Turns lead into gold." } ;


Notice that we know the word name, the first strings in each stack effect in the $values array, and the markup structure.

Here is the generated scafford:
HELP: lead>gold
{ $values { "lead" null } { "gold" null } }
{ $description } ;
This is much less wrist punishment than typing manually. The scaffold tool can even generate the documentation with their Factor types given correct code and standard type names like "quot" and "hashtable". Types that are not known are given the null class. Documenation correctness tools can later search for null objects and report them as errors, since no words should operate on this type.

I'll finish by including a snippet of the output of running the scaffold help generator on itself.
! Copyright (C) 2008 Doug Coleman.
! See http://factorcode.org/license.txt for BSD license.
USING: arrays help.markup help.syntax io.streams.string kernel strings words ;
IN: tools.scaffold

HELP: check-root
{ $values
{ "string" string }
{ "string" string } }
{ $description } ;

HELP: check-scaffold
{ $values
{ "vocab-root" "a vocabulary root string" } { "string" string }
{ "vocab-root" "a vocabulary root string" } { "string" string } }
{ $description } ;

HELP: check-vocab-name
{ $values
{ "string" string }
{ "string" string } }
{ $description } ;

HELP: developer-name
{ $description } ;

HELP: help.
{ $values
{ "word" word } }
{ $description } ;

HELP: lookup-type
{ $values
{ "string" string }
{ "object/string" null } { "?" "a boolean" } }
{ $description } ;

HELP: main-file-string
{ $values
{ "vocab" "a vocabulary specifier" }
{ "string" string } }
{ $description } ;

HELP: not-a-vocab-root
{ $values
{ "string" string } }
{ $description } ;

HELP: not-a-vocab-root?
{ $values
{ "object" object }
{ "?" "a boolean" } }
{ $description } ;

HELP: root?
{ $values
{ "string" string }
{ "?" "a boolean" } }
{ $description } ;

HELP: scaffold-authors
{ $values
{ "path" "a pathname string" } }
{ $description } ;

HELP: scaffold-copyright
{ $description } ;

HELP: tests-file-string
{ $values
{ "vocab" "a vocabulary specifier" }
{ "string" string } }
{ $description } ;

HELP: using
{ $description } ;

HELP: vocab>scaffold-path
{ $values
{ "vocab-root" "a vocabulary root string" } { "string" string }
{ "path" "a pathname string" } }
{ $description } ;

ARTICLE: "tools.scaffold" "tools.scaffold"
;

ABOUT: "tools.scaffold"
I hope this new vocabulary makes documenation a lot less tedious and error-prone. Now to go write some docs...

Monday, April 28, 2008

Word renaming (part 2)

Here are the word names that have changed:
  • find* -> find-from
  • find-last* -> find-last-from
  • index* -> index-from
  • last-index* -> last-index-from
  • subset -> filter
New shorthand words:
  • 1 tail -> rest
  • 1 tail-slice -> rest-slice
  • swap compose -> prepose
Changes to existing word behavior:
  • reverse stack effect of assoc-diff, diff
  • before? after? before=? after=? are now generic
  • min, max can compare more objects than before, such as timestamps
  • between? can compare more objects
  • <=> returns symbols
There are several motivations at work here. One is that words named foo* are a variant of foo, but otherwise the * is no help to what the word actually does differently. We're trying to move away from word names with stars to something more meaningful.

Code with common patterns that come up a lot, like 1 tail and swap compose, is more clearly understood if these patterns are given a single name.

Factor's subset is not equivalent to the mathematical definition of subset, so it was renamed to filter to avoid confusion. Along these same lines, diff and assoc-diff are now the more mathematically intuitive; you can think of diff like set subtraction now, seq1 seq2 diff is like seq1 - seq2.

Finally, the "UFO operator" <=> now returns symbols +lt+ +eq+ +gt+ instead of negative, zero, and positive numbers. The before? and after? words can compare anything that defines a method on this operator. Since between?, min, and max
are defined in terms of these comparison words, they also work on more objects.

Please let me know if you have any more suggestions for things words that have awkward argument orders, imprecise names, or if you can suggest alternate names for words with stars in their names.

Monday, April 14, 2008

Word renaming

Several words have been renamed and moved around to make Factor more consistent:
  • new -> new-sequence
  • construct-empty -> new
  • construct-boa -> boa
  • diff -> assoc-diff
  • union -> assoc-union
  • intersect -> assoc-intersect
  • seq-diff -> diff
  • seq-intersect -> intersect
To make things symmetrical, a new word union operates on sequences.

Somehow, seq-diff and seq-intersect were implemented as O(n^2) algorithms. Now, they use hashtables and are O(n).

Lastly, a new vocabulary named ``sets'' contains the set theoretic words, along with a new word unique that converts a sequence to a hash table whose keys and values are the same. An efficient union and intersect are implemented in terms of this word.

Sunday, April 13, 2008

Adding a new primitive

I added two primitives to the Factor VM to allow setting and unsetting of environment variables. It's not that hard to do, but you have to edit several C files in the VM and a couple .factor files in the core.  Really they should not be primitives, so eventually they will be moved into the core.

The primitives that I added are defined as follows:
IN: system
PRIMITIVE: set-os-env ( value key -- )
PRIMITIVE: unset-os-env ( key -- )

Adding a primitive to vm/

Since Factor's datatypes are not the same as C's datatypes, and because of the garbage collector, there are C functions for accessing and manipulating Factor objects. The data conversion functions are not documented yet, so here's a sampling of a few of them:
  • unbox_u16_string() - pop a Factor string off the datastack and return it as a F_CHAR*
  • from_u16_string() - convert a C string to a Factor object
  • REGISTER_C_STRING() - register a C string with Factor's garbage collector
  • UNREGISTER_C_STRING() - unregister a registered C string
  • dpush() - push an object onto the datastack
  • dpop() - pop an object off of the datastack
Registering a C string with the garbage collector is required when VM code calls code that may trigger a garbage collection (gc).  Any call to Factor from the VM might trigger a gc, and if that happened the object could be moved, thus invalidating your C pointer.  When a pointer is unregistered, it's popped from a gc stack with the corrected pointer value.

Here is the call to set-os-env:
DEFINE_PRIMITIVE(set_os_env)
{
F_CHAR *key = unbox_u16_string();
REGISTER_C_STRING(key);
F_CHAR *value = unbox_u16_string();
UNREGISTER_C_STRING(key);
if(!SetEnvironmentVariable(key, value))
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}
The function is defined with a macro DEFINE_PRIMITIVE that takes only the function name.  A corresponding DECLARE_PRIMITIVE goes in run.h as your function declaration.  Not all primitives use these C preprocessor macros, for instance bignums don't because it doesn't improve the performance.  Parameters to your primitive are popped or unboxed off the data stack, so a primitive's declaration expands to:
F_FASTCALL primitive_set_os_env_impl(void);

F_FASTCALL void primitive_set_os_env(CELL word, F_STACK_FRAME *callstack_top) {
save_callstack_top(callstack_top);
primitive_set_os_env_impl();
}
INLINE void primitive_set_os_env_impl(void)
F_FASTCALL is a wrapper around FASTCALL, which on x86 will pass the first two arguments in registers as an optimization.  Note that while it declares that it takes no arguments (void), most primitives will do something to the data stack.

Since unbox_u16_string() allocates memory for the Factor object, it could trigger a gc, so it's registered as a string.  You can also register values using REGISTER_ROOT for cells, REGISTER_BIGNUM for bignums, and REGISTER_UNTAGGED for arrays, words, and other Factor object pointers for which the type is known.  The key string can immediately be unregistered after calling unbox on the next stack value since the rest of the function will not cause a gc.  If the win32 call fails, there's a function general_error() that throws an exception.  In this case, it's an ERROR_IO that calls a helper function to return the Windows error message.

Now that the function is written, you have to add it to the list of primitives in primitives.c.  The important thing is that this list remains in the same order as the list in core/ which you will edit in the next section.  Also, add a prototype to the run.h file.

Adding a primitive to core/

Everything in Factor compiles down to primitives.  Because they are by definition "primitive", the compiler cannot infer the stack effect and argument types. To make a primitive's stack effect "known", edit core/inference/known-words/known-words.factor:
\ set-os-env { string string } { } <effect> set-primitive-effect
The next step is to put your word in the file core/bootstrap/primitives.factor in the same order as in vm/primitives.c.

Sometime in the future there might be a PRIMITIVE: word that will reduce the number of different places to edit to add a primitive. If it used Factor's FFI, you could add a new primitive without even having to bootstrap again.

Friday, March 28, 2008

Recent Changes in Factor

Here are some things I've been working collaboratively with some other Factor devs lately.

cwd, cd, and pathname changes

The old way to change the current working directory was to call the cd word. This is now obsolete, and code that does this won't get the expected behavior anymore. A new word, normalize-pathname, is called before every top level word using paths and adjusts the pathname based on a new dynamic variable called current-directory.

To get pathnames relative to the Factor directory you used to use the words ?resource-path and resource-path. The code "resource:foo.txt" file-contents now does what "foo.txt" resource-path file-contents used to do, so user code should avoid calling resource-path on path literals.

Also, there's a new word called with-directory ( path quot -- ) that calls your quotation with a new current working directory and restores the old one when done. All of words that deal with reading and writing files have been updated to respect this change.

Using the cd word is still useful sometimes, like when calling with-fork ( parent-quot child-quot -- ) when you want the child process to inherit the current-directory variable as its current directory.

Pathnames in Factor are still strings, but may be either strings or pathname objects in the future.

IO Launcher Priorities

Since some Factor developers still use Windows XP on single-core laptops from 2004, tools.deploy now sets the child process to have a lower priority so that the system is still usable when deploying images.

Example:
<process>
{ "cmd" "/c" "dir" "/S" "\\" } >>command
+low-priority+ >>priority
run-process

BSD Port

Factor now supports FreeBSD, NetBSD, and OpenBSD 32 and 64bit versions. Please inform us of any bugs you find. Binary releases should happen within a week.

Fixing the Solaris port is next up, followed by the ARM Linux and WinCE port.

Random number generator protocol

Because we want to generate secure random numbers for the new web framework, the Mersenne twister algorithm just won't do as Factor's sole pseudo random number generator. So until we implement a Yarrow prng, the default prng for cryptography is /dev/random on Unix systems and the CryptGenRandom call on Windows. Does anyone have a suggestion as to which provider to use? I'm using PROV_RSA_AES right now, but I just chose it randomly from the list.

The random protocol is really simple.
GENERIC: seed-random ( tuple seed -- )
GENERIC: random-32* ( tuple -- r )
GENERIC: random-bytes* ( tuple n -- bytes )

M: object random-bytes* ( tuple n -- byte-array )
[ drop random-32* ] with map >c-uint-array ;

M: object random-32* ( tuple -- n )
4 random-bytes* le> ;
There are two ways to get randomness -- either 32 bits at a time, or several bytes at a time. The cool thing is that you only have to put a method on one of these words, and the other one is taken care of by the default method! Of course, if you don't define either, the callstack will overflow. You can define methods on both for efficiency if it makes sense.

Wednesday, March 12, 2008

Singletons in Factor

A singleton is a design pattern that only allows for a unique class that is only instantiated once. In other languages, singletons can contain global data, but why not just use a global to store global state instead?

In Factor, a singleton is simply a predicate class whose predicate tests if the class is identical to itself. Singletons are defined like this:
SINGLETON: factor
That's it! No messy boilerplate to copy and paste, no subtle reasons why your singleton might be wrong in some corner case. Using the singleton, we can replace some of the less elegant parts of the Factor core code and implement things in a simpler way.

For instance, until now the core words os and cpu have returned strings based on how the Factor binary is compiled. Soon, these strings will be parsed into whichever singleton they represent, allowing for generic dispatch. I wanted to have a cross-platform library for finding out basic hardware information about your computer, like how many cpus/cores, what speed, and how much RAM is in a machine. To do this without singletons, I had to redefine information already available as strings (the cpu and os words) as symbols. With singletons, this duplication can be removed. The same applies to the compiler code and loading libraries.

Here is the way the core supported operating systems can be defined using singletons.
SINGLETON: winnt
SINGLETON: wince
SINGLETON: macosx
SINGLETON: linux
UNION: windows winnt wince ;
Now we can dispatch on these to find the number of cores:
HOOK: #cpus os ( -- n )
M: macosx #cpus { 6 3 } sysctl-query-uint ;
M: winnt #cpus system-info SYSTEM_INFO-dwNumberOfProcessors ;
For loading code, the current idiom is this:
<< "alut" {
{ [ win32? ] [ "alut.dll" ] }
{ [ macosx? ] [ "/System/Library/Frameworks/OpenAL.framework/OpenAL" ] }
{ [ unix? ] [ "libalut.so" ] }
} cond "cdecl" add-library >>
Using singletons, we can shorten this to:
"alut" {
{ win32 "alut.dll" }
{ macosx "/System/Library/Frameworks/OpenAL.framework/OpenAL" }
{ unix "libalut.so" }
} add-library
The library is assumed to be 'cdecl', but if it were 'stdcall' you could specify this by adding a "stdcall" string after the libary name, thus making a triple instead of a pair. The amount of boilerplate is reduced and the programmer can be more productive and write fewer bugs.

The implementation of singleton is:
: define-singleton-class ( class -- )
\ word swap
dup [ eq? ] curry define-predicate-class ;
This generates code that looks like:
PREDICATE: word winnt \ winnt eq? ;
It makes a predicate class with a superclass of 'word' that you can dispatch on and only a single instance exists. Why are singletons so hard to define in some other languages?

Thursday, March 06, 2008

Google Summer of Code 2008 Project Ideas

The Factor project is applying for Summer of Code. Here are our project ideas.

Applications

  • Write a structure editor
  • Write a text editor
  • Write an IRC client
  • Write a package manager for Factor
  • Write a file manager

Enterprise Factor

  • Update the bindings to MySQL, Oracle, ODBC for the latest DB framework in extra/db
  • Add bindings for DB2, Informix, SQL Server, Sybase, etc
  • Write a MySQL binding in Factor instead of calling the C library to permit nonblocking I/O operation and avoid GPL licensing issues
  • Write high-level wrapper for OpenSSL binding which implements encrypted Factor streams
  • Write a SOAP stack
  • Improve the continuation-based web framework with ideas from Seaside

Embedded Languages

  • Improve the Prolog implementation in extra/prolog
  • Revive extra/lisp
  • Write an assembler to a microcontroller and cross-compile applications from within Factor
  • Write an infix math DSL
  • Write a 'language X' to Factor compiler, where X is C, Ruby, Python, Javascript, Elisp, etc

Miscellaneous Libraries

  • Write a binding and high level interface to zlib or implement gzip compression in Factor
  • Finish the tar library
  • Finish the packet sniffer libraries
  • Implement the math algorithm of your choice for extra/math
  • Implement the data structures described in Purely Functional Data Structures
  • Write a COM interface
  • Improve the JNI library
  • Write a JNI-like bridge for Android
  • Write a wrapper for Windows Mobile libraries to make calls and send SMSs
  • Integrate the zoneinfo timezone database with Factor's calendar library
  • Add more calendars to the library, such as Persian, Hebrew, Islamic, Chinese, Hindu, Buddhist

Factor VM

  • Write a GC profiler
  • Make GC memory areas shrink after garbage collection
  • Implement dtrace probes
  • Make continuations serializable

User Interface

  • Write a binding to an image manipulation library
  • Finish bitmap library
  • Improve the look of Factor's UI gadgets
  • Add drag and drop support
  • Add any of the following gadgets: tabs, syntax highlighting text editor using the xmode library, combo boxes, tables, trees, spinners
  • Implement native font rendering on Windows

Thursday, February 14, 2008

Disassembler Vocabulary "ported" to Windows

Slava wrote a vocabulary to send gdb a process id and a range of addresses to disassemble in order to streamline the process of writing compiler optimizations. The original code only worked on Unix, requiring a call unix:getpid, which is a Unix system call. The Windows equivalent is GetCurrentProcessId. Since we need to call the Unix version on MacOSX and Linux, and the Windows API call on Windows XP, we use a Factor design pattern -- the HOOK:.

The HOOK:



The word in question is called make-disassemble-cmd. Behold its majesty:

QUALIFIED: unix

M: pair make-disassemble-cmd
in-file [
"attach " write
unix:getpid number>string print
"disassemble " write
[ number>string write bl ] each
] with-file-out ;


You can see where it's calling unix:getpid. Knowing the Windows API call from above, it's easy to write a version that works on Windows:

M: pair make-disassemble-cmd
in-file [
"attach " write
GetCurrentProcessId number>string print
"disassemble " write
[ number>string write bl ] each
] with-file-out ;


Obviously, this is unacceptable, because now it doesn't work on Unix! If we rename the make-disassemble-cmd word for the new platform, then there are still two copies of the exact same word, and you'll be loading them both on platforms where they shouldn't be loaded. We really just want to rename the one word that changed, so...

Let's make a HOOK:

! in io.launcher
HOOK: current-process-handle io-backend ( -- handle )

! in io.unix.launcher
M: unix-io current-process-handle ( -- handle ) getpid ;

! in io.windows.launcher
M: windows-io current-process-handle ( -- handle ) GetCurrentProcessId ;

! in tools.disassembler
M: pair make-disassemble-cmd
in-file [
"attach " write
current-process-handle number>string print
"disassemble " write
[ number>string write bl ] each
] with-file-out ;


Now, there is just one word that will do the work on both platforms. The relevant code is only loaded into the image on the correct platform, and the problem is solved without renaming lots of words, without #ifdefs, and without copy/pasting.

Tuesday, February 12, 2008

Text Editor Integration and Directory Traversal

To edit a word in Factor you pass the edit word a defspec, like \ + edit or { float + } edit, or you press CTRL-SHIFT-E on a word. If an editor path is already set, then the word definition should pop up in your editor at the line of the definition. However, if you haven't configured your text editor, a restart will be thrown and you can select it from a list. If you selected gvim, previously it would use a braindead algorithm that makes a list of all paths in \Program Files and then traverses that list looking for the gvim.exe binary. This caused it to hang while doing tons of disk IO.

Now it's better in two ways -- it only looks in \Program Files\vim\ (oops), and secondly you can search with find-file-breadth or find-file-depth. The implementation is pretty simple -- breadth first search uses a dlist, pushes the new traversal results to the back of the list, and traverses the list in order. Depth first also pushes newly found directories to the end of the list, but it pops them from the end.

The next step is to extract out the depth/breadth first algorithm so it's generally useful, but I need to continue working on the database library.

Monday, February 11, 2008

Factor Tax Library

I've been doing my own taxes for a few years, so I decided to start writing some Factor words to help with the calculations. This library does FICA tax, Medicare tax, and federal income tax, along with Minnesota state tax.

To get paid by an employer you fill out a Form W-4 which they keep on file. The three relevant pieces of information are the tax year (2008), the number of allowances, and whether or not you are married.

In Factor this looks like:
TUPLE: w4 year allowances married? ;
C: <w4> w4


To calculate your withholding, or the amount your employer should withhold from your salary every pay period, the library starts with your yearly salary, a Form W-4, and a tax collector entity, such as <federal> or <minnesota> and calls the withholding word.

For the average single male programmer making $70k per year, the calculation looks like this:

( scratchpad ) 70000 2008 3 f <w4> <federal> withholding dup money.
$16,054.00
( scratchpad ) biweekly money.
$617.46


The government's cut of your paycheck isn't too hard to calculate. The employer withholds 6.2% of your paycheck for FICA (social security) and contributes a matching 6.2% that is over and above your salary, for a total of 12.4%. The FICA tax has what's known as a base rate, which is a ceiling on the amount of your salary that this tax is applied to. For 2008, the base rate is $102,000, which means that the FICA tax is not applied above this amount. Additionally, you and your employer pay Medicare tax, which is 1.45% each for a total of 3.9%, and this is applied to all income (no base rate).

Now for the Federal income tax. Take your salary and subtract $3500 times the number of allowances you claimed on your Form W-4. For instance, three allowances (working only one job, claiming self as a dependent, and head of household) will give you a $10,500 allowance, which will leave $59,500 taxable income. (Note: FICA and Medicare tax apply to the entire salary, while Federal income tax applies only to the salary - allowances) There are several equivalent charts for looking up the federal income tax withholding, so this library uses the annual one because it was easiest to encode.

So for the example salary of $70,000 with three allowances, the calculation looks like:
( scratchpad ) 70000 2008 3 f <w4> <federal> federal-tax money.
$10,699.00
( scratchpad ) 70000 2008 3 f <w4> fica-tax money.
$4,340.00
( scratchpad ) 70000 2008 3 f <w4> <federal> medicare-tax money.
$1,015.00
( scratchpad ) { 10699 4340 1015 } sum money.
$16,054.00
( scratchpad ) 70000 2008 3 f <w4> <federal> withholding money.
$16,054.00


Salaries are taxed at different rates as they increase. For example, the first $2,650 dollars you make are taxed at 0% federal income tax rate. This does not mean you pay no tax at all; FICA and Medicare still apply. From $2,650 to $10,300 the tax rate is 10%. So, if somehow you were making $8,650 per year after deducting allowances, you would be in the second tax bracket and your taxable income would be $6000, taxed at 10%, for a total of $600. Your employer would withhold this amount from your paychecks and pay it to the government quarterly. (Note: the numbers above are for single people, married people have a more advantageous table)

Minnesota has one of the highest state taxes. To calculate this tax, take the same allowances from your Form W-4 and calculate the allowance per the federal income tax, $3,500 per allowance. Again, there are two tables -- single and married. The tax amounts are 5.35%, 7.05%, and 7.85% in the highest bracket.
( scratchpad ) 70000  2008 3 f <w4> <minnesota> withholding money.
$3,686.68


So, altogether, the employer withholds your federal and state taxes from the paycheck. You can calculate this:
( scratchpad ) 70000  2008 3 f <w4>  employer-withhold dup money. biweekly money.
$19,740.68
$759.26
( scratchpad ) 70000 2008 3 f <w4> <minnesota> net money. biweekly money.
$50,259.33


So if you make $70k in Minnesota, your employer should withhold $759.26 per paycheck, to be split between the federal and state government and paid quarterly (or monthly). You would get to keep $50,259.33 and the employer would give you a Form W-2 with the relevant details. You could the file income tax returns on April 15th, where you could possibly get some of that back from various deductions and refunds.

All values are calculated in rational numbers, so they are exact until the end, where the cents are rounded to the nearest cent. Percentages are in the source as decimal numbers, and are parsed using the DECIMAL: word.

( scratchpad ) DECIMAL: .1 .
1/10


Please feel free to submit patches for other states and to help me correct any bugs.

Saturday, February 02, 2008

Parsing HTTP headers in Factor with multi-assocs

The implementation of setting and parsing http headers in Factor has previously used a hashtable with a single key/value pair. However, this is broken because certain fields can be sent twice, e.g. set-cookie. The new implementation is a hashtable with keys/vectors to store multiple values for the same key.

I originally tried to make this obey the assoc protocol so that you could convert from a hashtable of vectors back to any type of assoc (hashtable/alist/AVL tree/etc) but this turned out to be a really bad idea because not only was it not useful, but it breaks the semantics of the assoc protocol if set-at inserts an element instead of sets it.

So the implementation is in assocs.lib as a few helper words:

: insert-at ( value key assoc -- )
[ ?push ] change-at ;

: peek-at* ( key assoc -- obj ? )
at* dup [ >r peek r> ] when ;

: peek-at ( key assoc -- obj )
peek-at* drop ;

: >multi-assoc ( assoc -- new-assoc )
[ 1vector ] assoc-map ;

: multi-assoc-each ( assoc quot -- )
[ with each ] curry assoc-each ; inline

: insert ( value variable -- ) namespace insert-at ;


Of course, set-at and at still set and access the values, but there are a couple new utility words. The insert-at word has the same stack effect as set-at but pushes a value instead of setting it. peek-at will give you the last value set for a given key, and this is the standard way of accessing values when you only care about the last one.

To turn an assoc into a multi-assoc, call >multi-assoc. To iterate over all the key/value pairs, use multi-assoc-each.

The insert word is for use with the make-assoc word, which executes inside a new namespace and outputs the variables you set as a hashtable.

Here's an example of what the headers look like for a website:

( scratchpad ) USE: http.client "amazon.com" http-get drop .
H{
{ "connection" V{ "close" } }
{ "content-type" V{ "text/html; charset=ISO-8859-1" } }
{ "server" V{ "Server" } }
{ "x-amz-id-2" V{ "L0oid1yo1Z6cuq+VgwWCv0G/UdPov/0v" } }
{ "x-amz-id-1" V{ "15CPXN68HXB35FXE62CX" } }
{
"set-cookie"
V{
"skin=noskin; path=/; domain=.amazon.com; expires=Sun, 03-Feb-2008 04:57:59 GMT"
"session-id-time=1202544000l; path=/; domain=.amazon.com; expires=Sat Feb 09 08:00:00 2008 GMT"
"session-id=002-3595241-4867224; path=/; domain=.amazon.com; expires=Sat Feb 09 08:00:00 2008 GMT"
}
}
{ "vary" V{ "Accept-Encoding,User-Agent" } }
{ "date" V{ "Sun, 03 Feb 2008 04:57:59 GMT" } }
}


I have normalized the keys by converting them to all lower case. For some reason, Amazon sends two headers as Set-Cookie and the last one as Set-cookie, which is pretty weird.

Since the prettyprinter outputs valid Factor code, you can copy/paste the above headers into a Factor listener and run some of the multi-assoc words on them.