Sunday, October 19, 2008

Moving primitives out of the Factor VM: Factor vs C

In a previous lifetime, I added environment variable primitives to the VM. Well, it turns out that a better place for them is in the Factor basis vocabulary root, so this post is about moving them again.

To move a primitive out of the VM, implement its functionality in Factor code and replace usages with your word if necessary, remove it from vm/primitives.c and core/bootstrap/primitives.factor, remove the primitive code from the VM, make a new image, recompile Factor, and bootstrap. Basically, do the inverse of the previous post.

What's more interesting, unless you have to actually remove a primitive someday and need this reference, is comparing the code for each version side-by-side. The C code has to call functions to prevent the garbage collector from moving data around when it shouldn't, but it's written without the usual ifdefs you will find in most cross-platform C code, so overall it's fairly clean. The Factor version has a high-level protocol that is implemented by both backends across separate files, with one-liners for most of the Unix definitions and high-level combinators for the Windows ones. I find the Factor version much easier to understand and I believe it's more maintainable. Factor is a better C than C.

High-level environment variable interface

Factor's high-level environment variable words let you get a single variable or all of them, set a single variable or all of them, and unset a variable. On Windows you cannot set all of the variables at once, and on Windows CE the whole concept of environment variables does not exist.

Here is the code for the main vocabulary. Notice that there are hooks on the os word, which will be a value like macosx or winnt or linux. The boilerplate at the bottom is for loading the platform-specific code.

USING: assocs combinators kernel sequences splitting system
vocabs.loader ;
IN: environment

HOOK: os-env os ( key -- value )

HOOK: set-os-env os ( value key -- )

HOOK: unset-os-env os ( key -- )

HOOK: (os-envs) os ( -- seq )

HOOK: (set-os-envs) os ( seq -- )

: os-envs ( -- assoc )
(os-envs) [ "=" split1 ] H{ } map>assoc ;

: set-os-envs ( assoc -- )
[ "=" swap 3append ] { } assoc>map (set-os-envs) ;

{
{ [ os unix? ] [ "environment.unix" require ] }
{ [ os winnt? ] [ "environment.winnt" require ] }
{ [ os wince? ] [ ] }
} cond

Unix environment variables, before and after

DEFINE_PRIMITIVE(os_env)
{
char *name = unbox_char_string();
char *value = getenv(name);
if(value == NULL)
dpush(F);
else
box_char_string(value);
}

DEFINE_PRIMITIVE(os_envs)
{
GROWABLE_ARRAY(result);
REGISTER_ROOT(result);
char **env = environ;

while(*env)
{
CELL string = tag_object(from_char_string(*env));
GROWABLE_ARRAY_ADD(result,string);
env++;
}

UNREGISTER_ROOT(result);
GROWABLE_ARRAY_TRIM(result);
dpush(result);
}

DEFINE_PRIMITIVE(set_os_env)
{
char *key = unbox_char_string();
REGISTER_C_STRING(key);
char *value = unbox_char_string();
UNREGISTER_C_STRING(key);
setenv(key, value, 1);
}

DEFINE_PRIMITIVE(unset_os_env)
{
char *key = unbox_char_string();
unsetenv(key);
}

DEFINE_PRIMITIVE(set_os_envs)
{
F_ARRAY *array = untag_array(dpop());
CELL size = array_capacity(array);

/* Memory leak */
char **env = calloc(size + 1,sizeof(CELL));

CELL i;
for(i = 0; i < size; i++)
{
F_STRING *string = untag_string(array_nth(array,i));
CELL length = to_fixnum(string->length);

char *chars = malloc(length + 1);
char_string_to_memory(string,chars);
chars[length] = '\0';
env[i] = chars;
}

environ = env;
}
Factor
USING: alien alien.c-types alien.strings alien.syntax kernel
layouts sequences system unix environment io.encodings.utf8
unix.utilities vocabs.loader combinators alien.accessors ;
IN: environment.unix

HOOK: environ os ( -- void* )

M: unix environ ( -- void* ) "environ" f dlsym ;

M: unix os-env ( key -- value ) getenv ;

M: unix set-os-env ( value key -- ) swap 1 setenv io-error ;

M: unix unset-os-env ( key -- ) unsetenv io-error ;

M: unix (os-envs) ( -- seq )
environ *void* utf8 alien>strings ;

: set-void* ( value alien -- ) 0 set-alien-cell ;

M: unix (set-os-envs) ( seq -- )
utf8 strings>alien malloc-byte-array environ set-void* ;

os {
{ macosx [ "environment.unix.macosx" require ] }
[ drop ]
} case

MacOSX environment variables, before and after

On OSX, we have to use a function to access the environment variable.

#ifndef environ
extern char ***_NSGetEnviron(void);
#define environ (*_NSGetEnviron())
#endif
Factor
USING: alien.syntax system environment.unix ;
IN: environment.unix.macosx

FUNCTION: void* _NSGetEnviron ( ) ;

M: macosx environ _NSGetEnviron ;

Windows NT environment variables, before and after

Draw your own conclusions.

DEFINE_PRIMITIVE(os_env) 
{
F_CHAR *key = unbox_u16_string();
F_CHAR *value = safe_malloc(MAX_UNICODE_PATH * 2);
int ret;
ret = GetEnvironmentVariable(key, value, MAX_UNICODE_PATH * 2);
if(ret == 0)
dpush(F);
else
dpush(tag_object(from_u16_string(value)));
free(value);
}

DEFINE_PRIMITIVE(os_envs)
{
GROWABLE_ARRAY(result);
REGISTER_ROOT(result);

TCHAR *env = GetEnvironmentStrings();
TCHAR *finger = env;

for(;;)
{
TCHAR *scan = finger;
while(*scan != '\0')
scan++;
if(scan == finger)
break;

CELL string = tag_object(from_u16_string(finger));
GROWABLE_ARRAY_ADD(result,string);

finger = scan + 1;
}

FreeEnvironmentStrings(env);

UNREGISTER_ROOT(result);
GROWABLE_ARRAY_TRIM(result);
dpush(result);
}

DEFINE_PRIMITIVE(set_os_env)
{
F_CHAR *key = unbox_u16_string();
REGISTER_C_STRING(key);
F_CHAR *value = unbox_u16_string();
UNREGISTER_C_STRING(key);
if(!SetEnvironmentVariable(key, value))
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}

DEFINE_PRIMITIVE(unset_os_env)
{
if(!SetEnvironmentVariable(unbox_u16_string(), NULL)
&& GetLastError() != ERROR_ENVVAR_NOT_FOUND)
general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);
}

DEFINE_PRIMITIVE(set_os_envs)
{
not_implemented_error();
}
Factor
USING: alien.strings fry io.encodings.utf16 kernel
splitting windows windows.kernel32 ;
IN: environment.winnt

M: winnt os-env ( key -- value )
MAX_UNICODE_PATH "TCHAR" <c-array>
[ dup length GetEnvironmentVariable ] keep over 0 = [
2drop f
] [
nip utf16n alien>string
] if ;

M: winnt set-os-env ( value key -- )
swap SetEnvironmentVariable win32-error=0/f ;

M: winnt unset-os-env ( key -- )
f SetEnvironmentVariable 0 = [
GetLastError ERROR_ENVVAR_NOT_FOUND =
[ win32-error ] unless
] when ;

M: winnt (os-envs) ( -- seq )
GetEnvironmentStrings [
<memory-stream> [
utf16n decode-input
[ "\0" read-until drop dup empty? not ]
[ ] [ drop ] produce
] with-input-stream*
] [ FreeEnvironmentStrings win32-error=0/f ] bi ;

Saturday, October 18, 2008

Introducing the Factor database library

Background

One of the first libraries I wrote in Factor was a binding to PostgreSQL library in May 2005. Factor makes it incredibly easy to bind to C functions -- just add a FUNCTION: declaration and call the C function. For example,

( scratchpad ) FUNCTION: int getuid ( ) ;
( scratchpad ) getuid .
0
At about the same time, Chris Double wrote a SQLite binding and a high-level library he called tuple-db. It had some good ideas, such as query-by-example and prepared SQL statements for better security.

Earlier this year, I took both of these libraries and merged them to come up with the the current database library. The supported backends are PostgreSQL and SQLite and there are protocols for implementing other backends with relative ease. Contributions are welcome.

Overview

Briefly, Factor tuples correspond one-to-one to tables in the database through a mapping defined with the define-persistent word. Tuples are then manipulated through insert, update, delete, and select words. Upon calling the insert word, all of the filled-in slots in a tuple will be saved to the database. The process is reversible, and by filling in slots for a tuple and calling select-tuples, the libary generates a select statement from the filled-in slots (query-by-example) and returns tuples in a sequence. More advanced queries and lower level raw SQL statements are possible as well.

First, we'll look at how to connect to a database.

Connecting to a database

Every database has its own setup which Factor encapsultes in a tuple. SQLite just needs a path to a file on disk, but since PostgreSQL has a server/client model, the setup is more complex. After making your database tuple, connecting works the same for any database by using the with-db word. This word takes a quotation (a block of code) and your tuple object, then calls the database open routine and your quotation. After running your quotation, the database is closed, even if your code throws an exception. Having an interface like this means you can simply swap out your SQLite database for a networked PostgreSQL one if you suddenly need more scalability or networked database access.

You should generally make a custom combinator with your project's connection information. Here are a couple of examples.

SQLite example combinator:

USING: db.sqlite db io.files ;
: with-sqlite-db ( quot -- )
"my-database.db" temp-file <sqlite-db> swap with-db ; inline

PostgreSQL example combinator:

USING: db.postgresql db ;
: with-postgresql-db ( quot -- )
<postgresql-db>
"localhost" >>host
5432 >>port
"user" >>username
"seeecrets?" >>password
"factor-test" >>database
swap with-db ; inline
To make sure your database connection works, you can test it with an empty quotation:
 [ ] with-postgresql-db 

If that line of code doesn't throw an error, you can safely assume connecting to the database worked.

Defining persistent tuples

The highest-level database abstraction Factor offers relates tuples directly to database tables. Tuples must map one-to-one to a database table, but foreign keys to other tables are allowed.

Primary keys

Each tuple needs a primary key for indexing by the database. The primary key can be an increasing integer assigned by the database (a +db-assigned-id+), an object assigned by the user (a +user-defined-id+), a random number, or even a compound key consisting of multiple values used together as a key.

Database defined primary keys are automatically set on the tuple after insertion. While SQLite makes this feature easy to implement with the sqlite3_last_insert_rowid library call, PostgreSQL lacks such a feature and instead, in the backend, the inserts are done through a SQL function that queries the most recently inserted row.

User-assigned ids can be anything that the programmer knows will be unique for each object in the table. The same is true for compound keys, but only one of the values has to differ in this case for it to be unique.

Sometimes, a randomly generated id is useful, and the database library makes it easy to associate a tuple with its random key. By default, it generates a 64-bit random integer and in the unlikely event that it collides with an existing entry, it tries again up to ten times. The random integer is then set in the primary key slot of the tuple.

Database types

Data should have a type to allow the database to optimize its queries and storage. The supported data types are booleans, varchars, text, integers, big-integers, doubles, reals, timestamps, byte-arrays, URLs, and Factor blobs, which store arbitrary Factor objects. Types can have default values, unique qualifiers, and not-null restrictions. The database framework does all of the marshalling of objects for you transparently.

Creating and dropping tables

There are quite a few options when it comes to creating tables. The most basic is the create-table word. The problem is that it throws an exception when the table exists, which happens often. So one alternative is ensure-table -- we make sure a table exists and silently ignore errors. However, if we're just developing an application and we want to make sure the table is always the latest version of the code, we might use recreate-table, which drops and creates the table without throwing errors, and of course drops all the data with it. To simply drop a table, use the drop-table word.

Inserting tuples

Tuples are inserted one at a time with the insert-tuple word. SQL insert commands are generated directly from the Factor object and its filled-in slots. An example will follow.

Selecting tuples

Select statements are generated from exemplar tuples, or tuples with certain slots filled in with any value besides f. Passing an empty exemplar tuple will select all tuples from the table and return them as a sequence. A useful feature we support is querying by ranges, sequences, or intervals, as shown in following demonstration.

Exams demo

About the simplest example of interest is one that matches students names with their grades on a particular exam. Here's aa exam tuple, its mapping to the database, and a utility word to generate random exam objects.

USING: math.ranges random db.types ;

TUPLE: exam id name score ;

exam "EXAM" {
{ "id" "ID" +db-assigned-id+ }
{ "name" "NAME" TEXT }
{ "score" "SCORE" INTEGER }
} define-persistent

: random-exam ( -- exam )
f
6 [ CHAR: a CHAR: z [a,b] random ] replicate >string
100 [0,b] random
exam boa ;

I'll use a trick for opening a database to use it interactively without putting a with-db wrapper around every call.

"my-database.db" temp-file <sqlite-db> db-open db set

Note that the database handle should be cleaned up with db get dispose if you don't want to leak memory.

Let's create the table and add some random exams to the database:

exam create-table

25 [ random-exam insert-tuple ] times

Now to demonstrate selects. You can select all the exams:

T{ exam } select-tuples .
{
T{ exam { id 1 } { name "qklkzk" } { score 38 } }
T{ exam { id 2 } { name "feeuwv" } { score 38 } }
T{ exam { id 3 } { name "hlzwuu" } { score 51 } }
T{ exam { id 4 } { name "liiptp" } { score 52 } }
T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
...
}

Ranges can be used with any datatype that makes sense, like integers or timestamps.

A range of passing exams:

T{ exam } 70 100 [a,b] >>score select-tuples .
{
T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 11 } { name "rorbyd" } { score 83 } }
T{ exam { id 16 } { name "ttvkar" } { score 78 } }
T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
T{ exam { id 21 } { name "izoiqi" } { score 83 } }
}

You can search for specific grades by using a sequence:

T{ exam } { 10 20 30 40 50 60 70 80 90 100 } >>score select-tuples .
{
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 9 } { name "bdoztd" } { score 30 } }
T{ exam { id 20 } { name "lnmxfq" } { score 60 } }
}

An interval query. Nobody should have above a 100:

USE: math.intervals T{ exam } 100 1/0. (a,b) >>score select-tuples .
{ }

Let's order by the grade:

<query> T{ exam } >>tuple "score" >>order select-tuples .
...
T{ exam { id 21 } { name "izoiqi" } { score 83 } }
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
}

Or by (random-generated) name:

<query> T{ exam } >>tuple "name" >>order select-tuples .
{
T{ exam { id 13 } { name "aagnof" } { score 61 } }
T{ exam { id 36 } { name "ailftz" } { score 41 } }
T{ exam { id 9 } { name "bdoztd" } { score 30 } }
...

Updates and deletes are also by example and may use sequences and intervals in the 'where' clause in the same way as the selects with the update-tuples and delete-tuples words.

Raw SQL

You can drop down into SQL if you want to do anything complex or things that are not supported by the library. A drawback is that the SQL you write may not work across SQL databases. Notice that, for a select, you have to convert the data to Factor tuples yourself since the data is returned as an array of strings.

"select * from exam where name like '%a%'" sql-query .
{
{ "8" "rxoyga" "53" }
{ "10" "fchhha" "1" }
{ "13" "aagnof" "61" }
{ "16" "ttvkar" "78" }
{ "22" "yhqkav" "28" }
}

Raw SQL that does not return results is possible too:

"delete from exam where name like '%a%'" sql-command
"select * from exam where name like '%a%'" sql-query length .
0

Another tutorial

Here is a tutorial from the database documentation that shows moref basic usage of the library. If you run this from within the Factor environment itself, you can click on the code blocks and run them one by one without copy/pasting or typing them in yourself.

Real-world usage

The Factor Wiki uses the database library, as does Factor's web framework, Furnace. Expect to see more useful websites and applications using it in the future.

Lastly, if anyone can suggest a catchy name for the database library, please let me know.