The Awesome Factor: Introducing the Factor database library

Background

One of the first libraries I wrote in Factor was a binding to PostgreSQL library in May 2005. Factor makes it incredibly easy to bind to C functions -- just add a FUNCTION: declaration and call the C function. For example,

( scratchpad ) FUNCTION: int getuid ( ) ;
( scratchpad ) getuid .
0

At about the same time, Chris Double wrote a SQLite binding and a high-level library he called tuple-db. It had some good ideas, such as query-by-example and prepared SQL statements for better security.

Earlier this year, I took both of these libraries and merged them to come up with the the current database library. The supported backends are PostgreSQL and SQLite and there are protocols for implementing other backends with relative ease. Contributions are welcome.

Overview

Briefly, Factor tuples correspond one-to-one to tables in the database through a mapping defined with the define-persistent word. Tuples are then manipulated through insert, update, delete, and select words. Upon calling the insert word, all of the filled-in slots in a tuple will be saved to the database. The process is reversible, and by filling in slots for a tuple and calling select-tuples, the libary generates a select statement from the filled-in slots (query-by-example) and returns tuples in a sequence. More advanced queries and lower level raw SQL statements are possible as well.

First, we'll look at how to connect to a database.

Connecting to a database

Every database has its own setup which Factor encapsultes in a tuple. SQLite just needs a path to a file on disk, but since PostgreSQL has a server/client model, the setup is more complex. After making your database tuple, connecting works the same for any database by using the with-db word. This word takes a quotation (a block of code) and your tuple object, then calls the database open routine and your quotation. After running your quotation, the database is closed, even if your code throws an exception. Having an interface like this means you can simply swap out your SQLite database for a networked PostgreSQL one if you suddenly need more scalability or networked database access.

You should generally make a custom combinator with your project's connection information. Here are a couple of examples.

SQLite example combinator:

USING: db.sqlite db io.files ;
: with-sqlite-db ( quot -- )
    "my-database.db" temp-file <sqlite-db> swap with-db ; inline

PostgreSQL example combinator:

USING: db.postgresql db ;
: with-postgresql-db ( quot -- )
    <postgresql-db>
        "localhost" >>host
        5432 >>port
        "user" >>username
        "seeecrets?" >>password
        "factor-test" >>database
    swap with-db ; inline

To make sure your database connection works, you can test it with an empty quotation:

 [ ] with-postgresql-db

If that line of code doesn't throw an error, you can safely assume connecting to the database worked.

Defining persistent tuples

The highest-level database abstraction Factor offers relates tuples directly to database tables. Tuples must map one-to-one to a database table, but foreign keys to other tables are allowed.

Primary keys

Each tuple needs a primary key for indexing by the database. The primary key can be an increasing integer assigned by the database (a +db-assigned-id+), an object assigned by the user (a +user-defined-id+), a random number, or even a compound key consisting of multiple values used together as a key.

Database defined primary keys are automatically set on the tuple after insertion. While SQLite makes this feature easy to implement with the sqlite3_last_insert_rowid library call, PostgreSQL lacks such a feature and instead, in the backend, the inserts are done through a SQL function that queries the most recently inserted row.

User-assigned ids can be anything that the programmer knows will be unique for each object in the table. The same is true for compound keys, but only one of the values has to differ in this case for it to be unique.

Sometimes, a randomly generated id is useful, and the database library makes it easy to associate a tuple with its random key. By default, it generates a 64-bit random integer and in the unlikely event that it collides with an existing entry, it tries again up to ten times. The random integer is then set in the primary key slot of the tuple.

Database types

Data should have a type to allow the database to optimize its queries and storage. The supported data types are booleans, varchars, text, integers, big-integers, doubles, reals, timestamps, byte-arrays, URLs, and Factor blobs, which store arbitrary Factor objects. Types can have default values, unique qualifiers, and not-null restrictions. The database framework does all of the marshalling of objects for you transparently.

Creating and dropping tables

There are quite a few options when it comes to creating tables. The most basic is the create-table word. The problem is that it throws an exception when the table exists, which happens often. So one alternative is ensure-table -- we make sure a table exists and silently ignore errors. However, if we're just developing an application and we want to make sure the table is always the latest version of the code, we might use recreate-table, which drops and creates the table without throwing errors, and of course drops all the data with it. To simply drop a table, use the drop-table word.

Inserting tuples

Tuples are inserted one at a time with the insert-tuple word. SQL insert commands are generated directly from the Factor object and its filled-in slots. An example will follow.

Selecting tuples

Select statements are generated from exemplar tuples, or tuples with certain slots filled in with any value besides f. Passing an empty exemplar tuple will select all tuples from the table and return them as a sequence. A useful feature we support is querying by ranges, sequences, or intervals, as shown in following demonstration.

Exams demo

About the simplest example of interest is one that matches students names with their grades on a particular exam. Here's aa exam tuple, its mapping to the database, and a utility word to generate random exam objects.

USING: math.ranges random db.types ;

TUPLE: exam id name score ;

exam "EXAM"    {
    { "id" "ID" +db-assigned-id+ }
    { "name" "NAME" TEXT }
    { "score" "SCORE" INTEGER }
} define-persistent

: random-exam ( -- exam )
        f
        6 [ CHAR: a CHAR: z [a,b] random ] replicate >string
        100 [0,b] random
    exam boa ;

I'll use a trick for opening a database to use it interactively without putting a with-db wrapper around every call.

"my-database.db" temp-file <sqlite-db> db-open db set

Note that the database handle should be cleaned up with db get dispose if you don't want to leak memory.

Let's create the table and add some random exams to the database:

exam create-table

25 [ random-exam insert-tuple ] times

Now to demonstrate selects. You can select all the exams:

T{ exam } select-tuples .
{
    T{ exam { id 1 } { name "qklkzk" } { score 38 } }
    T{ exam { id 2 } { name "feeuwv" } { score 38 } }
    T{ exam { id 3 } { name "hlzwuu" } { score 51 } }
    T{ exam { id 4 } { name "liiptp" } { score 52 } }
    T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
    ...
}

Ranges can be used with any datatype that makes sense, like integers or timestamps.

A range of passing exams:

T{ exam } 70 100 [a,b] >>score select-tuples .
{
    T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
    T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
    T{ exam { id 11 } { name "rorbyd" } { score 83 } }
    T{ exam { id 16 } { name "ttvkar" } { score 78 } }
    T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
    T{ exam { id 21 } { name "izoiqi" } { score 83 } }
}

You can search for specific grades by using a sequence:

T{ exam } { 10 20 30 40 50 60 70 80 90 100 } >>score select-tuples .
{
    T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
    T{ exam { id 9 } { name "bdoztd" } { score 30 } }
    T{ exam { id 20 } { name "lnmxfq" } { score 60 } }
}

An interval query. Nobody should have above a 100:

USE: math.intervals T{ exam } 100 1/0. (a,b) >>score select-tuples .
{ }

Let's order by the grade:

<query> T{ exam } >>tuple "score" >>order select-tuples .
...
    T{ exam { id 21 } { name "izoiqi" } { score 83 } }
    T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
    T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
}

Or by (random-generated) name:

<query> T{ exam } >>tuple "name" >>order select-tuples .
{
    T{ exam { id 13 } { name "aagnof" } { score 61 } }
    T{ exam { id 36 } { name "ailftz" } { score 41 } }
    T{ exam { id 9 } { name "bdoztd" } { score 30 } }
    ...

Updates and deletes are also by example and may use sequences and intervals in the 'where' clause in the same way as the selects with the update-tuples and delete-tuples words.

Raw SQL

You can drop down into SQL if you want to do anything complex or things that are not supported by the library. A drawback is that the SQL you write may not work across SQL databases. Notice that, for a select, you have to convert the data to Factor tuples yourself since the data is returned as an array of strings.

"select * from exam where name like '%a%'" sql-query .
{
    { "8" "rxoyga" "53" }
    { "10" "fchhha" "1" }
    { "13" "aagnof" "61" }
    { "16" "ttvkar" "78" }
    { "22" "yhqkav" "28" }
}

Raw SQL that does not return results is possible too:

"delete from exam where name like '%a%'" sql-command
"select * from exam where name like '%a%'" sql-query length .
    0

Another tutorial

Here is a tutorial from the database documentation that shows moref basic usage of the library. If you run this from within the Factor environment itself, you can click on the code blocks and run them one by one without copy/pasting or typing them in yourself.

Real-world usage

The Factor Wiki uses the database library, as does Factor's web framework, Furnace. Expect to see more useful websites and applications using it in the future.

Lastly, if anyone can suggest a catchy name for the database library, please let me know.

The Awesome Factor

Saturday, October 18, 2008

Introducing the Factor database library