Thursday, February 14, 2008

Disassembler Vocabulary "ported" to Windows

Slava wrote a vocabulary to send gdb a process id and a range of addresses to disassemble in order to streamline the process of writing compiler optimizations. The original code only worked on Unix, requiring a call unix:getpid, which is a Unix system call. The Windows equivalent is GetCurrentProcessId. Since we need to call the Unix version on MacOSX and Linux, and the Windows API call on Windows XP, we use a Factor design pattern -- the HOOK:.


The word in question is called make-disassemble-cmd. Behold its majesty:


M: pair make-disassemble-cmd
in-file [
"attach " write
unix:getpid number>string print
"disassemble " write
[ number>string write bl ] each
] with-file-out ;

You can see where it's calling unix:getpid. Knowing the Windows API call from above, it's easy to write a version that works on Windows:

M: pair make-disassemble-cmd
in-file [
"attach " write
GetCurrentProcessId number>string print
"disassemble " write
[ number>string write bl ] each
] with-file-out ;

Obviously, this is unacceptable, because now it doesn't work on Unix! If we rename the make-disassemble-cmd word for the new platform, then there are still two copies of the exact same word, and you'll be loading them both on platforms where they shouldn't be loaded. We really just want to rename the one word that changed, so...

Let's make a HOOK:

! in io.launcher
HOOK: current-process-handle io-backend ( -- handle )

! in io.unix.launcher
M: unix-io current-process-handle ( -- handle ) getpid ;

! in
M: windows-io current-process-handle ( -- handle ) GetCurrentProcessId ;

! in tools.disassembler
M: pair make-disassemble-cmd
in-file [
"attach " write
current-process-handle number>string print
"disassemble " write
[ number>string write bl ] each
] with-file-out ;

Now, there is just one word that will do the work on both platforms. The relevant code is only loaded into the image on the correct platform, and the problem is solved without renaming lots of words, without #ifdefs, and without copy/pasting.

Tuesday, February 12, 2008

Text Editor Integration and Directory Traversal

To edit a word in Factor you pass the edit word a defspec, like \ + edit or { float + } edit, or you press CTRL-SHIFT-E on a word. If an editor path is already set, then the word definition should pop up in your editor at the line of the definition. However, if you haven't configured your text editor, a restart will be thrown and you can select it from a list. If you selected gvim, previously it would use a braindead algorithm that makes a list of all paths in \Program Files and then traverses that list looking for the gvim.exe binary. This caused it to hang while doing tons of disk IO.

Now it's better in two ways -- it only looks in \Program Files\vim\ (oops), and secondly you can search with find-file-breadth or find-file-depth. The implementation is pretty simple -- breadth first search uses a dlist, pushes the new traversal results to the back of the list, and traverses the list in order. Depth first also pushes newly found directories to the end of the list, but it pops them from the end.

The next step is to extract out the depth/breadth first algorithm so it's generally useful, but I need to continue working on the database library.

Monday, February 11, 2008

Factor Tax Library

I've been doing my own taxes for a few years, so I decided to start writing some Factor words to help with the calculations. This library does FICA tax, Medicare tax, and federal income tax, along with Minnesota state tax.

To get paid by an employer you fill out a Form W-4 which they keep on file. The three relevant pieces of information are the tax year (2008), the number of allowances, and whether or not you are married.

In Factor this looks like:
TUPLE: w4 year allowances married? ;
C: <w4> w4

To calculate your withholding, or the amount your employer should withhold from your salary every pay period, the library starts with your yearly salary, a Form W-4, and a tax collector entity, such as <federal> or <minnesota> and calls the withholding word.

For the average single male programmer making $70k per year, the calculation looks like this:

( scratchpad ) 70000 2008 3 f <w4> <federal> withholding dup money.
( scratchpad ) biweekly money.

The government's cut of your paycheck isn't too hard to calculate. The employer withholds 6.2% of your paycheck for FICA (social security) and contributes a matching 6.2% that is over and above your salary, for a total of 12.4%. The FICA tax has what's known as a base rate, which is a ceiling on the amount of your salary that this tax is applied to. For 2008, the base rate is $102,000, which means that the FICA tax is not applied above this amount. Additionally, you and your employer pay Medicare tax, which is 1.45% each for a total of 3.9%, and this is applied to all income (no base rate).

Now for the Federal income tax. Take your salary and subtract $3500 times the number of allowances you claimed on your Form W-4. For instance, three allowances (working only one job, claiming self as a dependent, and head of household) will give you a $10,500 allowance, which will leave $59,500 taxable income. (Note: FICA and Medicare tax apply to the entire salary, while Federal income tax applies only to the salary - allowances) There are several equivalent charts for looking up the federal income tax withholding, so this library uses the annual one because it was easiest to encode.

So for the example salary of $70,000 with three allowances, the calculation looks like:
( scratchpad ) 70000 2008 3 f <w4> <federal> federal-tax money.
( scratchpad ) 70000 2008 3 f <w4> fica-tax money.
( scratchpad ) 70000 2008 3 f <w4> <federal> medicare-tax money.
( scratchpad ) { 10699 4340 1015 } sum money.
( scratchpad ) 70000 2008 3 f <w4> <federal> withholding money.

Salaries are taxed at different rates as they increase. For example, the first $2,650 dollars you make are taxed at 0% federal income tax rate. This does not mean you pay no tax at all; FICA and Medicare still apply. From $2,650 to $10,300 the tax rate is 10%. So, if somehow you were making $8,650 per year after deducting allowances, you would be in the second tax bracket and your taxable income would be $6000, taxed at 10%, for a total of $600. Your employer would withhold this amount from your paychecks and pay it to the government quarterly. (Note: the numbers above are for single people, married people have a more advantageous table)

Minnesota has one of the highest state taxes. To calculate this tax, take the same allowances from your Form W-4 and calculate the allowance per the federal income tax, $3,500 per allowance. Again, there are two tables -- single and married. The tax amounts are 5.35%, 7.05%, and 7.85% in the highest bracket.
( scratchpad ) 70000  2008 3 f <w4> <minnesota> withholding money.

So, altogether, the employer withholds your federal and state taxes from the paycheck. You can calculate this:
( scratchpad ) 70000  2008 3 f <w4>  employer-withhold dup money. biweekly money.
( scratchpad ) 70000 2008 3 f <w4> <minnesota> net money. biweekly money.

So if you make $70k in Minnesota, your employer should withhold $759.26 per paycheck, to be split between the federal and state government and paid quarterly (or monthly). You would get to keep $50,259.33 and the employer would give you a Form W-2 with the relevant details. You could the file income tax returns on April 15th, where you could possibly get some of that back from various deductions and refunds.

All values are calculated in rational numbers, so they are exact until the end, where the cents are rounded to the nearest cent. Percentages are in the source as decimal numbers, and are parsed using the DECIMAL: word.

( scratchpad ) DECIMAL: .1 .

Please feel free to submit patches for other states and to help me correct any bugs.

Saturday, February 02, 2008

Parsing HTTP headers in Factor with multi-assocs

The implementation of setting and parsing http headers in Factor has previously used a hashtable with a single key/value pair. However, this is broken because certain fields can be sent twice, e.g. set-cookie. The new implementation is a hashtable with keys/vectors to store multiple values for the same key.

I originally tried to make this obey the assoc protocol so that you could convert from a hashtable of vectors back to any type of assoc (hashtable/alist/AVL tree/etc) but this turned out to be a really bad idea because not only was it not useful, but it breaks the semantics of the assoc protocol if set-at inserts an element instead of sets it.

So the implementation is in assocs.lib as a few helper words:

: insert-at ( value key assoc -- )
[ ?push ] change-at ;

: peek-at* ( key assoc -- obj ? )
at* dup [ >r peek r> ] when ;

: peek-at ( key assoc -- obj )
peek-at* drop ;

: >multi-assoc ( assoc -- new-assoc )
[ 1vector ] assoc-map ;

: multi-assoc-each ( assoc quot -- )
[ with each ] curry assoc-each ; inline

: insert ( value variable -- ) namespace insert-at ;

Of course, set-at and at still set and access the values, but there are a couple new utility words. The insert-at word has the same stack effect as set-at but pushes a value instead of setting it. peek-at will give you the last value set for a given key, and this is the standard way of accessing values when you only care about the last one.

To turn an assoc into a multi-assoc, call >multi-assoc. To iterate over all the key/value pairs, use multi-assoc-each.

The insert word is for use with the make-assoc word, which executes inside a new namespace and outputs the variables you set as a hashtable.

Here's an example of what the headers look like for a website:

( scratchpad ) USE: http.client "" http-get drop .
{ "connection" V{ "close" } }
{ "content-type" V{ "text/html; charset=ISO-8859-1" } }
{ "server" V{ "Server" } }
{ "x-amz-id-2" V{ "L0oid1yo1Z6cuq+VgwWCv0G/UdPov/0v" } }
{ "x-amz-id-1" V{ "15CPXN68HXB35FXE62CX" } }
"skin=noskin; path=/;; expires=Sun, 03-Feb-2008 04:57:59 GMT"
"session-id-time=1202544000l; path=/;; expires=Sat Feb 09 08:00:00 2008 GMT"
"session-id=002-3595241-4867224; path=/;; expires=Sat Feb 09 08:00:00 2008 GMT"
{ "vary" V{ "Accept-Encoding,User-Agent" } }
{ "date" V{ "Sun, 03 Feb 2008 04:57:59 GMT" } }

I have normalized the keys by converting them to all lower case. For some reason, Amazon sends two headers as Set-Cookie and the last one as Set-cookie, which is pretty weird.

Since the prettyprinter outputs valid Factor code, you can copy/paste the above headers into a Factor listener and run some of the multi-assoc words on them.