# The Awesome Factor

## Thursday, December 13, 2007

### Sorting Filenames Containing Numbers

Jeff Atwood articulated a problem I usually run into when looking at files in Explorer. If you have a bunch of files with numbers, file "a100.txt" will come before file "a2.txt" when sorted by name. Of course, two is less than 100, so you'd expect "a2.txt" to come first.

Here's a solution in Factor:
`: human-sort ( seq -- newseq )    [ [ digit? ] cut3 >r string>number r> 3array ] map natural-sort    [ first3 >r number>string r> 3append ] map ;{ "a100.txt" "a10.txt" } human-sort .    { "a10.txt" "a100.txt" }`

The algorithm is simple: split a filename at the first number, convert that from a string to a number, and let Factor's `natural-sort` do the rest.

It maps over the sequence, { "a100.txt" "a10.txt" }, and cuts it at the first number inside each element so, after cut3, you get the result on the stack like "a" "100" ".txt", and "a" "10" ".txt". You dip under the top element and convert the to a number, `>r string>number r>`, and `3array` to make a sequence of sequences { { "a" 100 ".txt" } { "a" 10 ".txt" } }. You then call `natural-sort` on this. Natural-sort knows how to compare element by element, so seeing the "a" "a" is equal it moves onto the next comparison, 100 10. Now that it's in the right order, you still need to turn your sequence back into strings. natural-sort returned { { "a" 10 ".txt" } { "a" 100 ".txt" } }, so map (do something to each element in the sequence) over this and do `first3 >r number>string r> 3append`, which explodes the array, converts back to a string, and appends three strings together to get the filenames. The dot at the end prints it.

I had to write cut3, but it's a generally useful word for partitioning a sequence that I put into sequences.lib.

`: cut3 ( seq pred -- head match tail )    2dup find drop [        rot swap cut rot [ not ] compose        dupd find drop [ cut ] [ f ] if*    ] [        drop nip f like f f    ] if* ;  `

UPDATE:

I generalized this to work on { "a100b200.txt" "a100b2.txt" } with some suggestions from Slava. If you understood the previous code, this should be pretty understandable too. It keeps calling cut3 until it returns no matches and accumulates the result in a sequence, all the while applying a quotation to whatever matched the predicate. It then compares with `natural-sort` like before, and transforms these sequences back into filenames. `cut3` is cleaned up in this version. I ended up with two general words, `cut3` and `cut-all` that can be put into a library, sequences.lib, and used elsewhere.

`: cut-find ( seq pred -- before after )    dupd find drop dup [ cut ] when ;: cut3 ( seq pred -- first mid last )    [ cut-find ] keep [ not ] compose cut-find ;    : (cut-all) ( seq pred quot -- )    [ >r cut3 r> dip >r >r , r> [ , ] when* r> ] 2keep    pick [ (cut-all) ] [ 3drop ] if ;: cut-all ( seq pred quot -- seq )    [ (cut-all) ] { } make ;: human-sort ( seq -- newseq )    [ [ digit? ] [ string>number ] cut-all ] map natural-sort    [ [ dup string? [ number>string ] unless ] map concat ] map ;{ "a100b2.txt" "a100b200.txt" } human-sort .    { "a100b2.txt" "a100b200.txt" }`

UPDATE TWO:

Here is a much better algorithm that doesn't reconstruct the original filenames and thus can handle leading zeros. It constructs keys that can be sorted by `sort-values` or `sort-keys` (`sort-values` is just as efficient here and it lets you use `dup` in the map instead of a `[ ] keep`, which makes the code slightly more legible).
`: human-sort ( seq -- newseq )    [ dup [ digit? ] [ string>number ] cut-all ] { } map>assoc    sort-values keys ;{ "000a.txt" "00a.txt" } human-sort .    { "00a.txt" "000a.txt" }`

Robert Reiz said...

Thank you for the post. That's exactly what I am looking for. But I am new to ruby and I am not understanding your code. Could you please post an example how to use it? A method with an parameter and a return value?

Robert Reiz said...

Thank you for the post. That's exactly what I am looking for. But I am new to ruby and I am not understanding your code. Could you please post an example how to use it? A method with an parameter and a return value?

Robert Reiz said...

OK. I solved my problem now. Here is a pretty good solution in Ruby: https://github.com/reiz/naturalsorter
At least for me it's working fine ;-)