Here's a solution in Factor:
: human-sort ( seq -- newseq )
[ [ digit? ] cut3 >r string>number r> 3array ] map natural-sort
[ first3 >r number>string r> 3append ] map ;
{ "a100.txt" "a10.txt" } human-sort .
{ "a10.txt" "a100.txt" }
The algorithm is simple: split a filename at the first number, convert that from a string to a number, and let Factor's
natural-sort
do the rest.It maps over the sequence, { "a100.txt" "a10.txt" }, and cuts it at the first number inside each element so, after cut3, you get the result on the stack like "a" "100" ".txt", and "a" "10" ".txt". You dip under the top element and convert the to a number,
>r string>number r>
, and 3array
to make a sequence of sequences { { "a" 100 ".txt" } { "a" 10 ".txt" } }. You then call natural-sort
on this. Natural-sort knows how to compare element by element, so seeing the "a" "a" is equal it moves onto the next comparison, 100 10. Now that it's in the right order, you still need to turn your sequence back into strings. natural-sort returned { { "a" 10 ".txt" } { "a" 100 ".txt" } }, so map (do something to each element in the sequence) over this and do first3 >r number>string r> 3append
, which explodes the array, converts back to a string, and appends three strings together to get the filenames. The dot at the end prints it.I had to write cut3, but it's a generally useful word for partitioning a sequence that I put into sequences.lib.
: cut3 ( seq pred -- head match tail )
2dup find drop [
rot swap cut rot [ not ] compose
dupd find drop [ cut ] [ f ] if*
] [
drop nip f like f f
] if* ;
UPDATE:
I generalized this to work on { "a100b200.txt" "a100b2.txt" } with some suggestions from Slava. If you understood the previous code, this should be pretty understandable too. It keeps calling cut3 until it returns no matches and accumulates the result in a sequence, all the while applying a quotation to whatever matched the predicate. It then compares with
natural-sort
like before, and transforms these sequences back into filenames. cut3
is cleaned up in this version. I ended up with two general words, cut3
and cut-all
that can be put into a library, sequences.lib, and used elsewhere.
: cut-find ( seq pred -- before after )
dupd find drop dup [ cut ] when ;
: cut3 ( seq pred -- first mid last )
[ cut-find ] keep [ not ] compose cut-find ;
: (cut-all) ( seq pred quot -- )
[ >r cut3 r> dip >r >r , r> [ , ] when* r> ] 2keep
pick [ (cut-all) ] [ 3drop ] if ;
: cut-all ( seq pred quot -- seq )
[ (cut-all) ] { } make ;
: human-sort ( seq -- newseq )
[ [ digit? ] [ string>number ] cut-all ] map natural-sort
[ [ dup string? [ number>string ] unless ] map concat ] map ;
{ "a100b2.txt" "a100b200.txt" } human-sort .
{ "a100b2.txt" "a100b200.txt" }
UPDATE TWO:
Here is a much better algorithm that doesn't reconstruct the original filenames and thus can handle leading zeros. It constructs keys that can be sorted by
sort-values
or sort-keys
(sort-values
is just as efficient here and it lets you use dup
in the map instead of a [ ] keep
, which makes the code slightly more legible).
: human-sort ( seq -- newseq )
[ dup [ digit? ] [ string>number ] cut-all ] { } map>assoc
sort-values keys ;
{ "000a.txt" "00a.txt" } human-sort .
{ "00a.txt" "000a.txt" }