Friday, May 29, 2009

A new library to manage client connections, and a proof of concept chat server in Factor

Some thirty seconds into play-testing Joe Groff's terrain demo game, I realized there were no NPCs, no double-barreled shotguns or hand-cannons, and most disturbingly, no other players. Sure you can fly around, but you can't gib your friends! So I decided to do start to do something about it and write a managed server vocabulary with a generic protocol that you can extend for writing your own servers. Hey, I'm not much of an artist -- the guns will have to wait.

Managed-server infrastructure

The HTTP server already uses a library called io.servers.connections which implements a threaded-server with SSL support in 164 lines of code. A threaded-server listens for a set number of clients on an open port and handles each one individually; no client needs to know about any other. To get the code so concise, it uses libraries for concurrency, logging, sockets, and SSL that are themselves reused elsewhere.
Features of the threaded-server vocabulary:
  • the best nonblocking I/O code on every platform (completion ports, kqueue, epoll, not select())
  • connection/error logging, log rotation
  • correct error handling and resource cleanup
  • SSL support on Unix platforms
  • IPv4 and IPv6 support by default
For an HTTP or FTP server, handling each connection individually is what you want. However, for games or chat servers, you really want your users to interact. Building on top of this thread-server, I made a new tuple called a managed-server that tracks a list of connecting and disconnecting clients. You still get all of the features threaded-server implements, but now there's a new client handler that maintains a list of connected clients keyed by a username and utility words to send data to all clients.
You can also use this code to make custom binary protocols, and I'm mostly through implemented an SRP6 library to allow secure unencrypted logins after you create an account through an SSL connection. UDP support for first-person shooter and faster-paced games will also be supported when someone needs it.

The implementation of managed-server

A managed-server inherits from threaded-server class and adds a new slot called clients to store connections. Each connection's state -- the input/output streams, the local/remote socket addresses, username, a slot for passing quit messages, and a quit flag -- is wrapped inside a managed-client tuple and stored into the clients hashtable with the username as the key. In this way, it's easy to look up another client's stream and send it a message:
"wally" "hi wally!" send-client
You can also send a message to all connected clients,send-everyone, or to all but yourself:
"This one goes out to all the ladies." send-everyone-else
Here's what the tuple classes code looks like:
TUPLE: managed-server < threaded-server clients ;

TUPLE: managed-client
input-stream output-stream local-address remote-address
username object quit? ;

The managed-server protocol

A managed-server has some generics in place to guide you in creating your own servers. The first two generics are required, but the others default to no-ops unless you want to handle these events. Of course, the clients are still tracked no matter what your method does on the client-join or client-disconnect generics. The default method for handle-already-logged-in throws an error to prevent a new client from taking over the other client's session or logging in multiple times. You can override this behavior with your own perversions.
Here's the protocol:

HOOK: handle-login threaded-server ( -- username )
HOOK: handle-managed-client* managed-server ( -- )
HOOK: handle-already-logged-in managed-server ( -- )
HOOK: handle-client-join managed-server ( -- )
HOOK: handle-client-disconnect managed-server ( -- )

The implementation of a chat server using managed-server

Eventually someone will use managed-server for the networking code in a game, but until then I've implemented a simple chat server. Writing the chat server was fun and helped me to iron out a couple of bugs, which I wrote about below.

A walkthrough of the chat server protocol

The chat server code begins by inheriting from the managed-server tuple:
TUPLE: chat-server < managed-server ;
From here you go about implementing required parts of the protocol, handle-login and handle-managed-client*, so let's start there.
M: chat-server handle-login
"Username: " write flush
readln ;
The current input/output streams are bound to the client connection, so calling write will send them the login prompt. To read back the username, readln reads until a newline is sent. If you were to connect with telnet at this point, you would see the prompt and could send back a username. Then the server would kick you off because there's no implementation of handle-managed-client*.
M: chat-server handle-managed-client*
readln dup f = [ t client (>>quit?) ] when
"/" ?head [ handle-command ] [ handle-chat ] if
] unless-empty ;
This word handles every other message the client sends apart from the login code. Calling readln reads the client's message one line at a time and returns false when the stream closes. The quit flag is set in such a case and will be explained later. For now, suffice to say that you're quitting if readln returns false. Next, the message is checked for any content -- both false and an empty string can be safely ignored here by the unless-empty combinator. Inside the quotation, the leading slash is stripped from the input, if any, and a boolean returned by ?head decides if the message was intended for the server or the chat room.
: handle-command ( string -- )
dup " " split1 swap >lower commands get at* [
call( string -- ) drop
] [
2drop "Unknown command: " prepend print flush
] if ;
Commands sent to the server are normalized by converting to lower case and then looked up in the commands table. If you send a successful command such as /who or /nick then it gets executed; if not you get the generic "Unknown command" error.
: handle-chat ( string -- )
[ username ": " ] dip
] "" append-outputs-as send-everyone ;
Sending a message to the chat room is the alternative to server commands. I'm using append-outputs-as here to append together a bunch of strings, although i could easily have used 3append instead. I left this in because it's easier to change the look of the chat if you don't have to keep track of how many strings you're appending and you just let the compiler infer. Please take note: smart combinators in Factor are analogous to applying a function to a list or parameters in Lisp in that you don't need to know the number of parameters. The following two snippets will demonstrate what I mean:
(+ 1 2 10 1200)
[ 1 2 10 1200 ] sum-outputs
That's pretty much the essence of the chat server since everything else was just added for fun.

Fun with default encodings

Default encodings are terrible! Of course, you can change the encoding of a stream whenever you want, but the encoding for threaded-servers defaulted to ASCII until I changed it this evening. When I made my chat server yesterday, I forgot to set the encoding to what I wanted -- UTF8. Sending a character above 127 caused the server to throw an exception since ASCII is only 7 bits wide, and the sender would get disconnected. The FTP server I wrote started out with this bug as well, before I changed it to latin1. But now that threaded-server takes an encoding on the stack, this bug can never happen again.
So what's wrong with picking a different default encoding, maybe UTF8? Well, if I'm making a binary server, the UTF8 decoder will replace bad bit sequences with replacement characters -- another latent bug! What about binary as the default encoding, i.e. no encoding? Binary is the best option for a default, but then people who need to use UTF8 or latin1 might not know that the stream protocol supports encodings at all, and will end up doing a lot of work by hand which should be handled by the stream implementation. So not having a default encoding 1) prevents latent bugs and 2) forces the programmer to think about what they really want in each situation -- surely a good idea.

Quitting gracefully with quit flag

My first thought was just to throw an exception when I wanted to disconnect a client and cause the error handler to clean up the resources. Hopefully it's common knowledge that control flow implemented with exceptions is inefficient and bad design, in the general case. Maybe just this once? Nope, in my case the logging framework logs all exceptions, so the majority of the logs would be filled up with useless disconnect error messages. Clearly something better was needed -- the quit flag. Managed clients have a quit flag slot that is checked every time the server processes some data. Clients can quit gracefully by setting this flag to true and returning control back to the stream reader loop, and quits caused by exceptions are logged and worthy of further investigation.

Live coding on the server

After the chat server was up and running, I could add features without restarting. One of the first requested features was "/help", which required a redesign of how slash commands were handled. Instead of a case statement, now there's a word add-command that takes the implementation, the documentation, and the name of the command you want to add. Adding a command stores the code and the docs in symbols holding hashtables, indexed by the name of the command.
SYMBOL: commands
commands [ H{ } clone ] initialize

SYMBOL: chat-docs
chat-docs [ H{ } clone ] initialize

:: add-command ( quot docs key -- )
quot key commands get set-at
docs key chat-docs get set-at ;
I added a time command for fun:
[ drop gmt timestamp>rfc822 print flush ]
<" Syntax: /time
Returns the current GMT time."> "time" add-command
Someone else wanted a "/who" command -- easy enough.
[ drop clients keys [ "``" "''" surround ] map ", " join print flush ]
<" Syntax: /who
Shows the list of connected users.">
"who" add-command
There last feature I implemented was a way to change your nickname without reconnecting:
: handle-nick ( string -- )
"nick" usage
] [
dup clients key? [
username-taken-string print flush
] [
[ username swap warn-name-changed ]
[ username clients rename-at ]
[ client (>>username) ] tri
] if
] if-empty ;

[ handle-nick ]
<" Syntax: /nick nickname
Changes your nickname.">
"nick" add-command
Changing your nickname is straightforward but takes the most steps of all the commands I implemented. Try to understand the code -- "string" in the stack effect is the requested nickname and the clients word returns a hashtable of connections, indexed by nicknames. If the user didn't supply a nickname, remind them of the syntax for using /nick. If they did supply a nickname, check if it's in use and, if so, refuse to change their name. Otherwise, the nick change succeeded, so tell all the users of the nickname change, apply the nick change in the clients hashtable, and set the new nickname in the client.

Chat server running live

You can try out the chat server by downloading Factor and running this command:
USING: io.servers.connection ; 8889 <chat-server> start-server
Or you can connect to my running chat server:
telnet 8889
It's just a demo and I didn't implement any limits on your nickname or what you can send, though it would be easy enough to do so. Have fun, and please let me know if you can find any bugs.


randy7 said...

Hi Doug, very clear and very cool.

There is one thing, if you care to explain:
M: chat-server handle-managed-client*
readln dup f = [ t client (>>quit?) ] when

I am guessing the client is a symbol that contains the user tuple, bounded to the client's connection?

can you explain this concept of binding. how to use and how does it work. Thanks

Doug Coleman said...

Since threaded-server binds a lot of variables to your local namespace, I used this approach and made some helper words in managed-server:

: server ( -- managed-client ) managed-server get ;
: client ( -- managed-client ) managed-client get ;
: clients ( -- assoc ) server clients>> ;
: client-streams ( -- assoc ) clients values ;
: username ( -- string ) client username>> ;
: everyone-else ( -- assoc )
clients [ drop username = not ] assoc-filter ;
: everyone-else-streams ( -- assoc ) everyone-else values ;

You can see which symbols are set by the server: managed-server, managed-client. It really helps for readability and the code would look too long and repetitive without these helper words. So the snippet of code you posted sets the quit slot on the client in the current scope when readln returns f, meaning end of stream. The current scope has the client you're talking to immediately and the server, and the other clients are accessed through the server symbol-- this is done by managed-server and works really well. Another approach would be to pass around a tuple with all the state, but this makes stack shuffling harder.

randy7 said...

Yes, indeed it is very readable.
I haven't (yet) used this technique - I believe it can simplify and give an easier approach when programming. Still I'm wondering how it works. that is, how does the chat server know not to confuse between the different clients.
I browsed the managed-server code in git, but didn't see where the binding is happening. can you point me to that line, please?
I think these parts are needed in mine or other users' understanding in order to create new servers.
Again that was a very nice explanation, and I love these great abstractions, they make new things accelerate based on them. I like it when the code is so simple, and the promise of bugless code is again shown :)

Doug Coleman said...

Lines 69,125 of io.server.connections sets the socket addresses and the threaded-server.
Lines 75, 82 in managed-server sets the managed-client/server.

Each client gets its own scope because the with-stream combinator (called in handle-client in io.server.connectin) creates a new namespace (a new hashtable on the namestack).

darrint said...

Can you elaborate a bit on what you meant by "live coding on the server"? Was it that you changed the case statement to something more dynamic without restarting? Was your listener ui running on a different machine from your server? Can I have a pony?

Doug Coleman said...

By live coding on the server I meant that I could add features without restarting Factor or disconnecting anyone. I had Factor running in a terminal with the chat server running in the in-thread combinator (so I could still type) and another terminal was connected so I could chat/test. Restructuring the server command handling caused me to have to restart because I got the code wrong at first but after adding /who and /time you could do a /help and see the new commands.

So the workflow was: run the server in-thread, edit files, refresh-all, and test without restarting. Sorry I didn't make this more clear in the blog post -- blogging about code with Blogger really sucks because I get bogged down formatting everything for whitespace and escaping html elements instead of writing actual content.

darrint said...

I'm really glad to see you meant live coding the way I hoped you meant it.

I'm hoping for a day where there's a Firefox based ui listener that I can hook up to running factor instances on my servers. (Maybe with some ssh tunneling or whatever.)

Anyway, very cool. Glad you wrote that up.

Blogger said...

Bluehost is the best web-hosting provider for any hosting services you require.