SNAKELETS - Python Web Application Server

Benchmark: how fast is Snakelets?

Snakelets 1.40 Benchmark results
(updated with new server and new Snakelets version)

Test setup:
server machine: an AMD Athlon XP 2400+, 1Gb RAM, Mandrake Linux 10.1, Snakelets 1.40 without plugins and loglevels on WARNING, Python 2.4.1, 3Com 100Mbit network interface
client machine: Intel Pentium 4 3.40 Ghz, 1Gb RAM, Windows XP (sp2), onboard BroadCom 100Mbit network interface, using ab.exe ApacheBench, Version 2.0.40-dev <$Revision: 1.121.2.4 $> (Do not use another version, I observed weird bugs in other versions)
Both computers areconnected by a switched 100Mbit LAN using a Gigabyte-brand router.

Serial test, no concurrent requests: ab.exe -n 1000 -c 1 <testurl>
Concurrent test, 10 concurrent requests: ab.exe -n 1000 -c 10 <testurl>
Concurrent test, 60 concurrent requests: ab.exe -n 1000 -c 60 <testurl>
More than 60 concurrent requests caused ab.exe to crash on Windows XP.

Serving static HTML file, 41148 bytes
Without sendfile(2) support Serial Concurrent (10) Concurrent (60)
Requests per second 64.06 64.06 64.06
Time per request (msec)
(mean, across all conc. req's)
15.609 15.609 15.609
Transfer rate Kbyte/sec 2589.28 2589.28 2589.28
With sendfile(2) support Serial Concurrent (10) Concurrent (60)
Requests per second 64.00 64.06 64.06
Time per request (msec)
(mean, across all conc. req's)
15.625 15.609 15.609
Transfer rate Kbyte/sec 2586.69 2589.28 2589.28

You can see that Snakelets on my modest server machine is capable of filling about 25% (~2.5 mb/sec) of the maximum bandwith possible on the 100 Mbit lan, using a file of about 40 kb. Bigger files yield better bandwith efficiency. When using a 400Kb testfile, the transfer rate went up to 8.7 Mb/sec.

The use of sendfile(2) didn't have any influence on the transfer speed.

It is also interesting to note that the concurrency doesn't make any difference on the performance (at least up to 60 concurrent requests, I could not test more because ab.exe crashed with more).

To disable the effect of the actual transfer of the HTML data over the network, and thus to see what would be about the maximum number of requests that Snakelets can process per second, I also tried with a HEAD request instead of actually downloading the HTML file (ab.exe option -i). The results are:
Requests per second: 63.87 [#/sec] (mean); Time per request: 15.656 [ms] (mean, across all concurrent requests).

Using it without the network (on the same server as Snakelets is running) gives a rather different picture. Transferring the file with concurrency yields 451 request/sec with 18.6 mb/sec throughput. With 60 concurrent connections it is 385 requests/sec with 15.9 mb/sec throughput. The #requests per second using only HEAD jumps up to 539 with time per request: 1.85 [ms].

Serving simple .y page, 39107 bytes

The Ypage that was used uses character encodings, and contains a simple loop of 10 iterations that generates 39107 output bytes.

  Serial Concurrent (10) Concurrent (60) Concurrent (60) local
Requests per second 63.75 64.06 64.06 620
Time per request (msec)
(mean, across all conc. req's)
15.688 15.609 15.609 1.61
Transfer rate Kbyte/sec 2449.34 2461.60 2461.60 524
Cpu usage (user/idle) 16% / 77% 16% / 77% 16% / 77% 68% / 0%

Because Ypages are dynamic pages with embedded scripts, some more processing (cpu usage %) is going on to be able to send a response. But this depends heavily on what your Ypage does; if it does more complex processing, performance goes down ofcourse. See below for the cpu usage of a static file without CPU processing required.

Notice that the requests per second are almost the same as with the static HTML page. I don't know why that is, it seems that my network or router is the bottleneck here.

CPU usage of the python process when getting the static HTML file of 41148 bytes

This has been observed by using 50000 requests and running 'top' on the server machine.

Without sendfile(2) support
Serial
user=10%, system=3%, idle=83%
Concurrent (10)
user=10%, system=3%, idle=83%
Concurrent (60)
user=10%, system=3%, idle=83%

Snakelets can keep up nicely.

With sendfile(2) support
Serial
user=9%, system=1.5%, idle=87%
Concurrent (10)
user=9%, system=2%, idle=86%
Concurrent (60)
user=9%, system=2%, idle=84%

When the sendfile(2) system call extension is available, the picture is a bit different, but only slightly.

I'm looking for a better benchmark tool to run under windows XP... one that can handle way more than 60 connections...

Considerations

Snakelets spawns a thread per request. Because of Python's GIL (Global Interpreter Lock), the concurrency between those threads might not be as high as you might think. But if the threads are busy doing I/O, other requests can be processed. This works better when the sendfile(2) system call extension is available (get it from http://www.snakefarm.org)

Furthermore, Snakelets does not yet support HTTP keep-alive requests or HTTP pipelining. This means that for every HTTP request a new socket connection must be made. This slows down things very much if you're doing many small requests.

It also seems that my network is somehow limiting the amount of connections per second to about 65. I don't know why this is. Perhaps my cheap-ass router is at fault here :-)