Monkey Project joins Google Summer of Code 2013

thanks Google for the opportunity! , this week the results for the Mentoring Organizations for Google Summer of Code 2013 were released. I am very happy to announce that Monkey Project will be part of the program as second consecutive year.

We are a very small group of people developing an open source stack for web technologies and take part of the program means a big recognition, we are glad to be able to mentor students that will be hacking with us during three months.

Our last year experience was very positive, most of projects were developed successfully and those students have take an important role in our community and project development in general, so we have high expectations for this second round, as you can see there is a good scope of project ideas to develop in areas such as protocols (SPDY), OSX, Raspberry Pi, Web Services, etc.

You can get more details about the incoming work in our Monkey GSoC project ideas page.

Architecture of a Linux based Web Server

this post is all about Monkey HTTP server, i consider that is very important that besides we open our source code, the developers be also able to describe the project internals, there is no black magic, just thousands of hours of effort in programming, testing and improvement.

Introduction

Monkey is an open source project started on 2001 with the goal to learn C, the long story is here . Along this years, the code have been improved in many aspects, since nomenclatures to heavy architecture changes, all have been made for good and nowadays thanks to the community of core developers and contributors around the project, Monkey is one of the top performance web servers around, and i would claim that the best option for Embedded Linux.

Understanding the basics of a human readable protocol: HTTP

The Hyper Text Transfer Protocol is basically a language with simple grammar to communicate two components: a HTTP client and a HTTP server. In a common context, the communication starts from a client performing a request to the server and for hence the server replying back with some result for the request performed. As a result we can consider a status response plus a content or simply an error.

Each HTTP request performed by the client is composed by a request method, URI, protocol version, and optionally a bunch of headers, so described that, we can say that a server must take care of:

  • Listen for new connections
  • Accept connections
  • Once the connection is accepted, start reading the HTTP request sent by the client
  • Parse the HTTP request, understand what the client wants
  • Depending of the request type, the sever can: serve some content, close the connection because some exception, proxy back the request to somebody else, etc.
  • Close the connection or keep it opened waiting for more requests. This depends of the protocol version and client HTTP headers.

Depending of the server target, it can be implemented in many ways with different architecture strategies, so the architecture described in this post only aims to describe what have worked better for us in terms of high performance and low resources usage.

Architecture design facts

  • Monkey is a web server designed with a strong focus in Linux. It do not aims to be portable across other operating system, focusing in the top and widely used mainstream operating system allow us to put our energies and effort in one place in the best way, and of course take the most of Linux Kernel to achieve high performance.
  • Event driven: well known as asynchronous, an event driver web server aims to use non-blocking system calls to perform it works reducing the computing time in the user-space context, e.g: if we are sending a file content to a client, we do not block the whole process or thread when sending the data, instead we instruct the kernel through a system call to send N bytes from the file and just notify me where i am able to send more bytes, in the meanwhile.. i process other connections and send other pending data.
  • Embedded Friendly: our embedded context is Embedded Linux, we care a lot of resources consumption, that means that under a heavy load don’t use more than 2.5MB of memory. Even Monkey binary size is around 80KB, once is load in memory it takes like 350KB, and depending of the load, more resources can be needed.
  • Small core, flexible API: it implements a basic core to handle HTTP protocol, it exposes a flexible API through the plugin interface where is possible to hook plugins for transport layer, security, request type and event handlers.

Contexts

In Monkey, we have defined two contexts of work: process context and thread context. The process context represents the main process waiting for incoming connections and the scheduler balancing the new connection for the worker threads. The thread context belongs to each thread working the active connections:

The number of workers are defined in the configuration, it scale properly well in single and multi-core CPUs solutions. There is no need to set thread affinity through CPU masks, the Linux Kernel Scheduler is smart enough to assign CPU time to each worker request, by default all workers are assign to all CPUs.

From a system administrator point of view, is possible to assign each worker to a different set of CPUs, but this approach is not suggested unless we are totally aware about what the Linux scheduler does in terms of interruptions,  context switches and CPU time for Kernel and User space applications. Do it only if you can do it better than the running scheduler.

Scheduler

Before to enter in the server loop, the scheduler launch and initialize each worker, taking care of set the initial data structures and the interfaces for the interaction between the components mentioned, this stage involves the creation of a epoll(7) queue per worker. Is good to mention that each epoll(7) queue created through epoll_create(2) is managed through a specific file descriptor.

Once the workers are up and running, the next Scheduler job is to to manage the incoming connections. So for each new connection accepted, it determinate who is the lowest loaded worker and assign the connection to it. The chosen worker is the one that have less connections in its epoll(7) interface, so the scheduler  goes around the worker counters and chose one. On this specific point the scheduler have two file descriptors: the connection file descriptor returned by accept(2) and the file descriptor that represents the epoll(7) of the chosen worker. So it basically register the new file descriptor in the proper epoll(7) queue.

Workers

Each worker or thread, runs in an infinite loop through the epoll(7) interface, which is basically a Linux specific polling mechanism to register, enqueue and notify about events in file descriptors registered by the Scheduler (sockets on this case).

The worker stay in a loop waiting for events in the epoll_wait(2) system call. Every time the Scheduler register a new file descriptor, an event will be reported in the worker epoll(7) interface, and it will do same when for subsequent events such as “there is data available for read” (EPOLLIN), “now you can write to the socket” (EPOLLOUT), “connection closed” (EPOLLHUP), etc.

So for each event triggered, the worker keeps a status of the connection to determinate if is a new connection, its receiving the HTTP request, HTTP request completed, parsing the request or sending out some response. Besides events, every a fixed time of seconds set in the configuration, it checks the connections that timed out due to an incomplete request or another anomaly.

Plugins Architecture

Monkey defines three categories of API where the plugins can hook: Context, Events, Stages and Networking.

Context
Define callbacks  that can be invoked when the server is starting up, it covers the process and thread contexts described earlier.

Events
For every type of event reported in a worker loop, a plugin can implement a hook to perform specific actions:

Stages
Every new connection, enter in a stage status, so for each step of the HTTP cycle it passed along different phases, and each plugin can hook to a specific one:

Networking
Monkey is not aware about networking, for hence it intentionally depends of a plugin that provides the transport layer, this approach allows to change from common sockets communication to encrypted one as SSL in a easy manner. The networking plugin only needs to provide the required API functions for the communication:

Scaling up

Every time a connection have performed a successful request, this is allocated in a global list of the worker scope (implemented through a pthread_key). for each event reported, the worker needs to lookup the internal data associated to it, so the file descriptor or socket number  acts like a primary key for the search. The solution of data structure implemented for Monkey v1.2, is the use of red-black tree algorithm. This algorithm have shown to behave very fairly and scalable when handling thousands of active connections per worker, maintaining a good balance between performance and cost.

The cost of each file descriptor lookup is critical for the server performance, having a O(n) solution will work fine for a few connections but under high concurrency a O(log(n)) solution will end up providing the highest performance.

Memory Management

One of the success key to reduce overhead in a server, is to reduce as much as possible the memory allocation requests performed  to the system within the main loop. Current Monkey implementation only performs 1 memory allocation per new connection, if it needed because the incoming request will post too much data, it will allocate more memory as it needs. Other web server solutions implements caching mechanism to reduce even more the memory allocations, as our focus is Embedded Linux we focus into speed at low resources usage, and implement a caching mechanism will increase our costs. So we dropped that common approach to do not abuse of system memory, just a decision based in the target.

Linux Kernel system calls

The Linux Kernel exposes a useful of non-portable set of system calls to achieve high performance when creating networking applications. The first one is epoll(7), as described earlier this interface allow to watch a set of file descriptors for certain defined events. Similar solutions like select(2) or poll(2) do not perform so well as epoll(7) does.

When sending a static file, the old-fashioned way is to open the file, get the file descriptor and perform multiples read(2)/write(2) to write out the file content. This operation requires the Kernel to copy data between Kernel and User spaces back and forward which obviously generate an overhead. As solution, the Linux Kernel implements a Zero-Copy strategy through the system call sendfile(2). This system call do not copy data to user space, instead it allows to send it directly to other file descriptor achieving good performance reducing the latency of the old fashioned way described.

In our architecture, the Logger plugin requires to transfer data through a pipe(2)  (a unidirectional data channel that can be used for interprocess communication). A common mechanism is to use read(2) and write(2) on each end, but in a similar way as sendfile(2) works, a new system call takes place for this kind of situation called splice(2). This system call moves data from one point to other without the copy-data overhead. The main difference between sendfile(2) and splice(2), is that splice(2) requires that one end must be a pipe(2).

In my previous post, i mentioned how to usage the new Linux Kernel feature called TCP_FASTOPEN, being something very simple to implement, it requires the cooperation of both sides: the client and the server. If you have full control of your networking application (client and server), consider to use TCP_FASTOPEN, it will increase performance decreasing the TCP handshake roundtrip.

Monkey Plugins

Based in the architecture and API described, the following plugins are distributed as part of the core:

Liana: basic sockets connectivity layer

PolarSSL: provides a transport layer based in SSL

Cheetah: plugin that provides a command line interface to query the internals of a running server through a unix socket

Mandril: security layer that aims to restrict the access by URI strings or sub networks.

Dirlisting: directory listing

Logger: log writer

CGI: old fashioned CGI interface

FastCGI: provide fast-cgi support

 

Bonus track: Full HTTP Stack for web services implementation

Besides to be a common web server to serve static or dynamic content, Monkey is a full stack for the development of web applications. In order to provide an easy API for web application or web services development, we have created Duda I/O , which is an event-driven C framework for rapid development based in Monkey stack.

Duda implements a core API of pseudo-objects and provide extra features  through a packages system, everything in a friendly C API. The most relevant features supported at the moment are WebSocket, JSON, SQLite3, Redis, Base64 and SHA1.

Due to it high performance nature and open source ecosystem around, is being used in production from Embedded Linux products to Big Data solutions. The License of Duda allows to create closed-sourced services or applications and link them to Duda I/O stack at zero cost.

For more details please refer to Duda I/O main site.

Monkey organization believes in Open Source and is fully committed to create the best networking technology for different needs. If you are interested into participate as a contributor or testing our stack, feel free to reach us on our mailing lists or irc channel #monkey at irc.freenode.net.

Linux TCP FASTOPEN in your sockets

Few years ago the concept of TCP_FASTOPEN (TFO) was introduced as a solution to improve performance on TCP connections reducing one roundtrip of the handshake process. The first operating system that implements TFO is Linux and have been demonstrated good improvements when used in a common network.

The implementation in the Linux Kernels have been made by parts, being Linux Kernel 3.6.1 the first one into implement the client side requirements and then Linux Kernel 3.7 who implements the server side socket behavior.

Client side

In a common TCP client flow, the following calls takes place:

/* create socket file descriptor */
fd = socket(domain, type, protocol);

/* connect to the target server/port */
connect(fd,...);

/* send some data */
send(fd, buf, size);

/* wait for some reply and read into a local buffer */ 
while ((bytes  = recv(fd, ...))) {
    ...
}

When using TCP_FASTOPEN the behavior its a little different. You not longer need to use connect(2), instead you use sendto(2) and it also gives you the opportunity to let the Kernel buffer some initial outgoing data. For short, the call sendto(2) its like an implicit connect(2) and send/write(2) same time:

/* create the socket */
fd = socket();

/* connect and send out some data */
sendto(fd, buffer, buf_len, MSG_FASTOPEN, ...);

/* write more data */
send(fd, buf, size);

/* wait for some reply and read into a local buffer */ 
while ((bytes  = recv(fd, ...))) {
    ...
}

Server side

A common (old-fashion) TCP server is created with the following calls:

/* create the socket */
fd = socket();

/* connect and send out some data */
bind(fd, addr, addrlen);

/* this socket will listen for incoming connections */
listen(fd, backlog);

Adding TCP_FASTOPEN support to the server side code is very easy, the required changes are minimum, you only need to set a new socket option between bind(2) and listen(2):

/* a hint value for the Kernel */
int qlen = 5;

/* create the socket */
fd = socket();

/* bind the address */
bind(fd, addr, addrlen);

/* change the socket options to TCO_FASTOPEN */
setsockopt(sockfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));

/* this socket will listen for incoming connections */
listen(fd, backlog);

Required macros

Even you are running the latest Linux Kernel 3.8, you will face some problems as in most of the cases the required macro values for  TCP_FASTOPEN and MSG_FASTOPEN will not be available  at compile time. As a workaround you can include the following code in one of your header files:

/* 
 * A generic protection in case you include this 
 * from multiple files 
 */
#ifndef _KERNEL_FASTOPEN
#define _KERNEL_FASTOPEN

/* conditional define for TCP_FASTOPEN */
#ifndef TCP_FASTOPEN
#define TCP_FASTOPEN   23
#endif

/* conditional define for MSG_FASTOPEN */
#ifndef MSG_FASTOPEN
#define MSG_FASTOPEN   0x20000000
#endif

#endif

Enabling TCP_FASTOPEN in your Kernel

By default the TCP_FASTOPEN feature is not enabled at runtime (unless you instructed that in the sysctl.conf file). Before to test this new feature make sure is enabled with the following command:

# echo 1 > /proc/sys/net/ipv4/tcp_fastopen

If you try to use a client with TCP_FASTOPEN enabled, its mandatory that the server have this same option set in the listener socket, otherwise the client will faill at the connection phase (due to protocol mismatch).

For a TCP_FASTOPEN server, it does not matter if the client uses the new protocol or not, it will work anyways. So if you develop a TCP server you can give it a try adding a simple system call to add this feature. If your project is open source, feel free to use the header macros example provided above.

Btw, of course Monkey Web Server have added this feature recently in our development repository

References


GSoC Mentors Summit 2012 at Google Plex

As a member of Monkey Project which have joined as a mentoring organization for the Google Summer of Code 2012 program, we were invited to assist to the Mentors Summit conference at Googleplex in California, US. Two members of our community flight to Google to represent the organization: Felipe Reyes and I.

Be part of a project who was selected for GSoC 2012 is really exciting, because of the recognition as a solid open source project and the given opportunity to mentor three students around the world and instruct them about collaboration and core development in our project. It was a hard work and at the end, the other exciting part begins: the Mentors Summit.

The event took place in California, it started on Friday 19th at Wild Palms hotel with an open dinner around the pool and free beer for everyone, no formal things, just eat together and met great people behind each project :)

On Saturday 20th, we went to the famous Googleplex, to take breakfast before the event starts, and i cannot omit to mention that is TRUE, Google have a great free-food services for everyone, i was amazed with the great details that they have for their employees, the place is well designed with a lot of colors around and the general campus is pretty friendly.

We ran into different (and parallel) sessions with topics about GSoC it self and technical things about each project, it was an unconference so each people proposed their own topic in a board with a flexible schedule. Honestly i was not prepared to give a talk as i was not aware that we could propose technical sessions.. but well, we sign up for a talk about Monkey, we mostly talked about project internals and Duda web services, it went pretty well, interesting discussions about the project were raised and new horizons could come…

I attended some technical and GSoC sessions, it was a really good opportunity for Google and the mentoring organizations to discuss ways to improve the program, i am impressed about how the Open Source department is committed to help organizations to grow and create networks with other projects. Its a difficult work but the synergy around people involved in this program makes things easier, everybody was open to contribute, i would say that GSoC more than a program is a real community, an open program it self.

What a surprise, I finally met the good guys from the Open Source Lab, who we have been working together with us for about two years, they provide and maintain our hosting infrastructure, thanks!. In the right James Lopeman from Kernel.org .

OSL at left, Kernel.org at right

 

At night, social activities continues in the hotel with a new dinner around the pool and free drinks, have fun, share different technical interests and more and more…

 

 

 

 

 

 

 

 

 

 

 

 

Summary

A great event, i went without clear expectations it was something totally great, i have attended many conferences in the past and i have to say that this have been on the best in terms of organization, people and objectives.

I hope Google Summer of Code runs again in 2013, if so, we will do the best to get Monkey in.

Duda I/O: Websocket Chat

There is not much to explain, Websocket is being used widely for realtime notifications over the web and Duda I/O supports websocket through a package. I have written a simple chat example at server side to demostrate how it can be used, the front-end part is a tweaked client where i just performed minor modifications:

The interest part is the service code side:

#include "webservice.h"
#include "packages/websocket/websocket.h"

DUDA_REGISTER("Duda I/O Examples", "WebSocket Chat");

void cb_on_message(duda_request_t *dr, ws_request_t *wr)
{
    websocket->broadcast(wr, wr->payload, wr->payload_len, WS_OPCODE_TEXT);
}

void cb_handshake(duda_request_t *dr)
{
    websocket->handshake(dr);
}

int duda_main()
{
    /* Load the websocket package */
    duda_load_package(websocket, "websocket");

    /*
     * Define a callback, on every websocket message received,
     * trigger cb_on_message.
     */
    websocket->set_callback(WS_ON_MESSAGE, cb_on_message);

    /* Associate a static URL with a callback */
    map->static_add("/handshake/", "cb_handshake");

    /* Initialize the broadcaster interface */
    websocket->broadcaster();

    return 0;
}

In duda_main() we initialize the web service, loading the websocket package and setting a callback function to invoke when a websocket message arrives. Then we map the URL path who wWebsocket handshake and finally we instruct the websocket package to launch the Broadcaster service, this last one is necessary if you want to send broadcast messages.

Getting started
If you want a simple steps to try this example do:

  • git clone git://git.monkey-project.com/dudac
  • git clone git://git.monkey-project.com/duda-examples
  • cd dudac/ && ./dudac -g
  • ./dudac -w /path/to/duda-examples/050_websocket_chat/

Now you can point your browser at http://localhost:2001/wschat/

Documentation

For more details about the Websocket package and its available methods, please refer to the Websocket API documentation

Duda I/O

Since a few months i have been working in a C web services framework called Duda I/O, this can be considered a child of Monkey Project. Duda I/O runs on top of Monkey and aims to expose a C friendly API for building fast and scalable web services. It’s totally open source under the LGPLv2.

 

Framework Components


DudaC: Formally Duda Client Manager, it is a helper for the development and easy deployment of web services. It takes care of the setup of the development environment cloning the respective stack components and building each one. It also allows to run a web service on fly just pointing to it source code.

Duda Plugin: This plugin is an extension for Monkey Web Server, it mainly wraps the Monkey API and expose a more friendly C API for building web services. This plugin also takes care to hide the complexity of the HTTP stack in terms of threading, balancing and asynchronous socket events.

HTTP Server: As mentioned earlier, the HTTP stack is powered by Monkey, a high performant and Open Source Web Server. Monkey is a HTTP/1.1 non-blocking web server implmented through a strategy of a number of fixed threads each one holding their own events queue. Its pretty scalable and can take the most of SMP systems. 

Web Services: Web Service is a software component built on top of Duda Plugin API which execute different instructions through a mapping of HTTP URL requests and callback functions. In technical terms is a shared library loaded by Duda on runtime.

 

Features


Non-blocking: The whole HTTP stack is based in the non-blocking model for sockets, this means that it works on top of asynchronous events. Each working thread can scale to thousands of active connections. Is good to mention that a non-blocking model will not reduce the computing time or delays caused by the blocking calls used in your web service.

Lightweight: For a normal web service running, the global size of the running components in memory can be around of 400KB. The memory used will depends of your web service implementation and packages loaded. The stack components as Duda and Monkey aims to be lightweight and optimize the resources used.

Service oriented: One of the main features of Duda, is that it allow to register multiple web services under the same HTTP instance, as well each service can be assigned to a different Virtual Host (a Virtual Host can hold multiple web services).

Each service can map static URLs to specific callback functions or use the Map interfaces provided by Duda, this last one is pretty much similar to REST and provides a very useful set of methods to handle each request resources such as: methods, parameters and body content.

 

API Objects


When building/running a web service, a set of C pseudo-objects are exported to perform the setup and define callbacks for certains events, as well many objects are helpers to build responses and minimize the effort for the developer. Some API Objects available are:

  • Console
  • Cookie
  • Event
  • Param
  • Response
  • Request
  • Session

Each object expose a set of methods, for more details about the available methods for each objects refer to the API documentation.

 

API Packages


Besides the built-in API Objects available, Duda supports a packages system which aims to load on demand external objects to extend the core API capabilities. Some packages available are the following:

  • Base64
  • JSON
  • SHA1
  • SQLite
  • Websocket

Packages are included in Duda per users demand, if you miss some specific package functionality let us know to consider it development and further inclusion.

If you want to know more about Duda please refer to the following links:

more news coming soon, if you want to stay tuned make sure to register in the new mailing list…

GSoC and Monkey Roadmap

GSoC Update
we are almost finishing the Google Summer of Code program and the students have done a great job on Monkey Project, i have to admit that i am quite impressed by the quality of the work delivered. It has been a good experience for both sides, in name of the community i can only say “good job!”.  I will share more details and a final evaluation once the program ends officially, as well you will see a mini-post in the Google Code blog early in September.

Monkey Roadmap

The latest release version of Monkey is v1.0.1, in our development repository we have created a the branch for v1.1.0, this last one is in code-freeze status, that means that we are preparing the release and the a few tasks are involved such as: minor bug fixing, test the code base, verify possible regressions, do cleanups, package the binaries for Debian/Ubuntu, write the release notes, etc. So we hope to have some news at the end of this week, more detail will be shared in the release notes.

In GIT Master, we keep the good work for Monkey v1.2 and we are planning the release for mid of October 2012, the most relevant features listed at the moment is the inclusion of the FastCGI plugin and the replacement of SSL layer provided by MatrixSSL in favor of the SSL layer provided by PolarSSL.

Duda Roadmap

Our web services framework is still under active development, we are still working in some improvements before to deliver an official release, if you are interested on how is going you can check the development repository, or as well check the current API documentation.

For any of the project listed, If you have a desired feature/extension, please let us know, we are focusing on delivering a high quality open source web server stack based on people needs :)

About Monkey v/s G-Wan and reputation

Weeks ago i performed some benchmarks to test Monkey project againts other webservers, i have included the test for one closed source web server which claims to be (or not to be?) the fastest on earth: GWan. My benchmark shows that Monkey is faster than GWan under certain conditions: number of request per second for a file >= 200KB.

One person around our community started to perform his own bencharks and shared his findings with Pierre the author of GWan. Looks like Pierre’s feelings were hurt and his ironic words expressed a strongly disagree because there is a project that performs better than G-Wan. He started to put in doubt my reputation and abused of the usage of my employer’s name:

What makes an expert “cheat”? http://gwan.ch/blog/20120728.html

Oracle and I

Pierre, let’s make the things clear here. Oracle is my employer and it have not any relationship with the Monkey open source project. It looks like a lack of professionalism to abuse of Oracle name to increase visibility and try to pseudo-expose your startup in a different manner, why to mention the ‘Oracle‘ word 12 times without need it ?, there are other ways to positionate your blog with the search engines.

 

G-Wan benchmaks

The G-Wan author decided the best way to bencharmk his software, so he went straigforward to benchmark under two specific static conditions:

  • KeepAlive : all test are made under keep alive
  • Small file sizes: the tests are performed with a file of 100 bytes

If the benchmark method is static as the used here, its useful when measuring current version againts development version, and see how it behave each other, and this is good. But you cannot use the same method to measure other web servers, let’s perform a brief analysis:

KeepAlive

The KeepAlive HTTP feature (default in HTTP/1.1), aims to keep the communication channel open between the client and the server to perform multiple request in a FIFO way, this approach reduces latency due to the avoidance of the TCP handshake and less network traffic. In a HTTP KeepAlive session for example, you can perform 1000000 request over the same persistent channel if the server allows that.

As stated before, using HTTP KeepAlive the transfer of user-level data is faster. But there is an important point that we cannot omit to mention when we are under a *benchmark* context: if the web server use threads to balance the work due to SMP or other need, using KeepAlive sessions we will *hide* the overhead of the connection balancing between workers and for hence the internal scheduling and time to start processing each request. So use KeepAlive for benchmarking is useful to measure just a part of the web
server core and *not* a fundament to determinate which is the fastest solution in the world.

From a real-life perspective, an HTTP Client (browser) using KeepAlive, rarely will perform more than 25 request over the same channel. So the tests under KeepAlive does not reflect how the Internet/HTTP world behave.

Small file sizes

G-Wan perform caching over requested files in memory when they are pretty small, so when testing a 100 bytes file, this is not hitting I/O and is not requiring to perform extra expensive system call. This is the common approach over almost all web servers, a few KB of extra memory is not bad to avoid I/O.

Caching is good, but testing againts the same small cached file, it’s just trying to determinate how fast is accessing memory buffer and sending out the same data.

Now having describe how the static benchmark is done, sounds like a good plan to determinate that G-Wan is the fastest solution available testing *only* under KeepAlive mode plus the same small cached file for all requests ?, not really.

Every person who wants to benchmark G-WAN and break one of those rules (KeepAlive/small file), will realize than Monkey and NginX performs faster than G-Wan.

G-Wan author claims that testing in non-keepalive mode is equal to “test the TCP IP stack rather than the user-mode code of the server”, which is totally wrong, a web server depends on TCP/IP stack and meassuring the performance of a web server is more than test a simple access to a memory buffer in user space.

Measuring tool

The G-Wan benchmarks are done using a wrapper utility called ‘ab wrapper’, described by Pierre as the “most capable tool” (http://gwan.ch/source/ab.c), the sad part is that it cannot perform well with files of a few KB, it get stuck when used with Weighttp backend or when is used without keepalive mode. It’s a good idea of tool as it takes snapshots of user-mode/kernel-mode stats once a concurrent round of request is finished.

As the code of ab.c it’s not legible and is spawning third party utilities to perform it’s job, i wrote a similar tool but based in proc filesystem, the tool name is ‘watch resources‘ (aka wr) and i have published the code on GitHUB:

http://github.com/edsiper/wr

So with this new tool i have performed a new set of tests of benchmark using concurrency as ab.c does and having more accurate memory usage for the results.

Benchmark: Monkey v/s G-Wan

This test have been made in the following way:

  • Wrapper by Watch Resources tool
  • Weighttp backend to stress the server
  • The URL tested hits a file of 200KB
  • Concurrency tests from 100 to 1000 (100 concurrents to 1000), increasing the level with 10 concurrents.
  • KeepAlive enable
  • Backend stress tool with 10 workers
  • Each round did 500.000 requests

Requests per second

  • Monkey did an average of 29231 requests per second
  • G-Wan did an average of 17222 request per second

Monkey did 42% more requests than G-Wan under the same conditions

Memory Usage

Its expected to see high level of memory consumption as the concurrency hits a major load in the server so the goal is to optimize the resources and reduce memory allocations when they are not necessary:

  • Monkey consumed an average of 3.4MB along the whole test
  • G-Wan consumed an average of 7.09MB along the whole test

G-Wan consumed 52% more memory than Monkey for the same test

User and system time

In the Linux Kernel (OS) design which comes from Unix family, exist two executions context or virtual address spaces: User space and Kernel Space (or System space). The User space belongs to the container of user applications and related resources plus an interface to communicate to the Kernel, as well the Kernel Space and its roles covers I/O, Memory allocations, Scheduling of user space tasks and others.

All job that occurred in a user process/task and which do not require a direct Kernel intervention is called user space, everything else is Kernel Space. We call User-time to the CPU cycles consumed by the user user task, and we call Kernel/System time to the CPU cycles consumed by the user space task through a system call to the Kernel.

So this metric shows how much CPU time is spent in user and system space, and depending of the point of view it could be good or not so good:

  • Monkey user time: 5267 milliseconds
  • G-Wan user time: 3830 milliseconds
  • Monkey Kernel/System time 36159 milliseconds
  • G-Wan Kernel/System time 50999 milliseconds

G-Wan have focused too much into reduce user-time being a non-friendly program running under the Linux Kernel. Monkey is a project devoted to run over a Linux Kernel and that’s the reason about why it runs pretty optimized. Having a basic knowledge of Linux system calls is not hard to achieve a great performance.

In the blog post mentioned earlier in the graphics comments it states:

As opposed to Monkey, with G-Wan, the Kernel is using CPU resources faster than G-Wan user mode CPU usage. As a result, the Linux Kernel is the bottleneck far before G-Wan

The previous comment, denotes a lack of understanding about how the Linux Kernel works internally, there is no knowledge of user/system spaces or scheduling. Referring to the G-Wan project history it comes from Windows so we can excuse the lack of knowledge on Linux. Rarely the Linux Kernel at this operation level is a bottleneck.

For short, in order to avoid misunderstandings i already have learned my lesson and i will provide more detailed benchmarks from now. In the meanwhile G-Wan have a couple of things to fix.. starting from it design ?.

Other points that i forgot to mention:

  • The test in Monkey took 25 minutes and 58 seconds. G-Wan took 44 minutes and 23 seconds
  • If you want to validate the information on this post, you can download the source reports and graphics from here.
  • I am emaling Pierre about this blog post