thanks Google for the opportunity! , this week the results for the Mentoring Organizations for Google Summer of Code 2013 were released. I am very happy to announce that Monkey Project will be part of the program as second consecutive year.

We are a very small group of people developing an open source stack for web technologies and take part of the program means a big recognition, we are glad to be able to mentor students that will be hacking with us during three months.

Our last year experience was very positive, most of projects were developed successfully and those students have take an important role in our community and project development in general, so we have high expectations for this second round, as you can see there is a good scope of project ideas to develop in areas such as protocols (SPDY), OSX, Raspberry Pi, Web Services, etc.

You can get more details about the incoming work in our Monkey GSoC project ideas page.

today happened something that i was told many times that will not happen: iOS hanged in my Wife’s iPad, it wrote some trace messages alerting a Kernel Panic.

Kernel Panics are Apple’s friends too ūüôā

this post is all about Monkey HTTP server, i consider that is very important that besides we open our source code, the developers be also able to describe the project internals, there is no black magic, just thousands of hours of effort in programming, testing and improvement.


Monkey is an open source project started on 2001 with the goal to learn C, the long story is here . Along this years, the code have been improved in many aspects, since nomenclatures to heavy architecture changes, all have been made for good and nowadays thanks to the community of core developers and contributors around the project, Monkey is one of the top performance web servers around, and i would claim that the best option for Embedded Linux.

Understanding the basics of a human readable protocol: HTTP

The Hyper Text Transfer Protocol is basically a language with simple grammar to communicate two components: a HTTP client and a HTTP server. In a common context, the communication starts from a client performing a request to the server and for hence the server replying back with some result for the request performed. As a result we can consider a status response plus a content or simply an error.

Each HTTP request performed by the client is composed by a request method, URI, protocol version, and optionally a bunch of headers, so described that, we can say that a server must take care of:

  • Listen for new connections
  • Accept connections
  • Once the connection is accepted, start reading the HTTP request sent by the client
  • Parse the HTTP request, understand what the client wants
  • Depending of the request type, the sever can: serve some content, close the connection because some exception, proxy back the request to somebody else, etc.
  • Close the connection or keep it opened waiting for more requests. This depends of the protocol version and client HTTP headers.

Depending of the server target, it can be implemented in many ways with different architecture strategies, so the architecture described in this post only aims to describe what have worked better for us in terms of high performance and low resources usage.

Architecture design facts

  • Monkey is a web server designed with a strong focus in Linux. It do not aims to be portable across other operating system, focusing in the top and widely used mainstream operating system allow us to put our energies and effort in one place in the best way, and of course take the most of Linux Kernel to achieve high performance.
  • Event driven: well known as asynchronous, an event driver web server aims to use non-blocking system calls to perform it works reducing the computing time in the user-space context, e.g: if we are sending a file content to a client, we do not block the whole process or thread when sending the data, instead we instruct the kernel through a system call to send N bytes from the file and just notify me where i am able to send more bytes, in the meanwhile.. i process other connections and send other pending data.
  • Embedded Friendly: our embedded context is Embedded Linux, we care a lot of resources consumption, that means that under a heavy load don’t use more than 2.5MB of memory. Even Monkey binary size is around 80KB, once is load in memory it takes like 350KB, and depending of the load, more resources can be needed.
  • Small core, flexible API: it implements a basic core to handle HTTP protocol, it exposes a flexible API through the plugin interface where is possible to hook plugins for transport layer, security, request type and event handlers.


In Monkey, we have defined two contexts of work: process context and thread context. The process context represents the main process waiting for incoming connections and the scheduler balancing the new connection for the worker threads. The thread context belongs to each thread working the active connections:

The number of workers are defined in the configuration, it scale properly well in single and multi-core CPUs solutions. There is no need to set thread affinity through CPU masks, the Linux Kernel Scheduler is smart enough to assign CPU time to each worker request, by default all workers are assign to all CPUs.

From a system administrator point of view, is possible to assign each worker to a different set of CPUs, but this approach is not suggested unless we are totally aware about what the Linux scheduler does in terms of interruptions,  context switches and CPU time for Kernel and User space applications. Do it only if you can do it better than the running scheduler.


Before to enter in the server loop, the scheduler launch and initialize each worker, taking care of set the initial data structures and the interfaces for the interaction between the components mentioned, this stage involves the creation of a epoll(7) queue per worker. Is good to mention that each epoll(7) queue created through epoll_create(2) is managed through a specific file descriptor.

Once the workers are up and running, the next Scheduler job is to to manage the incoming connections. So for each new connection accepted, it determinate who is the lowest loaded worker and assign the connection to it. The chosen worker is the one that have less connections in its epoll(7) interface, so the scheduler  goes around the worker counters and chose one. On this specific point the scheduler have two file descriptors: the connection file descriptor returned by accept(2) and the file descriptor that represents the epoll(7) of the chosen worker. So it basically register the new file descriptor in the proper epoll(7) queue.


Each worker or thread, runs in an infinite loop through the epoll(7) interface, which is basically a Linux specific polling mechanism to register, enqueue and notify about events in file descriptors registered by the Scheduler (sockets on this case).

The worker stay in a loop waiting for events in the epoll_wait(2) system call. Every time the Scheduler register a new file descriptor, an event will be reported in the worker epoll(7) interface, and it will do same when for subsequent events such as “there is data available for read” (EPOLLIN), “now you can write to the socket” (EPOLLOUT), “connection closed” (EPOLLHUP), etc.

So for each event triggered, the worker keeps a status of the connection to determinate if is a new connection, its receiving the HTTP request, HTTP request completed, parsing the request or sending out some response. Besides events, every a fixed time of seconds set in the configuration, it checks the connections that timed out due to an incomplete request or another anomaly.

Plugins Architecture

Monkey defines three categories of API where the plugins can hook: Context, Events, Stages and Networking.

Define callbacks  that can be invoked when the server is starting up, it covers the process and thread contexts described earlier.

For every type of event reported in a worker loop, a plugin can implement a hook to perform specific actions:

Every new connection, enter in a stage status, so for each step of the HTTP cycle it passed along different phases, and each plugin can hook to a specific one:

Monkey is not aware about networking, for hence it intentionally depends of a plugin that provides the transport layer, this approach allows to change from common sockets communication to encrypted one as SSL in a easy manner. The networking plugin only needs to provide the required API functions for the communication:

Scaling up

Every time a connection have performed a successful request, this is allocated in a global list of the worker scope (implemented through a pthread_key). for each event reported, the worker needs to lookup the internal data associated to it, so the file descriptor or socket number  acts like a primary key for the search. The solution of data structure implemented for Monkey v1.2, is the use of red-black tree algorithm. This algorithm have shown to behave very fairly and scalable when handling thousands of active connections per worker, maintaining a good balance between performance and cost.

The cost of each file descriptor lookup is critical for the server performance, having a O(n) solution will work fine for a few connections but under high concurrency a O(log(n)) solution will end up providing the highest performance.

Memory Management

One of the success key to reduce overhead in a server, is to reduce as much as possible the memory allocation requests performed  to the system within the main loop. Current Monkey implementation only performs 1 memory allocation per new connection, if it needed because the incoming request will post too much data, it will allocate more memory as it needs. Other web server solutions implements caching mechanism to reduce even more the memory allocations, as our focus is Embedded Linux we focus into speed at low resources usage, and implement a caching mechanism will increase our costs. So we dropped that common approach to do not abuse of system memory, just a decision based in the target.

Linux Kernel system calls

The Linux Kernel exposes a useful of non-portable set of system calls to achieve high performance when creating networking applications. The first one is epoll(7), as described earlier this interface allow to watch a set of file descriptors for certain defined events. Similar solutions like select(2) or poll(2) do not perform so well as epoll(7) does.

When sending a static file, the old-fashioned way is to open the file, get the file descriptor and perform multiples read(2)/write(2) to write out the file content. This operation requires the Kernel to copy data between Kernel and User spaces back and forward which obviously generate an overhead. As solution, the Linux Kernel implements a Zero-Copy strategy through the system call sendfile(2). This system call do not copy data to user space, instead it allows to send it directly to other file descriptor achieving good performance reducing the latency of the old fashioned way described.

In our architecture, the Logger plugin requires to transfer data through a pipe(2)  (a unidirectional data channel that can be used for interprocess communication). A common mechanism is to use read(2) and write(2) on each end, but in a similar way as sendfile(2) works, a new system call takes place for this kind of situation called splice(2). This system call moves data from one point to other without the copy-data overhead. The main difference between sendfile(2) and splice(2), is that splice(2) requires that one end must be a pipe(2).

In my previous post, i mentioned how to usage the new Linux Kernel feature called TCP_FASTOPEN, being something very simple to implement, it requires the cooperation of both sides: the client and the server. If you have full control of your networking application (client and server), consider to use TCP_FASTOPEN, it will increase performance decreasing the TCP handshake roundtrip.

Monkey Plugins

Based in the architecture and API described, the following plugins are distributed as part of the core:

Liana: basic sockets connectivity layer

PolarSSL: provides a transport layer based in SSL

Cheetah: plugin that provides a command line interface to query the internals of a running server through a unix socket

Mandril: security layer that aims to restrict the access by URI strings or sub networks.

Dirlisting: directory listing

Logger: log writer

CGI: old fashioned CGI interface

FastCGI: provide fast-cgi support


Bonus track: Full HTTP Stack for web services implementation

Besides to be a common web server to serve static or dynamic content, Monkey is a full stack for the development of web applications. In order to provide an easy API for web application or web services development, we have created Duda I/O , which is an event-driven C framework for rapid development based in Monkey stack.

Duda implements a core API of pseudo-objects and provide extra features  through a packages system, everything in a friendly C API. The most relevant features supported at the moment are WebSocket, JSON, SQLite3, Redis, Base64 and SHA1.

Due to it high performance nature and open source ecosystem around, is being used in production from Embedded Linux products to Big Data solutions. The License of Duda allows to create closed-sourced services or applications and link them to Duda I/O stack at zero cost.

For more details please refer to Duda I/O main site.

Monkey organization believes in Open Source and is fully committed to create the best networking technology for different needs. If you are interested into participate as a contributor or testing our stack, feel free to reach us on our mailing lists or irc channel #monkey at

Few years ago the concept of TCP_FASTOPEN (TFO) was introduced as a solution to improve performance on TCP connections reducing one roundtrip of the handshake process. The first operating system that implements TFO is Linux and have been demonstrated good improvements when used in a common network.

The implementation in the Linux Kernels have been made by parts, being Linux Kernel 3.6.1 the first one into implement the client side requirements and then Linux Kernel 3.7 who implements the server side socket behavior.

Client side

In a common TCP client flow, the following calls takes place:

/* create socket file descriptor */
fd = socket(domain, type, protocol);

/* connect to the target server/port */

/* send some data */
send(fd, buf, size);

/* wait for some reply and read into a local buffer */ 
while ((bytes  = recv(fd, ...))) {

When using TCP_FASTOPEN the behavior its a little different. You not longer need to use connect(2), instead you use sendto(2) and it also gives you the opportunity to let the Kernel buffer some initial outgoing data. For short, the call sendto(2) its like an implicit connect(2) and send/write(2) same time:

/* create the socket */
fd = socket();

/* connect and send out some data */
sendto(fd, buffer, buf_len, MSG_FASTOPEN, ...);

/* write more data */
send(fd, buf, size);

/* wait for some reply and read into a local buffer */ 
while ((bytes  = recv(fd, ...))) {

Server side

A common (old-fashion) TCP server is created with the following calls:

/* create the socket */
fd = socket();

/* connect and send out some data */
bind(fd, addr, addrlen);

/* this socket will listen for incoming connections */
listen(fd, backlog);

Adding TCP_FASTOPEN support to the server side code is very easy, the required changes are minimum, you only need to set a new socket option between bind(2) and listen(2):

/* a hint value for the Kernel */
int qlen = 5;

/* create the socket */
fd = socket();

/* bind the address */
bind(fd, addr, addrlen);

/* change the socket options to TCO_FASTOPEN */
setsockopt(sockfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));

/* this socket will listen for incoming connections */
listen(fd, backlog);

Required macros

Even you are running the latest Linux Kernel 3.8, you will face some problems as in most of the cases the required macro values for  TCP_FASTOPEN and MSG_FASTOPEN will not be available  at compile time. As a workaround you can include the following code in one of your header files:

 * A generic protection in case you include this 
 * from multiple files 

/* conditional define for TCP_FASTOPEN */
#define TCP_FASTOPEN   23

/* conditional define for MSG_FASTOPEN */
#define MSG_FASTOPEN   0x20000000


Enabling TCP_FASTOPEN in your Kernel

By default the TCP_FASTOPEN feature is not enabled at runtime (unless you instructed that in the sysctl.conf file). Before to test this new feature make sure is enabled with the following command:

# echo 1 > /proc/sys/net/ipv4/tcp_fastopen

If you try to use a client with TCP_FASTOPEN enabled, its mandatory that the server have this same option set in the listener socket, otherwise the client will faill at the connection phase (due to protocol mismatch).

For a TCP_FASTOPEN server, it does not matter if the client uses the new protocol or not, it will work anyways. So if you develop a TCP server you can give it a try adding a simple system call to add this feature. If your project is open source, feel free to use the header macros example provided above.

Btw, of course Monkey Web Server have added this feature recently in our development repository


As a member of Monkey Project which have joined as a mentoring organization for the Google Summer of Code 2012 program, we were invited to assist to the Mentors Summit conference at Googleplex in California, US. Two members of our community flight to Google to represent the organization: Felipe Reyes and I.

Be part of a project who was selected for GSoC 2012 is really exciting, because of the recognition as a solid open source project and the given opportunity to mentor three students around the world and instruct them about collaboration and core development in our project. It was a hard work and at the end, the other exciting part begins: the Mentors Summit.

The event took place in California, it started on Friday 19th at Wild Palms hotel with an open dinner around the pool and free beer for everyone, no formal things, just eat together and met great people behind each project ūüôā

On Saturday 20th, we went to the famous Googleplex, to take breakfast before the event starts, and i cannot omit to mention that is TRUE, Google have a great free-food services for everyone, i was amazed with the great details that they have for their employees, the place is well designed with a lot of colors around and the general campus is pretty friendly.

We ran into different (and parallel) sessions with topics about GSoC it self and technical things about each project, it was an unconference so each people proposed their own topic in a board with a flexible schedule. Honestly i was not prepared to give a talk as i was not aware that we could propose technical sessions.. but well, we sign up for a talk about Monkey, we mostly talked about project internals and Duda web services, it went pretty well, interesting discussions about the project were raised and new horizons could come…

I attended some technical and GSoC sessions, it was a really good opportunity for Google and the mentoring organizations to discuss ways to improve the program, i am impressed about how the Open Source department is committed to help organizations to grow and create networks with other projects. Its a difficult work but the synergy around people involved in this program makes things easier, everybody was open to contribute, i would say that GSoC more than a program is a real community, an open program it self.

What a surprise, I finally met the good guys from the Open Source Lab, who we have been working together with us for about two years, they provide and maintain our hosting infrastructure, thanks!. In the right James Lopeman from .

OSL at left, at right


At night, social activities continues in the hotel with a new dinner around the pool and free drinks, have fun, share different technical interests and more and more…














A great event, i went without clear expectations it was something totally great, i have attended many conferences in the past and i have to say that this have been on the best in terms of organization, people and objectives.

I hope Google Summer of Code runs again in 2013, if so, we will do the best to get Monkey in.

There is not much to explain, Websocket is being used widely for realtime notifications over the web and Duda I/O supports websocket through a package. I have written a simple chat example at server side to demostrate how it can be used, the front-end part is a tweaked client where i just performed minor modifications:

The interest part is the service code side:

#include "webservice.h"
#include "packages/websocket/websocket.h"

DUDA_REGISTER("Duda I/O Examples", "WebSocket Chat");

void cb_on_message(duda_request_t *dr, ws_request_t *wr)
    websocket->broadcast(wr, wr->payload, wr->payload_len, WS_OPCODE_TEXT);

void cb_handshake(duda_request_t *dr)

int duda_main()
    /* Load the websocket package */
    duda_load_package(websocket, "websocket");

     * Define a callback, on every websocket message received,
     * trigger cb_on_message.
    websocket->set_callback(WS_ON_MESSAGE, cb_on_message);

    /* Associate a static URL with a callback */
    map->static_add("/handshake/", "cb_handshake");

    /* Initialize the broadcaster interface */

    return 0;

In duda_main() we initialize the web service, loading the websocket package and setting a callback function to invoke when a websocket message arrives. Then we map the URL path who wWebsocket handshake and finally we instruct the websocket package to launch the Broadcaster service, this last one is necessary if you want to send broadcast messages.

Getting started
If you want a simple steps to try this example do:

  • git clone git://
  • git clone git://
  • cd dudac/ && ./dudac -g
  • ./dudac -w /path/to/duda-examples/050_websocket_chat/

Now you can point your browser at http://localhost:2001/wschat/


For more details about the Websocket package and its available methods, please refer to the Websocket API documentation

Since a few months i have been working in a C web services framework called Duda I/O, this can be considered a child of Monkey Project. Duda I/O runs on top of Monkey and aims to expose a C friendly API for building fast and scalable web services. It’s totally open source under the LGPLv2.


Framework Components

DudaC: Formally Duda Client Manager, it is a helper for the development and easy deployment of web services. It takes care of the setup of the development environment cloning the respective stack components and building each one. It also allows to run a web service on fly just pointing to it source code.

Duda Plugin: This plugin is an extension for Monkey Web Server, it mainly wraps the Monkey API and expose a more friendly C API for building web services. This plugin also takes care to hide the complexity of the HTTP stack in terms of threading, balancing and asynchronous socket events.

HTTP Server: As mentioned earlier, the HTTP stack is powered by Monkey, a high performant and Open Source Web Server. Monkey is a HTTP/1.1 non-blocking web server implmented through a strategy of a number of fixed threads each one holding their own events queue. Its pretty scalable and can take the most of SMP systems. 

Web Services: A Web Service is a software component built on top of Duda Plugin API which execute different instructions through a mapping of HTTP URL requests and callback functions. In technical terms is a shared library loaded by Duda on runtime.



Non-blocking: The whole HTTP stack is based in the non-blocking model for sockets, this means that it works on top of asynchronous events. Each working thread can scale to thousands of active connections. Is good to mention that a non-blocking model will not reduce the computing time or delays caused by the blocking calls used in your web service.

Lightweight: For a normal web service running, the global size of the running components in memory can be around of 400KB. The memory used will depends of your web service implementation and packages loaded. The stack components as Duda and Monkey aims to be lightweight and optimize the resources used.

Service oriented: One of the main features of Duda, is that it allow to register multiple web services under the same HTTP instance, as well each service can be assigned to a different Virtual Host (a Virtual Host can hold multiple web services).

Each service can map static URLs to specific callback functions or use the Map interfaces provided by Duda, this last one is pretty much similar to REST and provides a very useful set of methods to handle each request resources such as: methods, parameters and body content.


API Objects

When building/running a web service, a set of C pseudo-objects are exported to perform the setup and define callbacks for certains events, as well many objects are helpers to build responses and minimize the effort for the developer. Some API Objects available are:

  • Console
  • Cookie
  • Event
  • Param
  • Response
  • Request
  • Session

Each object expose a set of methods, for more details about the available methods for each objects refer to the API documentation.


API Packages

Besides the built-in API Objects available, Duda supports a packages system which aims to load on demand external objects to extend the core API capabilities. Some packages available are the following:

  • Base64
  • JSON
  • SHA1
  • SQLite
  • Websocket

Packages are included in Duda per users demand, if you miss some specific package functionality let us know to consider it development and further inclusion.

If you want to know more about Duda please refer to the following links:

more news coming soon, if you want to stay tuned make sure to register in the new mailing list…

GSoC Update
we are almost finishing the Google Summer of Code program and the students have done a great job on Monkey Project, i have to admit that i am quite impressed by the quality of the work delivered. It has been a good experience for both sides, in name of the community i can only say “good job!”. ¬†I will share more details and a final evaluation once the program ends¬†officially, as well you will see a mini-post in the Google Code blog early in September.

Monkey Roadmap

The latest release version of Monkey is v1.0.1, in our development repository we have created a the branch for v1.1.0, this last one is in code-freeze status, that means that we are preparing the release and the a few tasks are involved such as: minor bug fixing, test the code base, verify possible regressions, do cleanups, package the binaries for Debian/Ubuntu, write the release notes, etc. So we hope to have some news at the end of this week, more detail will be shared in the release notes.

In GIT Master, we keep the good work for Monkey v1.2 and we are planning the release for mid of October 2012, the most relevant features listed at the moment is the inclusion of the FastCGI plugin and the replacement of SSL layer provided by MatrixSSL in favor of the SSL layer provided by PolarSSL.

Duda Roadmap

Our web services framework is still under active development, we are still working in some improvements before to deliver an official release, if you are interested on how is going you can check the development repository, or as well check the current API documentation.

For any of the project listed, If you have a desired feature/extension, please let us know, we are focusing on delivering a high quality open source web server stack based on people needs ūüôā