November Updates

It have been a long time since my last post, it’s a really busy year at work with enjoyable things to do!, but of course nothing is an excuse to stop sharing updates.

This year I have been working in several areas which includes software development, community strategies and spreading the word at several conferences. I think it have been the perfect mix of things to learn and improve. All of this consumes a lot of time, hopefully my wife and kids are very supportive on all of this, the balance is very important to keep everyone healthy.

Below you will find a short update for:

  • Fluent Bit
  • Conferences and Fluentd
  • Monkey & Duda I/O

Continue reading

Three months at Treasure Data

Almost three months ago, I started a new Open Source Adventure and I called it adventure because I had to take decisions where different variables to evaluate were involved like: family, stability, country, long-term-plans, etc.

So how things are going now?, I would say pretty well!, there is a lot of fun things to do, I feel more happy contributing than just calling working. One thing that I am very pleasant about Treasure Data is about the work environment, despites I am a remote worker (I work from Costa Rica), I had the opportunity to visit the HQ offices in Mountain View a couple of times and I can say that people is very friendly.


Continue reading

Ubuntu 14.10 + MacBook Pro Retina 13″ 2014

[Updated on March 10th, 2015]

I just got my new setup for work and as usual I performed Softwares changes: Install Ubuntu on the MackBook Pro retina. With the system by default I run on the following issues:

  • Suspend takes a long time
  • On Resume after suspend, the wireless driver stopped working
  • CPU scaling governor is powersave, but the CPU is always at maximum! (high temperature)

I manage to solve the problems with the following steps:

Continue reading

OSX Yosemite + Ubuntu 14.04 / dual boot

Before to start playing with the commands suggested here let me clarify my context:

  • I had a Mac Book Pro with OSX Maverick with dual boot working with Ubuntu 14.04 through rEFInd boot loader.
  • Today I upgraded OSX to Yosemite and after everything was updated rEFInd stopped working.

So my goal was to make able to have rEFInd working again to switch between operating systems. My primary OS is Linux so if you are in a similar situation this may help:

On a Twitter search I found someone linked the following solution:

That worked for me, I got a few warnings and I did not need to run the ‘mount’ command. Good luck!

Busy and Productive Weekend


I planned to have a quite Saturday, but after check my email in the morning, I found a possible security issue reported for Monkey (1.5 onwards), it may lead to a DDoS based on the new optional FDT feature, so it was time to research and release a fix as soon as possible.

So Monkey v1.5.3 was released which include the fix for the problem reported, i also take the opportunity to include other patchs sent by the reporter, so the goal was accomplished: act transparent and quick.

Duda I/O

Also during the week i have been working in a new look & feel for Duda I/O project, you cannot imagine how hard is to create an image for a Software that don’t have a UI (yet). So the project just got 3 major things:

New logo

Duda I/O is a Stack, for hence from my perspective I think this represents it nature and simplicity:

not sure if people would love it but I am sure it will fit very well on Project T-shirts and Stickers :)

New Web Site

As the new logo arrived, was a good time to complete the new responsive web site:

Disclaimer: I bought a bootstrap template and did the adjustments, i am not a designer.

compared to the previous project site this is a major upgrade.

Google Summer of Code

We are in the pencils down phase of the Google Summer of Code program, that means most of our students are doing their final adjustments on their projects as the final evaluations may start this evening, all of them have done a very good job, they deserve the credit. More news about cool stuff that was made will be public in the incoming days.

It’s a good jurney, life is moving a lot in a positive way, in all aspects…

Architecture of a Linux based Web Server

this post is all about Monkey HTTP server, i consider that is very important that besides we open our source code, the developers be also able to describe the project internals, there is no black magic, just thousands of hours of effort in programming, testing and improvement.


Monkey is an open source project started on 2001 with the goal to learn C, the long story is here . Along this years, the code have been improved in many aspects, since nomenclatures to heavy architecture changes, all have been made for good and nowadays thanks to the community of core developers and contributors around the project, Monkey is one of the top performance web servers around, and i would claim that the best option for Embedded Linux.

Understanding the basics of a human readable protocol: HTTP

The Hyper Text Transfer Protocol is basically a language with simple grammar to communicate two components: a HTTP client and a HTTP server. In a common context, the communication starts from a client performing a request to the server and for hence the server replying back with some result for the request performed. As a result we can consider a status response plus a content or simply an error.

Each HTTP request performed by the client is composed by a request method, URI, protocol version, and optionally a bunch of headers, so described that, we can say that a server must take care of:

  • Listen for new connections
  • Accept connections
  • Once the connection is accepted, start reading the HTTP request sent by the client
  • Parse the HTTP request, understand what the client wants
  • Depending of the request type, the sever can: serve some content, close the connection because some exception, proxy back the request to somebody else, etc.
  • Close the connection or keep it opened waiting for more requests. This depends of the protocol version and client HTTP headers.

Depending of the server target, it can be implemented in many ways with different architecture strategies, so the architecture described in this post only aims to describe what have worked better for us in terms of high performance and low resources usage.

Architecture design facts

  • Monkey is a web server designed with a strong focus in Linux. It do not aims to be portable across other operating system, focusing in the top and widely used mainstream operating system allow us to put our energies and effort in one place in the best way, and of course take the most of Linux Kernel to achieve high performance.
  • Event driven: well known as asynchronous, an event driver web server aims to use non-blocking system calls to perform it works reducing the computing time in the user-space context, e.g: if we are sending a file content to a client, we do not block the whole process or thread when sending the data, instead we instruct the kernel through a system call to send N bytes from the file and just notify me where i am able to send more bytes, in the meanwhile.. i process other connections and send other pending data.
  • Embedded Friendly: our embedded context is Embedded Linux, we care a lot of resources consumption, that means that under a heavy load don’t use more than 2.5MB of memory. Even Monkey binary size is around 80KB, once is load in memory it takes like 350KB, and depending of the load, more resources can be needed.
  • Small core, flexible API: it implements a basic core to handle HTTP protocol, it exposes a flexible API through the plugin interface where is possible to hook plugins for transport layer, security, request type and event handlers.


In Monkey, we have defined two contexts of work: process context and thread context. The process context represents the main process waiting for incoming connections and the scheduler balancing the new connection for the worker threads. The thread context belongs to each thread working the active connections:

The number of workers are defined in the configuration, it scale properly well in single and multi-core CPUs solutions. There is no need to set thread affinity through CPU masks, the Linux Kernel Scheduler is smart enough to assign CPU time to each worker request, by default all workers are assign to all CPUs.

From a system administrator point of view, is possible to assign each worker to a different set of CPUs, but this approach is not suggested unless we are totally aware about what the Linux scheduler does in terms of interruptions,  context switches and CPU time for Kernel and User space applications. Do it only if you can do it better than the running scheduler.


Before to enter in the server loop, the scheduler launch and initialize each worker, taking care of set the initial data structures and the interfaces for the interaction between the components mentioned, this stage involves the creation of a epoll(7) queue per worker. Is good to mention that each epoll(7) queue created through epoll_create(2) is managed through a specific file descriptor.

Once the workers are up and running, the next Scheduler job is to to manage the incoming connections. So for each new connection accepted, it determinate who is the lowest loaded worker and assign the connection to it. The chosen worker is the one that have less connections in its epoll(7) interface, so the scheduler  goes around the worker counters and chose one. On this specific point the scheduler have two file descriptors: the connection file descriptor returned by accept(2) and the file descriptor that represents the epoll(7) of the chosen worker. So it basically register the new file descriptor in the proper epoll(7) queue.


Each worker or thread, runs in an infinite loop through the epoll(7) interface, which is basically a Linux specific polling mechanism to register, enqueue and notify about events in file descriptors registered by the Scheduler (sockets on this case).

The worker stay in a loop waiting for events in the epoll_wait(2) system call. Every time the Scheduler register a new file descriptor, an event will be reported in the worker epoll(7) interface, and it will do same when for subsequent events such as “there is data available for read” (EPOLLIN), “now you can write to the socket” (EPOLLOUT), “connection closed” (EPOLLHUP), etc.

So for each event triggered, the worker keeps a status of the connection to determinate if is a new connection, its receiving the HTTP request, HTTP request completed, parsing the request or sending out some response. Besides events, every a fixed time of seconds set in the configuration, it checks the connections that timed out due to an incomplete request or another anomaly.

Plugins Architecture

Monkey defines three categories of API where the plugins can hook: Context, Events, Stages and Networking.

Define callbacks  that can be invoked when the server is starting up, it covers the process and thread contexts described earlier.

For every type of event reported in a worker loop, a plugin can implement a hook to perform specific actions:

Every new connection, enter in a stage status, so for each step of the HTTP cycle it passed along different phases, and each plugin can hook to a specific one:

Monkey is not aware about networking, for hence it intentionally depends of a plugin that provides the transport layer, this approach allows to change from common sockets communication to encrypted one as SSL in a easy manner. The networking plugin only needs to provide the required API functions for the communication:

Scaling up

Every time a connection have performed a successful request, this is allocated in a global list of the worker scope (implemented through a pthread_key). for each event reported, the worker needs to lookup the internal data associated to it, so the file descriptor or socket number  acts like a primary key for the search. The solution of data structure implemented for Monkey v1.2, is the use of red-black tree algorithm. This algorithm have shown to behave very fairly and scalable when handling thousands of active connections per worker, maintaining a good balance between performance and cost.

The cost of each file descriptor lookup is critical for the server performance, having a O(n) solution will work fine for a few connections but under high concurrency a O(log(n)) solution will end up providing the highest performance.

Memory Management

One of the success key to reduce overhead in a server, is to reduce as much as possible the memory allocation requests performed  to the system within the main loop. Current Monkey implementation only performs 1 memory allocation per new connection, if it needed because the incoming request will post too much data, it will allocate more memory as it needs. Other web server solutions implements caching mechanism to reduce even more the memory allocations, as our focus is Embedded Linux we focus into speed at low resources usage, and implement a caching mechanism will increase our costs. So we dropped that common approach to do not abuse of system memory, just a decision based in the target.

Linux Kernel system calls

The Linux Kernel exposes a useful of non-portable set of system calls to achieve high performance when creating networking applications. The first one is epoll(7), as described earlier this interface allow to watch a set of file descriptors for certain defined events. Similar solutions like select(2) or poll(2) do not perform so well as epoll(7) does.

When sending a static file, the old-fashioned way is to open the file, get the file descriptor and perform multiples read(2)/write(2) to write out the file content. This operation requires the Kernel to copy data between Kernel and User spaces back and forward which obviously generate an overhead. As solution, the Linux Kernel implements a Zero-Copy strategy through the system call sendfile(2). This system call do not copy data to user space, instead it allows to send it directly to other file descriptor achieving good performance reducing the latency of the old fashioned way described.

In our architecture, the Logger plugin requires to transfer data through a pipe(2)  (a unidirectional data channel that can be used for interprocess communication). A common mechanism is to use read(2) and write(2) on each end, but in a similar way as sendfile(2) works, a new system call takes place for this kind of situation called splice(2). This system call moves data from one point to other without the copy-data overhead. The main difference between sendfile(2) and splice(2), is that splice(2) requires that one end must be a pipe(2).

In my previous post, i mentioned how to usage the new Linux Kernel feature called TCP_FASTOPEN, being something very simple to implement, it requires the cooperation of both sides: the client and the server. If you have full control of your networking application (client and server), consider to use TCP_FASTOPEN, it will increase performance decreasing the TCP handshake roundtrip.

Monkey Plugins

Based in the architecture and API described, the following plugins are distributed as part of the core:

Liana: basic sockets connectivity layer

PolarSSL: provides a transport layer based in SSL

Cheetah: plugin that provides a command line interface to query the internals of a running server through a unix socket

Mandril: security layer that aims to restrict the access by URI strings or sub networks.

Dirlisting: directory listing

Logger: log writer

CGI: old fashioned CGI interface

FastCGI: provide fast-cgi support


Bonus track: Full HTTP Stack for web services implementation

Besides to be a common web server to serve static or dynamic content, Monkey is a full stack for the development of web applications. In order to provide an easy API for web application or web services development, we have created Duda I/O , which is an event-driven C framework for rapid development based in Monkey stack.

Duda implements a core API of pseudo-objects and provide extra features  through a packages system, everything in a friendly C API. The most relevant features supported at the moment are WebSocket, JSON, SQLite3, Redis, Base64 and SHA1.

Due to it high performance nature and open source ecosystem around, is being used in production from Embedded Linux products to Big Data solutions. The License of Duda allows to create closed-sourced services or applications and link them to Duda I/O stack at zero cost.

For more details please refer to Duda I/O main site.

Monkey organization believes in Open Source and is fully committed to create the best networking technology for different needs. If you are interested into participate as a contributor or testing our stack, feel free to reach us on our mailing lists or irc channel #monkey at

Linux TCP FASTOPEN in your sockets

Few years ago the concept of TCP_FASTOPEN (TFO) was introduced as a solution to improve performance on TCP connections reducing one roundtrip of the handshake process. The first operating system that implements TFO is Linux and have been demonstrated good improvements when used in a common network.

The implementation in the Linux Kernels have been made by parts, being Linux Kernel 3.6.1 the first one into implement the client side requirements and then Linux Kernel 3.7 who implements the server side socket behavior.

Client side

In a common TCP client flow, the following calls takes place:

/* create socket file descriptor */
fd = socket(domain, type, protocol);

/* connect to the target server/port */

/* send some data */
send(fd, buf, size);

/* wait for some reply and read into a local buffer */ 
while ((bytes  = recv(fd, ...))) {

When using TCP_FASTOPEN the behavior its a little different. You not longer need to use connect(2), instead you use sendto(2) and it also gives you the opportunity to let the Kernel buffer some initial outgoing data. For short, the call sendto(2) its like an implicit connect(2) and send/write(2) same time:

/* create the socket */
fd = socket();

/* connect and send out some data */
sendto(fd, buffer, buf_len, MSG_FASTOPEN, ...);

/* write more data */
send(fd, buf, size);

/* wait for some reply and read into a local buffer */ 
while ((bytes  = recv(fd, ...))) {

Server side

A common (old-fashion) TCP server is created with the following calls:

/* create the socket */
fd = socket();

/* connect and send out some data */
bind(fd, addr, addrlen);

/* this socket will listen for incoming connections */
listen(fd, backlog);

Adding TCP_FASTOPEN support to the server side code is very easy, the required changes are minimum, you only need to set a new socket option between bind(2) and listen(2):

/* a hint value for the Kernel */
int qlen = 5;

/* create the socket */
fd = socket();

/* bind the address */
bind(fd, addr, addrlen);

/* change the socket options to TCO_FASTOPEN */
setsockopt(sockfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));

/* this socket will listen for incoming connections */
listen(fd, backlog);

Required macros

Even you are running the latest Linux Kernel 3.8, you will face some problems as in most of the cases the required macro values for  TCP_FASTOPEN and MSG_FASTOPEN will not be available  at compile time. As a workaround you can include the following code in one of your header files:

 * A generic protection in case you include this 
 * from multiple files 

/* conditional define for TCP_FASTOPEN */
#define TCP_FASTOPEN   23

/* conditional define for MSG_FASTOPEN */
#define MSG_FASTOPEN   0x20000000


Enabling TCP_FASTOPEN in your Kernel

By default the TCP_FASTOPEN feature is not enabled at runtime (unless you instructed that in the sysctl.conf file). Before to test this new feature make sure is enabled with the following command:

# echo 1 > /proc/sys/net/ipv4/tcp_fastopen

If you try to use a client with TCP_FASTOPEN enabled, its mandatory that the server have this same option set in the listener socket, otherwise the client will faill at the connection phase (due to protocol mismatch).

For a TCP_FASTOPEN server, it does not matter if the client uses the new protocol or not, it will work anyways. So if you develop a TCP server you can give it a try adding a simple system call to add this feature. If your project is open source, feel free to use the header macros example provided above.

Btw, of course Monkey Web Server have added this feature recently in our development repository


GSoC Mentors Summit 2012 at Google Plex

As a member of Monkey Project which have joined as a mentoring organization for the Google Summer of Code 2012 program, we were invited to assist to the Mentors Summit conference at Googleplex in California, US. Two members of our community flight to Google to represent the organization: Felipe Reyes and I.

Be part of a project who was selected for GSoC 2012 is really exciting, because of the recognition as a solid open source project and the given opportunity to mentor three students around the world and instruct them about collaboration and core development in our project. It was a hard work and at the end, the other exciting part begins: the Mentors Summit.

The event took place in California, it started on Friday 19th at Wild Palms hotel with an open dinner around the pool and free beer for everyone, no formal things, just eat together and met great people behind each project :)

On Saturday 20th, we went to the famous Googleplex, to take breakfast before the event starts, and i cannot omit to mention that is TRUE, Google have a great free-food services for everyone, i was amazed with the great details that they have for their employees, the place is well designed with a lot of colors around and the general campus is pretty friendly.

We ran into different (and parallel) sessions with topics about GSoC it self and technical things about each project, it was an unconference so each people proposed their own topic in a board with a flexible schedule. Honestly i was not prepared to give a talk as i was not aware that we could propose technical sessions.. but well, we sign up for a talk about Monkey, we mostly talked about project internals and Duda web services, it went pretty well, interesting discussions about the project were raised and new horizons could come…

I attended some technical and GSoC sessions, it was a really good opportunity for Google and the mentoring organizations to discuss ways to improve the program, i am impressed about how the Open Source department is committed to help organizations to grow and create networks with other projects. Its a difficult work but the synergy around people involved in this program makes things easier, everybody was open to contribute, i would say that GSoC more than a program is a real community, an open program it self.

What a surprise, I finally met the good guys from the Open Source Lab, who we have been working together with us for about two years, they provide and maintain our hosting infrastructure, thanks!. In the right James Lopeman from .

OSL at left, at right


At night, social activities continues in the hotel with a new dinner around the pool and free drinks, have fun, share different technical interests and more and more…














A great event, i went without clear expectations it was something totally great, i have attended many conferences in the past and i have to say that this have been on the best in terms of organization, people and objectives.

I hope Google Summer of Code runs again in 2013, if so, we will do the best to get Monkey in.

Duda I/O: Websocket Chat

There is not much to explain, Websocket is being used widely for realtime notifications over the web and Duda I/O supports websocket through a package. I have written a simple chat example at server side to demostrate how it can be used, the front-end part is a tweaked client where i just performed minor modifications:

The interest part is the service code side:

#include "webservice.h"
#include "packages/websocket/websocket.h"

DUDA_REGISTER("Duda I/O Examples", "WebSocket Chat");

void cb_on_message(duda_request_t *dr, ws_request_t *wr)
    websocket->broadcast(wr, wr->payload, wr->payload_len, WS_OPCODE_TEXT);

void cb_handshake(duda_request_t *dr)

int duda_main()
    /* Load the websocket package */
    duda_load_package(websocket, "websocket");

     * Define a callback, on every websocket message received,
     * trigger cb_on_message.
    websocket->set_callback(WS_ON_MESSAGE, cb_on_message);

    /* Associate a static URL with a callback */
    map->static_add("/handshake/", "cb_handshake");

    /* Initialize the broadcaster interface */

    return 0;

In duda_main() we initialize the web service, loading the websocket package and setting a callback function to invoke when a websocket message arrives. Then we map the URL path who wWebsocket handshake and finally we instruct the websocket package to launch the Broadcaster service, this last one is necessary if you want to send broadcast messages.

Getting started
If you want a simple steps to try this example do:

  • git clone git://
  • git clone git://
  • cd dudac/ && ./dudac -g
  • ./dudac -w /path/to/duda-examples/050_websocket_chat/

Now you can point your browser at http://localhost:2001/wschat/


For more details about the Websocket package and its available methods, please refer to the Websocket API documentation