since some time ago when using a specific Duda I/O feature to connect backend servers using a custom web service, some inconsistency existed if those backends closed the connection, for some reason this ended up into the web service missing the connection close event.
This problem was detected months ago but it was not easy to determinate the root cause, it affected MariaDB/MySQL package and custom web services that use the stack event loop to be notified on sockets connected to third party servers. Last night doing some example code for a different thing the problem could be reproduced, and the steps were very simple:
- create a TCP socket
- make it asynchronous
- run a connect(2) to a specific backend
- register the socket file descriptor into the events loop interface
- shutdown the backend server
and when a connection close notification should arrive, that never happened. This ended up in an inconsistency of a zombie TCP socket without events notifications on it. Looking around and tracing Monkey HTTP Server found the following difference:
- when registering a new socket in the events interface, the following flags are always associated to a socket: EPOLLERR, EPOLLHUP and EPOLLRDHUP.
- when someone change the socket direction (read to write, write to read), all flags are overwritten but on this time just EPOLLERR and EPOLLHUP were used, EPOLLRDHUP was missing!
Googling around i found this relevant comment:
A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN) event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote peer only did a half-close of its socket.
The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or EPOLLERR. (I haven't had time to confirm, but it makes sense).
There is some documentation to suggest there are older Linux implementations that don't support EPOLLRDHUP, as such EPOLLHUP gets generated instead.
After apply the following fix, the issue have gone:
Now wondering when (and who) introduced the problem:
$ git blame src/mk_epoll.c -L362,+1 10c3eb27 (Eduardo Silva 2011-09-15...) event.events = EPOLLERR | EPOLLHUP;
Git blame me and lesson learned... never miss EPOLLRDHUP again.
Note: the fix have been backported to Duda-Stable-Branch 1 (DST-1), so people developing services only need to update their stable stack version.