06 February, 2010

Lighttpd: zombie fastcgi process

When lighttpd is starting, the mod_fastcgi plugin will fork every fastcgi it needs depending on the configuration. If fastcgi process terminate normally while lighttpd is still running, it seems alright at the beginning. However, if we encounter huge traffic and the process will terminate after handling a single request (with a reasonable excuse), the terminated fastcgi process sometimes become a zombie process! It is due to that the mod_fastcgi is too slow to invoke waitpid() function, when it get there, the processes are already gone. As you knew, a forked process without waitpid() in parent process will become a zombie process. This is the reason for getting a bunch of zombie running in my box.

Many tricks can prevent zombie process. According to my previous post, lighttpd already use the fork twice trick to prevent it. Unfortunately, they did something wrong.
In Bug #2144, some people observed the same situation. To solve this problem, I dig into the source code and found out the root cause.

According to my response to Bug #2144:
"
I encounter the same situation when I running my fastcgi program.

This problem is due to that we daemonize the lighttpd AFTER we initialize plugins. Therefore, the cgi/fcgi processes forked by mod_cgi/mod_fastcgi are still possible to become zombie processes.

Beside the order of invoking daemonize function in server.c, we also forget to ignore SIG_CHLD signal after the first fork, and it is the reason why the "fork twice" trick did not work.

The patch is for lighttpd-1.4.25, I did the following changes:
1. Invoke daemonize() before invoking plugins_call_init(srv).
2. Ignore SIG_CHLD after the first fork().
2. Do not handle SIG_CHLD signal after daemonize lighttpd.

"

I hope my patch can solve this issue once and for all. :D

No comments: