failing like never before

9Jun/090

Network Timeouts in C

Reccently, while coding up some P2P sharing software for class, I came across a problem that really got me stuck. (Note, that I forked different processes to handle each upload and upload.) When reading data from another peer, my peer would generally have to block until the other peer responded, since with network programming we can never expect all of our requests to return immediately. The problem was that occassionally, the other peer would decide to die entirely and I was left with a process that would block essentially forever since the signal it was waiting for was never going to come. Now the great thing about blocking reads is that they don't burn CPU time spinning around in a circle waiting for data to arrive, but they do take up space in the process table and in memory. And of course, if my blocked reading proccesses stayed around forever, it would be very simple for a malacious peer to bring my OS to a grinding halt. Essentially, what I needed was a way to make read() timeout. Now anyone vaguely familar with using internet browsers and other such network-reliant programs are probably familar with timeouts, but I had no idea how to implement it in C.

My first thought, was to use setrlimit(), a Unix function that allows the programmer to set the maximum amount of system resources (CPU time, VM size, created file sizes, etc., for more use "man 2 setrlimit"). When setrlimit() is used to set a maximum amount of CPU time, the process will recieve SIGXCPU when the CPU time soft limit is reached, and then the process is killed when the hard limit is reached. At the time, I was a bit groggy so setrlimit() seemed like a great solution, but of course anyone with half a brain (which I apparently didn't have at the time) will realize that setrlimit() is definetely not the solution to this problem. A blocking process doesn't run and therefore doesn't consume CPU time, so setting a maximum CPU time does pretty much nothing to the blocking process; it'll still keep blocking forever.

After a little bit of trawling the internet, I finally came upon the perfect solution: alarms! When alarm(unsigned int seconds) is called, it will raise SIGALRM after so many seconds, realtime seconds mind you and not CPU seconds consumed by the process, even if the process that called alarm() is blocking. I set the alarm right before I began a read() and used signal() to bind a  signal handler to SIGALRM, so that when the alarm went off my signal handler could gracefully kill the timed-out download process!