Network Timeouts in C

Reccently, while coding up some P2P sharing software for class, I came across a problem that really got me stuck. (Note, that I forked different processes to handle each upload and upload.) When reading data from another peer, my peer would generally have to block until the other peer responded, since with network programming we can never expect all of our requests to return immediately. The problem was that occassionally, the other peer would decide to die entirely and I was left with a process that would block essentially forever since the signal it was waiting for was never going to come. Now the great thing about blocking reads is that they don’t burn CPU time spinning around in a circle waiting for data to arrive, but they do take up space in the process table and in memory. And of course, if my blocked reading proccesses stayed around forever, it would be very simple for a malacious peer to bring my OS to a grinding halt. Essentially, what I needed was a way to make read() timeout. Now anyone vaguely familar with using internet browsers and other such network-reliant programs are probably familar with timeouts, but I had no idea how to implement it in C.

My first thought, was to use setrlimit(), a Unix function that allows the programmer to set the maximum amount of system resources (CPU time, VM size, created file sizes, etc., for more use “man 2 setrlimit”). When setrlimit() is used to set a maximum amount of CPU time, the process will recieve SIGXCPU when the CPU time soft limit is reached, and then the process is killed when the hard limit is reached. At the time, I was a bit groggy so setrlimit() seemed like a great solution, but of course anyone with half a brain (which I apparently didn’t have at the time) will realize that setrlimit() is definetely not the solution to this problem. A blocking process doesn’t run and therefore doesn’t consume CPU time, so setting a maximum CPU time does pretty much nothing to the blocking process; it’ll still keep blocking forever.

After a little bit of trawling the internet, I finally came upon the perfect solution: alarms! When alarm(unsigned int seconds) is called, it will raise SIGALRM after so many seconds, realtime seconds mind you and not CPU seconds consumed by the process, even if the process that called alarm() is blocking. I set the alarm right before I began a read() and used signal() to bind a  signal handler to SIGALRM, so that when the alarm went off my signal handler could gracefully kill the timed-out download process!
Read all »

Three Weeks in Biking

Distance Traveled: 13.497 miles
Average Speed: 12.9 MPH
Maximum Speed: 31.2 MPH
Elapsed Time: 1 hour, three minutes, and 12 seconds

Distance Traveled: 20.253 miles
Average Speed: 11.8 MPH
Maximum Speed: 31.2 MPH
Elapsed Time: 1 hour, forty-three minutes, and 10 seconds

Distance Traveled: 27.360 miles
Average Speed: 13.5 MPH
Maximum Speed: 31.2 MPH
Elapsed Time: 2 hours, two minutes, and 0 seconds

I really did a number on my two weeks ago. I took a shortcut through some bushes, which as it turned out were obscuring some rather large rocks, and somehow managed to derail my chain off both the front and back gears. After I reset the chain, I then noticed that the right pedal was spinning extremely freely; closer inspection revealed that the pedal was wobbling around on the axle and that the pedal housing was dropping bearings everywhere. When I got my bike back to my room and started to dissemble the pedal in hopes of repairing it, I noticed something quite odd.

It appeared that somehow, in the process of traveling through the bushes, I had managed to slam my pedal straight down into the ground, causing the plastic pedal housing to be pushed towards the bicycle body, which resulted in the pedal housing cracking down the center. Of course, the pedal was entirely non-rideable since putting much pressure on the pedal would make the pedal crack entirely in half.

So now I have a non-ridable bicycle. Go me.

The Evils of FAT

I think we’re all familar with the infamous FAT32 (File Allocation Table) file system, still in use after all these years despite the numerous superior alternatives avaliable. I myself am guilty for helping to extend FAT’s unnaturally long lifetime on this earth, since in order for me to be able to have an external hard drive that is easily mountable and readable by all the major operating systems (Linux, OS X, and Windows), I had to format the disk using a commonly used file system. Unfortunately, FAT32 was the only option avaliable. It would be nice if Microsoft and Apple started including by default, drivers for some modern file system in their operating systems so that all machines could easily share external media without suffering from a performance penalty that is inherent in the file system. But this is extremely unlikely, so I won’t spend too much time hoping. But what exactly is it that makes Fat such a terrible system?

Anyone using FAT32 on their external hard drive used to store media will probably be familar with FAT32’s maximum file size limitation of 4GiB, which is actually larger then the maximum file size allowed in the original FAT file system. FAT also has a linear seek time within files, since in order to find the ith sector used to store data in a file, we must first find all the sectors preceding that sector. And of course, FAT has some pretty serious internal fragmentation issues, especially once we start creating and deleting files on the drive. On a physical hard drive, large internal fragmentation of files results in the hard drive head having to constantly seek around the drive in order to find the next 4KiB block. This is of course a bad thing, especially since the bottleneck on a physical hard drive is its seek time. Ideally, we would like to have continguous blocks used for a single file.

Another serious performance issue with FAT that is not readily noticeable, is the size of the File Allocation Table, and how large it grows when with the drive. Ideally, we would like to be able to just stick the whole FAT into RAM in order to improve performance, since having to cache the FAT on disk would mean we would have to do an extra read in order to read the FAT just so we could find a file on the drive. Now, lets assume that you’re a movie fanatic, and in order to store your increasingly large movie collection, you’ve bought a new 1TB (sorry, not 1 TiB) hard drive. Of course, you want your drive to be able to function with multiple operating systems so you format it with FAT32. FAT32 uses 4KiB blocks, and each entry in the File Allocation Table takes up 4 bytes. Because we have a 1TB drive, we have about 238 entries in the table, which means that the File Allocation Table on your 1TB drive is about 1GiB. Some people might consider this to be a problem.

Of course, a fairly modern desktop computer will have 4GB of RAM so it could theoretically store the whole FAT in memory without having to cache it on disk, but I doubt that the operating system would like to dedicate a quarter of its memory to improving IO performance on one external hard disk.

So there you have, another good reason to stay away from FAT.

Something is Not Quite Right

Take a look at this C function, and see if you can can catch whats wrong with it. (That is, what about this function could produce an error?)

void wait_for_ready(void)
{
    while ((inb(0x1F7) & 0x80) || !(inb(0x1F7) & 0x40))
        continue;
}

This question showed up on my midterm and stumped the hell out of me at the time. Now that I know the answer, I feel like a complete idiot for not spotting the problem initially, seeing as its so amazingly obvious. When I took the exam, I spent way too much time on this problem, completely overthinking it.

In this function, we’re using a machine level instruction, inb, to read the value of some hardware port on an x86 machines. So far so good, or so I thought when I was taking my exam. But the problem is, is that the data we’re reading can be a shared resource, that is, other applications could be writing and reading to it at the same time, so a race condition ensues. Even on a machine with a single processor, this is still a problem, since wait_for_ready() could read the data memory into register, then a context switch could occur, some application could write to that location, and then wait_for_read() regains control but operates with an old data value.

So simple. And now I feel like an idiot.

GNU Readline

I reccently wrote a shell for a project in my CS class. One of the advanced features that my partner and I implmented in the shell, was tabbed completion, and in order to implemet this extremely useful shell feature, we used the GNU readline library. The GNU Readline library is a beast, and not in the good way. Its a great hulking pile of code and documentation, intended to provide a ton of features for reading typed lines. Once you figure out what all the function pointers, bindings, and generators in the Readline library are supposed to do, things become much more straightfoward, but it doesn’t negate the fact that initially figuring out Readline is a bit of a pain in the butt.

The first thing I did after we were told that the Readline library could make our project easier to design, was to pop open a terminal and type “man readline”. What I got was a basic summary of Readline, so in order to get the full library manual I had to resort to Google. I did however, happen to see this at the bottom of the manpage:

BUGS

It’s too big and too slow.

Now if even the guys working on the Readline library think thats its too big and too slow, we may have a potential problem on our hands.

One of the plus sides of however Readline’s enormity was that it does offer a whole slew of features, like a kill ring, and generators for tab-completion of filenames and usernames. It would be very nice though, if all these features could be implemented without the need for a manual that probably took me longer to read, then it did for me to code up a generator for command line completion.

My Google Interview

Its been a while now, but here it is anyways…

About six or seven weeks ago, I scored a phone interview (technically two) with Google’s IT department for a summer internship. Nobody was more surprised then I was that Google actually found my resume somewhat impressive enough to warrent a phone interview, especially considering my less-then-stellar GPA and the enormous number of super-intelligent applicants Google recieves every day. The two interviews were each forty-five minutes long, and the interviewers (both intelligent IT guys,  and not technically incompetent manager types) took pretty much all of the alloted time.

These days, almost every CS guy dreams of working for Google and so I’ve heard a few things about their interviews before, which I would like to mention before I get into my interview. A few years ago when I was an intern at Intel, they had a lady come in to tell all the high school interns about how to be successful in scoring future jobs. She spent a lot of time teaching us how to walk properly, shake hands, sit in a proper manner, dress, and answer generic interview questions. She told us that Google interviewers like to ask broad open ended questions like “how would you sell ice to an eskimo” and ”why are manhole covers round,” and promptly put us to answering similar questions. A few months later, I went to a Google Tech Talk at my University, where a Google software engineer was asked by someone in the audience if Google did in fact like to ask interview questions like “why are manhole covers round.” The Google rep resonded with the following:

“In my time at Google I have interviewed several software engineers and I have never asked a question like that before. Google is not in the business of making manhole covers. If we did make manhole covers, we might ask those kinds of questions.

I think occassionally, a Google interviewer might throw in a brain teaser if they just want to burn some time, but apparently they don’t do it too often.

Anyways, going on to my interview… I was interviewing for an IT position, so unlike the software developer positions where they barrage you with an endless stream of algorithim and programming questions (Why is quicksort log(n)? Whats the best sorting algorithim to use in this scenario? What data structure would you use for this? etc.) there was almost no programming invovled in my interview. And since the recruiter and HR person told me pretty much nothing about what I should expect, I went into the interview pretty much cold. Read all »

Dollhouse at the NSA

Over the previous summer, I rediscovered Angel and Buffy the Vampire Slayer and watched some old episodes at hulu. (And let me just say, think you hulu for throwing my resume aside so nonchalantly after you read the second line. Electrical engineering major I may be, but I know RoR, SQL, Unix C, and just as much crap as any other CS guy out there.) Anyhoo, some of you Joss Wheedon fans may have heard that Wheedon has a whole new series out by the name of Dollhouse, not nearly as popular as some of his other shows, but still pretty cool. Or at least it started out cool, lately, its just been getting a little kinky and weirdish.

But I digress, I’m not here to discuss the merits of Dollhouse (I’ll save that for another time) but rather to point out a few odd little facts about episode 9 (A Spy in the House of Love).  Theres a part in episode 9 where Sierra knocks out some Japanese chick on a train, steals her identity, and walks into some big NSA facility. Now as I was watching Sierra traipse through the sparse halls of the NSA, I had a strange feeling of deja vue. But it wasn’t until the part where she checks in with a security guard to access some giant government database, that I realized what I was seeing: the building that they used for the NSA in episode 9 was none other then UCLA’s infamous Boelter Hall, (aka my third home (the first being my real home, and the second being my dorm room)) and the room that Sierra and the security guard were in was none other then Boelter 6426. I know Boelter quite well, having spent far too much of my time there (or rather, here, since I’m in Boelter right now), and I know the building’s appearance right dsown to the ugly water fountains, square black digital clocks, distinctive green flecked staircases, and blocky gold door numbers.

There are of course people that still doubt me, I know. So go to hulu and watch the section from episode 9 again, and then look at these pictures that I took at 7:30 on Monday morning.

 

Request for a Change

Hauling out my change of major request essay…

Prior to UCLA my knowledge of anything vaguely associated to the field of electrical engineering was essentially nil, so it should come as no surprise when I say that I cannot for the life of me begin to remember why I chose electrical engineering as a major. And now, the more electrical courses I take, the more tiresome I find the subject matter to be. So the question I should be answering isn’t “why do I want to change majors,” but rather, “why did I wait till now to change?”

Over the last two years, I have trudged my way through the required introductory math and physics courses, never particularly excited by the material but always telling myself that it was going to get better when I reached the EE courses and that then, everything would be worth it. So it came as somewhat of a surprise to me, that I found the electromagnetics in EE 1 to be as unexciting as the materials in all the prerequisite courses. But after having spent the better part of two quarters convincing myself that “it was going to get better,” I continued to drag myself through another uninteresting class, and during my second year, I proceeded to take EE 2, 16, 116L, 10, and 102, continually hoping that things would get better. Happily, they did.

It was during second quarter of this year while I was doing EE homework one night, that I realized that the only EE classes that I have ever actually enjoyed were the multi-departmental classes (16, and 116L) shared by the EE and CS departments, and that CS 35L (informally called “intro to Linux) was without a doubt my most enjoyable class ever. Shocked by my newfound epiphany, I stood up from my desk and declared to my roommates that I did not like electrical engineering and that instead, computer science was the field for me. My roommates were unsurprised by my sudden realization, saying that they had always noticed that I liked computers more than electrical engineering, and that the energetic mannerisms that I adopted when talking about Linux or computer architecture deeply contrasted the dull and tiresome attitude I exhibited when I approached my EE homework. Indeed, during my past two years at UCLA, I have probably spent as much time tinkering with Linux and reading essays like Neal Stephenson’s In the Beginning was the Command Line, and Fred Brooks’ The Mythical Man-Month: Essays on Software Engineering, as I have dedicated to my EE classes.

I am, however, not choosing to switch into computer science, but rather computer science and engineering since my enthusiasm for all things computer related extends below (so to speak) the software level. Now lately, numerous people have pointed out to me the similarities between electrical engineering with the computer engineering option, and computer science and engineering, and asked why I simply do not stay with EE CE. A question to which I can only respond to by saying that the differences between the two majors is quantitatively quite small (That is to say, the difference in course requirements between the two is almost negligible.), but that the content of the almost negligible difference is quite significant to me; I would much rather take a compiler class rather than another electromagnetics course.

Having now realized what subjects I enjoy the most, I am loath to continue in a major I find no pleasure in. Though perhaps over-dramatic, it would not be inaccurate to say that this change of major could have the potential to dictate my future happiness.