Taking C Seriously

Dennis Ritchie, a co-creator of Unix and C, passed away a few weeks ago, and was honored with many online tributes this weekend for a Dennis Ritchie Day advocated by Tim O’Reilly.

It should hardly be necessary to state the importance of Ritchie’s work. C is the #2 language in use today according to the TIOBE rankings (which, while criticized in some quarters, are at least the best system we currently have for gauging such things). In fact, TIOBE’s preface to the October 2011 rankings predicted that a slow but consistent decline in Java will likely make C the #1 language when this month’s rankings come out.

Keep in mind that C was developed between 1969 and 1973, making it nearly 40 years old. I make this point often, but I can’t help saying it again, when Paul Graham considered the possible traits of The Hundred-Year Language, the one we might be using 100 years from now, he overlooked the fact that C had already made an exceptionally good start on a century-long reign.

And yet, despite being so widely used and so important, C is widely disparaged. It is easy, and popular, and eminently tolerated, to bitch and complain about C’s primitiveness.

I’ve already had my say about this, in the PragPub article Punk Rock Languages, in which I praised C’s lack of artifice and abstraction, its directness, and its ruthlessness. I shouldn’t repeat the major points of that article — buried as they are under a somewhat affected style — so instead, let me get personal.

As an 80′s kid, my first languages were various flavors of BASIC for whatever computers the school had: first Commodore PETs, later Apple IIs. Then came Pascal for the AP CS class, as well as a variety of languages that were part of the ACSL contests (including LISP, which reminds me I should offer due respect to the recent passing of its renowned creator, John McCarthy). I had a TI-99 computer at home (hey, it’s what was on sale at K-Mart) and its BASIC was godawful slow, so I ended up learning assembly for that platform, just so I could write programs that I could stand to run.

C was the language of second-year Computer Science at Stanford, and I would come back to it throughout college for various classes (along with LISP and a ruinous misadventure in Prolog), and some Summer jobs. The funny thing is that at the time, C was considered a high-level language. At that time, abstracting away the CPU was sufficient to count as “high-level”; granted, at the time we also drew a distinction between “assembly language” and “machine language”, presumably because there was still someone somewhere without an assembler and was thus forced to provide the actual opcodes. Today, C is considered a low-level language. In my CodeMash 2010 talk on C, I postulated that a high-level language is now expected to abstract away not only the CPU, but memory as well. In Beyond Java, Bruce Tate predicted we’d never see another mainstream language that doesn’t run in a VM and offer the usual benefits of that environment, like memory protection and garbage collection, and I suspect he’s right.

But does malloc() make C “primitive”? I sure didn’t think so in 1986. In fact, it did a lot more than the languages at the time. Dynamic memory allocation was not actually common at that time — all the flavors of BASIC of that time have stack variables only, no heap. To have, say, a variable number of enemies in your BASIC game, you probably needed to do something like creating arrays to some maximum size, and use some or all of those arrays. And of course relative to assembly language, where you’re directly exposed to the CPU and RAM, C’s abstractions are profound. If you haven’t had that experience, you don’t appreciate thata = b + c involves loading b and c into CPU registers, invoking an “add” opcode, and then copying the result from a register out to memory. One line of C, many lines of assembly.

There is a great blog from a few years ago assessing the Speed, Size, and Dependability of Programming Languages. It represents the relationship between code size and performance as a 2-D plot, where an ideal language has high performance with little code, and an obsolete language demands lots of work and is still slow. These two factors are a classic trade-off, and the other two quadrants are named after the traditional categorization: slow but expressive languages are “script”, fast but wordy are “system”. Go look up gcc â€“ it’s clearly the fastest, but its wordiness is really not that bad.

Perhaps the reason C has stuck around so long is that its bang for the buck really is historically remarkable, and unlikely to be duplicated. For all the advantages over assembly, it maintains furious performance, and the abstractions then built atop C (with the arguable exception of Java, whose primary sin is being a memory pig) sacrifice performance for expressiveness. We’ve always known this of course, but it takes a certain level of intellecutual honesty to really acknowledge how many extra CPU cycles we burn by writing code in something like Ruby or Scala. If I’m going to run that slow, I think I’d at least want to get out of curly-brace / function-call hell and adopt a different style of thinking, like LISP.

I was away from C for many years… after college, I went on a different path and wrote for a living, not coming back to programming until the late 90′s. At that point, I learned Java, building on my knowledge of C and other programming languages. But it wasn’t until the mid-2000′s that I revisted C, when I tired of the dead-end that was Java media and tried writing some JNI calls to QuickTime and QTKit (the lloyd and keatonprojects). I never got very far with these, as my C was dreadfully rusty, and furthermore I didn’t understand the conventions of Apple’s C-based frameworks, such as QuickTime and Core Foundation.

It’s only in immersing myself in iOS and Mac since 2008 that I’ve really gotten good with calling C in anger again, because on these platforms, C is a first-class language. At the lower levels — including any framework with “Core” in its name — C is the only language.

And at the Core level, I’m sometimes glad to only have C. For doing something like signal processing in a Core Audio callback, handing me avoid* is just fine. In the higher level media frameworks, we have to pass around samples and frame buffers and such as full-blown objects, and sometimes it feels heavier than it needs to. If you’re a Java/Swing programmer, have you ever had to deal with a big heavyBufferedImage and had to go look through the Raster object or whatever and do some conversions or lookups, when what you really want is to just get at the damn pixels already? Seems to happen a lot with media APIs written in high-level languages. I’m still not convinced that Apple’s AV Foundation is going to work out, and I gag at having to look through the docs for three different classes with 50-character names when I know I could do everything I want with QuickTime’s oldGetMediaNextInterestingTime() if only it were still available to me.

C is underappreciated as an application programming language. Granted, there’s definitely a knack to writing C effectively, but it’s not just the language. Actually, it’s more the idioms of the various C libraries out there. OpenGL code is quite unlike Core Graphics / Quartz, just like OpenAL is unlike Core Audio. And that’s to say nothing of the classic BSD and other open-source libraries, some of which I still can’t crack. Much as I loathe NSXMLParser, my attempt to switch to libxml for the sake of a no-fuss DOM tree ended after about an hour. So maybe it’s always going to be a learning process.

But honestly, I don’t mind learning. In fact, it’s why I like this field. And the fact that a 40-year old language can still be so clever, so austere and elegant, and so damn fast, is something to be celebrated and appreciated.

So, thanks Dennis Ritchie. All these years later, I’m still enjoying the hell out of C.



