« Metal Gear on MSXZZT »

Literate Programming

28/04/08

Permalink 10:57:45 am, 996 words
Categories: Code, Linguistics

Literate Programming

I’m just finishing up a project that used Don Knuth’s literate programming environment CWEB. It is a macro system for C that lets you write documentation and code in the same file, and then untangles them into two separate files. Then it compiles both. You end up with a program and a PDF. The PDF is divided into sections, which are supposed to contain a few paragraphs of writing and 5-20 lines of code. (At least that’s what the project was like that I worked with.) The code is peppered with references to other sections, which are then filled in by the macro processor. So a section looks like this:

Divide words into syllables using the Levenshtein-based algorithm written by the MUN CS student (whose name I cannot remember). If the word cannot be properly syllabified, it is divided based on fallback rules that attempt to divide based on simple Consonant-Vowel sequences that puts everything else into its own syllable.

for(int i = 0; i < SYLLABLES; i++) {
  int result = 0;
  <12: Find out if syllabification succeeds>
  if(result) {
    <14: Syllabify word>
  } else {
    <20: Divide word based on fallback rules>
  }
  <21: Add divided word to word list>
}

Literate programming was promoted in the 80s as a way to better organise your programs. The idea was the programming should be more like writing, and that you could understand the program faster by reading through the PDF. Literate programming has never become popular. Don Knuth thinks that there aren’t enough people who are programmers and writers, and this is probably true. But I think a more important reason is that it’s because it’s inferior to better technologies, like object-oriented programming, or even just languages with modules (unlike C, on which CWEB is based).

There are two problems with literate programming in practise. The first is that the recommended section size is too small for C when you code the actual innards. Honestly, you need 10-30 lines to do anything interesting in C. So in the example above, section 21 would be one line in Java but around five in C, depending on how pointy* your code is. It’s not really worth it’s own section. But you have to document it anyway, and you feel bad if you don’t have at least a paragraph even though the section title is good enough.
Even worse, in a language with a list class, the code is self-documenting because it’s words.add(divided). There’s no way not to write an inane paragraph about that. (What’s more, in a high-level language, there IS no section 21 because you wrapped everything in a list comprehension instead of a for loop.)

So now you have the high-level structure of your programming chopped into 5-20 line chunks (that’s good) with section references to the low-level structure separated into other 5-20 line chunks (that’s bad). Now to understand the program, a reader starts off with the high-level part (that’s good). But if you really need to understand how the program works, all the way to the lowest level, you have to read the low-level parts too (that’s bad).

The writing at the low-level sections isn’t helpful (it’s inane stuff like ‘assign x to y, then increment x’), so you have to read the low-level code. Now you have a problem. You can’t see enough low-level code at once: because the sections are so small, section 21’s code is bereft of context. So you need to look at other low-level sections. This means that you have to backtrack up to section 11, then down to section 20 or section 18 (via section 14) to find the final representation of a divided word so that it can be easily added to the list. And none of this is automated! It’s all on paper! There is a good index but it’s got nothing on Eclipse or even text search in Emacs.

The second problem is related to the first problem of too many little chunks. But it’s worse, because it’s a problem with the high-level structure, which is also cursed (that’s bad). Because the program relies on the CWEB macro system to structure the code, it doesn’t use the structures provided by the language. C only has a few ways to structure code, but if you judiciously use structs, functions and pointers, I believe you can write a good, comprehensible C program that contains small functions with well-contained variable scope so that you don’t have to understand the whole program to understand any single part of it. But programs written for CWEB look so organised already that it can’t hurt to use globals and have everything run in one long C function. C is just object code for CWEB anyway.

Well, the scope rules are still C scope rules. So when section 20 uses x, you have to know where it was created (that was section 7), who might have changed it (sections 12 and 13) and who uses it after 20 (section 21, at least). You’d be much better off working on breaking the above code into smaller functions whose scope is enforced by the compiler.

So the problem is the same as that of exceptions, I guess. Good code and bad code look the same in literate programming, and you don’t even get the massively cleaner code that exceptions give you. But literate programming is a lot less likely to be adopted than exceptions, because exceptions make your program shorter (usually), while literate programming makes it longer.

I told you I was finishing up a project based on some old code that was written using the literate programming environment CWEB. So the above problems are based on a real case study. It was a dissertation with a complicated algorithm that was fairly new at the time. Sounds like a perfect application for literate programming. It turns out that the design document (the dissertation) was a far better explanation of the program than the literate program was, even though it consisted mostly of math notation mixed with some pseudocode.

*Good at using pointers.

4 comments

Comment from: Kaleb [Visitor] Email · http://kalebcaptain.com
Ha,

As always, I enjoy how you pimp slap bad code, bad techniques, bad idea...
28/04/08 @ 12:27
Comment from: jeo [Visitor] Email
Sounds to me like the main problem was not literate programming itself, but that you slavishly followed rules instead of applying good taste.

Knuth's own literate implementation of Tex, which he published as a book, does not follow the rules like you did.
29/04/08 @ 07:34
Comment from: sandersn [Member]
True enough. Good taste outweighs any technique or technology. I think the question is whether literate programming helps the average programmer as much as OO or functional styles. Just "following the rules" in Java, would I think, produce slightly better code than in CWEB. If nothing else, the structure is compiler-enforced, and there are automated tools for dealing with byzantine messes. With literate programming you just get a PDF.

Literate programming was presented as the Next Step beyond structured programming at the time when Object Oriented programming was doing the same thing. But I think it does less than OO for improving code quality of the average programmer.

(Incidentally, this is not my own code, but a code base I have been working on recently)
29/04/08 @ 10:19
Comment from: Mike [Visitor] Email
But it comes with a free frogurt! (that's good)


Seriously, this pretty much matches my impression of literate programming. I remember reading Knuth's original paper when it came out and thinking then that it was a great idea. It still is--it just seems that no one can figure out how to make it work in practice.

29/04/08 @ 12:08

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
powered by b2evolution free blog software