?

Log in

No account? Create an account

Previous Entry | Next Entry

Why documents fall apart

I'm working on a medium-sized technical document with a couple other people, and the first 10 or 20 minutes of every editing pass I do involves fixing (from my point of view) the structural screwups that the other guy(s) have made, so that the TOC no longer works, references don't refer, formats don't flow etc.

What I've come to believe is that the problem comes from (other people) working in too structured a fashion. Every paragraph, every bulleted list is part of their document's outline. It has a level, it has things that should happen to it because of that level, and when one randomly moves chunks around or plunks down big pieces of new text willy-nilly in the middle, as editors do, that organization gets broken. I, on the other hand, have a simple rule: If it's formatted as one of the Heading styles, it's a heading and goes in the TOC; if it's not, it's body text. Outline, schmoutline.

(And yes, this can be serious -- the last job I did with these people, with a couple of 100+ page documents brought together from many different highly structured files, each with its own structure, turned into a complete clusterfsck, with tables vanishing, illustrations losing their captions, callout fonts changing to greek depending on the size of the illustration, blah blah blah. 20 or 30 unbillable hours just patching things back together.)

Partly I blame this problem on the editor, but then openoffice wasn't exactly meant to do this kind of thing in the first place. The closest analogy that comes to mind is battle war between structure editors and text editors during the early Lisp era. Just as a lisp program is a set of trees represented as S-expressions, the ideal structured text is a tree, seen on screen as a flow of characters. Which of course is the problem. The thing we get to manipulate is the printed representation of the tree(s), not the trees themselves. And many of the operations that we perform on text strings leave the trees in invalid, inconsistent or simply wrong configurations. The people who built structure editors in Interlisp thought long and hard about how to reconcile manipulations of the stuff displayed on the screen with the "true" underlying representation of the program, but for the most part I don't think text-editor and word-processor people have given similar issues similar kinds of thought. They have enough trouble, after all, just keeping valid versions of all the information needed to get attributed text strings out to the screen or the printer.

I'm also kind of surprised (because in the lisp world I adore the notion of structure editors) that, in effect, I'm coming down firmly on the side of pure text from which structure can be derived when it must. Maybe because big ugly natural-language(ish) documents offer so many more syntactic and structural options so that structural editors just don't make sense for how I write. Or maybe because I never wrote a big enough ugly piece of code.