User accessible pages: /manuals/
Manual editing pages: /manuals/admin/
Manual administration pages: /admin/manuals/
Data model: /doc/sql/manuals.sql
The Big Picture
This is a system for managing a set of manuals or books through the database. This system allows users to view a dynamically generated table of contents, view sections and make comments on sections. Administrators can add, delete, edit and rearrange sections. Manuals can also have figures or use image to decorate their pages.
Printable versions of the manual are produced using HTMLDOC. Readers can download the complete manual in HTML or PDF; PostScript is an option but almost never what you want to offer for download because of size relative to PDF.
As an option, the system can be configured to use CVS to manage version control for section content. If there is any chance of concurrent edits, CVS should be installed and turned on in this module. Trying to avoid this by keeping track of locks in the database is ugly and we don't support it because basically you just end up with a bad re-implementation of CVS.
Figures and References
To handle figure and section references in an evolving document we have developed a reference system which is an extension of HTML. This system allows authors to refer to images and sections without knowing where the image is stored, where it appears in the text or what the numbering of a particular section happens to be.
Each section and figure has an entry in the database, label, which is used to make references. Instead of using IMG tags, authors insert images with the tag
References to figures in the text use
A similar construct is used for referring to sections:
When a manual section is served, the above tags are replaced as follows:
with values pulled out of the database as appropriate.
When the text of a section is uploaded or edited, we parse the file to look for any references which aren't already in the database. References to nonexistant sections are not allowed and the user must go back and change the offending reference. References to unknown figures send the user to a page where they can upload a figure from their hard drive to the server.
Our data model
We use three tables to store all content for a manual: manuals, manual_sections and manual_figures.
manuals holds the name of each manual stored on the system. Additional information we keep includes the owner of the manual and the scope of the document (public or restricted to a group).
create table manuals (
manual_id integer primary key,
-- title of the manual
title varchar(500) not null unique,
-- compact title used to generate file names, e.g. short_name.pdf
short_name varchar(100) not null unique,
-- person responsible for the manual (editor-in-chief)
owner_id references users(user_id) not null,
-- a string containing the author or authors which will
-- be included on the title page of the printable version
author varchar(500),
-- copyright notice (may be null)
copyright varchar(500),
-- string describing the version and/or release date of the manual
version varchar(500),
-- if scope=public, this manual is viewable by anyone
-- if scope=group, this manual is restricted to group members
scope varchar(20) not null,
-- if scope=group, this is the owning group_id
group_id references user_groups,
-- is this manual currently active?
active_p char(1) default 'f' check (active_p in ('t','f')),
-- notify the editor-in-chief on all changes to the manual
notify_p char(1) default 't' check (notify_p in ('t','f')),
-- insure consistent state
constraint manual_scope_check check ((scope='group' and group_id is not null)
or (scope='public'))
);
manual_sections holds information about the sections of the manuals:
create table manual_sections (
section_id integer primary key,
-- which manual this section belongs to
manual_id integer references manuals not null,
-- a string we use for cross-referencing this section
label varchar(100),
-- used to determine where this section fits in the document hierarchy
sort_key varchar(50) not null,
-- title of the section
section_title varchar(500) not null,
-- user who first created the section
creator_id references users(user_id) not null,
-- notify the creator whenever content is edited?
notify_p char(1) default 'f' check (notify_p in ('t','f')),
-- user who last edited content for this section
last_modified_by references users(user_id),
-- is there an html file associated with this section?
content_p char(1) default 'f' check (content_p in ('t','f')),
-- determines whether a section is displayed on the user pages
active_p char(1) default 't' check (active_p in ('t','f')),
-- we may want to shorten the table of contents by not displaying all sections
display_in_toc_p char(1) default 't' check (display_in_toc_p in ('t','f')),
-- make sure that sort_keys are unique within a give manual
unique(manual_id,sort_key)
-- want to add the following but can't figure out the syntax
-- contraint manual_label_check check ((label is null) or (unique(manual_id,label))
);
The sort key uses a system similar to that in the threaded bboard system, whereby sections sort lexigraphically and the depth is determined by the length of the sort key. Ex.,
00
01
0100
0101
0102
02
0200
020000
020001
...
...
Unlike the bboard system, we only use digits since it simplifies the code and 100 seems like a reasonable limitation on the number of subsections of a given section. While these are numbers, the database treats them as strings and care must be taken to always single quote sort keys in SQL statements. Similarly, one should be careful to avoid TCL's hangups with leading zeros.
Manual Administration
High level administration occurs in /admin/manuals/. Here, administrators can add or delete manuals, change owners, authorize editors or otherwise dramatically alter the properties of a manual.
Editorial tasks are handled in /manuals/admin/. Here the editor of a manual can add, delete or edit sections of a manual and manipulate the figures contained in a manual.
The system uses CVS to provide support for multiple, simultaneous editors. This means that multiple editors can work on section content at the same time without clobbering each other's changes. Using CVS has the added bonus of keeping a record of what changes were made and by whom.
Figure numbers are generated automatically based on the order they are referenced within the sections of a manual. This requires global processing of the document and can be a relatively expensive operation (compared to the executation time to construct a typical web page). Figure numbers can get out-of-sync whenever figures are added, removed, or rearranged. A figure-numbering procedure runs nightly to update figure numbers, but this can also be done on demand from the admin page for a manual.
HTMLDOC
We run a nightly proc to shove all the parts of each manual into one big file then run HTMLDOC on it to generate PostScript and PDF versions of the manual. This is easy.
The hard part is getting around how braindead HTMLDOC is. First, it requires a strict heirarchy for heading tags. This is accomplished by forbidding the authors from putting any tags by hand. All heading tags are generated on the fly at the appropriate level based on the table of contents for the manual.
It works out that this is not the only stupidity in HTMLDOC's parser. Things like
some text
also confuse it. While most HTML editors are standard compliant in this respect, it seems that MS Word really likes to produce stuff like this. I don't know of any solution to this other than strongly encouraging authors not to use MS Word to generate their documents. Since the HTML produced by Word is deficient in several other ways, this probably won't be a big problem.
Yet another fun aspect of HTMLDOC is that it seems to have some problems with images if an absolute path is given, although I'm just guessing here since this doesn't seem to be documented.
Currently we are using HTMLDOC version 1.7. The latest version is 1.8.4. Possibly some of these problems are solved in later releases. However, there doesn't seem to be a version history on their web page, so the only way to find out seems to be to download the new version and install it and see.
Future Improvements
Need to add CVS tagging so that a group of section revisions can be associated with a particular "release" of the manual and old versions of the manual can be retrieved on the fly.
Need to add full-text searching of manual content.