2: Starting small: Designing a CMS for one type of content


A full content management system needs to be able to handle many types of content, but our first step is designing a system for one type of content -- stories. The data model will be simpler that the full general data model, but we still need to deal with all the issues of concurrency, workflow, revisioning, etc.

For a story, we need to store the following information:

The data model

A system that stores multiple revisions of a story starts with two tables: one table with a row for every story, and a second table with one row for every revision of a story. Ignoring all other columns, we have:

create table stories (
  story_id              integer primary key,
  ...
);

create table story_revisions (
  revision_id           integer primary key,
  story_id              integer not null references stories,
  ...
);

When a story is created, a row is written into each table. When the next person checks out the story, a second row is added to story_revisions with the same story_id.

All the fields above need to find a place in either stories or story_revisions. If the field should be the same for all revisions, then we say it's a story-level attribute. For example, the story's url is a "story-level attribute" because the URL shouldn't change from revision to revision. If someone fixes a typo and makes a new revision go live, the new revision should be available at the same URL to prevent bookmarks from breaking. If a field can be different across revisions, we say it's a "revision-level attribute".

At the story level, we want to store the short name and department:

create table stories (
  story_id              integer primary key,
  department_id         integer references departments,
  short_name            varchar(200) unique,
  ...
);

The first columns to go into the revisions table are those that we think of as "content" -- the body of the story, the headline, and the subhead.

publication_date also goes into story_revisions because many stories will be updated over time, and the publication date will usually be updated when a new revision goes live.

create table story_revisions (
  revision_id           integer primary key,
  story_id              integer not null references stories,
  publication_date      date,
  headline              varchar(4000),
  subhead               varchar(4000),
  body                  clob,
  ...
);

URLs

Editors should be able to choose intuitive URLs for their stories. Instead of having urls like http://yoursite.com/0,3688,-----2155,00.html, they should be able to choose http://yoursite.com/interviews/2002/may/bush.html.

The solution is to create a virtual filesystem in the database. Stories will look like files, and editors will create directories to organize the stories. Attached at the story level are two fields which specify the story's location in the virtual file system: directory_id and filename.

create table directories (
  directory_id          integer primary key,
  parent_directory      -- for directories inside other directories
                        integer references directories,
  directory_name        -- since this will be part of a URL, we 
                        -- need to be sure this name doesn't have 
                        -- any spaces in it.
                        varchar(200)
);

create table stories (
  ...
  directory_id          integer references directories,
  filename              varchar(200),
  unique(directory_id, filename),
  ...
);

The unique constraint ensures that no two stories can be assigned the same url.

Keeping revisions

When someone checks out a story, a row is added to the revisions table. As they work and save their changes, that row is updated. Only the last revision can be modified, and only by the person who has it checked out. Once that person has checked in the item, the content for that revision will be locked by the system.

This approach has two benefits: (1) we never throw away anybody's contribution, and (2) every revision is the work of only one person, so we can search through the audit log and know exactly who is responsible for each change.

To maintain the audit trail, we add three columns to story_revisions.

create table story_revisions (
  ...
  audit_user            integer not null references users,
  audit_creation        date default sysdate not null,
  audit_modified        date default sysdate not null,
  ...
);

To store information about checked out stories, a table is created that has a row for every item checked out.

create table stories_checked_out (
  story_id              integer primary key references stories,
  checked_out_user      integer not null references users,
  checked_out_date      date default sysdate not null
);

Notice the primary key is story_id instead of revision_id. That's because only one person can have a story checked out, and they're always editing the last revision.

Creating an audit trail for information attached at the story level

To see who modified the text of a story and when, you can view the audit_modified and audit_user columns in story_revisions.. The same can't be done for the attributes at the story level because we keep only one copy of that data across all revisions, so we can't tell who changed what.

We have two options if we want audit information for those columns -- (1) we can create a second audit table for the items attached at the story level, or (2) move those columns to the revision table.

Do we want two different audit systems? In our case, we do. When a writer wants to change a headline, the story is checked out and we create a new revision. But what if the editor wants to change the story's department or URL? We don't want to create a new revision in this case. And we want the editor to be able to make these changes while a writer has the story checked out. The behavior we want is this: (1) checking out a story gives a user exclusive control over the information attached at the revision level, but not over the information attached at the story level; (2) any time a story-level attribute is changed, the old values are copied in to a separate audit table.

To audit story-level information, we first add audit columns to the stories:

create table stories (
  ...
  audit_user            integer not null references users,
  audit_date            date default sysdate not null
);

Then create an audit table that looks exactly like stories except it has a new primary key and no other unique or primary key constraints.

create table stories_audit (
  audit_id              integer primary key,
  story_id              integer references stories,
  department_id         integer references departments,
  directory_id          integer references directories,
  filename              varchar(200),
  short_name            varchar(200),
  audit_user            integer not null references users,
  audit_date            date default sysdate not null
);

stories_audit is populated with a trigger on stories.

Workflow

A possible workflow states for a story are:

Every revision will be in one of these states. When someone checks out a story, we ask them whether they're writing, formatting, or editing. When they check it in, we allow them to choose from any of the states.

Although state information is attached at the revision level, it's unique in that we allow someone to update the state after it has been checked in. If the editor looks at a story and decides that no changes are needed, he should be able to move it to the 'approved' state without creating a new revision. The revision history would look like:

Revision 1 Written by Bob Writer
Revision 2 Formatted by Steve Designer
Approved by Joe Editor

Satisfying this requirement means we use a style of auditing that looks similar to both types mentioned before: we attach state at the revision level, but add an audit table to save old states.

create table story_revisions (
  ...
  state                 varchar(20) not null,
  state_audit_user      integer not null references users,
  state_audit_date      date default sysdate not null
);

create table story_states_audit (
  audit_id              integer primary key,
  revision_id           integer references story_revisions,
  state                 varchar(20) not null,
  state_audit_user      integer not null references users,
  state_audit_date      date default sysdate not null
);

Whenever a revision's state is updated, the old state information is copied into story_states_audit.

The live revision

Being approved isn't the same as being live because there can be several approved revisions, but only one live revision. When the editor wants to make a revision go live, he chooses among approved revisions.

We create a mapping table to store the revision_id of the live revision. We also create an audit table so we have a record of which revisions were live when.

-- One row for every live story

create table story_live_revisions (
  story_id              integer references stories primary key,
  revision_id           integer not null references story_revisions,
  audit_user            integer not null references users,
  audit_date            date default sysdate not null
);

-- One row for every time the live revision changed. The 
-- revision_id column can be null if a story was made unlive.

create table story_live_revisions_audit (
  audit_id              integer primary key,
  story_id              integer not null references stories,
  revision_id           integer references story_revisions,
  audit_user            integer not null references users,
  audit_date            date default sysdate not null
);

Next: A general solution: A data model for all kinds of content


Send questions/comments/errors to dvr@dvrodriguez.com