3: A general solution: A data model for all kinds of content


This chapter expands on the ideas of the previous chapter to build a data model that will work for multiple types of content. In addition to storing news stories, our CMS will hold images and free-form pages (e.g., the "about us" page found on most sites).

For an image, we want to store:

For a free-form page, we want to store:

How are these types of content similar and dissimilar from stories? First, the similarities:

The dissimilar fields are headline, subhead, and body for a story -- in other words, the content fields. Reviewing all the columns from the last chapter, these are the only three that aren't common for all types of content.

The first step towards a general data model is creating a table to hold information about the types of content our system will support. For each type, we store: the name we show to users ('News story', etc), and a unique string ('story', 'image', 'generic_page')

create table cm_types (
  type_key              varchar(20) primary key,
  type_name             varchar(100) not null unique
);

The next step is separating the story-specific columns from the columns needed for all content. We remove headline, subhead, and body from the revisions table and rename those tables cm_items and cm_revisions.

create table cm_items (
  item_id               integer primary key,
  type_key references cm_types,
  short_name            varchar(1000) unique,
  department_id         integer not null references cm_departments,
  directory_id          integer references cm_directories,
  filename              varchar(200), 
  unique(directory_id, filename),
  audit_user            integer not null references users,
  audit_date            date default sysdate not null
);

create table cm_revisions (
  revision_id           integer primary key,
  item_id               integer not null references cm_items,
  publication_date      date,
  audit_user            integer not null references users,
  audit_creation_date   date default sysdate not null,
  audit_modified_date   date,
  ...
);

Since each content type has its own set of fields, each type gets its own set of "extension tables". The story extension table holds the columns we removed from cm_revisions.

create table cmx_r_story (
  revision_id           integer references cm_revisions primary key,
  headline              varchar(4000),
  subhead               varchar(4000),
  body                  clob
);

The explanation behind the table name: 'cmx' says it's an extension table for the cm_ tables, 'r' says this extends the revision table, and 'story' is the type_key. Following this convention allows anyone to look at a table name and know exactly why it's there.

For stories, every insert into cm_revisions is accompanied by an insert into cmx_r_story. Updates are also done in pairs. Since both tables are always modified at the same time, we don't need auditing columns in the extension table because this information is already captured in cm_revisions.

The 'image' content type has a similar extension table:

create table cmx_r_image (
  revision_id           integer references cm_revisions primary key, 
  file_num              integer,
  mimetype              varchar(100),
  caption               varchar(4000)
);

Notice that there's no column to store the body of the image. Images will be stored in the file system, each file being named <item_id>-<file_num>, where file_num is a counter that increases every time the file is modified. Here's how it works: when the image is uploaded the first time, we set file_num to 1. If someone checks it out to update the caption but doesn't modify the image, we create a second revision but leave the file_num at 1. If they upload a new image, we increment file_num and create a new file. This method allows us to avoid unnecessarily duplicating the image, which could be important if the images are large.

For free-form pages, the extension table is similar to the one for stories:

create table cmx_r_page (
  revision_id           integer references cm_revisions primary key,
  title                 varchar(1000),
  body                  clob
);

Workflow

Each type of content will have its own set of possible workflow states. Instead of hard-coding these into the application, we store them as data so they can be changed later without modifying code.

There will be one master list of all possible states, and one mapping table which says which states are allowed for each content type.

For each state we'll store:

In the state-type mapping table, we have a sort_key column that allows us to show the states in the order people expect:

Assigned
Writing
Written
Formatting
Formatted
Editing
Edited
Approved

Without the sort key, the best we could do is sort them alphabetically, which would be confusing:

Approved
Assigned
Edited
Editing
Formatted
Formatting
Written
Writing

The table definitions are:

create table cm_workflow_states (
  state_key             varchar(20) primary key,
  state_name            varchar(100) not null,
  editor_state_p        number(1) default 0 not null 
                        check(editor_state_p in (1, 0)),
  action_state_p        number(1) default 0 not null 
                        check(action_state_p in (1, 0))
);

create table cm_workflow_state_type_map (
  state_key             varchar(20) references cm_workflow_states,
  type_key              varchar(20) references cm_types, 
  primary key(state_key, type_key),
  sort_key              integer not null,
  -- be sure the sort_key is unique amoung states for that 
  -- content type 
  unique(type_key, sort_key)
);

create table cm_revisions (
  ...
  state_key             varchar(20) references cm_workflow_states,
  state_audit_user      integer not null references users,
  state_audit_date      date default sysdate not null,
  ...
);

(A note on naming conventions: the _p at the end of editor_state_p and checkout_state_p says it's a boolean field. We use number(1) to store 0 or 1 (false and true, respectively) because Oracle's boolean datatype can't be used in tables.)

Putting it all together

-- Users

create sequence users_sequence;

create table users (
  user_id               integer 
                        constraint users_pk
                        primary key,
  first_name            varchar(100),
  last_name             varchar(100),
  email                 varchar(100) unique,
  password              varchar(100)
);

create sequence cm_sequence;
create sequence cm_audit_sequence;

-- Content types

create table cm_types (
  type_key              varchar(20)
                        constraint cm_types_pk
                        primary key,
  type_name             varchar(100) not null unique
);

-- Departments

create table cm_departments (
  department_id         integer 
                        constraint cm_departments_pk
                        primary key,
  department_key        varchar(50) unique not null,
  department_name       varchar(50) unique not null
);

-- Workflow

create table cm_workflow_states (
  state_key             varchar(20)
                        constraint cm_workflow_states_pk
                        primary key,
  state_name            varchar(100) not null,
  editor_state_p        number(1) default 0 not null
                        constraint cm_workflow_states_edtr_st_ck
                        check(editor_state_p in (1, 0)),
  action_state_p        number(1) default 0 not null
                        constraint cm_workflow_states_actn_st_ck
                        check(action_state_p in (1, 0))
);

-- The workflow states possible for each type

create table cm_workflow_state_type_map (
  state_key             varchar(20)
                        constraint cm_workflow_s_t_map_state_fk
                        references cm_workflow_states,
  type_key              varchar(20)
                        constraint cm_workflow_s_t_map_type_fk
                        references cm_types,
  constraint cm_workflow_s_t_map_pk
  primary key(state_key, type_key),
  sort_key              integer not null,
  -- be sure the sort_key is unique amoung states for that 
  -- content type
  constraint cm_workflow_s_t_map_sort_un
  unique(type_key, sort_key)
);

-- Templates

create table cm_templates (
  template_id           integer 
                        constraint cm_templates_pk
                        primary key,
  filename              varchar(100) unique not null,
  description           varchar(4000)
);

-- The directories of our virtual filesystem. Items are the files.

create table cm_directories (
  directory_id          integer 
                        constraint cm_directories_pk
                        primary key,
  parent_directory      -- for directories inside other directories
                        integer 
                        constraint cm_directories_parent_fk
                        references cm_directories,
  directory_name        -- since this will be part of a URL, we need to 
                        -- be sure this name doesn't have any spaces in it.
                        varchar(200) not null
);

-- The root directory

insert into cm_directories (directory_id, directory_name)
    values (0, 'Root directory');

-- Items and revisions

create table cm_items (
  item_id               integer 
                        constraint cm_items_pk
                        primary key,
  type_key              constraint cm_items_type_fk
                        references cm_types,
  short_name            varchar(1000) unique,
  department_id         integer not null
                        constraint cm_items_department_fk
                        references cm_departments,
  directory_id          integer
                        constraint cm_items_directory_fk
                        references cm_directories,
  filename              varchar(200),
  constraint cm_item_dir_filename_un
  unique(directory_id, filename),
  audit_user            integer not null 
                        constraint cm_items_auser_fk
                        references users,
  audit_date            date default sysdate not null
);

create table cm_items_audit (
  audit_id              integer
                        constraint cm_items_audit_pk
                        primary key,
  item_id               integer 
                        constraint cm_items_audit_item_fk
                        references cm_items,
  type_key              constraint cm_items_audit_type_fk
                        references cm_types,
  short_name            varchar(1000),
  department_id         integer not null
                        constraint cm_items_audit_department_fk
                        references cm_departments,
  directory_id          integer
                        constraint cm_items_audit_directory_fk
                        references cm_directories,
  filename              varchar(200),
  audit_user            integer not null 
                        constraint cm_items_audit_auser_fk
                        references users,
  audit_date            date default sysdate not null
);

create table cm_revisions (
  revision_id           integer 
                        constraint cm_revision_pk
                        primary key,
  item_id               integer not null
                        constraint cm_revision_item_fk
                        references cm_items,
  publication_date      date,
  audit_user            integer not null 
                        constraint cm_revision_auser_fk
                        references users,
  audit_creation_date   date default sysdate not null,
  audit_modified_date   date,
  state_key             varchar(20)
                        constraint cm_revisions_state_fk
                        references cm_workflow_states,
  state_audit_user      integer not null
                        constraint cm_revision_state_auser_fk
                        references users,
  state_audit_date      date default sysdate not null,
  template_id           integer
                        constraint cm_revisions_template_fk
                        references cm_templates
);

-- If a revision's state is updated, the old state is written here:

create table cm_revisions_state_audit (
  audit_id              integer 
                        constraint cm_revisions_state_a_pk
                        primary key,
  revision_id           integer not null
                        constraint cm_revisions_state_a_reb_fk
                        references cm_revisions,
  state_key             varchar(20)
                        not null
                        constraint cm_revisions_state_a_state_fk
                        references cm_workflow_states,
  state_audit_user      integer not null 
                        constraint cm_revisions_state_a_auser_fk
                        references users,
  state_audit_date      date default sysdate not null
);

-- Live revisions. One row for every live item.

create table cm_live_revisions (
  item_id               integer
                        constraint cm_live_revisions_item_fk
                        references cm_items
                        constraint cm_live_revisions_pk
                        primary key,
  revision_id           integer not null
                        constraint cm_live_revisions_rev_fk
                        references cm_revisions,
  audit_user            integer not null 
                        constraint cm_live_revisions_auser_fk
                        references users,
  audit_date            date default sysdate not null
);

-- An audit table for cm_live_revisions so we know which revisions
-- where live and when. The revision_id column can be null if a item 
-- was made unlive

create table cm_live_revisions_audit (
  audit_id              integer 
                        constraint cm_live_revisions_audit_pk
                        primary key,
  item_id               integer not null
                        constraint cm_live_revisions_aud_item_fk
                        references cm_items,
  revision_id           integer
                        constraint cm_live_revisions_aud_rev_fk
                        references cm_revisions,
  audit_user            integer not null 
                        constraint cm_live_users_auser_fk
                        references users,
  audit_date            date default sysdate not null
);

-- One row for every item checked out

create table cm_items_checked_out (
  item_id               integer
                        constraint cm_items_checked_out_item_fk
                        references cm_items
                        constraint cm_items_checked_out_pk
                        primary key,
  user_id               integer not null
                        constraint cm_items_checked_out_user_fk
                        references users,
  checked_out_date      date default sysdate not null
);

--
-- Extension tables for content types
-- 

create table cmx_r_story (
  revision_id           integer 
                        constraint cms_v_story_revision_fk
                        references cm_revisions
                        constraint cmx_r_story_pk
                        primary key,
  headline              varchar(4000),
  subhead               varchar(4000),
  body                  clob
);

create table cmx_r_image (
  revision_id           integer
                        constraint cmx_r_image_revision_fk
                        references cm_revisions
                        constraint cmx_r_image_pk
                        primary key, 
  file_num              integer,
  mimetype              varchar(100),
  caption               varchar(4000)
);

create table cmx_r_page (
  revision_id           integer 
                        constraint cmx_r_page_revision_fk
                        references cm_revisions
                        constraint cmx_r_page_pk
                        primary key,
  title                 varchar(1000),
  body                  clob
);

Next: Encapsulating the system's rules within views and stored procedures


Send questions/comments/errors to dvr@dvrodriguez.com