3: A general solution: A data model for all kinds of content
This chapter expands on the ideas of the previous chapter to build a data model that will work for multiple types of content. In addition to storing news stories, our CMS will hold images and free-form pages (e.g., the "about us" page found on most sites).
For an image, we want to store:
- The caption
- The image's mime type
- The image file
For a free-form page, we want to store:
- The title of the page
- the body of the page
How are these types of content similar and dissimilar from stories? First, the similarities:
- The two-table structure is still correct. Regardless of the type of content, there is still fundamentally (1) the content item and (2) several revisions of that item.
- Every revision has a workflow state.
- Every revision should carry with it a record of who last modified its content and state.
- Every content item belongs to a department
- Every revision should be able to have a publication date.
- Every piece of content has either no name, or a name that's unique among all content.
- Some pieces of content will have URLs
The dissimilar fields are headline, subhead, and body for a story -- in other words, the content fields. Reviewing all the columns from the last chapter, these are the only three that aren't common for all types of content.
The first step towards a general data model is creating a table to hold information about the types of content our system will support. For each type, we store: the name we show to users ('News story', etc), and a unique string ('story', 'image', 'generic_page')
create table cm_types ( type_key varchar(20) primary key, type_name varchar(100) not null unique );The next step is separating the story-specific columns from the columns needed for all content. We remove headline, subhead, and body from the revisions table and rename those tables cm_items and cm_revisions.
create table cm_items ( item_id integer primary key, type_key references cm_types, short_name varchar(1000) unique, department_id integer not null references cm_departments, directory_id integer references cm_directories, filename varchar(200), unique(directory_id, filename), audit_user integer not null references users, audit_date date default sysdate not null ); create table cm_revisions ( revision_id integer primary key, item_id integer not null references cm_items, publication_date date, audit_user integer not null references users, audit_creation_date date default sysdate not null, audit_modified_date date, ... );Since each content type has its own set of fields, each type gets its own set of "extension tables". The story extension table holds the columns we removed from cm_revisions.
create table cmx_r_story ( revision_id integer references cm_revisions primary key, headline varchar(4000), subhead varchar(4000), body clob );The explanation behind the table name: 'cmx' says it's an extension table for the cm_ tables, 'r' says this extends the revision table, and 'story' is the type_key. Following this convention allows anyone to look at a table name and know exactly why it's there.
For stories, every insert into cm_revisions is accompanied by an insert into cmx_r_story. Updates are also done in pairs. Since both tables are always modified at the same time, we don't need auditing columns in the extension table because this information is already captured in cm_revisions.
The 'image' content type has a similar extension table:
create table cmx_r_image ( revision_id integer references cm_revisions primary key, file_num integer, mimetype varchar(100), caption varchar(4000) );Notice that there's no column to store the body of the image. Images will be stored in the file system, each file being named <item_id>-<file_num>, where file_num is a counter that increases every time the file is modified. Here's how it works: when the image is uploaded the first time, we set file_num to 1. If someone checks it out to update the caption but doesn't modify the image, we create a second revision but leave the file_num at 1. If they upload a new image, we increment file_num and create a new file. This method allows us to avoid unnecessarily duplicating the image, which could be important if the images are large.
For free-form pages, the extension table is similar to the one for stories:
create table cmx_r_page ( revision_id integer references cm_revisions primary key, title varchar(1000), body clob );Workflow
Each type of content will have its own set of possible workflow states. Instead of hard-coding these into the application, we store them as data so they can be changed later without modifying code.
There will be one master list of all possible states, and one mapping table which says which states are allowed for each content type.
For each state we'll store:
- the "pretty name" (the name we show to users).
- a flag which determines whether this state is available only to editors, or to all staff members.
- a flag which says whether this state is an "action state" (typically the states ending in 'ing' -- writing, editing, etc.) When someone checks out an item, we force them to choose from one of the action states.
In the state-type mapping table, we have a sort_key column that allows us to show the states in the order people expect:
Assigned
Writing
Written
Formatting
Formatted
Editing
Edited
ApprovedWithout the sort key, the best we could do is sort them alphabetically, which would be confusing:
Approved
Assigned
Edited
Editing
Formatted
Formatting
Written
Writing
The table definitions are:
create table cm_workflow_states ( state_key varchar(20) primary key, state_name varchar(100) not null, editor_state_p number(1) default 0 not null check(editor_state_p in (1, 0)), action_state_p number(1) default 0 not null check(action_state_p in (1, 0)) ); create table cm_workflow_state_type_map ( state_key varchar(20) references cm_workflow_states, type_key varchar(20) references cm_types, primary key(state_key, type_key), sort_key integer not null, -- be sure the sort_key is unique amoung states for that -- content type unique(type_key, sort_key) ); create table cm_revisions ( ... state_key varchar(20) references cm_workflow_states, state_audit_user integer not null references users, state_audit_date date default sysdate not null, ... );(A note on naming conventions: the _p at the end of editor_state_p and checkout_state_p says it's a boolean field. We use number(1) to store 0 or 1 (false and true, respectively) because Oracle's boolean datatype can't be used in tables.)
Putting it all together
-- Users create sequence users_sequence; create table users ( user_id integer constraint users_pk primary key, first_name varchar(100), last_name varchar(100), email varchar(100) unique, password varchar(100) ); create sequence cm_sequence; create sequence cm_audit_sequence; -- Content types create table cm_types ( type_key varchar(20) constraint cm_types_pk primary key, type_name varchar(100) not null unique ); -- Departments create table cm_departments ( department_id integer constraint cm_departments_pk primary key, department_key varchar(50) unique not null, department_name varchar(50) unique not null ); -- Workflow create table cm_workflow_states ( state_key varchar(20) constraint cm_workflow_states_pk primary key, state_name varchar(100) not null, editor_state_p number(1) default 0 not null constraint cm_workflow_states_edtr_st_ck check(editor_state_p in (1, 0)), action_state_p number(1) default 0 not null constraint cm_workflow_states_actn_st_ck check(action_state_p in (1, 0)) ); -- The workflow states possible for each type create table cm_workflow_state_type_map ( state_key varchar(20) constraint cm_workflow_s_t_map_state_fk references cm_workflow_states, type_key varchar(20) constraint cm_workflow_s_t_map_type_fk references cm_types, constraint cm_workflow_s_t_map_pk primary key(state_key, type_key), sort_key integer not null, -- be sure the sort_key is unique amoung states for that -- content type constraint cm_workflow_s_t_map_sort_un unique(type_key, sort_key) ); -- Templates create table cm_templates ( template_id integer constraint cm_templates_pk primary key, filename varchar(100) unique not null, description varchar(4000) ); -- The directories of our virtual filesystem. Items are the files. create table cm_directories ( directory_id integer constraint cm_directories_pk primary key, parent_directory -- for directories inside other directories integer constraint cm_directories_parent_fk references cm_directories, directory_name -- since this will be part of a URL, we need to -- be sure this name doesn't have any spaces in it. varchar(200) not null ); -- The root directory insert into cm_directories (directory_id, directory_name) values (0, 'Root directory'); -- Items and revisions create table cm_items ( item_id integer constraint cm_items_pk primary key, type_key constraint cm_items_type_fk references cm_types, short_name varchar(1000) unique, department_id integer not null constraint cm_items_department_fk references cm_departments, directory_id integer constraint cm_items_directory_fk references cm_directories, filename varchar(200), constraint cm_item_dir_filename_un unique(directory_id, filename), audit_user integer not null constraint cm_items_auser_fk references users, audit_date date default sysdate not null ); create table cm_items_audit ( audit_id integer constraint cm_items_audit_pk primary key, item_id integer constraint cm_items_audit_item_fk references cm_items, type_key constraint cm_items_audit_type_fk references cm_types, short_name varchar(1000), department_id integer not null constraint cm_items_audit_department_fk references cm_departments, directory_id integer constraint cm_items_audit_directory_fk references cm_directories, filename varchar(200), audit_user integer not null constraint cm_items_audit_auser_fk references users, audit_date date default sysdate not null ); create table cm_revisions ( revision_id integer constraint cm_revision_pk primary key, item_id integer not null constraint cm_revision_item_fk references cm_items, publication_date date, audit_user integer not null constraint cm_revision_auser_fk references users, audit_creation_date date default sysdate not null, audit_modified_date date, state_key varchar(20) constraint cm_revisions_state_fk references cm_workflow_states, state_audit_user integer not null constraint cm_revision_state_auser_fk references users, state_audit_date date default sysdate not null, template_id integer constraint cm_revisions_template_fk references cm_templates ); -- If a revision's state is updated, the old state is written here: create table cm_revisions_state_audit ( audit_id integer constraint cm_revisions_state_a_pk primary key, revision_id integer not null constraint cm_revisions_state_a_reb_fk references cm_revisions, state_key varchar(20) not null constraint cm_revisions_state_a_state_fk references cm_workflow_states, state_audit_user integer not null constraint cm_revisions_state_a_auser_fk references users, state_audit_date date default sysdate not null ); -- Live revisions. One row for every live item. create table cm_live_revisions ( item_id integer constraint cm_live_revisions_item_fk references cm_items constraint cm_live_revisions_pk primary key, revision_id integer not null constraint cm_live_revisions_rev_fk references cm_revisions, audit_user integer not null constraint cm_live_revisions_auser_fk references users, audit_date date default sysdate not null ); -- An audit table for cm_live_revisions so we know which revisions -- where live and when. The revision_id column can be null if a item -- was made unlive create table cm_live_revisions_audit ( audit_id integer constraint cm_live_revisions_audit_pk primary key, item_id integer not null constraint cm_live_revisions_aud_item_fk references cm_items, revision_id integer constraint cm_live_revisions_aud_rev_fk references cm_revisions, audit_user integer not null constraint cm_live_users_auser_fk references users, audit_date date default sysdate not null ); -- One row for every item checked out create table cm_items_checked_out ( item_id integer constraint cm_items_checked_out_item_fk references cm_items constraint cm_items_checked_out_pk primary key, user_id integer not null constraint cm_items_checked_out_user_fk references users, checked_out_date date default sysdate not null ); -- -- Extension tables for content types -- create table cmx_r_story ( revision_id integer constraint cms_v_story_revision_fk references cm_revisions constraint cmx_r_story_pk primary key, headline varchar(4000), subhead varchar(4000), body clob ); create table cmx_r_image ( revision_id integer constraint cmx_r_image_revision_fk references cm_revisions constraint cmx_r_image_pk primary key, file_num integer, mimetype varchar(100), caption varchar(4000) ); create table cmx_r_page ( revision_id integer constraint cmx_r_page_revision_fk references cm_revisions constraint cmx_r_page_pk primary key, title varchar(1000), body clob );Next: Encapsulating the system's rules within views and stored procedures
Send questions/comments/errors to dvr@dvrodriguez.com