4: Encapsulating the system's rules within views and stored procedures
The data model described in the previous chapter is clean, logical, but not entirely pleasant to work with. Getting a single piece of content into the database can take a half-dozen inserts, and there are a number of rules that need to be followed, for example:
- Only the last revision of any content item can be updated, and only by the person who has it checked out
- Every time content is updated, the audit_date must be updated
- Only editors can mark content 'approved'
- A revision can only go live if it's in the 'approved' state.
- Content can only be shown to users if it's been made live.
As the size of the development team grows, an increasing amount of energy could be spent making sure all members of the team know and follow the system's rules. A better solution is to create a set of views and stored procedures that encapsulate the system's rules and hide much of the complexity. If a developer is writing a page to update a revision's state, instead of writing 'update cm_revisions ...', he'll call a stored procedures that runs the necessary checks and throws an error if the update violates any of the system's rules. The set of views and plsql that implement these rules are called our "core plsql".
A major benefit of this approach is that only a fraction of the development team (perhaps only one person) needs to know and understand all the system's rules. This "core team" will know the CMS inside and out, and he/they will write the views and stored procedures; everyone else only needs to know how to call the stored procedures and be able to read the error messages that result from calling functions at the wrong time.
Maintaining the core won't be much work because it will be a small fraction of the code base for most sites. For a complicated site, the CMS might be around 25% of the site's code. Figure the core plsql will be equal to about 20% of the CMS' code, and you see that the core plsql is only 5% of the site's code. That's likely small enough to be maintained by one person.
The inside team should look at the outside team as its customer -- their job is to provide the outside team with the views and stored that make them as productive as possible.
Banning inserts and updates
If a database insert or update should happen only after a few conditions have been met, these columns should only be updated by calling stored procedures that verify these conditions have been met before doing the update. For example, the admin page that allows an editor to choose a url for an item won't do:
update cm_items set directory_id = :directory_id, filename = :filename where item_id = :item_idbut will instead call the procedure cm_item_set_url:
cm_item_set_url ( p_item_id => :item_id, p_directory_id => :directory_id, p_filename => :filename, p_user_id => :user_id );which runs the necessary checks before doing the insert/update. It verifies that:
- the url isn't taken by another item
- directory_id and filename are both null, or that both are non-null (we don't allow users to choose a filename without a directory, or a directory without a filename).
- The user is allowed to make this change
This third check explains why user_id is included as an argument. Since our CMS will support fine-grained permissions (explained in more detail later), all our stored procedures will take user_id as an argument.
About half our tables will be off-limits to raw updates and inserts. Stored procedures will be the only way we want people updating data in these tables:
- cm_items
- cm_items_audit
- cm_revisions_state_audit
- cm_live_revisions
- cm_live_revisions_audit
- cm_items_checked_out
In cm_revisions, state_key and all audit columns should be updated only through stored procedures.
Passing human-readable error message
If cm_item_set_url tries to assign two items the same url, it should throw an exception that would make sense to a human. Since the data model has a unique constraint on these columns, Oracle would throw a "unique constraint violated" if we tried the insert, but we want to catch the error before that point and return a message like "That url is already taken".
Human-readable error messages are required if we want the system's rules to be encapsulated inside the core plsql. Put yourself in the position of the developer writing the admin page that allows staff members to update a story's url: you write the form that allows them to choose the directory and write in a filename, and your code passes those values to cm_item_set_url. If cm_item_set_url returns a generic oracle error, you would need to figure out what went wrong. Since the most likely error is specifying a url that's already taken, your code would query against cm_items to see if that was the case. But at this point, the benefit of the stored procedure is lost because the logic inside the stored procedure is being duplicated outside the stored procedures. If cm_item_set_url returns a human-readable error, you can send any errors directly to the end-user and know they'll understand what went wrong, and what they need to do to fix it.
Therefore, beyond enforcing our system's rules, our core plsql should generate human-readable error message for any error that results from someone trying to violate one of the system's rules. In the case of cm_item_set_url, a human-readable error needs to be specified for the three things that might go wrong. Here's the source:
create or replace procedure cm_item_set_url ( -- -- Sets the directory and filename for an item. If the url is -- already taken, an exception will be thrown. -- -- We don't allowed a directory to be chosen without a filename, -- or vice-versa. An exception will be thrown in either case. -- -- If p_item_id and p_directory_id are both null, we delete the -- url for this item. -- -- p_user_id is used to verify the user is allowed to make this -- change. -- p_item_id in integer, p_directory_id in integer, p_filename in cm_items.filename%type, p_user_id in integer ) as v_url_taken_p integer; begin cm_item_verify_allowed ( p_item_id => p_item_id, p_user_id => p_user_id, p_action => 'edit_url', p_error => 'You''re not allowed to change this item''s url' ); if (p_directory_id is null and p_filename is null) or (p_directory_id is not null and p_filename is not null) then if p_directory_id is not null then -- See if another item is already using this url select count(*) into v_url_taken_p from cm_items where directory_id = p_directory_id and filename = p_filename and item_id not in (p_item_id); if v_url_taken_p = 1 then cm_throw_exception('That url is taken by another piece of content'); end if; end if; update cm_items set directory_id = p_directory_id, filename = p_filename, audit_user = p_user_id, audit_date = sysdate where item_id = p_item_id; else -- We don't allow them to specify a filename without a -- directory, or a directory without a filename. They must -- has both or neither cm_throw_exception('Either the filename and directory should be chosen, or both should be empty'); end if; end;Example:: When a user saves a change to an item they have checked out, the CMS first calls cm_revision_update with the revision_id of the revision being updated and the user_id of the user trying to edit the content. If this user isn't allowed to update this revision, the procedure throws an exception.
create or replace procedure cm_revision_update ( -- -- Update the audit information for a revision. If this -- user doesn't have the item checked out, of if this is -- not the last revision, throw an exception. -- p_revision_id in integer, p_user_id in integer ) as v_item_id integer; v_last_revision_id integer; v_checked_out_to_user_p integer; begin -- Get the item_id for this revision v_item_id := cm_item_for_revision(p_revision_id); -- Be sure the user has the item checked out select count(*) into v_checked_out_to_user_p from cm_items_checked_out where item_id = v_item_id and user_id = p_user_id; if v_checked_out_to_user_p = 0 then cm_throw_exception('This content isn''t checked out to you'); end if; -- We only allow updates to the last revision. select max(revision_id) into v_last_revision_id from cm_revisions where item_id = v_item_id; if v_last_revision_id <> p_revision_id then cm_throw_exception('Only the latest revision can be updated.'); end if; -- All the checks passed. Update the audit columns update cm_revisions set audit_user = p_user_id, audit_modified_date = sysdate where revision_id = p_revision_id; end;Example: The procedure to make an item go live, cm_item_set_live is called with a revision_id and the user_id of the user making the change. If this person isn't allowed to make content live, it should throw an exception.
cm_item_set_live also must follow these rules:
- The old revision_id must be written into the audit table before the new live revision_id is set.
- If the admin tries to make live a revision that's already live, we shouldn't change a thing (and the audit information shouldn't be updated).
- If we're removing the live revision (by calling cm_item_set_live with revision_id = null), we should delete the row from cm_live_revisions and write an extra row in the audit table to record who removed the live revision
The source:
create or replace procedure cm_item_set_live_revision ( -- -- Set the live revision for an item. If passed a null -- revision_id, remove the live revision if one exists. -- -- p_user_id is used to verify this user has permission to make -- the change. -- p_item_id in integer, p_revision_id in integer, p_user_id in integer ) as -- Is this revision already the live revision? If so, we -- don't want to update the audit information. v_no_change_p integer; begin cm_item_verify_allowed ( p_item_id => p_item_id, p_user_id => p_user_id, p_action => 'publish', p_error => 'You''re not allowed to pulish/unpublish this item' ); if p_revision_id is not null then -- Is this revision already the live revision? select count(*) into v_no_change_p from cm_live_revisions where item_id = p_item_id and revision_id = p_revision_id; if v_no_change_p = 0 then -- We're setting a new value, so copy the old value -- to the audit table if there is one. insert into cm_live_revisions_audit ( audit_id, item_id, revision_id, audit_user, audit_date) select cm_audit_sequence.nextval, item_id, revision_id, audit_user, audit_date from cm_live_revisions where item_id = p_item_id; -- If there's an old value, delete it. delete from cm_live_revisions where item_id = p_item_id; -- Insert the row with the new live revision_id insert into cm_live_revisions ( item_id, revision_id, audit_user, audit_date ) values ( p_item_id, p_revision_id, p_user_id, sysdate ); end if; else -- p_revision_id is null. If there's a live revision, we want -- to remove it. -- Save the current live revision, if there is one. insert into cm_live_revisions_audit ( audit_id, item_id, revision_id, audit_user, audit_date) select cm_audit_sequence.nextval, item_id, revision_id, audit_user, audit_date from cm_live_revisions where item_id = p_item_id; delete from cm_live_revisions where item_id = p_item_id; if sql%rowcount = 1 then -- A row was deleted, so a live revision existed. -- Add another row to the audit table to save the user_id -- and date of the time the live revision was removed. insert into cm_live_revisions_audit ( audit_id, item_id, revision_id, audit_user, audit_date ) values ( cm_audit_sequence.nextval, p_item_id, null, p_user_id, sysdate ); end if; end if; end cm_item_set_live_revision;Passing messages from the core plsql to the UI code
Our core plsql won't try to catch and give readable error messages for all errors that might happen. Data type checking is one area where we're not going to bother -- if the UI code sends a string where an integer is required, we let Oracle throw the error because we expect the UI code to do data checking before calling the procedures.
Error messages returned from our core plsql with cm_throw_exception will look exactly like other errors messages returned from oracle, but they shouldn't be treated as other errors. They should be shown to the user so they can know what went wrong (and perhaps fix their input to avoid the problem); errors that result from mistakes like using the wrong data type fall into the category of programming mistakes, and should be written to the error log and brought to the programmer's attention.
The issue is being able to distinguish between the two. Whenever Oracle throws an error, it includes an error code. Trying to violate a unique constraint results in:
ORA-00001: unique constraint violatedTrying to fill an integer field with something other than an integer gives:
ORA-01722: invalid numberThe solution is have all our core messages return with the same error code. Oracle allows us to use any code between 20000 and 20999, so we choose 20000. All errors thrown with cm_throw_exception will use 20000. The source for cm_throw_exception is just:
create or replace procedure cm_throw_exception ( -- -- Throws an exception and causes the calling code to exit. -- p_exception_text in varchar ) as v_exception_code integer := -20000; begin raise_application_error(v_exception_code, p_exception_text); end cm_throw_exception;If someone tries to check out an item that's already checked out, the core plsql will throw an error like:
ORA-20000: That content is checked out by someone else.When the UI code calls a core procedure, it should catch any error thrown; if the result code on the error message is 20000, it should extract the message from the error message and show it to the user so they know what went wrong. cm_core_db_execute is a drop-in replacement for db_execute that followed these rules:
function cm_core_db_execute ($q, $die_p = true) { // Execute a prepared statement. If it results in an error, // see if the error is a message from core, and if so, return // the error to the user or calling program. If it's a normal // error, rethrow the error. @db_execute($q); if (db_error($q)) { // There's an error $error_info = db_error($q); $code = $error_info[code]; $message = $error_info[message]; if ($code == CM_DB_CORE_ERROR_CODE) { // This is a message from the core. $message_from_core = cm_extract_core_message_from_db_error($message); if ($die_p) { // Show the message to the user and stop the page cm_die_with_message_for_user($message_from_core); } else { // The code that called this function wants to handle // the error itself. Return the message from core. return $message_from_core; } } else { // It's an error, but not a message from the core. // Rethrow it. trigger_error($error_info[message]); } } }Keeping our queries simple
Our core plsql comes with two advantages. First, it ensures the system's rules are followed. Second, once the core has been written, it makes development easier for the rest of the developers because there are fewer details for them to keep in their heads. They don't need to remember which table stores the url; they only need to know how to call cm_item_set_url.
But our stored procedures only handle database inserts and updates. Since most sites will have at least 10 times as many statements to fetch data as statements to update data, we need something that does for queries what our stored procedures do for inserts/updates.
We have two goals: (1) to have the ability to make data model changes without breaking the many queries scattered throughout our system; (2) to hide from developers details they don't need to know to accomplish their work.
What don't developers need to know? The largest part of a typical site's code deals with showing content to public users; queries on these pages should return only live content. A developer writing these pages will know their queries should return only live content, but shouldn't need to know how the system determines an item is live. The developer should be thinking "I want to show the headlines for all live stories", not "I want to show the headlines for all stories that have an item_id in cm_live_revisions."
If the developer has to think "... have an item_id in cm_live_revisions", we're not meeting our goals because references to cm_live_revisions will be in all his queries. Not only does this mean we can't change cm_live_revisions without breaking all his queries, but it also increases the chance of a bug allowing unapproved content to be seen by the public because it's easy to forget a join against cm_live_revisions, especially in complicated queries.
The developer will need to know how things relate to each other -- for example, that every piece of content has one url, but can have many writers, etc -- but doesn't need to which tables hold which fields. He doesn't need to know if an attribute is attached at the story level or revision level. To write the code that displays the public version of a story, he needs to know that revision's publication date, but not whether that value is coming from cm_items, cm_revisions, or somewhere else.
The trendy solution: Using a class
Many development teams have found that filling their pages with queries can lead to maintenance problems. Changing a table definition forces them to find and fix all queries that use that table. And the logic contained in queries often get copy-and-pasted from page file to file.
A common solution is to wrap each query with a class. If a page needed the headline, subhead, publication date, and url for a story, the code might look like:
Story myStory = new Story.getLiveStory(revisionId); headline = myStory.getAttribute("headline"); subhead = myStory.getAttribute("subhead"); url = myStory.getAttribute("url"); publication_date = myStory.getAttribute("publication_date");Inside the class would be a query that's executed whenever the instance is created.
select s.headline, s.subhead, i.publication_date, cm_url_for_directory_filename(i.directory_id, i.filename) as url, ... from cmx_r_story s, cm_live_revisions r, cm_items i where s.revision_id = :revision_id and s.revision_id = r.revision_id and r.item_id = i.item_id;If the choice is between calling this class and writing this query every time these fields are needed, the class approach has the two advantages:
- If you change a table name, you only need to make a change in the class. You won't need to search through large amounts of code because the table name will be used in a few classes at most.
- The developer calling this class is able to ignore the difference between item-level attributes and revision-level attributes.
A better solution: Using a view
The class approach has one major disadvantage: We can't use this abstraction within other queries. We can't use this abstraction inside other stored procedures or join against it. This limitation is huge when our queries become every slightly complicated. For example, a page to highlight a photographer's work might include links to all stories that include images by this photographer. The correct query would find all live images by the photographer, then joins to the set of all live stories. There's no class approach to this problem that one would feel good about using.
A better solution is to recreate the same level of abstraction in the database, using a view instead of a class. Here is a view that gives the same two benefits of the class above:
create or replace view cm_vlive_story as select i.*, cm_url_for_directory_filename(i.directory_id, i.filename) as url, r.revision_id, s.headline, s.subhead, s.body from cmx_r_story s, cm_live_revisions r, cm_items i where s.revision_id = r.revision_id and r.item_id = i.item_id;Using this view, the previous query becomes:
select headline, subhead, publication_date, url from cm_vlive_story where revision_id = :revision_idCompare this to the class approach. In both cases, revision_id is used to choose the revision. In both cases, headline, subhead, publication_date, and url are the attributes returned. The only difference is that we're using the view cm_vlive_story instead of the class Story.getLiveStory. But having our abstraction in a view allows us to use this abstraction in joins.
A standard set of views
Since it's best for developers to ignore the difference between item-level and revision-level attributes, the first step is creating a few views which combine the two. We'll include all columns from cm_items and cm_revisions, plus a column with the url (which is calculated using a few plsql functions, but there's no reason for the developer using this information to know this; we'll just make it look like any other column).
The first view is cm_v_content, which returns one row per revision:
create or replace view cm_v_content as select i.item_id, i.type_key, i.short_name, i.department_id, i.directory_id, i.filename, cm_url_for_directory_filename(i.directory_id, i.filename) as url, r.revision_id, r.template_id, r.publication_date, r.audit_user, r.audit_creation_date, r.audit_modified_date, r.state_key, r.state_audit_user, r.state_audit_date from cm_items i, cm_revisions r where i.item_id = r.item_id;Most of the time we don't want one row for every revision. The two most common tasks in our system will be (1) fetching the fields for an item's live revision, and (2) listing all live content of a certain type. In both cases, we want a view that returns the same information as cm_v_content, but include only live revisions. That view is cm_vlive_content:
create or replace view cm_vlive_content as select c.* from cm_live_revisions l, cm_v_content c where l.revision_id = c.revision_id;It's also common to want information on an item's last revision. This view should return the same columns as the previous two views, but return only one row per item.
create or replace view cm_vlast_content as select c.* from (select item_id, max(revision_id) as revision_id from cm_revisions group by item_id) l, cm_v_content c where l.revision_id = c.revision_id;The "view last" view is useful when building a preview mode. Once you have the code that shows the live version of a story to the public, a preview mode can be added with just a few lines of code that chooses the appropriate view:
if ($mode == 'preview') $content_view = 'cm_vlast_content'; else $content_view = 'cm_vlive_content'; $query = " select revision_id from $content_view where item_id = cm_item_for_url(:url)";The next step is building similar views for every content type. These views are the result of joining the three views above with the type's extension tables.
The three views for the 'story' content type are:
create or replace view cmx_v_story as select c.*, s.headline, s.subhead, s.body from cmx_r_story s, cm_v_content c where s.revision_id = c.revision_id; create or replace view cmx_vlive_story as select c.*, s.headline, s.subhead, s.body from cmx_r_story s, cm_vlive_content c where s.revision_id = c.revision_id; create or replace view cmx_vlast_story as select c.*, s.headline, s.subhead, s.body from cmx_r_story s, cm_vlast_content c where s.revision_id = c.revision_id;For the image content type, the three views are:
create or replace view cmx_v_image as select c.*, r.file_num, r.mimetype, r.caption from cmx_r_image r, cm_v_content c where r.revision_id = c.revision_id; create or replace view cmx_vlive_image as select c.*, r.file_num, r.mimetype, r.caption from cmx_r_image r, cm_vlive_content c where r.revision_id = c.revision_id; create or replace view cmx_vlast_image as select c.*, r.file_num, r.mimetype, r.caption from cmx_r_image r, cm_vlast_content c where r.revision_id = c.revision_id;The test of our abstraction is how well it simplifies common queries. The front page of the News department needs a query that lists all live news stories by date. If there is an image in the same directory as the story with the name "teaser.jpg", the image should be displayed next to the headline. If we query directly against the tables, we get:
select s.headline, s.subhead, s.publication_date, s.url as story_url, r.image_width, r.image_height r.url as image_url from (select i.item_id, s.headline, s.subhead, i.publication_date, i.directory_id, cm_url_for_directory_filename(i.directory_id, i.filename) as url from cmx_r_story s, cm_items i, cm_live_revisions r u where s.revision_id = r.revision_id and r.item_id = i.item_id and i.department_id = :department_id) s (select i.directory_id, r.image_width, r.image_height, cm_url_for_directory_filename(i.directory_id, i.filename) as url from cmx_r_image r, cm_live_revisions l, cm_items i where i.item_id = l.item_id and l.revision_id = r.revision_id and i.filename = 'teaser.jpg') i where i.directory_id = i.directory_id(+) order by s.publication_date descUsing our new views, the query becomes just:
select s.headline, s.subhead, s.publication_date, s.url as story_url, i.image_width, r.image_height r.url as image_url from cmx_vlive_story s, (select directory_id, image_width, image_height, url from cmx_vlive_image where filename = 'teaser.jpg') i where i.directory_id = i.directory_id(+) and s.department_id = :department_id order by s.publication_date descNot only does this reduce the length of the query, but the improvement in readability is substantial.
Next: Permissions
Send questions/comments/errors to dvr@dvrodriguez.com