November 24, 2003
"Planning is the key to successful content management, and you can't buy planning in a box. Content management is a deliberate process. This may seem obvious, but let me just make it clear that you have to plan." -- Phil Suh: Content Management Systems
The task of keeping the Library’s web site up-to-date has become problematic. For example, a coop student has been hired to update pages and train library staff in XHTML. It will take at least a year to complete this pass through each of the Library’s 11,600 web pages. It could take another year to run a similar program to introduce Cascading Style Sheets to all the pages in the site. The entire site also needs to be revised to accommodate accessibility legislation. Web technology has grown more complicated. A change in one area often breaks something in another. The Library’s web development staff is spending more time fixing problems which means less time spent on developing new and improved client services. While some database components have been developed and new tools introduced, the handcrafted approach to web site creation and maintenance used back in 1994 remains in place in most areas. In this environment only a substantial increase in resources dedicated to the task can hope to keep the site up-to-date. There is a technology, content management, which promises a solution while also helping to address increasing demand and rising expectations. The purpose of this project was to determine the feasibility of the content management approach for the Library.
The project members analyzed the current web site’s problems, workflows and existing site structure in order to develop a list of functional requirements for a Content Management System (CMS) in the UW Library environment. Using these requirements, a number of commercial and open-source content management systems were briefly reviewed.
The group concluded that a content management approach would substantially improve the efficiency and effectiveness of the Library’s web services. The capabilities and costs of the many CMSs vary wildly but all require degrees of customization in order to give the web site a local brand. The use of the Library’s existing tool set (i.e. ColdFusion) to build an entire customized CMS, although possible, would require a significant investment of the Library’s limited programming resources. Another promising option is offered by commercial and open source content management systems built upon a ColdFusion platform. Acquiring a CMS based upon the ColdFusion platform would provide many of the required options but also allow the leveraging of existing knowledge and expertise of the Library and its TUG (TriUniversities Group of Libraries) partners. This is the key option that the committee recommends that the Library explore.
There are a couple of other options that may warrant investigation. A Zope based content management system name Plone was used to demonstrate the functions of content management to members of the committee. Plone was voted the number one Open Source project in the world in an O’Reilly & Associates/Comdex survey. The Library also has some expertise in Zope. Zope has been providing reliable access to the TUG Electronic Journals database and the LT3 CLOE (Co-operative Learnware Exchange) learnware database for over 2 year. In addition, an announcement by the Open Text Corporation, a cooperating partner of the University of Waterloo, revealed that Open Text has acquired access to the content management systems of IXOS Software AG and Gauss Enterprises AG. Open Text software is made available to the University without charge and is currently used by the Library for indexing purposes. This recent development opens up new possibilities. The Open Text and the Plone options may be worth exploration by the Library. Since content management is a rapidly developing area, until a final direction for development is determined, the Library should keep all its options open.
The group also studied the nature of the content management approach and its impact upon the web delivery operations of an organization, specifying recommendations that address this critical area of implementation.
The Library's Web Site was originally published in 1994 by a nine-member Internet Resources Committee. The site won the OLITA (Ontario Library & Information Technology Association) Innovation award in 1995 as one of the first web-based electronic libraries. Through the implementation of a “buddy” system the number of participants in the creation of the web site was increased substantially and as a result the site grew in breadth and width. The main site currently includes roughly 11,600 documents (not including images and other supplementary files). Although technology has changed since 1994, growing more powerful and at the same time more complex, aside from some improved tools and the implementation of some database services, the same handcrafted approach used to create and maintain the original Web site remains in place.
The Web Operational Management group (WebOps) was formed in 2001 to manage the day-to-day technical operations of the Web site and to act as a resource for staff contributing to the site. The handcrafted approach requires considerable technical knowledge to publish even a simple page. Besides knowledge of the content, expertise is required in HTML, JavaScript, and server technology (server side includes), as well as the software (Dreamweaver) used to maintain the site. Training is, therefore, a major WebOps concern but much of this effort goes for naught because staff lack the time to master the technology and infrequent use prevents them from becoming familiar with the software. The expertise required will only increase as W3C industry standards such as XHTML and Cascading Style Sheets (CSS) and standards for accessibility are applied to every page on the site.
WebOps has experienced a couple of major revisions to the look and feel of the web site and each implementation was a resource intensive exercise. Updating the entire site to the latest standards requires major training and time commitments of all staff participating in site maintenance. WebOps plans to make multiple passes through the entire web site to tackle XHTML conversion, CCS (Cascading Style Sheet) implementation and conformance to accessibility guidelines. A co-op student has been hired to help with combing through the site for the purpose of upgrading coding standards to XHTML and training participants in validation and use of the XHTML standard. It is expected to take at least a year to complete this pass through all the Library’s 11,600 web pages. Similar efforts will be required for Cascading Style Sheets and accessibility standards. The effort required to keep the Library’s web site up-to-standard is stretching existing resources and will only continue to get worse with the increasing demand and rising expectations being placed upon web services. .
WebOps is the only group that knows "the whole site" and is aware of problems of badly out-of-date content, duplicate content and problems with inconsistency in presentation which can make for difficulty when using the site. The Library's web site, to remain effective, needs to move away from the handcrafted approach. An approach that separates the creation and maintenance of content from its presentational design and its technological functionality, may provide an answer to these challenges. This approach, called “Content Management,” could allow the entire library staff to contribute to the vitality of the site's content by lowering the level of expertise required and freeing the WebOps team to apply its expertise to the larger issues of site design and functionality. The desire to reach a point where the site can be changed and kept clean without repeated scouring of the site page-by-page prompted WebOps to look to the possible use of a Content Management approach for the delivery of the Library’s web site.
In May 2003, WebOps proposed the establishment of a group to look into alternative web delivery methods. The result was a project charter “The Feasibility of a Content Management Approach for the Delivery of the University of Waterloo Library’s Public Web Site.”[1] The primary purpose was to determine whether a content management approach is a viable option for the publication of the Library’s public web site.
In June 2003 the charter was finalized and the group members selected.
The group was charged with producing an interim report with recommendations for submission to the University Librarian by the end of September 2003. The report will be reviewed by the Digital Library Coordination Committee (DLCC), WebOps, ISMC (Information Services Management Committee) and other interested parties. A final revised report will be issued November 30, 2003.
The inaugural meeting of the group was July 23, 2003 and the group continued to meet weekly until the end of September. A further meeting to discuss the interim report was held October 8, 2003. A web site was established[2] to collect documents and links to relevant resources and to maintain the minutes of the group’s activities.
Educating everyone on the nature of content management was the first order of business. Using an existing CMS, a slide show[3] was prepared and presented to the group outlining the advantages and processes needed for the successful implementation of any CMS. Five copies of the book, Content Management Systems by Dave Addey, James Ellis, Phil Suh and David Thiemecker were ordered and arrived the week of August 13. Uses of Content Management by other institutions[4] were investigated and the Movable Type[5] CMS, being explored for possible use at Waterloo, was demonstrated.
Guided by the project’s charter, the groups analyzed the current problems[6] of the UW Library’s web site[7] and the workflows[8] used to maintain it. The structure[9] of the current site and the division of responsibilities were reviewed as were a number of current database delivered services[10]. These efforts led to the development of a list of functional requirements[11] that was used to evaluate a number of existing Content Management Systems.
Using all the information collected, the group examined the data, discussed the implications, and recommended a course of action for the Library. This report is the product of these deliberations.
An effective web site must contain up-to-date information. The site must be easy to navigate and technologically sound to avoid browser compatibility problems for the clients. The site should provide equitable access to all library clients by meeting accessibility standards. And, the site must be flexible enough to easily adapt to changing technologies and rising expectations. The current web site while achieving it goals has areas where information is dated. Browser incompatibilities occasionally arise and although efforts are made to ensure accessibility some areas are woefully inadequate. The flexibility of the current web site is such that these service problems can only be adequately addressed by an increase in personnel resources dedicated to the task.
A more efficient publication technology, for example, is Cascading Style Sheets (CSS). The implementation of this technology will require a major training effort. Each maintainer must become proficient in the technology before it can be used effectively. Visiting all the pages to implement CSS is a massive task. This situation creates an inertia severely limiting the site’s flexibility.
Global changes across the whole web site, aside from server-side-includes, are not possible. For example, although a maintainer using the capabilities of Dreamweaver, may update a URL within their own area of responsibility, if that URL appears other places in the site it may not get updated and a broken link will result. Although monthly Linkbot reports listing problems with broken links and orphan files are issued, the task of cleaning up these problems is time consuming and frequently neglected resulting in a degradation of the quality of the web site as a whole.
Duplication of effort occurs. In response to a need, one or more maintainers may create information resources designed to achieve the same objective. Not only is this a duplication of effort but different approaches to the same information can cause confusion to our users. A system that promotes the reuse of information resources can reduce this confusion and wasted effort.
Timeliness is a problem. Pages are created and once published they are ignored in many cases because of workload constraints. There is just not enough time to regularly review pages. Dated pages degrade the quality of the web site as a whole.
Changes initiated by a maintainer may unwittingly result in technological problems. For example, the addition of a JavaScript capability by a maintainer using a variable name duplicated in the navigation bar rendered the navigation bar inoperable. The result was the publication of pages that did not work. In many cases this type of problem goes undetected until a client complaint is received. The sheer size and complexity of the current web site compounds all these problems. Web development resources are continually being consumed fighting fires rather than developing improved services.
Web site maintainers must have a degree of expertise in HTML and now must update these skills to handle XHTML. With the increased complexity of web publishing, training and retraining is a constant. Maintainers may use Dreamweaver without a thorough knowledge of HTML but end up facing frustration when their pages do not display as expected. Some maintainers trained in Dreamweaver find the software too complex and the learning curve too steep. The problem is compounded when a maintainer only visits their pages infrequently. Training is a constant need but often goes for naught because of the infrequent use of the learned skill.
Each HTML page contains subject content, presentation coding (HTML or XHTML), and may even contain program code such as ColdFusion or JavaScript. The level of expertise required to deal with these pages competently is extensive. It is frequently necessary to coordinate a number of people to launch a webpage when the expertise for these different skills resides in different people. This need for coordination of diverse expertise increases the inertia around change and inhibits the speed with which new information resources can be introduced.
In some areas within the web site the look and feel of pages, although from a similar area of the site, can vary to an extent that a client may believe they have entered a new site. This can disorient a user. The culture of page ownership can result in a variety of designs, some creative and some not, but the variation can confuse the users of the site.
WebOps has published "Standards & Practices for UW Library Webpages" to assist maintainers and introduced the use of “server side include” statements to help maintainers conform to Library webpage design and provide a level of global update. However, contributors still require knowledge of HTML and webpage design to be able to contribute directly to the Library web site. Many have resorted to relying on the help of designated web site maintainers. Others may contribute directly but need help in HTML or design from time to time. To encourage participation of library staff, help distribute the workload, and allow a quicker response to changes in service requirements and new technologies, a more efficient web publishing strategy is required for the Library.
One key to efficient operation is effective workflow design. This is particularly important in the automation of a process. Currently UW Library Web maintainers are assigned responsibility for a work area or directory[12]. Each directory has group permissions and ownership [13]associated with it. You cannot edit a file unless you are part of the UNIX group assigned to that directory. Many of our webpages are edited via Dreamweaver by page owners or their support staff. A few staff members still edit their pages via pico. In one area, at least, an approval mechanism is in place. Once the page is prepared, the page is examined and it must be approved before it is officially published.
In a number of instances code fragments are maintained in separate files and then added to pages at the point of publication. The header or navigation bar used on most public webpages is a prime example. The Journal Indexes service has an HTML fragment in a static HTML page on the computer, lap2 that is edited by the maintainer using pico. Headers are also retrieved from a Guelph web server for those TUG services (Journal Indexes, Ejournals and Electronic Reference) where the display is tailored based upon the source of the requestor. The OCUL Staff Search service also retrieves, for those libraries using the local directory option, headers and footers from around the province. Dreamweaver and pico at Waterloo are the tools used to maintain these fragments and most other webpages.
A different variety of workflows have evolved for the maintenance of the Library’s ColdFusion/Zope web publishing processes. In these processes a maintainer updates a database of metadata using a number of methods. For the TUG Ejournals system a distributed input method using web forms loads a record with a temporary status. These records are examined and then added officially after an edit check. Batch update processes are also occasionally required for the Ejournals system as a result of major changes affecting a large number of records. A batch update process is also available in the OCUL Staff Search service. In most of the webpage-based database maintenance processes, changes, once entered, are published instantaneously. Besides the use of web forms for input there is also an instance where Microsoft Access forms are used and another where PGAdmin, a PostgreSQL database client, is used for updating purposes.
These different workflow scenarios represent the variety of methods currently in use to maintain the UW Library public web site.
To determine the applicability of the content management approach to the UW Library’s public web site, a general analysis of the site was performed. The goal of this effort was to find areas with common characteristics that have a potential for standardization in an automated environment. The site can be broken into two components, database services and those areas delivered from static webpages.
The current database-delivered web services are:
Each of these services represents discrete packages operating out of discrete databases. Metadata is maintained for the resources in each package and pages delivered through templates created either with Zope or ColdFusion. Although content in the form of metadata is kept separate from presentation, the templates are a mix of presentation and transformation coding.
There are an estimated 11, 600 HTML files in 60 directories in the Library’s web site not including images and other supplementary files. Within the web site there are a number of areas where many pages have common characteristics. Within a CMS these areas can potentially be delivered using a common template or templates.
A table[23] was prepared listing directories, number of pages and the groups responsible for the contents of each area.
The Library's Mission Statement was used as a starting point for determining the requirements of a system for the efficient and effective delivery of our web services. The audience for the web site was also determined. It was concluded that the primary audience is the faculty, staff, and students (including co-op, distance education, and special needs students), the secondary audience is users of the government publications depository collection, and the tertiary group is all others who visit the site. With distance education students among our primary clients, off-campus delivery, particularly of commercial access controlled resources, becomes an essential requirement of the system. Similarly for special needs students, accessibility standards must be achieved. Achieving the Library’s goal of meeting client’s needs is the overall objective of the Library and the Library’s web site.
Based on the identified problems and strengths of the UW Library’s web site and the workflows of the existing maintenance processes, a list of functional requirements of a CMS for the UW Library were determined. Please note that the numbers in parentheses at the end of each line represent the priority for each requirement (1 = Essential, 2 = Desirable, 3 = Nice).
Using the requirements above, a select number of commercial and open source content management systems were briefly evaluated. A short list[24] of systems were selected for evaluation. An evaluation was also performed of the Library’s existing tool set (ColdFusion, Apache, PostgreSQL, Dreamweaver) and its suitability for the building of a local customized content management system.
The purpose of these brief evaluations was to see what CMSs were out there, how close they come to meeting the Library’s requirements and at what cost. The fact that “Content Management” is an evolving technology became immediately obvious as a variety of terminology was encountered to describe similar functions. This made the evaluation process difficult. Besides available commercial and open source products, the possible use of the Library’s existing tool set (ColdFusion, Apache, PostgreSQL, Dreamweaver) to build a CMS was explored.
It soon became apparent that many of the existing content management systems, which could meet most of the Library’s requirement, were very expensive. Other systems within the Library’s budget were often simplistic like weblogs and not designed to operate a large and complex site like the Library’s. One other fact became obvious. Any content management system taken out of the box will require significant customization in order to brand it as the UW Library web site.
The use of the Library’s existing tool set to build a complete CMS, although possible, would require a significant investment of the Library’s limited programming resources. Another option is offered by commercial and open source content management systems built on a ColdFusion platform. While many of the functions required by the Library are available in these systems, the fact that they are built upon the same tool set used in the Library would allow easy customization and the potential preservation of some of the Library’s current programs. Most of these systems also appear to be within range of the Library’s budget.
A Zope based content management system called Plone was used to demonstrate to participants the capabilities of content management. Zope is an open source, object oriented web publishing environment which the Library has been using to delivery the TUG Electronic Journals services for over 2 years. Plone is built upon the Zope content management framework. As previously mentioned it has gained an international reputation. The Plone system has demonstrated that it can provide most, if not all, of the Library’s requirements.
The University has an informal, partnership arrangement with the Open Text Corporation. The Library has been using their Livelink software for web site indexing for a number of years. Near the completion of this project, Open Text announced an agreement with IXOS Software AG and Gauss Enterprises AG which essential adds the CMSs of these companies to those offered by Open Text. Very preliminary information indicates that these are sophisticated content management systems. The Open Text agreement might allow access by the University to this software free of charge. The systems of these companies are another option that could be explored by the Library.
There are many content management systems available in the marketplace varying in price and sophistication. Reducing these options to a manageable number, the committee identified ColdFusion based systems as a starting point. Plone, the Zope system, and the systems from Open Text are additional options.
The disciplined separation of content from presentation and publication of that content is known as content management. The primary purpose of this project was to determine whether a content management approach is feasible for the delivery of the Library’s web services and provide some initial direction. Specifically the group was assigned the tasks of analyzing current problems and workflows and determining the requirements of the Library’s web publishing program. With that information in hand the group was charged with exploring a content management approach to determine whether it could meet the Library’s needs and substantially improve the functions of the Library and its services to its clients. It was determined that content management can provide a solution to the problems currently facing the Library’s web publication program.
Planning, as it relates to the implementation of a content management system, is critical. The Library’s web site has grown in importance from the early years until it has become an essential method for the delivery of a wide variety of services. This has made the web site a service that impacts on almost all library staff members. The implementation of a content management system will impact upon all aspects of web publishing and, therefore, have an impact on most staff members. All staff must be brought on board for the successful implementation of a content management approach because it may affect their work. The implementation process must acknowledge the need to acquaint people with the workings and the purpose of content management. From the beginning, feedback should be built into the entire process so that the system is adapted to the organization rather than forcing the organization to adapt to the CMS.
The Library’s web site can be broken down into a number of sections with various groups assigned responsibility for the areas. These groups or content experts, once familiar with the nature of the content management approach, are the natural groups to determine the details regarding template designs and specific workflows within their areas. Engaging these content experts in the development and implementation process ensures that best client-centered approach will be front and center and that staff will be fully cognizant of the benefits of the content management approach.
The evaluation of the content management systems during the study was not exhaustive. Many systems offered basic services and did not meet the Library’s essential requirements. Some of the more sophisticated systems were very expensive. In all cases extensive customization effort is required to create a local brand. To build a fully functional system from scratch would require a significant investment of limited Library programming resources. The ideal would be a fully functional, affordable content management system or a system that would be easily customizable and extensible based upon existing Library technical expertise.
In summary, the group concluded that a content management approach could substantially improve the quality, efficiency, and effectiveness of the Library’s web services. Because a content management approach is transforming, it was concluded that all Library staff must be aware of the concept and those involved in web site maintenance must be intimately involved in the design and implementation process. Finally, to make the evaluation of available content management systems manageable, it is recommended that the Library proceed with detailed evaluations of systems built upon the ColdFusion platform. If a suitable ColdFusion based system is not identified, alternates including Plone and the products of Open Text should be evaluated.
[1] The Feasibility of a Content Management Approach for the Delivery of the University of Waterloo Library’s Public Web Site - http://www.lib.uwaterloo.ca/staff/dlcc/charter.html
[2] CMF - Content Management Feasibility - http://www.lib.uwaterloo.ca/staff/dlcc/cmf.html
[3] Slide Show - http://philip.greenspun.com/wp/display/2041/
[4] Other Institutions and CMS - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/elsewhere/index.html
[5] Movable Type CMS
[6] Site Problems - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/current_problems.html
[7] UW Electronic Library - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/current_problems.html
[8] Workflows - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/workflow.html
[9] Site Structure - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/structure.html
[10] Current Database Services - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/ColdFusionSystems.html
[11] CMS – UW Functional Requirements - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/libcmsreq.html
[12] Work Areas/Directories Responsibilities - http://www.lib.uwaterloo.ca/staff/webmaint/groups.html
[13] Groups - http://www.lib.uwaterloo.ca/staff/webmaint/groups_all.html
[14] ColdFusion Systems - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/ColdFusionSystems.html
[15] Electronic Journals - (System Update)
[16] CLOE (Cooperative Learning Object Exchange) (System Terminated)
[17] Electronic Theses - http://etheses.uwaterloo.ca/
[18] Journal Indexes - http://journal-indexes.uwaterloo.ca/
[19] OCUL Staff Search - http://ocul-staffsearch.uwaterloo.ca/
[20] Scholarly Societies Project - http://ssp-search.uwaterloo.ca/compound.cfm
[21] Electronic Reference - http://testtube.uwaterloo.ca/reference/
[22] Electronic Reserves - http://www.ereserves.uwaterloo.ca/ereservesSearch.cfm
[23] Structure and responsibility table - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/structure.html
[24] Short List of CMSs - http://www.lib.uwaterloo.ca/staff/dlcc/cmf/short_list.html