June 13, 2001
I hope that we will have the initial encoding effort for the Electronic Reference Shelf complete by early July. About 500 items are in the database at present; I anticipate that the completed database will have between 800 and 850 items.
As we are moving closer to the point of working on obvious problems in this preliminary version of the database, I thought it would be useful to give a report on what the Dublin Core Metadata Initiative has to say about those problem areas.
As you will remember, I identified two obvious problem areas in the data: (1) how to deal with the broad subject descriptors, and (2) how to deal with the descriptors for the type of information to be found in a resource.
As matters stand, the Dublin Core Metadata Initiative (DCMI) is the best known and most respected of the attempts to standardize the application of metadata to the description of web-based resources. Their website is at: http://dublincore.org/
The DCMI has identified the important data elements (http://dublincore.org/documents/dces/) as:
Title
Creator
Subject [this corresponds to the 1st obvious problem area that I identified]
Description
Publisher
Contributor
Date
Type [this corresponds to the 2nd obvious problem area that I identified]
Format
Identifier
Source
Language
Relation
Coverage [this includes spatial location, and also temporal period, which I have separated in the first version of the data structure for the Electronic Reference Shelf]
Rights
They say “Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.” The TUG E-journals Group is in fact working towards achieving consensus on a set of about 60 broad subject terms that could be chosen from drop-down menus.
In looking at the “Social Justice” area of the Electronic Reference Shelf (and also at some subject discipline pages), it seems to me that we may also have to supplement the broad subject terms by some more specific subject terms in order to be able to construct (from the databases) sub-sections of the “Social Justice” page, and also sub-sections from some discipline pages. Just how to do that seems to me to be an open question, except that an additional field is probably required.
The DCMI has approached this are in layers. They were successful in reaching agreement on a recommendation for the vocabulary to be used at the broadest level. Their recommendation ( http://dublincore.org/documents/dcmi-type-vocabulary/) is sometimes called DCT1 (for Dublin Core, Type Level 1). Here are the terms that they recommend:
Collection [This term refers to a webpage that has many different types of resources on it. It is the best term to use in describing our own discipline pages, since they have links on them to indexes, journals, dictionaries, etc. It is also the best term for a rather large number of the items in the new “Social Justice” area (which now contains over 170 items).]
Dataset
Event
Image
Interactive Resource
Service
Software
Sound
Text
The DCMI also did some work on the 2nd Layer of descriptors (which they sometimes call “sub-types” or the DCT2). They produced a draft recommendation on 2000 September 28 at: http://lcweb.loc.gov/marc/dc/subtypes-20000928.html
NOTE: The recommendation includes one term for Indexes (which broadly covers everything in the Journal Indexes area). It includes several terms that might be useful for the E-Journals area: Journal, Magazine, Newsletter, and Newspaper. The term “Proceedings” may mean conference proceedings, and, if so, would be useful for the E-Text area (as well as Book, Thesis and TechReport).
Useful terms for the Electronic Reference Shelf include: Catalog, Dictionary, Numeric Data, Spectral Data, Statistical Data. But that leaves a very large number of other sections of the Electronic Reference Shelf.
I get the impression that work on this 2nd Layer has become stalled. The Group has, however, compiled a survey of Domain Type Lists used by other agencies. This is dated 2001, April 20 and is found at: http://epub.mimas.ac.uk/DC/domainlists.html
Based on the feeling I get from Dublin Core work on the more specific Type vocabulary, I think that it may be some years before there is anything like an adequate set of “approved” descriptors for the data element Type. We can certainly use the recommendations in the DCT2 Draft, but it is important to understand that this draft may never be approved in this form. Also, in my opinion, it does not make fine enough distinctions to cover the Electronic Reference Shelf adequately.
The DCT1 contains some very broad terms that appear to be useful for the UW Library website in general. There are also some terms in the not-approved DCT2 that appear to be useful for several areas of the UW Library website. But we will probably have to broaden the list in the DCT2 in order to handle the Electronic Reference Shelf adequately.