Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from A-E

Cataloging and Knowledge Organization - Recording Data about Information Items, Knowledge Organization Systems, Standardized Methods, Producing Files in a Standardized Format

classification subject terms example

If information cannot be found when it is wanted, it cannot be integrated into the world of human knowledge or into an individual’s personal knowledge base. Whether people want to write a newspaper article, complete a project, or learn about a new hobby, they need to be able to find information that relates to what they are doing and to what they want to know. The overall purpose of cataloging and knowledge organization is to help people achieve the goal of finding information as easily as possible when they need it. This goal may seem to be a simple one, but accomplishing it is not necessarily easy or straightforward. For example, the way in which information is described and organized should ideally be consistent within one information medium, compatible with other information media, and predictable and appropriate for different kinds of information in different media. In addition, description and organization of information should be flexible enough to accommodate all the different assumptions, views of the world, and natural languages that human beings currently employ, as well as those that they have employed throughout the history of recorded information.

People have recorded information in many ways and in many forms. One general term for all of these information containers is “item.” Information items can be textual (e.g., books or magazines), nontextual (e.g., paintings or sculptures), or a combination of the two (e.g., musical scores or maps). In addition, these information items can be physically stored in institutions such as libraries, museums, or archives, and/or they can be virtually stored in databases (textual or non-textual) for private use (e.g., within an organization) or for public use (e.g., through the Internet). This variety of possibilities has motivated information professionals to develop standardized and nonstandardized ways of helping people find what they want.

Recording Data about Information Items

Three processes help information professionals create access for users. These are the description of an item, the choice of descriptive elements as access points (i.e., data that may be searched), and the entry of the description into a file that is either manually or electronically searchable.

The description of an information item is a surrogate representation for it. A surrogate record stands for the information item in a manual or an electronic file. The purpose of the description is to allow people to decide whether they want to look at the thing itself. For example, the surrogate description for a book includes both physical characteristics (e.g., number of pages and dimensions) and intellectual characteristics (e.g., title and subject). These and other data elements (e.g., author, publisher, date of publication) help people decide whether they want to read the book. In general, the process of creating a description and assigning access points is known as “cataloging.” The process of creating “metadata” has roughly the same meaning, but it may include how the description is put into machine-readable form, where the item may be found, and/or its relationship with other items. “Resource description” is a similarly broad term for methods of creating surrogates for any kind of item. The surrogate might be a cataloging record, an abstract or summary, or a thumb-nail picture of the item. Clearly, if descriptions of items are standardized and predictable, people will more easily find the information they are looking for because they can make a comprehensive and complete search of an information file.

Charles Cutter (1904) identified three purposes for cataloging: (1) to allow someone to find an item with a known creator, title, or subject, (2) to allow someone to discover what an institution has about a certain topic, and (3) to allow someone to select an appropriate item from among a number of similar ones. These three goals still guide any catalog or other finding aid. The first objective is met by cataloging rules and codes, the second is met by knowledge organization systems, and the third is met by both kinds of systems.

Knowledge Organization Systems

One of the objectives that Cutter (1904) had for a surrogate system was to allow people to find items that have the same topic or subject. The topic of an item is what it is about (e.g., landscape painting, theoretical astrophysics, gardening, or how to fly an airplane). The term “knowledge organization” encompasses different methods for organizing information, but the term is sometimes used for information about a topic or subject. Standardized (i.e., alphabetical systems, classification systems) and nonstandardized methods of specifying subjects have been developed, all of which can be used in both manual and electronic environments to help people retrieve the information they want.

Standardized Methods

A cataloger analyzes an information item to determine its topic and the concepts it uses and then translates the concepts in the analysis into a standardized or controlled vocabulary. Standardized methods of knowledge organization include systems that are primarily displayed alphabetically (e.g., subject-heading systems and thesauri) and systems that are primarily displayed systematically (e.g., classification and ontological systems). These two types of systems are not mutually exclusive because alphabetical systems include classificatory elements, and classificatory systems include alphabetical elements. Both kinds of system are used to organize resources on the Internet (e.g., Beyond Bookmarks) and in nonelectronic information environments.

Subject-heading and thesaural systems are called “controlled vocabularies” because the particular terms the system prefers for expressing each concept are chosen in advance and controlled by the system developers. Searchers are guided to these preferred terms by networks of references that are called the “syndetic structure” of the system. Assigning subject headings to information items is usually called “subject cataloging” and assigning thesaurus terms is usually called “indexing.”

Subject-heading lists provide words and/or phrases that may be used as access points for subjects. Subject-heading lists are often used in libraries and are usually created for knowledge in general. These systems provide networks of terms to describe the subjects in a document. Library of Congress Subject Headings , first published in 1914, is used in many large academic and national libraries in English-speaking countries. Usually, a cataloger gives a book more than one subject heading, and in an online system subject headings can be searched by keywords. That is, the searcher does not have to know the exact form of the subject heading in order to use it for searching.

Thesauri began to be developed in the 1950s. Thesaural systems are similar to subject-heading systems in providing lists of consistent terms that are assigned to an information item by an indexer. Unlike subject-heading systems, however, thesauri are usually created for a particular field. For example, the Art & Architecture Thesaurus , published by the Getty Information Institute, provides access to all kinds of heritage information items (e.g., texts, images, museum materials). In addition, the syndetic structures of thesauri are usually more strictly controlled than those of subject-heading systems, and the terms in them are defined for the particular purposes of that field of knowledge.

Both subject-heading and thesaural systems include codes that describe the relationships of one term to other terms. The most common relationships are “broader term” (BT), “narrower term” (NT), and “related term” (RT). A broader term names a concept that is wider in scope than another. For example, the concept “precipitation” is broader than “snow.” A narrower term names a concept that is more specific. For example, the concept “oak tree” is narrower than “tree.” A related term is associated in some way to the term in question but is neither broader nor narrower in scope. For example, “light” is related to “color” and may interest a searcher who has looked up “color,” but the two terms do not have a broader/narrower hierarchical relationship. In addition, some terms are preferred terms (called “used terms”). Terms not preferred by the system are called “unused terms.” Unused terms are considered synonyms for used terms and cannot be used for searching. For example, “wig” may be a synonym for “hair.” People who look up an unused term (e.g., “wig”) are directed to search with a used term instead (e.g., “hair”).

Controlled vocabularies are useful in information retrieval systems because the terms assigned to information items can be used to search a database. Searching with an assigned term ensures that all the records that have been indexed with that term are retrieved. Certainty that all the relevant records have been found means that a searcher can feel confident that the search was comprehensive. Otherwise, the searcher would have to think of all the possible synonyms of a term in order to be sure that the search was complete.

Classification systems are structured systems that divide some knowledge domain into groups on the basis of likenesses and/or differences among the members of each group. The study of classification dates back at least to the philosophers of ancient Greece. Modern bibliographic classification systems started to appear in the late nineteenth century. In an ideal classification system, the classes are both mutually exclusive and jointly exhaustive. That is, the classes do not overlap (i.e., mutually exclusive), and all the classes taken together encompass all possible content so that nothing is left out (i.e., jointly exhaustive). This ideal cannot be fully achieved because new members of the classes can be discovered or invented at any time. Nevertheless, the ideal can be used to help evaluate classification systems because one can assess the classes for mutual exclusivity and joint exhaustivity.

In North America, most libraries use either the Dewey Decimal Classification or the Library of Congress Classification (in which each class is published separately). Both of these classification systems are called “enumerative systems” because they seek to list all of the possible topics that documents may have. In libraries, classification systems are used both to show the place of a particular topic in the context of the world of knowledge and also to provide a shelf address for each document. On the Internet, classification systems (e.g., DESIRE) often provide an address or hyper-link to the relevant site. Researchers into artificial intelligence have begun to create ontologies (i.e., classification systems) for real-world knowledge so computers can represent contexts, understand human languages, and recognize how things in the world are related to each other.

Most classification systems have a hierarchical structure in which the attributes of a class on a higher level are shared by those on the lower levels. For example, a document about Canadian history in general will not be as detailed on each of its constituent topics (e.g., the Canadian constitution) as a document that deals only with that topic, but a document about the narrower topic will also contain elements of the broader topic. For example, a document about the Canadian constitution will also deal to some extent with Canadian history in general. Unlike subject-heading systems and thesauri, classification systems are displayed structurally, not as an alphabetical list. Each class has a notation that represents the place of the class in the world of knowledge and in the system and that shows its relationships to a hierarchy of other classes. For example, part of the Dewey Decimal Classification schedules for “technology” (with growing specificity) is 600 for technology (applied sciences), 630 for agriculture and related technologies, 636 for animal husbandry, 636.7 for dogs, and 636.8 for cats.

Notation can be numeric, alphabetical, or mixed alphanumeric. For example, the notation for the topic “economics of education” is 338.4337 in the Dewey Decimal Classification and LC65 in the Library of Congress Classification. Hierarchical relationships may also be shown in the notation. For example, in the Dewey Decimal Classification , “Canadian history” is notated as 971, where the 9 stands for “history,” the 7 stands for “North America,” and the 1 stands for “Canada.” The Dewey Decimal Classification notation 971 thus shows that history is a broader concept than North America and that North America is a broader concept than Canada.

One relatively recent development in the creation of classification systems is the construction of faceted systems. Facet theory was developed by Shiyali R. Ranganathan in India and refined in his Colon Classification (1964). Facet analysis divides a subject field into mutually exclusive groups called “facets” and then divides each facet into its constituents. For example, the material facet for furniture would contain terms for the various kinds of materials from which furniture can be made (e.g., wood, metal, cloth, plastic). Each of these terms has its own notation, and notations from different facets can be synthesized to express a complex topic. For example, one might express the topic “red plastic tables” with notational elements from the color, material, and type facets. The idea of facet analysis has also been adopted for the development of thesauri. Its advantage is that all topics do not have to be listed, and a notational subject statement may be built up in a way that is similar to constructing a sentence from component words in a natural language. Another faceted classification system is the Bliss Bibliographic Classification (devised by Henry Evelyn Bliss and edited by Jack Mills and Vanda Broughton), which is based on Ranganathan’s theories and incorporates other advances from modern classification research.

The ability to search a database using notations as search terms means that the searcher does not have to know the human language that is used in the records. For example, using the Dewey Decimal Classification notation 636.8 (“cats”) for searching a database in which each record has been assigned one or more notations will retrieve records in English, Spanish, Chinese, Russian, or any other natural language. The searcher does not have to know the word for “cats” and its synonyms in all these languages. This ability is particularly useful in multilingual information environments.

Nonstandardized Methods

Nonstandardized methods of knowledge organization have been developed and are used for accessing the content of an individual document. An abstract is a brief summary that contains only the most salient points from the document and is often written by a professional abstractor, not by the originator of the document. Abstracts are often included at the beginning of a journal article and, in an electronic environment, these abstracts can be searched to find words in uncontrolled vocabulary that are of interest to the searcher. Individual documents such as books often have an index that refers only to that document and its page numbers. These back-of-the-book indexes are created by professional indexers, and no standardized method has been developed. Each book also has a table of contents that includes the names of chapters and/or sections in order to help readers find what they want. In the case of both abstracts and back-of-the book indexes, searching with an uncontrolled vocabulary means that one can never be certain that all the relevant material has been retrieved or that the search has been comprehensive.

Producing Files in a Standardized Format

Individual surrogate records are entered into a file to create a manual or computerized catalog, list, directory, index, guide, or register that can be searched. In a manual (i.e., printed) file, the display format is usually established by a publisher (e.g., for a book) or by an institution (e.g., for a library catalog). For computerized resources (e.g., a database), information is encoded from descriptive standards such as AACR2, and the way this information is displayed can be customized. To encode information means to make it machine-readable. Institutions or individuals that want to exchange records can do so if they are using the same encoding standard or if a method has been developed to convert one standard format to another. Sharing records increases their accessibility for people who are trying to find information. Standardized encoding formats include, for example, Machine-Readable Cataloging (MARC) and Standard Generalized Markup Language (SGML), which allow data to be displayed in human languages. The MARC format is the oldest encoding standard and is used in many libraries. Markup languages such as SGML permit the structures of many different types of documents to be encoded. They show which elements are structural elements (e.g., a paragraph or a title) and which elements are content elements (e.g., the sentences in the paragraph). In addition, standards can be used to describe each other. For example, MARC records can be encoded with SGML.

Conclusion

Cataloging and knowledge organization systems have been developed to make it easier for people to find what they need within the complex worlds of information and knowledge. These systems are used in all kinds of information environments to improve access to actual and virtual documents in many formats, in many languages, and from many periods of history. The evolution of these systems is ongoing because information professionals are constantly striving to improve access for users of the systems.

Catharsis Theory and Media Effects [next] [back] Catalog Photography

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or