Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Metadata and Digital Photography - Definition and Introduction, Use and requirements, Categories, Terminology, Metadata and retrieval, Standards, Storage and technologies

data information file primary

SIMON MARGULIES
University of Basel

Definition and Introduction

Metadata is usually described as data about data. More accurately defined, metadata is data that contains information about other data—so-called primary data or a resource for future retrieval needs. Metadata relates to primary data by describing and referencing it. This description provides information that is not apparent from the primary data itself. It makes the primary data meaningful and useful. For example, the metadata associated with digital photography is usually not pictorial data embedded in the image file. It could provide information about its creator, about its copyright protection, about the equipment used during the shot, about its contents, about its file format, etc. A picture stored in a computer system can contain metadata about the file in the filename, the file path, a database entry, or a header tag, as well as other metadata.

A definition of metadata from librarians, who have a long experience in cataloging and creating metadata, is: Structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities1.

This definition introduces the purpose of metadata as a characteristic: Much digital data and especially the content of pictures are (and probably will never be) neither searchable by machines nor self-explanatory to human users without additional textual information. Often additional information such as copyright statements, information for display (such as a color profile), or information about the circumstances of the creation also needs to be stored with the primary data. Summarized metadata should guarantee future retrieval, readability, and usability of its primary data. In digital photography this means that all additional non-pictorial data must be stored as metadata to guarantee that images can be found and linked to their photographer, opened and correctly displayed by a computer system, and used by the final observer. Much of this is achieved by structuring the metadata and filling the structure with coherent content. As a result, metadata links different data, as in a library catalogue: If two different primary data match on one of their properties, and if those properties are stored in the structure of the metadata, the different primary data can be linked together. For example, all the books from a certain author can be found in a library or all pictures of a certain date range can be identified in an image archive.

Use and requirements

Different environments and users will require different metadata strategies, since retrieval, readability, and usability can have different significance. Independent of any usage scenario, metadata needs to guarantee retrieval, readability, and usability of the data. Otherwise the metadata is not an adequate data about data.

For example, an archive or a library that performs long-term preservation of pictures and wants to provide access to many users for retrieval or users in a distant future will need to provide its data with a rich set of metadata. Metadata for digital images in such a scenario should provide vast information about the data format and the color profile, along with details about the contents, the creation, the creator, and the coherence of the collection of the picture. As another example, a company dealing with images needs to focus on its workflow and the requirements of its clients, and optimize the metadata for this purpose.

A photographer storing pictures basically needs to care about metadata as much as an archive does, if he wants to preserve and then retrieve the pictures. Naturally, a photographer won’t need extensive content and coherence description when managing a private and familiar collection. If images will be presented on the Internet or shared with other institutions, authors may be well advised to store a copyright notice and content description as metadata to render the images searchable and identifiable. Photographers can thereby profit from the use of metadata for their imaging businesses. Copyright information, password security for displaying the file, and content description should also be added as metadata. Embedding unique image identification as metadata makes it possible to prove ownership and follow the track of an image from the photographer to his client and to other involved parties. The whole workflow can be supported by metadata. It is then possible to record when and where an image file was processed and changed, allowing users to make invoices and complete transactions. Keywords and captions as metadata could render images searchable by content, providing faster access and processing in the production workflow. Using term-based search tools is far more efficient than browsing thumbnails, especially in large image collections.

Categories

Metadata research has so far not produced a uniform way of categorizing metadata. Often metadata are subdivided into descriptive, structural, and administrative metadata. Sometimes a fourth category is added, technical metadata. Basically, metadata categories help users to group parts of the metadata and to better understand its uses and requirements for the primary data.

Descriptive metadata contains data that is needed for resource identification and discovery. All requirements for a successful search and retrieval of the primary data must be fulfilled. The described resource must be identifiable at least by an ID. In a library this ID corresponds to the shelf mark of the resource. This ID must be unique for the domain in which the resource is referred. If this domain is a database, a primary key must be chosen; if the domain is the Internet, the key should correspond to the syntax of the Uniform Resource Identifier (URI), 2 of which Uniform Resource Locators (URLs) are a subset. As the term descriptive suggests, descriptive metadata should also contain information about the content and the coherence of a resource. This is evidenced by a textual description of what a picture displays and why it was taken.

Structural metadata provides data about the context of a resource, if it is part of a complex information object. In this example, a picture could be part of a page of a book and therefore need information about its place on the page and about the book containing all pages. For digital photography this category normally remains secondary.

Administrative metadata are used to manage the object in the information system. This may be copyright and licensing information and also information about the data format and the equipment used to scan the picture. This is why administrative metadata is often split into administrative and technical metadata. This separation makes sense especially in the field of long-term preservation of digital images, because the technical part contains the information needed for the necessary actions to maintain and display the data (such as the data format) and the administrative part contains the information about the actions taken to preserve the data (such as a change log).

Terminology

Metadata consist of four layers, depicted in Figure 53, which describe top down the semantic, the data model, and the syntax of metadata. The combination of these layers and the mechanism of identifying the described resource is called a metadata framework.

Semantics represents different data fields, their structure, and their content. Data fields are named entities of information, such as creator, title, and date. Often these entities are called elements. The data structure defines the organization and the order of the data fields. For example, metadata about a picture in digital photography must provide information about the camera that was used to take the picture, but can store data from only one camera, because only one camera was used. Data fields and data structure offer the possibility of filling in content and thereby creating semantics.

To guarantee the integrity of such metadata, rules for data content and norms for data values must be used consistently. Content rules for metadata define in what form and with what logical constraints data fields can be filled. One rule, for example, could state that a data field called “creator” must be completed with the name followed by the surname of the creator. Likewise, a date could have the form of YYYY-MM-DD or DD-MM-YYYY. Norms for data values are mutually accepted thesauri or predefined terms by the authority of a domain. These norms are to be used filling the data fields. Such norms, data fields, and data structure often result in a complex arrangement of rules defining an interwoven model of a certain domain. The formal conceptualization of such a model is called an ontology in information sciences.

The data model defines how statements about resources can be made. It states the grammar of the metadata. It represents an abstract layer under the semantic and over the syntax of a metadata framework. Usually a relational data model (tuples containing many attributes), hierarchical attribute/value combinations (like in XML) or statements with subject, predicate, and object (like triples in RDF) are used.

The syntax describes the way the semantics and the data model of the metadata are recorded (for example, in Unicode and XML).

The crucial component of a metadata framework is the mechanism of identification of the described resource, because it is part of every other layer, and not only every resource but also every semantic rule, every data model, and every syntax used must be unambiguously identifiable to prevent any possibility of a mix-up between different assets.

Metadata and retrieval

By describing data the person recording the metadata classifies the primary data according to its subjective contexts. This means, that the recording person describes the data in the way he understands and interprets it. The result is a subjectively encoded information—the semantics of the data. The recording process is illustrated in Figure 54.

The classification of metadata is fixed before data is ingested: The stored information is organized by structure and content of the data fields of the metadata. Such classifications depend on the recording person or the recording institution. In a broader sense, the classifications are domain-specific. This means that it is possible for every domain to use a different classification and other keywords to describe the data—their data has different semantics.

A researcher who wants to retrieve stored information has his own idea and understanding of the primary data and its metadata. The researcher needs to have sufficient knowledge about the classification of the data undertaken by the recording person to be able to use the system for retrieving information. In Figure 55 these processes are outlined: The person who is searching gets access to the searched primary data by researching the metadata. This metadata was ingested into the system by the recording person in Figure 54. As mentioned above, the recording person created the metadata by subjectively classifying it. This happens before the search and is the reason that the classification of the primary data and of metadata is marked as “historical” in Figure 55.

Only a preceding consistent generation of metadata (input into data fields taking into account the content rules and norm values) can guarantee a later retrieval of the described primary data. To find the searched data, a researcher needs to have prior knowledge and understanding of the data model and the vocabulary used during the ingest of the data, respectively the ingesting organization should consider the habits of the user searching the data. By mapping his own classification of contexts to the historical classification, someone who is doing a search tries to interpret the historical classification to retrieve the searched data. Knowing how the stored information is structured and what terms were used to describe it are both crucial for a successful retrieval.

If various domains share their data, a researcher needs to semantically link different classifications. This gets easier. The more consistently the metadata was ingested and the more a researcher knows about its classifications and its vocabulary, the more functional the metadata can be. If the researcher knows the semantics of each domain, the mapping of the different classifications will be apparent. The semantics of similar subjects of the domains are closer and their mapping can be done more easily. On the other end of the spectrum, information retrieval in an uncontrolled user community and with uncontrolled data—as on the Internet—is more difficult and error-prone to a defined community, which shares only specific kinds of data and a common interest in accurately describing the data. This is why successful data retrieval can be difficult. To make it easier, the various metadata and their vocabulary should be shared across domains. For that purpose, standards for metadata have been defined.

Standards

Dublin Core embodies the smallest common denominator of a metadata standard by defining only 15 core elements, which can be further defined with element refinements. The first 15 elements embrace title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, and rights 3. Defined especially for Internet resources, this standard is widely used and also suitable for content description of digital images. Standards of the Library of Congress, including MARC, EAD, METS, MIX, and PREMIS, are more widely defined and aimed for an extensive effort in metadata generation. MIX is a schema for technical data elements required to manage digital image collections; PREMIS is a data dictionary. Schemas for core preservation metadata needed to support the long-term preservation of digital materials are valuable references for metadata standards for digital images.4

The Exchangeable Image File Format (Exif) was created by the Japan Electronic Industry Development Association and at the moment of exposure is automatically stored in the file header of the recorded file by modern digital cameras. Metadata tags contain information about the data, the camera settings, the geographical location, and manually added description and copyright information. Although widely used, there is currently no public cooperation or official people behind Exif.

The ICC profile of the International Color Consortium describes the file’s color 5. This standard is used to describe color transformations, because every scanning or displaying device needs to provide color management systems with the information necessary to convert scanned color data between the device’s color space and a device-independent color space, thereby correctly displaying the color of an image. ICC profiles are widely used and supported by various manufacturers and file formats.

The IPTC-NAA standard of the International Press Telecommunications Council 6 is a widely shared standard for describing audiovisual data and is often used to exchange image data between photographers, digital imaging/stock photography companies, and news agencies.

The Extensible Metadata Platform (XMP) from Adobe Systems 7 is implemented into all Adobe products and therefore widely used in the imaging community. It embodies a metadata framework that tries to standardize the creation, processing, and sharing of metadata in publishing workflows. In addition to the ubiquitous availability of Adobe® products in imaging, this technology offers the advantage of being extensible by other standards or private metadata fields.

Storage and technologies

There are two possibilities for storing metadata: inside the file, mostly in the file header, or outside the file, in a database. While the first approach is more secure, the second approach guarantees a faster access, because the metadata can be indexed in the database. For search and retrieval an external metadata database is crucial. Since much metadata (XMP, ICC profile, Exif) is saved automatically in the file header, it is advisable to store all metadata in the file header. Nevertheless all metadata used to provide search and retrieval facilities can easily be exported into a database by an automated process.

For the long-term preservation of digital assets, it is recommended that metadata be stored in the file header, since preserving the metadata in a database would mean preserving the database system containing the metadata anyway. Various file formats, such as TIFF, JPEG, and JPEG2000, support seamless integration of metadata into the file header. XML, the extensible markup language of the W3C 8, is a markup language is used to express almost all available metadata standards. It offers the possibility of arbitrarily defining markup and therefore storing data in a structural form. By defining an XML schema containing the rules as norms of a metadata standard, XML parsers can control how that metadata gets consistently entered. Schemas can be interlinked by using names, spaces, and the defining of mappings for different metadata standards. XML markup is usually generated and stored as plain text documents. Such documents can easily be opened in an editor and remain human-readable. Most metadata standards using XML are defined with a hierarchical data model.

The RDF, or Resource Description Framework of the W3C, is a framework for metadata that has been developed to describe data in a machine-understandable way and to allow semantically based retrieval on the Internet. The aim is to provide a standard data model for metadata. The semantics and the syntax are not fixed by the W3C. URIs must be used for identification. The semantics in this framework are integrated by using namespaces and vocabularies. The data model of RDF is the statement as a triple of subject, predicate and object. The object is either a final value or the subject of another statement. The predicate can further be described by adding other statements. This way RDF provides a high expressiveness by defining classes and rules for statements, and defining machine-understandable ontologies becomes possible. In this fashion, a community can provide software agents with the needed information to deduce meaning and context of different source data and metadata. If something matches the concept picture , then it has an author and a title. Author is of type person and equivalent to creator . A photographer is a sub-conceptualization of an author . Storing an author named John Smith of a certain picture somewhere as metadata and then searching for a picture by the photographer John Smith , would provide a software agent not
only with a possible result, but also a way to deduce that the researcher does not mean John Smith the butcher , who is obviously another person. Possible examples for the syntax of RDF are Turtle and RDF/XML. RDF/XML is an XML representation of the RDF data model; Turtle uses an easy-to-read text notation.

Unicode of the Unicode Consortium 9 is the basis for all textual metadata; in the past, every computer system used its own encoding for characters in plain text files. Unicode is an internationally used norm for translating bytes into characters and therefore an obligatory standard for encoding metadata into computer systems.

Handling metadata

Using the XMP metadata framework in Adobe Photoshop CS means opening the File Info dialog box (Figure 56).

This dialog box allows you to enter basic metadata information. Camera Data 1 (Figure 57) and Camera Data 2 (Figure 58) provide Exif metadata.

IPTC fields are not shipped with the standard version of Photoshop CS, but can be downloaded from the IPTC website and easily integrated into XMP. The File Browser offers an easier interface with a better overview (Figure 59).

Photoshop allows the creation of metadata templates with predefined values for the author and copyright status, and also allows you to apply the templates to multiple images. Modern
computer systems retrieve the XMP metadata directly from the file header and support searching for such images.

Metcalfe, Jane - Overview, Personal Life, Career Details, Chronology: Jane Metcalfe, Social and Economic Impact [next] [back] Messenger, Ruth Ellis (1884–1964) - Medieval Hymns

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

over 6 years ago

harshil