Erich Neuhold and Claundia Niederée
Fraunhofer Institute IPSI, Darmstadt, Germany

Definition: Digital Library is an information system targeted towards a specific community, where content from different sources is collected and managed, content is structured and enriched with metadata, and a set of services is offered that makes the content available to a user community via a communication network, typically the Internet.

Digital Libraries are the electronic counterparts of traditional paper libraries, where the digital medium opens new opportunities, especially in the area of improved access support, increased content availability, powerful content interlinking, and reduced costs, but also imposes new challenges like long-term preservation in the context of fast changing storage technologies. Further important challenges are issues of copyright and digital rights management and the cost of digitization for not digitally-born content.

Multimedia Libraries are Digital Libraries, where the managed content is not restricted to the usually mainly textual documents. Such libraries contain, next to the “textual” contents, media types like music, videos, images, maps, and mixtures of different content types (multimedia objects) as they are, for example used in e-Learning or in the documentation of history. Multimedia libraries may also contain content types that were not supported in traditional libraries at all like 3D objects, executable software (e.g. computer games) or callable services. One of the main challenges for a multimedia library is to provide effective access to these types of context (based on adequate indexing) and to provide support for the “real-time” integration of different content types. Some challenges of multimedia libraries are closely related to those of museums and archives that make multimedia representations of their artifacts available online.

This article starts with a discussion of the role of Digital Libraries in mediating between the individual information needs of the members of a community and the vast amount of globally available content. In this context the services provided by a library play a central role. Therefore, search services and further value adding services within a Digital Library are discussed next in the article with a focus on the special requirements of multimedia content. The article closes with some current trends and open research issues in Digital Library technology.

Role of Digital Libraries in Content-to-Community Mediation

A Digital Library mediates between the information needs of its user community and the globally available content. Contributions in four task areas are essential for supporting this mediation (see figure 1 and):

  • Content preselection: The library selects high-quality content potentially relevant for the members of its user community;
  • Content structuring: The library structures the content according to the predominant domain understanding of its user community;
  • Content enrichment: The library enriches content objects with descriptive and value-adding metadata provided by domain experts, librarians, and community members;
  • Library services: Support for content retrieval, access, annotation, etc. enable the identification of relevant material and facilitate access of content and its use by community members as a group or as individuals;

These contributions allow a Digital Library to reduce the gap that exists between the wide variety and large amount of globally available content and specific information needs of individuals and small group within its community. Ideally, many of these contributions should be achieved without or with little human inference. However, for technological reasons, but also for reasons of quality control and trust, human involvement and especially involvement of representatives from the library now and in the future will be essential for these tasks.

Metadata and Search Services

Search services are required for efficiently and effectively finding relevant content in the content collection(s) managed by a Digital Library. There are two approaches for supporting search in Digital Libraries, which may also be combined with each other: On the one hand, methods of information and multimedia retrieval and on the other hand metadata-based search.

Information retrieval is a search approach which is based on a direct analysis of the content objects of a Digital Library. For efficient information retrieval, retrieval indices are automatically built and updated. For (mainly) textual documents indices are built based on an analysis of word occurrences within documents (with some extra processes like stemming, removing stop words, taking into account term frequency, etc.). For multimedia objects, the task of creating useful indices is more demanding: Features like the color distribution for images, that can easily be extracted automatically, are in most cases not the features the users is interested in. The challenge is in finding adequate associations between low-level features like color distributions and high-level features that represent aspects of content objects a user is interested in like e.g. objects displayed in an image, scenes of a movie or genre of a music title. Such association rules are highly domain dependent and they are still subject of current research in multimedia retrieval and will remain so for some time.

A second method of information access is the use of metadata for retrieving relevant content objects. Metadata is data about data, or more precisely in the Digital Library context, metadata are data for describing information objects and for supporting information processes within the respective domain. Such a described information object may be an individual content object like a scientific paper or an image, but also an entire collection and other metadata. Metadata form an additional information layer on top of the content objects, which are managed by a DL, and can be used in retrieving interesting information, in selecting relevant content objects from a search result list (e.g. selecting a paper based on the author of the paper or the proceeding it was published in) and in supporting an improved understanding of content objects (e.g. by taking into account the historical context a film was produced in). The management of metadata is comparable with the management of library catalogues in traditional (paper) libraries.

There is a wide range of different metadata types that can be classified in multiple ways, e.g. according to their purpose or the depth of description (see e.g.). Standardization of metadata formats plays an important role in Digital Libraries. Metadata standards like Dublin Core and MARC enable the re-use of metadata records in different libraries and the development of re-usable services that operate on such standards, and facilitate search across different collections. Additional types of metadata are required for describing multimedia content objects like images, videos etc. This includes, for example information related to the production of a video or the description of objects in an image. Special multimedia metadata standards have been developed like MPEG-7 or MPEG-21. In addition, there are multimedia metadata standards for specific application domains like SCORM for the area of e-learning.

Of course, the separation between content-based information retrieval and metadata-based search services is not that strict. Retrieval indices may be considered as a special type of metadata, information retrieval methods can be applied on some types of metadata like e.g. abstracts or annotations and information retrieval and metadata-based search can be combined to more powerful access methods.

The dialog with the user has been identified as another starting point for improvements in the efficiency of multimedia retrieval. Going beyond entering text into query forms, approaches exist, where the user may formulate his/her information needs in innovative ways that are adapted to the type of content to be retrieved. This includes humming the tune of the music one wants to access as well as painting a sketch of an image or using an example image to retrieve similar pictures.

Mediation between content and community can be further improved by taking into account information about the user in retrieval and information filtering. Such personalization approaches require an adequate model of the user as well as methods for collecting and updating information for the individual user profiles based on this model. User models typically focus on cognitive patterns like interests and skills. Information about the user can be collected explicitly, by asking the user, or implicitly by observing user behavior and by analyzing this data to infer user characteristics like interests. The information in the user profiles is, for example, used to refine queries posed by the user and to give recommendations to the user.

Further retrieval challenges in Digital Libraries are multi- and cross-lingual search, adequate result visualization and structuring as well as federated search that efficiently manages searches over different collections (within one Digital Library or across the boundaries of Digital Libraries). Selection of promising collections, decomposition of queries and combination of query results, which requires duplicate detection and re-ranking of the combined result, are the challenges in federated search.

Digital Library Services beyond Search

In addition to search services, a Digital Library also supports other classes of services. This mainly includes community services, annotation services, and administrative services.

Services for supporting the community beyond search are, for example, services that foster community formation and services for supporting the communication and collaboration between community members (chat, discussion forum etc.). Collaborative filtering services are an option to combine community and search services: Ratings provided by community members and similarities between community members are used to provide recommendations to community members about relevant information objects in the Digital Library. Other community services are services that involve community members into the content collection process by enabling them to include their own content into the Digital Library collection or to build their own private libraries.

Annotation services enable members of the community to add annotations in the form of comments, ratings, etc. to content objects of the Digital Library. In this way, community members may profit from the experiences and expertise of other community members. Advanced annotation services enable the annotation of annotations and support different types of annotations. Annotation can also be used in the retrieval process, since the comments about a document may provide additional information about its content and its reception in the community.

The adequateness and the form of the aforementioned services clearly depend on the size of the community supported by the library. In small, well-connected communities, for example, self-organization and quality assurance might work based on social networks and control. In large communities, however, issues like quality control, intellectual property rights, and mutual trust as well as avoiding information overload and spam become more critical issues and have to be taken into account by the services.

In addition to the services provided to the user, Digital Library also supports administrative processes for the management of the Digital Library. Main goal of this process is to keep the collection focused and attractive for the targeted community. Collection management includes deciding upon which new content to acquire and possibly also when to delete content from the collection (based on an adequate collection strategy), and restructuring the collection, when this is implied by changes in the underlying domain (e.g. new trends). Furthermore, necessary administrative Digital Library processes include the management of users and user groups, digitization of content, creation and acquisition of metadata records, just to name the most prominent. In addition, a careful handling of access rights, copyrights, and intellectual property rights contributes to the “trust” of the community members into the services of a Digital Library.

Many Digital Libraries do not only have to provide efficient and effective access to their content, but also have to take actions for achieving long-term accessibility for content objects. Special services and organizational strategies are required to achieve long term preservation in Digital Libraries due to the fast changes in storage technology. Adequate methods for ensuring long-term preservation in the digital age are still subject of research.

Current Trends in Digital Libraries and Multimedia Libraries

The first generation of Digital Libraries has been built from scratch in an experimental fashion. After a certain understanding has been established about the core functionality of a typical Digital Library, so-called Digital Library management systems like DSpace or Greenstone have been developed that offer basic, out-of the box functionality for managing a Digital Library. Such systems are now available and used in various Digital Library projects. The latest trend in Digital Library technology is a more decentralized, service-oriented approach for Digital Library architectures. The overall goal here is to systematically make Digital Library functionality available to a broader audience, reduce the cost of entry for this technology, to improve flexibility and adaptability and to foster shared and synergetic use of content, metadata, services and other resources. In this context current technological developments like Grid Computing, Web and Grid Services and the Peer-to-Peer computing paradigm are exploited. The project DILIGENT (EU IST-004260), for example, works on building a Grid-based Digital Library infrastructure that enables the on-demand creation of tailored Digital Libraries, so called Virtual Digital Libraries on top of the generic infrastructure. In general, Digital Libraries migrate from centralized systems to dynamic federations of services.

A second trend that was already addressed in the previous part of the article is the offering of additional services beyond search and collection management that reflects a broadened understanding of the role of a Digital Library within a community. This includes community services that support community formation, awareness of a community for trends in the domain and the role of individual within the community as well as services for fostering collaboration in the community. In addition, these are also services that enable community members to take a more active part in content provision and annotation. In summary, the idea is to support the collaborative information processes of the community in a more comprehensive and participative way, migrating from the information access support provided by Digital Libraries to the idea of tailored virtual information and knowledge environments. For research libraries this trend is reflected by current research activities in the area of e-Science.

A third trend in Digital Libraries is the use of Semantic Web technology for intelligent search services. This includes semantic annotation of content objects based on domain ontologies, the use of concepts and ontological knowledge instead of strings in search, and concept-based clustering of query results. Another area of research and development in intelligent search support is to more systematically take context into account. On the one hand, this refers to user context. More comprehensive, ontology-based models of the user and his current situation (including user tasks and relationships a user is involved in) are used to go beyond existing personalization approach. On the other hand the context of an information object can be used to improve retrieval results like, for example, the information a content object is linked with or the annotations about a content object.

over 2 years ago