# Multimedia Fingerprinting - Collusion-resistant, c-secure, Traitor Tracing Codes, Random Distortion-Constrained Codes

### fingerprints set distribution unique

*William Luh and Deepa Kundur
Texas A&M University, Texas, USA*

**Definition:** A digital fingerprint is a buyer-dependent serial number that is inconspicuously embedded in the multimedia, such that it is used to trace the original legal buyer of the multimedia.

The ever-increasing consumption of digital data, such as digital images, video and audio, has lead to a number of security threats on multimedia content. Of these threats, multimedia piracy has dominated the technical, legal and business communities due to its impact on intellectual property protection, its influence on public policy, and its power to adapt business models. Multimedia piracy is defined as the unlawful copying, tampering, distribution, or downloading of multimedia content. The goals of multimedia pirates, who conduct the act of piracy, may include: (1) unrightfully claiming ownership of the intellectual property; (2) altering the multimedia data thus creating a misrepresentation of semantic content; (3) illegally selling or freely distributing the multimedia resulting in a loss of profits for members of the content distribution chain. The first two malicious objectives are popularly countered through the use of multimedia encryption, digital signatures, and robust and fragile watermarking methods. In this article, we present the problem of *multimedia fingerprinting,* a counterattack to the third goal.

Although it is possible to prevent the illegal selling or free distribution of multimedia through copy-protection on hardware consoles, multimedia fingerprinting is a cheaper approach that detects such illegal activities *after* they have taken place, or *passively deters* such illegal acts from taking place. Since detection of piracy occurs after-the-fact and by third parties often representing the content owners, consumers do not bear the burden of purchasing expensive tamper-proof hardware consoles.

Wagner introduced fingerprinting in 1983. In multimedia fingerprinting, a digital fingerprint is a buyer-dependent serial number that is inconspicuously embedded in the multimedia, such that it is used to trace the original legal buyer of the multimedia. In this way, the original buyer is deterred from illegally distributing her fingerprinted multimedia. A general weakness of digital fingerprinting occurs when a coalition of pirates compares their uniquely fingerprinted multimedia to exploit the differences amongst their unique fingerprints in hopes of detecting, removing, or altering the fingerprint so as to evade being traced, and at the same time possibly frame an innocent buyer. This attack is known as collusion. Successful digital fingerprints will partially survive specific collusion attacks and correctly identify the pirate(s). The method of identifying pirate(s) is also called the tracing algorithm as depicted in Figure 1. buyer. This attack is known as collusion. Successful digital fingerprints will partially survive specific collusion attacks and correctly identify the pirate(s). The method of identifying pirate(s) is also called the tracing algorithm as depicted in Figure 1.

Collusion has been the main research challenge in the realm of fingerprinting, and this article presents trend techniques in curbing this attack. Another area of interest pertaining to fingerprinting is the efficient distribution of fingerprinted multimedia. This article presents various methods of fingerprinting distribution.

## Collusion-resistant, c-secure, Traitor Tracing Codes

In this paradigm a *codeword,* which is an element of a *codebook,* is used as a watermark; this watermark is embedded into the host media via watermarking techniques. What differentiates this from robust and fragile watermarking is that the codebook is deigned such that its codewords are resistive to specific collusion attacks. Figure 2 depicts the codebook paradigm.

Formally, a codebook with *N* codewords of length *L* over an alphabet *S* is defined as G = { *w* 1 , *w* 2 ,*W N* }, such that for all codewords *i, w i = x i,l x i,2 x i,L* and *x i,j* are in *S* (in this notation the subscript *i* denotes the *i* th codeword, while the subscript *j* denotes the alphabet position in a codeword). In other words, the codewords are strings created by concatenating alphabets from the finite set *S* .

To define *collusion-resistant codebooks,* let the collusion attack be represented by a function *C* that takes a subset of codewords, and outputs a word of length *L* . Also define a tracing algorithm *A* to be a function that takes a word of length *L* , and outputs another word of length *L* . The codebook G is said to be *c* -collusion-resistant (or *c-secure* ) with *e* -error, for *0* *< e < 1* , if for any subset *S* of *G* with cardinality no greater than *c, Pr[A(C(S)) in P(S)] > 1 – e,* where *P(S)* is the power set of *S* excluding the null set. The codebook *G* is said to be *c-frameproof* if *C(S)* is a codeword then *C(S) in S* . This formal definition states that the tracing algorithm is capable of identifying at least one member of the malicious coalition given a word created by the coalition’s collusion attack function.

To make the problem of deriving a collusion-resistant codebook tractable, restrictions or assumptions are often placed on the collusion function *C* . A popular restriction known as the *Marking Assumption* , creates a word *C(S)* = *y 1 y 2* … *y L* , where for fixed position *j, y j* = *z* if *x i,j* = *z* (in which case position *j* is said to be *undetectable* ) for all codewords *w i in S* , otherwise *y j* can be any alphabet in *S* or an alphabet not in *S* , which is denoted*. Codebooks that are resistive to such collusion attacks are termed *fingerprinting codes* . On the other hand, if *y j* must be in *S* , then codebooks resistive to this more restrictive collusion attack are termed *traitor tracing codes* . Some common collusion attacks that obey the Marking Assumption are: majority voting ( *y j* = MODE, where MODE gives the most common value in the set); random ( *y j* = *x i,j* with probability *1/c* for all *w i in S* ). It is proven in that e must be greater than *0* , and the codebook design must employ an element of randomness under the Marking Assumption attack.

Some techniques for creating fingerprinting codes and tracing algorithms can be found in. Some techniques for creating traitor tracing codes and tracing algorithms can be found in. Much of the research in fingerprinting and traitor tracing codes has revolved around creating large codebooks with short codeword lengths, robustness against large collation sizes, small probability of error, and fast tracing algorithms. In all these cases, the codebook is designed independently of the multimedia, and only embedded into the multimedia through watermarking.

## Random Distortion-Constrained Codes

Another popular design is to randomly choose fingerprints according to some distribution. Some techniques and analysis for designing Gaussian fingerprints under different attack scenarios can be found in. Collusion attacks on Gaussian codebooks include *linear attacks* such as *averaging, weighted averaging* , or *additive multiple access channel* (MAC), and *nonlinear attacks* such as the order statistics: min, max, median, minmax, modified negative, randomized negative in. The only constraint on these collusion attacks is that the multimedia resulting from the attack should be perceptually similar to the pristine multimedia (i.e. the distortion-constraint), and hence the Marking Assumption and its variants do not apply.

Under various scenarios of this problem, information theoretic results on the fingerprinting capacity have also received some interest. In all cases, as the number of pirates increases, the ability to trace the pirates becomes increasing difficult. For example in, it is shown that the maximum amount of information pertaining to the fingerprints that survives after a distortion-constrained collusion attack is proportional to *O* ( *1/c* ), where *c* is the number of pirates.

## Joint Source Fingerprinting

The *joint source fingerprinting* (JSF) paradigm describes a method of designing fingerprints by directly using the source multimedia, hence the name joint source. This is in contrast with the two techniques above, in which the fingerprints are designed/chosen independently of the multimedia and then hidden in the multimedia. Another difference between the JSF and the two aforementioned techniques is that popular collusion attacks, such as averaging, result in perceptually degraded multimedia when the JSF is used, but does not when the aforementioned techniques are used. This property forces the pirates to employ more costly attacks to create a perceptually distortionless multimedia.

In, images are randomly warped (i.e. rotated) to create fingerprinted images. When these uniquely fingerprinted images are averaged, the result is a blurry image. In this method the original multimedia is directly used to create fingerprinted multimedia and a common collusion attack results in perceptual degradation. In video frames are used as fingerprints. When pirates average or randomly choose pixels between their copies, the resulting video is perceptually degraded.

The JSF methodology begins by decomposing the multimedia into two sets: the semantic set and the feature set. The semantic set is comprised of elements that result in a coarse representation of the original multimedia, and the feature set contains the details/features that when combined with the semantic set, result in the pristine multimedia. Fingerprinting begins by selecting subsets from the feature set such that these subsets when combined with the semantic set result in a perceptually distortionless multimedia. These subsets of the feature set are the fingerprints. The JSF methodology is depicted in Figure 3.

When a coalition of pirates compares their fingerprinted multimedia, the coalition can detect the fingerprints. However, removing all their fingerprints will result in the semantic set, which is a coarse representation of the multimedia, and hence valueless. In contrast, if the pirates include all their fingerprints in the attack on the multimedia, all the pirates will be detected. Finally, if the fingerprints are designed correctly, a common collusion attack such as averaging results in perceptual degradation.

## Fingerprinting Distribution

Upon creating *N* uniquely fingerprinted multimedia, the next task is to deliver these to the end-users over a communications network. The simplest and least efficient method is to send the *N* copies to the *N* different users. The ratio of the number of unique copies sent through the network to the number of unique fingerprints is thus equal to *1* .

In a more efficient distribution means is proposed. Two distinctly watermarked media are produced where each media is partitioned into *q* pieces, so that there are a total of 2q pieces between the two watermarked media. These *2q* pieces are encrypted with *2q* unique keys, and then transmitted to the end users. Each end user has a unique set of *q* keys that will only decrypt *q* pieces, resulting in uniquely fingerprinted multimedia. This distribution scheme is outlined in Figure 4. This technique is well suited for fingerprinting schemes that use a codebook built over the binary alphabet, since the two distinctly watermarked media represent two alphabets. Since two uniquely watermarked media are sent over the network, the ratio of the number of unique copies sent to the number of unique fingerprints is *2/N* , which is more efficient than the previous scheme.

The distribution of JSF fingerprinted multimedia differs from the above approach and may be more efficient depending on some parameters. The semantic set is encrypted and broadcast to all users. Since the semantic set is much smaller than the original multimedia, distribution of the semantic set is equivalent to the distribution of less than one copy through the network. The fingerprints are then encrypted for individual end users and either broadcast to all users, upon the receipt of which users can decrypt their own fingerprints, or, privately transmitted to each individual. Since the fingerprints are drawn from the feature set, approximately one copy in total is sent through the network. The ratio of the number of unique copies sent to the number of unique fingerprints is approximately *1/N* which is more efficient than the previous scheme. A distribution method for Gaussian codewords that is similar in spirit can be found in.

Another interesting approach to fingerprinting and efficient distribution relies on tamperproof proprietary players. In, a watermark *w* is inserted into multimedia that is required to be purchased legally in order to be played in the proprietary player. On the other hand multimedia that may be distributed freely without purchasing does not have such a watermark and can be played in the proprietary player. Identical copies of the watermarked multimedia are transmitted to all users. To play the protected multimedia, a user must supply the player with a unique key issued to him. If the unique key contains the watermark *w* , then the player is permitted to play the multimedia. Hence the goal of a coalition of pirates is to remove the watermark *w* so that anyone can play the multimedia. In trying to remove the watermark *w* , the pirates leave unique traces of their unique keys in the attacked multimedia, and these unique traces can be used to uniquely identify the pirates. Since only one watermarked multimedia is broadcast to all end users, the ratio of the number of unique copies sent to the number of unique keys is *1/N* . Figure 5 depicts this fingerprinting and distribution method.

Another distribution method involves fingerprinting multimedia at intermediate nodes in a network instead of at the source or the destination in the network. In this scheme the intermediate nodes are trusted and assumed not to be malicious. The desired multimedia is uniquely watermarked several times in proportion to the number of end users. These uniquely watermarked multimedia are then packetized. Hence for each part of the multimedia there exists a set of similar packets corresponding to that part of the multimedia. The entire set of packets corresponding to a part of the multimedia is sent through the network. However at each intermediate node, only some of the packets from that set are chosen to continue their journey to a specific end user. By following this methodology all the way to each destination node, each end user will receive slightly different packets for each part of the multimedia. Thus overall each user receives a uniquely fingerprinted multimedia. These distribution schemes can be found in. Figure 6 captures the spirit of such schemes.

Another distribution scheme involves fingerprinting at the user’s end through the use of a *joint fingerprinting and decryption* (JFD) device that not only decrypts the encrypted media, but also adds a fingerprint; these cryptographic algorithms are also called *chameleon ciphers* . The advantage of this method is that one encrypted copy of the multimedia is sent to all users, hence saving bandwidth. The fingerprints are encapsulated by each user’s private keys. The drawback of these schemes is the tradeoff between bandwidth efficiency and security, with the latter somewhat lowered. Some JFD schemes can be found in. In a lightweight video encryption scheme is used, offering additional computational efficiency. Figure 7 depicts the spirit of the JFD scheme.

## Buyer Protection

In the aforementioned fingerprinting schemes, the goal is to protect merchants against dishonest buyers who illegally distribute their multimedia. These are known as symmetric fingerprinting schemes. On the other hand, to protect buyers from dishonest merchants, protocols are needed in which buyer information is also incorporated into the fingerprinting scheme. In asymmetric fingerprinting, buyer information is disclosed to the merchant and then incorporated into the fingerprinting scheme. In anonymous fingerprinting, the buyer’s information is not disclosed to the merchant but rather to a third party, thus preserving the anonymity of the buyer.

## User Comments