Safeguard
Digital Libraries
Fred Mintzer, Jeffrey Lotspiech, Norishige Morimoto
IBM Research Division
Yorktown Heights, Almaden, Tokyo
D-Lib Magazine, December 1997
Introduction
In the traditional manufacturer of paper, wet fiber is subjected to
high pressure to expel the moisture. If the press's mold has a slight
pattern, this pattern leaves an imprint in the paper, a watermark, best
viewed under transmitted light. Now, the old word "watermark"
has been borrowed by high technology. Digital watermarks are
imperceptible, or barely perceptible, transformations of digital data;
often the digital data set is a digital multimedia object. While digital
images are most often mentioned in the same breath as digital
watermarking, we note that watermarks can be applied to other forms of
digital data, for example, videos and music.
We will use the term invisible watermarks to describe digital
watermarks that are imperceptible, but which can be extracted
computationally. The term data hiding will be used when the
imperceptible watermarks themselves contain data. Often, the
computations that extract an invisible watermark require a password of
sorts to extract the digital watermark; here, the intent is to restrict
extraction of the watermark to authorized parties. The term watermark
key will be used to describe such a password. However, we strongly
caution the reader to be careful not to confuse watermark keys with
encryption keys; they have quite different properties, as will be
discussed below.
One of our colleagues is fond of saying that the topic of digital
watermarking makes a great graduate student thesis. By this she means
that a single individual can conceive a novel watermarking scheme,
implement it, and evaluate its effectiveness, all without needing a lot
of money or a team of programmers. The ease with which new schemes can
be defined has had a predictable effect on the marketplace: there is a
glut of watermarking schemes, clamoring for mind share. Sometimes the
hyperbole surrounding this technology can be deafening, and can hide its
true implications and applications.
While we are enthusiastic inventors of watermarking schemes, we are
careful not to describe digital watermarking as a panacea. We think of
digital watermarking as one of a triad of technologies (the other two
being encryption and digital signatures), that together offer a
reasonable level of copyright protection. Digital watermarking does not
stand alone. For example, the new U.S. $100 bills have a traditional
watermark, a picture of Ben Franklin. It is ludicrous to think of this
as their only protection against counterfeiting. It does, however,
"raise the bar"; it is one more feature the counterfeiter must
mimic to make a convincing fake. Digital watermarking can raise the bar
in the same way.
Digital watermarking is a relatively new and largely unproven
technology. In the following section, we will discuss a number of
proposed watermarking applications, even while admitting that many
watermarking applications remain unproven, including some that we will
describe. Indeed, the authors of this paper have been known to disagree
about their relative prospects.

Applications for Watermarking
One major application for digital watermarking is to convey ownership
information. Implicit in that word is the idea that there may be an
adversary who may try to misappropriate the material (by removing the
ownership information). This information may take one of two forms: the
watermark may identify the originator of the material, or it may
identify the recipient (the end-user or library) to whom the material
was given. On a separate axis, the watermarks may be visible or
invisible. All four possibilities make sense in the ownership
application, and have been implemented in real systems. Examples:
The idea of watermarking the recipient is perhaps the foremost
application for this technology. Many people wrongly feel that marking
the recipient represents an invasion of privacy. It does not. If the
recipient plays by the rules, the copyrighted material should never be
re-distributed widely. The watermarked material should remain in a
private place, unobservable by outsiders. Only miscreants, for example,
the people who post copyrighted material on the Web, risk exposing their
identities. Of course, there may be cases where recipients are allowed
to show the material to the other people (for example, to insert it in
their own works). In those cases, cryptography can be used to guarantee
that the message in the watermark can only be interpreted by the
material's creator.
The rationale for watermarking the owner needs a little more
explanation. If you are the owner of set of materials, why do you need
to watermark it? Don't you know your own material when you see it? Of
course, you do. The reasons for watermarking your material are more
subtle. A visible watermark can act as an advertisement or as a
restriction. For example, you might be willing to give away
low-resolution, visibly watermarked images for free, but wish to provide
high-resolution unmarked versions of the same images for a fee. Even if
the free copies were of the same resolution as the priced ones, the
visible watermark can dissuade end-users from improperly misusing them.
The rationale for watermarking your own content invisibly is
similarly subtle: you may want to "sniff" the Web
automatically, for example, looking for misappropriations of your
material. Marking allows you to detect your material automatically even
if it has been slightly altered. However, for efficiency, you may put
the same secret watermark key on every item. Alas, this gives the
adversary a large collection upon which to mount a statistically-based
attack; see below.
Several inventors have proposed using ownership watermarks to verify
the authenticity of material. Most invisible watermarks are designed to
be robust -- that is, the watermark robustly survives alterations of the
watermarked data. This application requires an invisible, but fragile
watermark -- one that is destroyed by any attempt to modify the
material. An important question for this application concerns how the
detector works. If every end-user needs a detector, they are susceptible
to reverse engineering; the adversary can learn how to make the secret
fragile watermark. If the detector is located within a secure server,
the secret is presumably safe. However, the server will certainly have
other ways to verify that something is authentic ("this is
mine"). The detractors of authentication watermarks argue that
digital signatures are superior for content authentication; the
advocates of authentication watermarks argue that there are additional
features provided by authentication watermarks which will make this
application viable.
Less problematic, in many ways, is an application we call captioning.
Here, the invisible watermark is embedded in the material together with
associated information: e.g., its name, its author, its date, its point
of contact, etc. Note that there is no adversary in this application:
the embedded information is useful to everyone. A good example for this
application is songs played on the radio. All parties, both the music
owners and the radio stations, are interested in an accurate count of
exactly what gets played on the air. If the songs are inaudibly
watermarked, it enables an automatic "radio listener"
monitoring station in each metropolitan region. In addition to the
watermarking in music, this application is a big interest to TV stations
as well as their sponsors. In local TV stations, the sponsor's mark will
be inserted in some part of a commercial. The local TV station can use
this invisible watermark automatically to detect and log the commercials
that have been played; a third party could even audit it.
Recently, with the advent of digital movies on satellite broadcasts
and Digital Video Disk (DVD) media, the movie studios have become very
interested in watermarking. The application here is to record an
invisible, robust, "never copy", "copy once", or
"no more copy" watermarks in each movie. Every recording
device will be required to detect them, and refuse to record any movie
whose mark prohibits copying. In return, the studios would indemnify the
recording device manufacturers against contributory copyright
infringement suits. As this article is being published, a sub-group of
the Copy Protection Technical Working Group of the DVD Consortium is
evaluating various schemes. If this application is deemed to be useful,
it may very well become the most popular use of watermarking technology
in the public eye. The largest advantage of this watermarking
application is the independence from the technology, protocol, or format
of the distribution. The mark will be there any time the movie is
viewed.

The Diversity of Watermarking Techniques
From the perspective of the content owner, it would be desirable if
there were a single watermarking technique that satisfied all of the
proposed applications. Unfortunately, this is not possible. The
different watermarking applications have different technical
requirements; a great diversity of techniques is needed to satisfy them.
One dimension of the diversity, perceptibility, has already been noted;
some of the applications are effectively satisfied with visible
(perceptible) watermarks, while others require watermarks that are
invisible.
Another dimension of the diversity is robustness. As was noted above,
many of the proposed applications use the watermark to carry ownership
information. For these applications, it is often desired that the
watermark be hard to remove to inhibit its removal by a malicious party.
A lesser level of robustness may be required even when there is no
expectation of malicious removal, since lossy image compression, (such
as JPEG, MPEG), image reduction, and contrast modification are often
part of the process normally used to prepare images for printing (or
display). Hence, even within this category, the needed level of
robustness can vary. Furthermore, there are other applications, such as
content verification, for which maximum fragility (and minimum
robustness) is desired. A discussion of watermarking robustness, as it
applies to still images, is given in {FM}.
Still another dimension of the diversity among watermarking
techniques is the type of multimedia object to be watermarked. All good
data hiding techniques exploit perceptual masking. For example, in the
case of audio, perception of low-volume tones is masked by the presence
of louder tones at slightly different frequencies. Not surprisingly,
different media types, e.g. audio, still images, and video, are subject
to different perceptual masking, and the best watermarking techniques
take advantage of the perceptual masking of the object to be marked. The
representation of the object can also add a dimension to the diversity.
Compressed objects, such as JPEG images, have had a great deal of their
redundancy removed. The sensitivity of a compressed object's appearance
to a change in its data is intrinsically different than it is for an
uncompressed object, and different techniques are needed to best exploit
perceptual masking in the presence of a different sensitivity
relationship.
Other dimensions of the diversity of watermarking techniques concern
the amount of data that is carried and the form of that data. (It turns
out that watermarking can encode more information than just its
"presence" or "absence".) The amount of data that
can be carried by watermarks used for copyright protection is often
small, as this data is carried redundantly to create robustness. The
amount of data that can be carried by watermarks used for applications
such as captioning can be large, as little robustness (or redundancy) is
required. Lastly, we note that a watermark is just digital data, but it
can be used to represent images, numbers, text, other multimedia
objects, or a host of other things.

Resistance to Attacks
In many watermarking applications, it is desired that the watermark
be hard to remove; in other watermarking applications, it is desired
that it be hard to forge a watermark. Let us now assume that a hacker,
Harry, is trying to remove (or forge) a digital watermark for his own
nefarious purposes. What are his chances? Actually, they are often
pretty good. (Examples of some attacks are described in {SC}). Indeed,
the underlying question is often not whether Harry can do it, but
whether it is worth the effort.
First, we recognize that Harry will have access to normal image
processing tools. He will be able to sharpen, smooth, compress, color
adjust, clip, and resize the image. (The analogous tools will be
available for audio and video content.) It is likely that each of these
processes will reduce the strength of the watermark. Note that these
tools and transformations may also be innocently applied by people who
have no idea that the content is watermarked and unknowingly attack the
watermark.
Ironically, the most common operations, clip and resize, create the
most difficult problems for the automatic detection of most watermarks.
Fortunately, automatic detection is not needed in all applications. As
an example, let us suppose I am a publisher, I have discovered a piece
of my content being illegally redistributed on the Web, and I want to
discover which library was the custodian of the particular copy. Here, a
fully-automated procedure is not required. I can look at the item, see
what resizing or rotation has been applied, and manually undo it before
I test for the watermark.
So, Harry can be assumed to possess some general techniques to
attenuate watermarks. He may also have specific information about how
the watermark was applied. Any specific information that Harry has about
the watermark can be used against it in an attack. Indeed, the following
line of attack should always be considered: detect the watermark,
estimate the watermark, invert the watermark application process to undo
it. If Harry knows what the watermark is and how it was applied, we
should assume he may exactly remove it.
To add a degree of unpredictability to the watermark, many schemes
apply the watermark under a key, a randomization of the source noise
pattern (or some other secret) that must be known in the detector before
it can read the watermark. This term "key" is borrowed from
cryptography, but any self-respecting cryptographer would wince to see
it used in this context. A cryptographic key is a secret that is
extremely difficult to calculate even if you:
Completely understand the cryptographic algorithm, and
Have enormous amounts of data encrypted in that key to examine, and even
Have examples of data encrypted in that key for which you know the
unencrypted version.
Today's watermark keys could never survive such scrutiny. Even worse, in
some applications, the detector is widely distributed to the end-users.
This is true in many instances of the authenticity application, and it
is true of the "do not copy" watermarks used in devices for
recording movies. In these cases, Harry can obtain the detector and
reverse-engineer the key. But perhaps the detector is in hardware or
obfuscated software and too difficult for Harry to understand? He can
also accumulate large amounts of content watermarked with the same key
and hope that statistical analysis will reveal the key. Or, in another
attack described by Cox and Linnartz {IC}, Harry uses the detector as a
"black box." By performing numerous carefully-designed tests
on a "just barely watermarked" copy, and by making small
changes to see if the watermark disappears, in many schemes he is able
to calculate the key. Alternatively, if the watermark key is short,
Harry can just try all the possibilities. Or, if the key is circulated
to end users, Harry may simply acquire it from one of them.
Nonetheless, many inventors make sweeping claims about the security
of their particular watermark. At the heart of most of these claims is
an assumption that the watermark cannot be detected because it is not
possible to distinguish between small "noise" introduced by a
watermark from the naturally occurring noise in the content. Often, this
assumption is easy to disprove. Merely ask the proponent: if the content
is compressed, does it get larger after the watermark has been applied?
If it does, there is at least one model -- namely, whatever model the
compression scheme uses -- which would help an attacker decide between
watermark noise and natural noise. Of course, the indication might be
very weak, so this alone rarely yields a productive attack. It does,
however, discredit most "proofs" of absolute watermark
security.
And what about collusion? What if Harry has a bunch of friends, and
they all have differently watermarked copies of the same item? The
simple average of all these copies starts to approach the true, unmarked
item. Now weak attacks that normally destroy too much of the image,
suddenly become more fruitful. The value of the watermark is already
radically attenuated by the averaging, and so much less needs to be done
to completely eliminate it.
So where do all these attacks lead? The wildest proponents of digital
watermarking seem to envision a world where watermarks will be used to
establish guilt in a court of law -- the new digital DNA. (One company
has even taken out a trademark to evoke this phrase.) If watermarks are
not proof safe against attack, then a watermark can always be refuted in
court by claiming it to be a forgery1. There is an even more fundamental
question, raised by Cynthia Dwork {CD}. Does the presence of my
watermark on a piece of pirate content mean that I have necessarily done
anything wrong? For example, at our site everyone's files are on the
network. I routinely share all my files with my close colleagues. What
copyright have I violated by putting my legitimately obtained copy of a
piece of content in my file system? (In my site I have no other place to
put it, anyway.) Of course, my colleagues, if they were to copy the item
from me, might be violating a copyright. But I claim I have not done
anything wrong; my watermark in their possession is not a proof of my
guilt.
In the end, we feel the primary value of watermarks depend not on
legal proofs, not on technical security measures, but instead on
economic terms. I do not have to prove something in a court of law to
sever an ongoing voluntary relationship -- to revoke somebody's library
card, for example. That is the threat. The attacker's side, similarly,
is driven by economic terms. Harry can eliminate a watermark, but it may
take him more effort than the value he will obtain by doing so.

Specific Watermarking Technologies
Visible Image Watermark
The visible image watermark, available with IBM Digital Library,
embeds a visible mark onto a gray or color photographic image. An
example of a visibly watermarked image is given in Figure 1, which shows
a page of a Vatican Library manuscript, darkened with a watermark that
was modelled on the Vatican Library's seal. This technique was developed
at the request of the Vatican Library as part of a project that made
images of their manuscripts available through the Internet {FM2}; here
the intent was to make clear, to all who would see the images, that they
were the property of the Vatican Library, without detracting from their
utility for scholarship. This use of the watermark, like a copyright
notice, identifies the ownership of materials and reminds viewers of
their limited copying rights. We note that the visible watermark has
also been used to mark images owned by the Klau Library of Hebrew Union
College, as discussed in {HG}; here the intent was to provide a
reference to the Klau library within the images, so that anyone desiring
to see the scanned manuscripts would know where they might be found.
This watermark has several features that distinguish it from other
visible watermarking techniques. One constrains the watermarking process
to change only the brightness, and not the color, of the image to be
marked; this is intended to make the watermark less obtrusive. Another
uses a model of the human visual system to adjust the prominence of
applied watermarks; this is intended to produce more uniformly prominent
watermarks when the watermarking is applied in batch mode. A description
of the technique is given in {GB1}.
Reversible Visible Watermarks
Another form of visible image watermarking developed at the IBM Tokyo
Research Laboratory is called Reversible Visible Watermarking for
applications such as on-line content distribution. Here, the image is
marked with a Reversible Visible Watermark before distribution or
posting on the Internet, and the watermarked image content serves as a
"teaser" that users may view or obtain for free. Then, the
watermark can be removed to recreate the unmarked image by using a
"vaccine" program that is available for an additional fee.
Figure 4. An image marked with a reversible visible watermark.
Fragile Image Watermarks
IBM is investigating multiple techniques for fragile image
watermarking that would determine whether an image has been altered
since the time when it was watermarked. The targeted applications for
this "image authentication" include detection of altered (or
replaced) image content within a digital library, and the "secure
digital camera." We will mention two such techniques.
Both techniques require an image-specific authentication key to
extract the watermark from the watermarked image. This makes it more
difficult for a malicious party to detect or estimate the watermark in a
watermarked image (which could lead to it being inserted in altered
content to falsify the no-change condition). Both techniques permit the
display of the extracted watermarked as an image for visual
authentication, and both permit automatic authentication. Both can
localize the changes that have taken place in an altered image.
One technique, developed at the IBM Tokyo Research Laboratory, is
used to detect the presence of tampering in an image. A layer of robust
watermark is embedded into the image simply to identify that the image
is an authenticated image, then another layer of fragile watermark is
embedded on top of the same image, which is designed to be extremely
sensitive to the alteration of the image. The first layer tells the user
to check the second layer; the second layer acts as an "alarm"
that rings if the image has been tampered with.
Another technique {MY}, developed at the Watson Research Center, uses
error diffusion to preserve the color content of an image as it
undergoes watermarking; this can be important if the images are to be
used in a color-critical application. This technique also retains a
partial watermark if the watermarked image is cropped; the extraction
process can determine whether the remaining portion has otherwise been
altered.
Robust Image Watermarks
IBM is also investigating multiple techniques for robust image
watermarking that would apply watermarks that could later help identify
the owner or recipient of an image. We will briefly discuss two such
techniques.
Both techniques require an image-specific watermark key to extract
the watermark from the watermarked image. This makes it more difficult
for a malicious party to detect or estimate the watermark (which could
lead to it being removed to delete evidence of ownership). Both
techniques insert the watermark data many times; this redundancy permits
the watermark extraction to work more reliably. Neither technique
requires that an unmarked image be present in order to extract a
watermark.
One variation, a suite of technologies called DataHidingTM, was
developed at the IBM Tokyo Research Laboratory to allow users to embed
invisible digital data into the digital content. The target media range
from still image, to video, and to audio data. This suite of
technologies has a great deal of flexibility in the amount of data that
can be embedded (as well as the level of robustness). Also, the data can
be automatically extracted and detected without human observation or
interaction.
In still image and video DataHidingTM, the data to be embedded is
converted to a binary bitstream and embedded into the image by altering
the luminance level of the pixels following a set of pre-defined rules.
The level of alteration to each pixel is based on baseband image
analysis (baseband means "before compression") to ensure the
preservation of the image quality. Data embedded by using this algorithm
was verified to survive normal image processing as well as JPEG
compression rations up to 1:30. For video DataHidingTM, embedding 4-bits
of control data, was verified to survive a combination of normal video
processing, MPEG compression, digital-to-analog-to-digital video
conversion, as well as the recording to analog video tape.
The other variation is based on a single technique {GB2}, developed
at the IBM Watson Research Center, that modulates the brightness of the
image's pixels with a random noise field to embed the watermark. The
color of the pixels is not altered in the watermarking so that image
color is preserved; this may be important if the images are to be used
in a color-critical application. When a watermark is extracted from a
watermarked image, the watermark is normally displayed; even of the
watermark has been damaged by processing applied to the watermarked
image, a visual inspection will quickly verify the resemblance of the
inserted and extracted watermarks. In limited experiments, this
technique has been verified to produce watermarks that survive printing
and re-scanning, reduction by a factor of two, and JPEG compression by a
factor of 15.

Conclusions
Digital watermarking is an exciting new field: It is exciting for
researchers because it is a new field and there is an opportunity to do
pioneering work. It is exciting for entertainment companies, museums and
libraries because it offers the promise of better protecting their
multimedia content from piracy. It is exciting for consumers because
better multimedia protection could lead to cheaper, better, and more
freely available entertainment and educational materials.
However, the excitement about the promise of watermarking should not
mask the state of its fulfillment. In spite of the exaggerated claims
often made about digital watermarking, it is a new and largely unplowed
field. Many applications have been proposed for watermarking; most of
them remain unproven. Few careful examinations of the technical
requirements of the proposed applications have been undertaken. A common
application requirement is that the watermark resist attacks that would
remove it (or insert a false watermark). Some watermarks, described in
their advertising as being attack-resistant, may be accidentally removed
by unintended attacks such as cropping, reduction, or compression. Other
techniques exhibit greater (but differing) degrees of effectiveness in
resisting attacks, but all offer limited resistance. Few watermarking
techniques have been tested by a talented and well-motivated attacker.
But increasing the cost of wrongdoing is a potentially powerful
incentive to act properly.
Even though we point out that digital watermarking has many
limitations, we do not believe watermarks are without merit or
importance. Indeed, much of the value of watermarking may be gained by
watermarking content with an imperfect scheme. Creating the possibility
that wrongdoing may be identified is an incentive to act properly, even
when there is no certainty that wrongdoing will be identified.
We, and our IBM colleagues, have listened to many media content
owners who have proposed watermarking applications, and we have proposed
some scenarios ourselves; a number of these proposed applications were
discussed above. We have considered the technical requirements of these
applications and created watermarking techniques to address them.
Compared with other techniques described in the literature, we believe
ours to be relatively effective, if imperfect, in addressing the
requirements of the applications.
We feel obligated to emphasize that digital watermarking does not
stand alone. It as a one of a triad of technologies (the other two being
encryption and digital signatures) that together can offer a reasonable
level of copyright protection at a reasonable cost. Because of the many
protection and watermarking options available, addressing an application
is not a simple matter. The technical requirements of the specific
application should first be studied. Then, the watermarking technique
(or techniques) most appropriate to that application should be chosen
and used in combination with the other protection technologies that best
meet the needs of that application. No tailor would sell the same suit
to all customers; nor should we.

Bibliography
{CD}
C. Dwork, "Copyright? Protection?", in The Mathematics of
Information Coding, edited by Cybenko, O'Leary, and Rissanen, (to
appear).
{FM}
F. Mintzer, G.B. Braudaway, and M.M. Yeung, "Effective and
Ineffective Digital Watermarks," Proceedings of IEEE ICIP'97, Santa
Barbara, CA, Oct. 1997.
{FM2}
F.C. Mintzer, et. al., "Towards On-Line Worldwide Access to Vatican
Library Materials," IBM J. of Res. and Develop, pp. 139-162, March
1996.
{GB1}
G. Braudaway, K.A. Magerlein, and F. Mintzer, "Protecting Publicly
Available Images with a Visible Image Watermark," IS&T/SPIE
Symposium on Elect. Imaging Sci. and Tech., Proceedings of Symposium on
Optical Security and Counterfeit Deterrence Techniques, San Jose, CA,
Feb. 1996.
{GB2}
G. Braudaway, "Protecting Publicly Available Images with an
Invisible Image Watermark," Proceedings of IEEE ICIP'97, Santa
Barbara, CA, Oct. 1997.
{HB}
H. Berghel, "Watermarking Cyberspace", Communications of the
ACM, Vol. 40, No. 11, November 1997.
{HG}
H.M. Gladney, F.C. Mintzer, and F. Schiattarella, "Safeguarding
Digital Library Contents and Users: Digital Images of Treasured
Antiquities," D-Lib Magazine, http://www.dlib.org/dlib/july97/vatican/07gladney.html,
July 1997.
{IC}
Ingemar J. Cox, Jean-Paul M.G. Linnartz, "Public watermarks and
resistance to tampering", IEEE International Conference on Image
Processing, Oct. 1997.
{MY}
M.M Yeung and F. Mintzer, "An Invisible Watermarking Technique for
Image Verification," Proceedings of IEEE ICIP'97, Santa Barbara,
CA, Oct. 1997.
{SC}
S. Craver, N. Memon, B.L. Yeo, and M.M Yeung, "Resolving Rightful
Ownership with Invisible Watermarking Techniques: Limitations, Attacks
and Implications," IEEE JSAC, March 1997.
Footnotes:
It is even possible to forge a mark ex post facto. Many schemes can be
run in reverse: using the naturally occurring noise in an image, it is
possible to calculate a key that looks like a valid watermark. Such a
watermark could say anything: an evil publisher could assert that a
piece of content belongs to him, when in fact it is not.




|
New
Software
FACE
ACE
Watermark, Photo Protection Software for Windows



Screen Shots
WEBSNATCH
Picture
Ripper, Automatic Image Download Software for Windows



|
 |