Copy Detection, Message Hiding and DRM

The media industry and opposing commercial and other interests

The media content industry sells copyrighted books, other print publications, packaged software, TV programs, music recordings and movies distributed using various means. This industry has a commercial interest vested in preventing copying of content in a manner unauthorised by copyright holders.

The computer hardware, networking and much of the electronics industry has a commercial interest vested in manufacturing and selling devices and systems which can copy packets of information quickly, accurately and reliably. Copying digital information is what computers, cameras, set-top boxes, networks, routers, storage and recording devices do. Most computer software is developed not for sale, but for use, research or education. Those developing software for purposes other than its sale have an interest in sharing its development cost with other interested parties by making it freely available so that anyone can copy it and help develop and test it.

Therefore concerning the issue of whether copying should be fast or slow, easy or difficult, accurate or impossible, free or strictly controlled, the commercial interests of these 2 multi-billion dollar industries conflict.

DRM is referred to by the content industry as Digital Rights Management. Those opposed to the use of this technology sometimes refer to it as Digital Restrictions Management.

Copy Detection and Tracking


Detecting unauthorised copying is often more practical than preventing it. An example might be use of digital photography on a website. A photographic image library which sells photographs for publications might be able to scan websites within a particular industry, and automatically compare image files downloaded against their own files. However, someone taking and using an unauthorised copy might have edited the image, e.g. cropping it or changing the colour balance, or over or under exposing it slightly. Even deliberately changing a single pixel to a slightly different shade on a digital copy will result in a different file checksum or hash, resulting in no match being detected using an automated system which relies on checksums or hashes of whole files.

Hidden Errors

Those publishing content which can be legally reproduced by others at a cost will often want to be able to prove when their copyright was misused. Such copyrights will include:

Hidden errors can be deliberately introduced into such material, e.g. changing the last digit by one in a single logarithm in a book of logarithms. Someone who computes all the values themselves can be assumed unlikely to make this error by accident. For someone to find where such an error was, they would have to recompute the table of logarithms themselves, which would defeat the purpose of copying a compilation made by another publisher.

A thesaurus can be used to substitute a few words in a text with synonyms and these changes will enable an exact text copy to be detected. Someone who uses OCR to scan both and then using a file comparison program such as diff will be able to detect the changes.


A more sophisticated solution to the copy detection problem is known as steganography which means the art of hiding secret information, known as stegotext, inside other information or covertext. In many cases the stegotext will be encrypted, making this more difficult to detect.

This is achieved using various means. Examples:


Steganalysis means detecting stegotext messages. This is likely to involve automated and statistical means, e.g. comparing edge effects in similar photographs or music MP3 recordings in suspected covertext files against collections of similar files, e.g. taken using the same make and model of digital camera, or produced using the same MP3 encoder program.

The cost of this analysis, and the facts:

makes this approach unlikely to be cost effective unless there are very good reasons to believe that a relatively small collection of data is likely to contain one or more stegotexts of enough significance.

Canary Trap

The canary trap is a name given to a technique enabling someone creating and distributing a number of slightly different versions of an information package to identify the party responsible for unauthorised disclosure or copying of it. This requires recording the variations in the information package communicated to specific recipients.

This approach might be used to identify the government minister or official responsible for leaking a discussion document to the press. In a situation where relatively few copies of software are distributed only to identifiable customers this would enable unauthorised copies to be tracked to a specific customer.

It can be useful to manage subscriptions to multiple email lists to create a new email address for each one, so that if this address gets into the hands of spammers this might help identify the party that sold the address on. It also enables messages from the list to be discontinued by ending use of the relevant address and rejecting messages sent to it.

Further reading

Digital Watermarking

Network Surveillance of Illegal copying

Detecting when people copy files on the Internet in breach of copyright is an activity engaged in by the RIAA (Recording Industry Association of America) and in the UK by the organisation misleadingly titled Federation against Software Theft (FAST). This name is misleading, because most of their activity is not directed against those stealing software, but is directed against those breaching software copyright. Copyright infringement for personal use is normally a civil offence which is not covered by legislation concerning theft which requires the owner of property to be permanently deprived of their use of the goods concerned.

In general, illegal file sharers make monitoring their activities fairly easy when globally searchable indexes are used concerning materials currently being copied, or made available for copying. Various Peer to Peer file sharing networks have been developed for this purpose and are also attacked by organisations such as the RIAA which are interested in preventing this activity from occurring. For example, The Bit Torrent peer to peer protocol discloses the IP addresses of users participating within any given torrent.

Illegal file sharers intending to reduce the risk of detection are likely to do this by:

  1. Exchanging information using encrypted methods, e.g. by using HTTPS or the SSH secure File Transfer Protocol SFTP
  2. Putting files to be shared on web URLs which are not linked from other web pages.
  3. Excluding automated web search engines e.g. by using the Robots exclusion standard
  4. Using HTTP authentication and providing a password only to friends and family.

Copy Prevention Technologies

The problem: computers are designed to copy

To do its job cost effectively, the digital computer was designed to copy contents of memory between cheaper, larger and slower memory, (starting with punched cards and later magnetic disk), and faster but smaller sized memory ( e.g. magnetic core and later semiconductor RAM. Modern computers also use semiconductor cache memory within the CPU memory which is smaller and more expensive than conventional RAM. This typically extends to 2 levels of CPU cache in addition to the CPU registers which are the fastest and most expensive memory of them all.

For an operating system to load and execute programs these have to be copied in disk blocks or individual instructions from slower large memory into smaller faster memory. The same considerations applies to the data these programs input, with copying data on output being from the small and fast memory to the slow and cheap memory in stages, also speeded up through extensive use of caching which allows parallelism. The design of a cost effective computer therefore requires the copying of programs and data to be a very fast, error-free and integral operation.

Standardised computer networks are also designed to make the job of copying programs and data from one computer to another cheap, fast and efficient; a network exists to copy packets of information from one place to another.

It is difficult to combine knowledge of what digital computers and networks are by design, with the idea that thousands of millions of such systems can be managed in a way that prevents unauthorised copying of copyrighted content, but the media content industry has invested some resources behind this objective. How successful this can ever be remains an open question, but efforts in the past have been largely unsuccessful. There exists a long history of successful breaks in the security of content protection technologies.

Early Software Copy Prevention Schemes

Copy prevention has been an arms race between the copy prevention engineers on the one side, and software users interested either in making illegal copies or in avoiding nuisances created by frequently ill-conceived copy prevention technology on the other. For example, users are exercising legitimate rights to take backup copies of software in case the master copy fails. The software company might no longer exist or be able to provide a new copy in this event, and the continuity of the software users' business may depend upon continuing to be able to use legitimately purchased software. In some cases a software user making use of personal data which comes under the 1998 Data Protection Act has a legal obligation to maintain secure access to the data concerned.

As soon as a new copy prevention scheme is created, technically skilled users will attempt to defeat it. Approaches to illegal copy prevention have included asking users questions which can be answered from the manual assumed to accompany all legitimate copies when this was printed rather than supplied on CD or DVD. This meant that for a user with a legitimate copy it became more convenient to use a cracked copy distributed illegally with the nuisance prompts removed, reducing the incentive to other users to buy legitimate copies.

Other approaches have involved installing non standard software drivers to read information from CDs and floppy disks formatted using non-standard formats, making it difficult for users with legitimate copies to take backups.

License Servers

Some software is designed to call home and register a serial number. An example of this is Windows XP. Alternatively some software used on a network which does not have Internet access might be required to register with a license server which will attempt to prevent more than the licensed number of copies being used simultaneously. Some commentators consider software which "calls home" over the Internet to infringe privacy. Vendors of this category of software will often provide a telephone backup for those using products behind restrictive firewalls or on non-networked computers.

Anyone who has supported software in this category is likely to be aware that a proportion of the support effort has to go into maintaining the license server and the credentials needed to operate the software, and will often consider this kind of approach an expensive nuisance at best. In the experience of the author of this article, having turnkey software unusable after a crashed server was restored from backup media and having to then wait for access to technical support from the software vendor concerned, because the license key file was on different disk sectors resulted in denial of access for some hours to the software which the users had paid for.

Hardware Dongles

A dongle is a hardware device that attaches to a computer to authenticate a piece of software. It is reasonable to assume that the hardware device will be more difficult to copy than the software it authenticates. The downside is that the hardware dongle will cost something and can easily be lost or borrowed and mislaid. It also doesn't protect the software vendor against cracked versions of the software with the dongle authentication disabled (often called warez) which will almost inevitably appear. This approach will ensure that those unwilling to run warez or use illegally reverse engineered dongles will pay to use the product, so this approach may be suited to relatively high-value proprietary software products.

DVDs and DeCSS

Video DVDs use the CSS Content Scrambling System. This uses a weak 40 bit keyed proprietary encryption algorithm which was broken by Jon Johannsen (DVD Jon) and of 2 other anonymous individuals ( Frank Stevenson and Derek Fawcus ? ) who helped him. This resulted in the widespread distribution of unscrambling software known as DeCSS. The simplicity of DeCSS and the fact that this software has purposes which are legitimate has removed the barrier previously restricting the manufacture of multi-region DVD players.

Jon's motivation was that he had purchased DVDs while in the USA which were unplayable at home in Norway due to region encoding - an attempt by the film studios to segment and control the market for movies based on geographical regions. Jon's defence of his actions was based on the view that he broke no law in Norway, where reverse engineering is legal, and that he had a right to view content which he had paid for and use his freedom of speech constitutional rights to help others do the same. Jon was prosecuted for copyright violation but found not guilty.

In practice, illegal copying of DVDs including region encoding had been occurring on a large scale prior to the release of the DeCSS program, so DeCSS did not enable DVD illegal copying so much as remove technical restrictions on consumer product use unsupported by Norwegian law. (The US DMCA was motivated by the desire of the content industry to make reverse engineering of this kind illegal.)

The Sony Music CD Rootkit

Extended Copy Protection (XCP) is the official name given to software developed by the UK firm then known as First 4 Internet which was sold on a number of audio CDs by Sony. In Oct 2005 a security researcher Mark Russinovich released a description of this program as functionally equivalent to a rootkit, in the sense that this program installed on the users computer without end-user authorisation and compromised the security on the computer.

Based on research into DNS cache requests made by this software, which also infringed privacy by reporting usage back to base over the Internet, Dan Kaminsky estimate that 568,000 networks had one or more PCs infected by this rootkit. There was some criticism of anti-virus vendors at the time concerning their failure to include signatures of this software and disinfection routines in their products for some time after the nature of this trojan was published.

The Wikipedia article on this alleges that Sony violated copyrights on some GNU Public Licensed components of this rootkit software. This article also mentions other legal investigations and actions concerning the allegedly unauthorised software modification carried out by this program.

Trusted or Treacherous Computing

The concept of trusted computing involves a computing environment in which all executable components are signed, checked and authorised starting with the initial boot sequence.

For this to work according to the specifications of the Trusted Computing Group this technical approach requires a custom hardware chip known as a Trusted Platform Module (TPM) to be included on the system motherboard.

TPM protected system boot sequence

  1. Hardware TPM module confirms the BIOS checksum. If hardware checksum checking module agrees with BIOS checksum it runs the BIOS code.
  2. The BIOS checksums the bootloader. If it agrees with bootloader checksum this is run.
  3. The bootloader confirms checksums on configuration files, OS kernel and other files needed to complete boot sequence. If these are accepted the OS kernel is loaded and run.
  4. Once the filesystem is loaded all other signed drivers are cryptographically checked.
  5. The kernel checks cryptographic signatures on all other programs and components.
  6. The kernel checks signatures on all other applications which are loaded and executed.

An example of software which supports TPM functionality in connection with Linux is the Trusted GRUB bootloader.

An example of a proprietary system which makes use of TPM functionality and signed executables is the Microsoft XBox . The TPM functionality of this games console enables Microsoft to control the software which can be sold for use on this hardware. Only programs signed with cryptographic keys used by Microsoft can be run. Modified versions of this system which defeat the TPM functionality are prevented from accessing the Xbox Live gaming network.

Richard Stallman founder of the Free Software Foundation and others have been critical of the trusted computing concept. He argues that when the end user does not have access to the encryption keys used to control the software allowed to run on his or her computer that the use of the word "trust" has nothing to do with whether the user can trust the system but whether the party controlling the system through the encryption keys trusts the user of it. Stallman argues that in this situation, "trusted computing" should be renamed as "treacherous computing" because the computer is not acting in the interests of its user but in the interests of the organisation controlling the cryptographic keys determining what the system can be used for.

Version 3 of the GNU Public License is currently in what is expected to be the final stages of revision. This software license is expected to require that any software licensed under its terms include any cryptographic keys required to enable its users to exercise rights in practice as users of free software as defined by the Free Software Foundation.

These rights include the ability to use and obtain source code, to study the software, and to redistribute it including in modified form. All software ( including many significant programming libraries and tools ) which is controlled by the Free Software Foundation will use this license, and any other programs derived from GPLV3 code and distributed will need to be licensed under the same license.

DRM - Digital Rights or Restrictions Management

The DRM term (however you pronounce it) has been given to a collection of technologies intended to enable copyright owners to enforce various terms and conditions. It is alternately known as "Digital Restrictions Management" to emphasise the nuisance very frequently caused when it makes it more difficult or impossible for computer users to carry out the legitimate and non copyright-infringing tasks they intend.

Copyright T+Cs often greatly exceed case and other law concerning the copyright of the content being protected. Some implementations of DRM technology can be relatively easily overcome, so become a minor nuisance or of no effect to those using unencumbered software, but a significant nuisance to those using DRM encumbered software. For example it was easily possible for me to use conventional edit->copy and edit->paste keys using a free-software PDF viewer (Evince) in connection with text in a DRM restricted PDF where the same action is prevented when using closed-source Adobe Acrobat software.

Screenshot showing copy and paste of restricted PDF text using Evince

DRM and Windows Vista

Some light amusement from J.D. "Illiad" Frazer of

More advanced forms of DRM are included within Windows Vista. This sets various flags within content designated as "premium" or "commercial" e.g. a very high resolution movie, which results in data being communicated over the system bus and to display and output devices encrypted. Separate keys are used in connection with software drivers, hardware devices and content files. This design creates additional expense for hardware manufacturers which will either be passed on to consumers or will result in reduced profits or losses if consumers are unwilling to pay premium prices for products reported by other consumers as defective.

Given the expectation and threat that once component vulnerabilities are identified this will lead to contractual and automatic revocation of the keys concerned, and the functionality of the components controlled by specific keys, this leads to interesting possibilities for a denial of service attacks, e.g. through key revocations in response to DRM keys being discovered and distributed by system crackers. We can imagine competition, e.g. between graphics card manufacturers, to result in incentives to engage in engineering activities, (e.g. using subcontractors within jurisdictions where reverse engineering is legal) resulting in the keys controlling the performance of competitors products to be revoked.

Below is a screenshot of a diagram taken from a Microsoft White Paper describing parts of the video premium content protection mechanism within Vista. This paper describes the view presented as simplified.

Vista video protection design image

Further Reading on TC and DRM