How do pirates pirate?
April 23, 2010 8:31 AM   Subscribe

When a software pirate cracks a program, what exactly does s/he do?

As a layperson, it's easy to understand some forms of piracy: building a torrent of MP3 files ripped from CD can be done with any music program and a torrent client, and discs without security measures can be ripped as .iso files and shared with the medium of choice.

Conversely, say that a new game comes out with a new form of DRM. Within hours, someone has cracked the security and is illegally distributing modified versions of the game that bypasses the DRM. I don't understand how this is possible for closed-source/proprietary products. If the only pieces we have are various bits of compiled code, some in proprietary formats, how does someone explore those files to not only identify how the DRM works, but also modify the actual software to bypass those measures? If the DRM functions by contacting a separate server, how do pirates figure out how to spoof a validation message when the game company presumably does not publicize its DRM mechanisms?

Obviously, I don't want step-by-step information about how to perform illegal activity. I want help understanding concepts of software engineering. Are compiled objects much less opaque than I understand them to be? If we don't know what language the software was coded in or what compiler turned it into software, how do software engineers explore its internal workings? How do programmers modify that software when they don't have the source code?
posted by Lifeson to Computers & Internet (19 answers total) 22 users marked this as a favorite
Here's a simplified example.

Say you have code that is like:

if (! check_drm()) quit();

That gets compiled down to a short list of machine code instructions. One to call the check_drm function, several to do the if statement.

It's not easy, but you can find those statements in the compiled version of the code. Then you change the machine instructions to say if (false), so that the quit() command never happens. To make this change, it requires hex editor, and a fairly detailed knowledge of assembly language.

Modern DRM then fights back against this kind of thing by doing lots of checks throughout the program, calling to the internet to get an encryption key, and other kinds of fixes. The details on how to break each DRM are different, but that's the basis.
posted by cschneid at 8:41 AM on April 23, 2010

I'm at work and, therefore, am not about to Google this, but there are a lot of "white hat" (but are they really?) hacking sites that teach you how to make keygens, how to reverse engineer serial numbers, etc. Some careful searching can probably wield some useful information, and I'm sure there are MeFites that know more about this than I do.
posted by reductiondesign at 8:43 AM on April 23, 2010

At its simplest, a piece of software with copy protection has a segment of (executable) code that says "hey, check and see if this software is legit". By building a wrapper around the executable, you can have another program intercept this signal and give it the "all clear" sign. Or, alternately, you can just rip out the segment of offending code and have the function return the "all clear" sign. In the old days, pirates would distribute "patches", which would operate on the executable files to insert the modified code. Those operate (conceptually) just like legitimate patches / software updates you may be familiar with.
posted by QuantumMeruit at 8:43 AM on April 23, 2010

1. A debugger is a program that runs another program inside itself, and lets the user see the state of the victim program as each instruction in it is executed. Debuggers are not nefarious; they are a vital part of any software development environment.

2. Start the victim program in a debugger, and step through it (that is, execute single or small groups of instructions) until you find the place where DRM is being checked.

3. At that point, patch the program so the DRM checker answers OK to any check request.

4. This is war, so the actual find-and-fix is more complex. But roughly, and for the first many years of computer programs with passwords, registration keys, etc, this is how it was done.
posted by hexatron at 8:46 AM on April 23, 2010 [4 favorites]

I'm not too knowledgeable on the subject but this is what I understand peripherally from hanging out with shady people. I come from an OS X background so take what I say from that angle.

One of the first and most basic steps in cracking most (90%?) programs is to use another program to keep your cracked application from connecting to the internet and contacting the cracked application's developer's server (referred to as "phoning home"). Once you can keep an application from "phoning home" you're in a much better position to crack that software. Now the software can't go back and check that the serials you are entering are valid (if that's how it does it) or report your IP to the developer telling them you cracked their app, or any other unsavory situations such as a forced update by the developer that causes your crack to no longer work.

To answer the other part of your question, one of the reasons cracks become available so quickly and easily is that many developer's systems of checking that your software is legit are mind-blowingly dumb. For example, I've seen applications whose sole method of checking whether your version was registered or not was a simple, plain-text, user-editable file found within the application. It might try and check whether the serial was good at start-up by "phoning home", but because of another program blocking it connecting to the internet, it would fail and run as if it was registered. In a case like that, the solution is as simple as changing a file from reading "IsRegistered = False" to "IsRegistered = True". Many cracks are simply more fancy versions of this kind of tactic. Looking through what the program is doing, and simply finding piece of code that acts as "the switch" to having your program registered, and then flicking that switch.

Then there's the whole matter of keygens and serial numbers and I have no idea how that works and I'm excited to hopefully learn from your question.
posted by ejfox at 8:50 AM on April 23, 2010

Disclaimer: my days of, uh, "post-market value-added software modification" ended somewhere around puberty. For the sake of discussion, I will refer to people engaged in this practice as Engineering Investigators (EIs).

The methods used vary at least as widely as the DRM in question. If a hardware dongle is part of the DRM "solution," you can bet a logic analyzer (digital these days) would be used to get between the dongle and the PC to figure out what is going on. If the DRM involves connecting to a server, an EI would also want to get in the middle of the traffic, this time using a packet sniffer, like Wireshark (formerly Ethereal). Capturing Internet traffic can be fairly useless if the transmissions are properly encrypted, of course. "Proper" is the key word there.

Decompilers are used to take the compiled code back to something partially readable. If you are hardcore/oldschool, you can use a hex editor. You can use tools like Process Monitor to take a snapshot of the game starting up on a virgin system to see what little programs are invoked as the software "calls home" to the DRM servers to get permission from Mommy to play with you. Debuggers can let you see what is going on, at least a bit, under the hood.

Research is important. Papers are published on various DRM schemes and someone who is patient can get a feel for how the system works. Sometimes, knowing the theory can reveal a smaller solution space to explore than expected and you can brute force your way to the keys, like in the famous DeCSSing of DVDs.

Never underestimate the importance of leaked early copies without all of the protection built in, especially software released to vendors. That's how EIs had Windows 7 opened up, at least for a while, before it even hit the shelves.

Hacking the XBox: An Introduction to Reverse Engineering would be a good start to some of the hardware end of things.
posted by adipocere at 8:52 AM on April 23, 2010

Most of my experience with cracking is from years ago back when I was a shareware developer trying to make my apps less hackable, so I won't get into the low level details, but I can address some of the higher-level points.

Are compiled objects much less opaque than I understand them to be?

Basically, yes. Compiled programs, while not in a "human readable" format, still contain all of the logic of the program in a standard format. There's no way to really make code proprietary in a real sense, because at the end of the day the CPU on a random person's PC has to understand what it is supposed to be doing. And if the CPU understands what is going on, a cracker can understand as well. Once someone gets to that point, they can edit out or disable the parts of the program that are related to the copy protection. There are a lot of details about how the cracker gets that done that others have mentioned, but that's the basic underlying reason that nearly any scheme can be cracked.

Although there are some security schemes that can be successful (requiring license checks for online play for example), the only method that really works in general is locking down the hardware and controlling what kinds of things the user can do at the OS level. That's why, for example, crackers aren't able to defeat PS3 copy protection the same way they can with PC games. Really controlling the hardware is easier said than done though, since hardware modifications to defeat such schemes have been common ever since the schemes were introduced. And the protection on the software side still needs to be airtight as well, because often a method can be found to run arbitrary code on a locked down system through software exploits (such as the various jailbreaking hacks on the iPhone).

Then there's the whole matter of keygens and serial numbers and I have no idea how that works and I'm excited to hopefully learn from your question.

One of the easiest methods for creating a keygen is to simply extract the part of the target application's code that does the key check. For example, if after running a debugger the code turns out to be generating the expected key and comparing that to the one entered by the user, you can just pull out that routine and put it in a small program that takes the result and displays it to the screen. That's why it's smarter to have an asymmetric key system, where the code to check a code is different from the code to create it.
posted by burnmp3s at 9:03 AM on April 23, 2010

Are compiled objects much less opaque than I understand them to be? If we don't know what language the software was coded in or what compiler turned it into software, how do software engineers explore its internal workings? How do programmers modify that software when they don't have the source code?

To answer those questions:
1: Most software is distributed in the form of a low-level language that is common to a specific hardware platform and operating system.

2: Software exists to examine the structure of those programs, create logs of what it's doing as it runs, and to decompile/dissassemble those files back into a higher-level language. From there, it's just a matter of looking for common or known DRM patterns.

3: If you understand the low-level language you can (painfully) make small edits to the executable file. Or if the DRM lives in its own library, you can just replace that library file.
posted by KirkJobSluder at 9:08 AM on April 23, 2010

If you don't mind information from about 1997, there's plenty on
posted by Mike1024 at 9:39 AM on April 23, 2010

In the old days of floppy disks on the Apple II, in addition to actually having to go in to the software and remove various kinds of checks, we, erm, I mean people had to deal with non-standard disk formats, where you had to use special software just to read the disk. The sectors could use different low-level formatting, the use of sectors could be eschewed entirely, the file system could be dispensed with, data could be written on half-tracks or quarter-tracks or even in a spiral. (The floppy controller in the Apple II was very minimalist, and functions that would be performed by hardware in many computers were instead performed by software, which meant the low-level disk format was basically entirely under programmer control.) One of the more interesting pieces of kit to get inside such programs was basically a software emulator for the entire computer (including the disk controller) that allowed you to step through the program instruction by instruction as it actually loaded. Of course, it was orders of magnitude slower to boot a disk that way, but it could give you vital clues as to how the disk was laid out. I would imagine today's crackers use virtual machines in much the same way.
posted by kindall at 10:42 AM on April 23, 2010

There are a variety of techniques and it depends on the target. If the software uses well known and previously cracked DRM techniques, the pirate may be able to use an already existing patch. Another technique, as above, is to track the program in a debugger and start patching out the hooks. Some programs are self encrypted and will need to be decrypted before analysis, so that do is done in a debugger usually. Back in the old days, we also used boot tracing (stepping through the boot process in a debugger, patching code as you go) or memory dumping (forcing the computer to halt via an interrupt and then capturing all of memory to disk, then just load it up and resume).

It's variable. The important part is it is always doable and actually fun since it is like solving a difficult puzzle. That's why DRM is generally a waste of money.
posted by chairface at 12:51 PM on April 23, 2010

kindall, I might have you beat.

Back in the day when I had my Atari 800, one significant software publisher's method of copy protection was to run a checksum on their legit 5.25" floppies to look for a couple of "bad sectors" which could not be read conventionally. If they failed the test and could be read, the disk would not boot.

Once we got our hands on a sector level disk editor, we simply did a sector copy of the disks, and then, set our sector editors to write specific necessary "bad" sectors to the disk. We'd put a loop of scotch tape on one corner of the copy floppy that stuck out of the drive door. Prior to re-writing those sectors, we'd pull down fairly hard on the tape loop, which would slow down the physical rotation of the disk enough to create a write error.

Instant bad sector and instant pirated copy.

I hope there's a statute of limitations on this stuff.
posted by imjustsaying at 4:55 PM on April 23, 2010 [1 favorite]

*adjusts walker*
Back in the days of the Commodore 64, there were a number of software disassemblers that would take the binary machine code and convert it to more readable assembler. This only worked if the programmers didn't obfuscate their code. Some disassemblers would even (luxury!) assign labels to often-used memory locations and routines. There were also a few of the hardware memory capture devices (e.g. The Final Cartridge) that chairface alluded to.

There were also the kinds of physical hacks that my friend did. Some software came on diskettes where the last few tracks were written off-centre from the rest of the disks. My friend would manually adjust the drive servo motors to an unaligned state to get the data from those special tracks, then rewrite the loader software to put this in memory and then jump to the position in the main program after the disk check. Hence his nickname, "The Unaligned Head."
posted by Hardcore Poser at 6:07 PM on April 23, 2010

In the beginning, there were no compilers. All software was written directly into machine code by programmers. Higher level languages and compilers just save time and make life easier. The CPU is not a magical device: it has a very well-defined list of instructions that it is able to perform and machine code is just a sequence of those instructions, coupled with data for it to operate on. In theory, given enough time and inclination, a human being with nothing more than a brain and a large enough pad of paper could 'execute' any program simply by stepping through each instruction and simulating what the CPU would do. This is obviously impractical for anything but the most trivial of programs, especially considering that modern computers have operating systems and aren't 'bare metal', meaning all programs make extensive use of system calls and system libraries which would also need 'head simulating'.

So given that it's a theoretical impossibility to prevent someone from being able to do this analysis, it really comes down to making it as hard as possible to pull off. This generally means taking measures to detect when your program is being run in a debugger and refusing to continue or otherwise altering behavior. Likewise on the other side of the fence the problem becomes making debuggers or virtual machines that don't make their presence known.
posted by Rhomboid at 7:04 PM on April 23, 2010

Some tutorials:

Basic idea with most apps is try to register a bogus registration code, take note of the error message, open the app in a hex editor, search for the error message, and follow the tutorials for more info.
posted by hungrysquirrels at 8:43 PM on April 23, 2010

As a professional software engineer, I often work with code that comes from vendors that is flawed. I believe that when I cross the line from my realm to theirs that all is well in the world, but somewhere along the line it goes horribly wrong. An important skill is to be able to cross that line - to step away from the soft, comforting realm of source-level debugging into the cold, harsh reality of assembly/machine language debugging. This enabled me to call up the vendor and report bugs in excruciating detail - not just that something's wrong, but where it's going wrong. The funny thing is that if you do this enough, you can learn to reinterpret code back into the high level language that generated it.

This is the same thing, if you consider DRM to be wrong and assume correctly that the vendor doesn't want you to remove it.
posted by plinth at 4:14 AM on April 24, 2010

Lots of information here:
posted by turkeyphant at 12:35 PM on April 27, 2010


Are compiled objects much less opaque than I understand them to be?

No, they are actually quite opaque. In some cases, the are even purposely obfuscated by the compiler itself. However, you don't need the entire program. You just need a trick or bypass a very small part that is locking you out. If the only check is at the beginning, it is sometimes as simple as loading the startup executable into memory and moving the instruction pointer somewhere further down line. You don't necessarily even have to know where you are, but if you hit a "good" block and instruction continues, you are pretty much done I would think.

If we don't know what language the software was coded in or what compiler turned it into software, how do software engineers explore its internal workings?

Binary <>
How do programmers modify that software when they don't have the source code?

They modify the executable and distribute a "patched" or "cracked" version of said file which bypasses the checks.


A debugger is a program...

Just a thing about wording here. The debuggers I think most of us would be familiar with are symbolic or source level debuggers, and generally target high level source code. Low-level debuggers target machine code, and provide you with pointers to the disassembled code. Often you would use a disassembler or a decompiler to go backwards, from machine code to assembly or even back to some semblance of the original source and reverse engineer it.

Debuggers are not nefarious

This is true, but we are not talking about the same kind of debuggers that are used in most software development. Decompiling copyrighted source code without permission is illegal in most countries, and distributing pirated software is the same.
posted by sophist at 11:19 AM on June 12, 2010

Looks like that got parsed wrong. Should be...

If we don't know what language the software was coded in or what compiler turned it into software, how do software engineers explore its internal workings?

Binary - Assembly - High Level Language. If you start off with a binary you might be able to get some part of the assembly reversed, and look at that. If you want to go higher, you would need to know what language it was written in.
posted by sophist at 11:21 AM on June 12, 2010

« Older looking for a toaster...   |  How to represent a convoluted ... Newer »
This thread is closed to new comments.