Source Files?
January 15, 2004 6:20 AM Subscribe
Why (in general) do you need source files? Why can't applications manipulate binary files directly? [more inside.]
Let's say you have an hour long quicktime video file, and for it's entire duration, the date 2003 appears in the upper right corner. As far as I know there's no video manipulation software that will allow me to change that date to 2004 without rerendering the entire video.
Why is this so? I have a very simplistic grasp of binary, but presumably inside the quicktime file, there are a bunch of ones and zeros (encoding the colors of the pixels). Why can't software be written that will reach inside the file and change just those ones and zeros that encode colors for the pixels in the upper-right corner?
This MUST be possible in theory. It's all bits, and bits can be flipped. So what's the stumbling block? It it just that binary code is too difficult for humans to understand, so writing such software would be monumentally difficult?
Incidentally, I'm aware that even if this was possible, it's wouldn't be desirable in most circumstances. The "text" 2003 wouldn't be editable text, it would be pixels in the shape of letter forms. And it would also be overlayed on top of images, so even if you slapped a new bunch of pixel-text (2004) into the place where the old text used to be, the composite would look terrible. My question is more theoretical. And really, video is just an example.
The more general question is: why is it so rare to find programs that can manipulate binary files? If you want to change the wording of some text in a swf file, why do you have to go back to the source fla?
Let's say you have an hour long quicktime video file, and for it's entire duration, the date 2003 appears in the upper right corner. As far as I know there's no video manipulation software that will allow me to change that date to 2004 without rerendering the entire video.
Why is this so? I have a very simplistic grasp of binary, but presumably inside the quicktime file, there are a bunch of ones and zeros (encoding the colors of the pixels). Why can't software be written that will reach inside the file and change just those ones and zeros that encode colors for the pixels in the upper-right corner?
This MUST be possible in theory. It's all bits, and bits can be flipped. So what's the stumbling block? It it just that binary code is too difficult for humans to understand, so writing such software would be monumentally difficult?
Incidentally, I'm aware that even if this was possible, it's wouldn't be desirable in most circumstances. The "text" 2003 wouldn't be editable text, it would be pixels in the shape of letter forms. And it would also be overlayed on top of images, so even if you slapped a new bunch of pixel-text (2004) into the place where the old text used to be, the composite would look terrible. My question is more theoretical. And really, video is just an example.
The more general question is: why is it so rare to find programs that can manipulate binary files? If you want to change the wording of some text in a swf file, why do you have to go back to the source fla?
If the video file only contained the series of bits that needed to be fed into the video buffer to produce the moving image, then, yes, it would be fairly trivial to write a program that imprints a particular overlay onto the video image.
It's more complicated than that, though.
For one thing, let's say your "2003" text didn't have perfectly sharp edges (i.e. it's antialiased). Then you're going to need some sort of edge detection to make sure that you're changing video values within the boundaries of the text. You're also going to need some way to add in the background information so that pixels that were in the "2003" but were not in the "2004" could be built up by looking at what the background is, over time (this could get hairy).
The real problem, though, is encoding. Most files, especially video and audio files, use heavy duty encoding to reduce the filesize. For instance, a wav file may be 10 times larger than a mp3 file that contains similar content. This sort of encoding can't be easily changed without:
1. Decoding the encoded file (sometimes it would be possible to only decode part of the file. sometimes not.).
2. Doing whatever manipulations you want to do.
3. Reencoding the file.
So you see, this sort of manipulation gets hard to do pretty quickly.
On a side note, one of the reasons things like XML get so much hype is that XML data files are (usually) human readable. XML rocks for text and numerical data, but is not so good for image, sound or video.
posted by bshort at 6:40 AM on January 15, 2004
It's more complicated than that, though.
For one thing, let's say your "2003" text didn't have perfectly sharp edges (i.e. it's antialiased). Then you're going to need some sort of edge detection to make sure that you're changing video values within the boundaries of the text. You're also going to need some way to add in the background information so that pixels that were in the "2003" but were not in the "2004" could be built up by looking at what the background is, over time (this could get hairy).
The real problem, though, is encoding. Most files, especially video and audio files, use heavy duty encoding to reduce the filesize. For instance, a wav file may be 10 times larger than a mp3 file that contains similar content. This sort of encoding can't be easily changed without:
1. Decoding the encoded file (sometimes it would be possible to only decode part of the file. sometimes not.).
2. Doing whatever manipulations you want to do.
3. Reencoding the file.
So you see, this sort of manipulation gets hard to do pretty quickly.
On a side note, one of the reasons things like XML get so much hype is that XML data files are (usually) human readable. XML rocks for text and numerical data, but is not so good for image, sound or video.
posted by bshort at 6:40 AM on January 15, 2004
The more general question is: why is it so rare to find programs that can manipulate binary files? If you want to change the wording of some text in a swf file, why do you have to go back to the source fla?
I'm not too sure of swf, but for most binary software you can change some text using a 'hex editor'. I've seen some video games translated from Japanese to English by people without access to the source, by just manipulating the binary file.
The problem is the results might be unpredictable and introduce bugs and crashes. Could you edit some wording in a swf file? Probably. If it was your own program it might be simpler to have the program grab text from a human readable text file, then you wouldn't have to recompile each time you change some text.
posted by bobo123 at 9:53 AM on January 15, 2004
I'm not too sure of swf, but for most binary software you can change some text using a 'hex editor'. I've seen some video games translated from Japanese to English by people without access to the source, by just manipulating the binary file.
The problem is the results might be unpredictable and introduce bugs and crashes. Could you edit some wording in a swf file? Probably. If it was your own program it might be simpler to have the program grab text from a human readable text file, then you wouldn't have to recompile each time you change some text.
posted by bobo123 at 9:53 AM on January 15, 2004
Response by poster: Thanks for all your answers, folks. Bobo, I'd assume that the reason a hex editor would likely introduce bugs and crashes is because the user wouldn't know anything about the structure of the binary file he was editing. Most binary editors are presumable "hacking tools," not made by the vendors of the original software.
But here's a different scenario: how hard would it be for Macromedia to make a program that would let you edit a swf and, say, change some text. Macromedia created the swf format, so presumably they know what would be safe to tamper with and what wouldn't.
I'm imagining a tool that wouldn't show you hex or binary. It would be have a GUI that would list all the text stored in the file and allow you to make changes.
I'm not sure I would actually have use for such a file. I'm just interested in how hard it is for the original programmers to reach into their own binary code.
posted by grumblebee at 12:22 PM on January 15, 2004
But here's a different scenario: how hard would it be for Macromedia to make a program that would let you edit a swf and, say, change some text. Macromedia created the swf format, so presumably they know what would be safe to tamper with and what wouldn't.
I'm imagining a tool that wouldn't show you hex or binary. It would be have a GUI that would list all the text stored in the file and allow you to make changes.
I'm not sure I would actually have use for such a file. I'm just interested in how hard it is for the original programmers to reach into their own binary code.
posted by grumblebee at 12:22 PM on January 15, 2004
I think the problem is that depending on how the video is encoded and compressed, they might do things such as for a particular frame, only encode what has changed since the last frame, in order to save space. Therefore if you change one frame, you need to change the previous frame and the next frame, and the one previous and next to that one, and so on, and can get pretty messy so it's a lot easier to decode the video into a format that is easier to change and then reencode it. Sorry this is just an educated guess based on some CS courses I've taken but I think it's correct.
posted by gyc at 12:56 PM on January 15, 2004
posted by gyc at 12:56 PM on January 15, 2004
I'm not sure I would actually have use for such a file. I'm just interested in how hard it is for the original programmers to reach into their own binary code.
It wouldn't be hard for them to read any given data from an swf file, since they have the spec for the format. I'd guess that they probably already have some in-house functions that can translate human-readable data to and from swf files. This said, I agree that if they were going to make an "swf editor", they would most likely decode the entire file, allow the user to make changes, and then reencode everything to a new file. They probably would not edit the original file in-place. If you know enough about a binary file's format to fully decode it, editing it in-place usually isn't worth the extra trouble. As gyc and bshort note, the amount of extra trouble can become pretty large when you're talking about video files.
By the way, I found a Flash Decompiler that you might like to play around with. I know that the swf thing is just an example, but it might be a fun way to get an idea of just how much a programmer can get out of a binary file, given some idea of the format.
posted by vorfeed at 3:08 PM on January 15, 2004
It wouldn't be hard for them to read any given data from an swf file, since they have the spec for the format. I'd guess that they probably already have some in-house functions that can translate human-readable data to and from swf files. This said, I agree that if they were going to make an "swf editor", they would most likely decode the entire file, allow the user to make changes, and then reencode everything to a new file. They probably would not edit the original file in-place. If you know enough about a binary file's format to fully decode it, editing it in-place usually isn't worth the extra trouble. As gyc and bshort note, the amount of extra trouble can become pretty large when you're talking about video files.
By the way, I found a Flash Decompiler that you might like to play around with. I know that the swf thing is just an example, but it might be a fun way to get an idea of just how much a programmer can get out of a binary file, given some idea of the format.
posted by vorfeed at 3:08 PM on January 15, 2004
In general, a compiler translates source code into binary and in the process makes all sorts of optimizations which may render it unreadable and in fundamental ways different than the more readable source code.
Compilers are optimized for the particular platform they run on and being able to reverse-engineer what the original algorithm was from a long series of arithmetical functions is almost impossible.
That said, it depends on the degree of optimization and data stored within the file (such as text, clips etc) is just that, data, and is more easy to manipulate in a binary than code.
posted by vacapinta at 3:14 PM on January 15, 2004
Compilers are optimized for the particular platform they run on and being able to reverse-engineer what the original algorithm was from a long series of arithmetical functions is almost impossible.
That said, it depends on the degree of optimization and data stored within the file (such as text, clips etc) is just that, data, and is more easy to manipulate in a binary than code.
posted by vacapinta at 3:14 PM on January 15, 2004
The real reasons programs are designed to have separate "source" files are twofold:
1) The source file can contain material that you would not necessarily want to distribute with the final version of the file for a number of reasons, e.g. performance, file size, etc. You might want to compress graphics for distribution, for example, but you sure wouldn't want to keep them in JPEG format while you're working on them, as each load/save cycle would reduce quality. And comments in a compiled program would be dead weight, since the computer ignores them.
2) Having separate source and object file formats provides a modicum of protection for digital content. People can't easily modify your SWF files to claim credit for them, for instance, or steal your proprietary Flash tricks for their own work.
posted by kindall at 3:50 PM on January 15, 2004
1) The source file can contain material that you would not necessarily want to distribute with the final version of the file for a number of reasons, e.g. performance, file size, etc. You might want to compress graphics for distribution, for example, but you sure wouldn't want to keep them in JPEG format while you're working on them, as each load/save cycle would reduce quality. And comments in a compiled program would be dead weight, since the computer ignores them.
2) Having separate source and object file formats provides a modicum of protection for digital content. People can't easily modify your SWF files to claim credit for them, for instance, or steal your proprietary Flash tricks for their own work.
posted by kindall at 3:50 PM on January 15, 2004
But here's a different scenario: how hard would it be for Macromedia to make a program that would let you edit a swf and, say, change some text. Macromedia created the swf format, so presumably they know what would be safe to tamper with and what wouldn't.
How hard it would be depends a lot on the specific structure of the binary file. bobo123 brought up some good points on that.
One specific example: consider a binary file that contains some text you want to change. The text is stored as a series of bytes. The program expects the section of data where your text lives to be packed in a certain way, each entry in a long list of text files to be a specific length and exactly FOO bytes into the binary file. When some part of the program refers to that bit of text you want to change, it looks counts FOO bytes into the file and then reads BAR bytes, the exactly length of the text in question.
Now, assume you go in and change the length of that specific bit of text. Say your new text is three bytes longer than the old text. Now every bit of data in the binary file after that is going to be three bytes out of place. Any attempt to read any of that data is going to lead to nasty corruption.
And the program might still be expecting the text you've changed to be it's original length, which might lead to bad things happening when it tries to manipulate the text you changed.
And maybe the text you changed is used not only as text for being displayed on screen, but also as a bit of unrelated data for some other portion of the program. Even if you replaced the text with some other text of the exact same length (avoiding some of the previous issues), you've fucked the program elsewhere.
And so on. Binary files are, generally, a tool for trading flexibility/accessibility for compactness/security/efficiency. With specific exceptions (raw binary files, such as uncompressed images, uncompressed audio), a binary file isn't made for editing so much as for just Using as-is.
posted by cortex at 4:46 PM on January 15, 2004
How hard it would be depends a lot on the specific structure of the binary file. bobo123 brought up some good points on that.
One specific example: consider a binary file that contains some text you want to change. The text is stored as a series of bytes. The program expects the section of data where your text lives to be packed in a certain way, each entry in a long list of text files to be a specific length and exactly FOO bytes into the binary file. When some part of the program refers to that bit of text you want to change, it looks counts FOO bytes into the file and then reads BAR bytes, the exactly length of the text in question.
Now, assume you go in and change the length of that specific bit of text. Say your new text is three bytes longer than the old text. Now every bit of data in the binary file after that is going to be three bytes out of place. Any attempt to read any of that data is going to lead to nasty corruption.
And the program might still be expecting the text you've changed to be it's original length, which might lead to bad things happening when it tries to manipulate the text you changed.
And maybe the text you changed is used not only as text for being displayed on screen, but also as a bit of unrelated data for some other portion of the program. Even if you replaced the text with some other text of the exact same length (avoiding some of the previous issues), you've fucked the program elsewhere.
And so on. Binary files are, generally, a tool for trading flexibility/accessibility for compactness/security/efficiency. With specific exceptions (raw binary files, such as uncompressed images, uncompressed audio), a binary file isn't made for editing so much as for just Using as-is.
posted by cortex at 4:46 PM on January 15, 2004
All valid points above, but I think the most significant reason is just that if there is a source format for some file, most people are going to be working with that, and so the demand for an object-file-manipulating tool is small. The source format is designed to be manipulable and understandable, so it's almost always going to be easier to perform a given manipulation on the source than on the object.
With respect to Flash, I remember some parts of the SWF format from reading a reverse engineering project's notes, and it wouldn't be impossibly difficult to write something to tweak a Flash file. The most general solution is probably to write a decompiler, though.
With respect to video, I think mplayer can add overlays while 'displaying' into a file (I know it can do each operation independently, I assume it can combine them). You'd probably lose some quality by decoding and recoding the video, unless one or both of the formats were lossless, but you could add the overlay.
Some formats, like PDF, are designed to be easy to manipulate in certain ways.
In any case, any object file is designed to be read by some software somewhere (a video player, a Flash plugin, the OS that runs a program, etc.), and so that software, and that software's author(s), must understand the structure of the file.
posted by hattifattener at 12:50 AM on January 16, 2004
With respect to Flash, I remember some parts of the SWF format from reading a reverse engineering project's notes, and it wouldn't be impossibly difficult to write something to tweak a Flash file. The most general solution is probably to write a decompiler, though.
With respect to video, I think mplayer can add overlays while 'displaying' into a file (I know it can do each operation independently, I assume it can combine them). You'd probably lose some quality by decoding and recoding the video, unless one or both of the formats were lossless, but you could add the overlay.
Some formats, like PDF, are designed to be easy to manipulate in certain ways.
In any case, any object file is designed to be read by some software somewhere (a video player, a Flash plugin, the OS that runs a program, etc.), and so that software, and that software's author(s), must understand the structure of the file.
posted by hattifattener at 12:50 AM on January 16, 2004
« Older Low cost temporary internet access in Arizona | Mass Conversion of Two Years of Docs from... Newer »
This thread is closed to new comments.
If a human is involved somewhere in this, then the data must presented in a human readable form. The designer decides whether that data comes from a source file or from a translated binary file.
posted by mischief at 6:37 AM on January 15, 2004