my new temporally-based compression scheme is unstoppable!
May 10, 2005 2:40 PM   Subscribe

This is one of those questions that will make me look very silly when it's answered, so I'll abandon any pretense of dignity straight off: with the ideas set forth in this comic about talking clip-art dinosaurs, why can't any arbitrarily huge amount of data be transmitted as a simple statement of a date and time?
posted by squidlarkin to Technology (31 answers total)
They can, but you would need as many bits to represent the date and time as you would the original data. For example, if you wanted to go with seconds since (say) the first day of 1 AD, you would have passed 63 billion only around the year 2000. (This is approximate since it doesn't take into account leap days.) 63,000,000,000 and "12:00 AM January 2000" take the same number of bits. (They also take close to the same number of characters, but this is incidental.)
posted by kindall at 2:46 PM on May 10, 2005

The logic suggested (but never quite elaborated) by that comic is that every piece of information essentially has a serial number assigned to it. If both parties already have the entire corpus of possible information, then yes, you could refer to the serial number for a given piece using the technique the dinosaur describes.

The bitch is getting all that information to both places.

If you want to look at it as a compression algorithm, then the algorithm itself is so hugely complex that it is bigger than all the information you would transmit to be decompressed by it.
posted by adamrice at 2:47 PM on May 10, 2005

Well, you may feel stupid asking the question, but I feel stupid reading it. I tend to be very literal, so it's probably my fault, not yours -- but I don't understand your question. Normally, I would just keep quiet. But on the offchance that others share my confusion (or can clarify your meaning for me), I'll dive in. (By the way, I'm pretty sure I understand binary coding and compression. It's the wording of the QUESTION that confuses me.)

You ask if data can be transmitted AS a date and a time. We transmit data AS something other than itself because transmitting it AS itself is difficult or impossible. (i.e. I send my thoughts to my friend AS email because it's impossible for me to send them as thoughts -- my friend isn't psychic.)

So we're talking about representation, right? Can one REPRESENT ideas AS something other than the actual ideas?

And the answer -- correctly given in the comic -- is that any ideas expressible in characters (i.e. text / numbers), and some other ideas (i.e. colors in an image) CAN be represented with just TWO symbols. Another common example, besides binary digits, is morse code. To translate any statement into morse code, you need only two symbols, a dot and a dash. You don't need a dot and a dash and a squiggle. We TEND to add more symbols, but that is just for shorthand: HELLO is shorter than .... . .-.. .-.. ---, but they are equally good representations.

In English, we represent dates/times in various ways, i.e. 3:30pm November 15, 1965. In that example, there are 18 discreet symbols. Since we only need TWO symbols to express anything, then of COURSE we can express anything as dates/times. So you can't mean "can we express anything using only the symbols we use to express dates and times in English."

But ANY way we express dates/times -- English, binary, Morse code -- will involve AT LEAST two symbols. Which is enough to express anything.

So what do you mean?
posted by grumblebee at 3:10 PM on May 10, 2005

This also would only work if you had instantaneous transfer or transfer with a known and stable delay.

Otherwise, you'd have to transmit the time/date stamp, which, as kindall has pointed out, would take the same number of bits as your target data. (Actually, it would take twice as many bits because you'd be sending two sets of date data to be subtracted from each other to reach the target binary).

Also, I think adamrice's explanation is missing the fact that the only information we're interested in is binary. You wouldn't need a decryption "book" listing code for every possible piece of information. If, for instance, your transferring a video file, that's what the codec/media-player is for.

On preview, I think grumblebee is also missing the subtraction thing, which might mean I'm wrong. I assumed the set-up was two pieces of small, meaningless data sent at precise times. Those times represent two binary numbers. By subtracting those two binary numbers we get a third binary number, which is the file actually being "transmitted." Obviously, the level of precision would need to be immense.
posted by nobody at 3:13 PM on May 10, 2005

This is an old idea that I saw presented as making a mark on the metal bar. Take the ratio of bar length and mark location, write it out as a decimal, and you've got a number that's as big as you please. I think the example talked about turning all the information in the world into a decimal, thus encoding human civilization in a single mark. But then you want to know how big the bar has to be to ensure an accurate reading of the mark.

If the information you're sending is 'fuzzy' then what the dinosaur is describing is just frequency modulation. In fact what the MSPaint dinos describe sounds exactly like how laserdiscs encode their video signal. Good for sending vast amounts of analog, not so great for sending bank transactions, unless you encode digital over the analog, with some kind of fictional frequency "modulater/demodulater" (scoffing snort).
posted by fleacircus at 3:14 PM on May 10, 2005

damn: "If for instance, you're transferring..."
posted by nobody at 3:14 PM on May 10, 2005

The logic suggested (but never quite elaborated) by that comic is that every piece of information essentially has a serial number assigned to it. If both parties already have the entire corpus of possible information, then yes, you could refer to the serial number for a given piece using the technique the dinosaur describes.

I don't think that's quite it. I think T-Rex's point is that every piece of information already has a serial number assigned to it. Just convert the html code for this web page, for instance, into binary, and you have its "serial number". Decoding is simple: it's just a binary-to-html conversion.

I think kindall has the answer: to represent an n-bit binary number would require a date with n bits of precision. You don't gain anything.

I think grumblebee is also missing the subtraction thing, which might mean I'm wrong.

Yeah, you don't need to transmit two dates. Both parties know the zero date, presumably to infinite precision. In fact, let's just call the zero date '0'.
posted by mr_roboto at 3:20 PM on May 10, 2005

I've heard this same idea expressed in another way: mailing someone a 1-inch ruler with a nick in it. Measure the fraction of the inch where the nick occurs very precisely and you get a long string of numbers: .3854750494850493804598340598 of an inch. Etc.

It's more of a neato little illustration and less of a practical reality. There's a small issue of the number of significant figures you can measure to. Still, this could come in handy someday for extremely low-bandwidth applications like interstellar communication.

Perhaps aliens are sending us START / STOP bits constantly and we just haven't recognized and decoded them.
posted by scarabic at 3:26 PM on May 10, 2005

If you have reliable high resolution timers it's not a bad idea. You need not represent the whole file as one number, you can represent it as a series of numbers. Say, for example, that you break the file down into, I don't know, 4 byte numbers. If you had a timer that had adequate reliability down to the microsecond (1 millionth of a second) then to transfer the largest possible number would take 4294 seconds. Whoops. OK, if you had a timer reliable to the femto-second you could transfer it in 0.0042 seconds. Not too bad.
Except this only allows you to send 928 bytes a second. Whoof. In comparison you can download, easily, 150,000 bytes/seconds with DSL.

To be clear, the idea espoused in the comic is that I send a "START" and a STOP command, nothing else. Really, just two transmissions of one bit each would be fine. You measure the time between when I say go and when I say stop, and the number of clock ticks in between those two represents the binary number sent. Let us pretend that you don't have to worry about network latency.

Anyway, if you have a REALLY REALLY good accurate clock this works fine but as demonstrated above you're going to have to be able to measure time in the, say, 1e-16 seconds or so range before it would be better than DSL
posted by RustyBrooks at 3:36 PM on May 10, 2005

This concept appears in Frederik Pohl's (I think) "The Gold at Starbow's End," where it's called Goedel-izing. There, the characters did it by describing a very big number in the simplest mathematical form, so BigDoc was 12.65 to the 1343rd power, divided by three, minus 12, ignore the decimal point, or something like that.
posted by ROU_Xenophobe at 4:31 PM on May 10, 2005

IANAAstrophysicist If you're looking at the start/stop strategy as a method for interstellar communications, especially if accuracy to the 1e-16 second is your goal, you're going to start running into problems with gravity's effect on time. Send a bit too near to a star or black hole, and you're inevitably going to have a delay.

I think.
posted by The White Hat at 4:34 PM on May 10, 2005

Experimental realization. This is a bread and butter method of fast particle counters/analyzers in high energy physics experiments.
posted by fatllama at 4:39 PM on May 10, 2005

Not to drift too much, but the previous entries reminded me another compression scheme that might be of interest to you- arithmetic encoding.

The basic principle behind arithmetic encoding is that a string can be stored as a precise fraction pointing to subinterval sequences for specific symbols. Instead of using a probability distribution function [P(message) = P(symbol 1) * P(symbol 2) * ... * P(symbol n)], it uses a cummulative density function [probabilities added together in increasing order].

Without getting into details, all that needs to be transmitted is the alphabet, the probability distribution model and the high precision fraction. A function of low and high end probability intervals is used to decode each symbol. Obviously it worked best (shorter binary numbers) when the probabilities of symbols is skewed. It's a very common coding scheme for video and images.
posted by gaelenh at 4:55 PM on May 10, 2005

I asked a similar question on years ago. Since Pi is a random string of numbers (up for debate, but let's assume it was) and it can be calculated out to any decimal place using existing accurate algorithms, why can't I simply refer to an index and a length to specify my file as found in Pi? For instance, if the message I want to transmit is "979323846" then I could save space by only telling you "13, 9" ... which you would already know means "Calculate digits of pi beginning at the 13th digit, then calculate the next eight, so you get a message of length 9"

That's convenient, and all, because it uses the "existing data store" that everyone has access to, and simply references it, as others have mentioned in their answers. The problem is, this only works in theory. If I wanted to "compress" a file using my Pi-index algorithm, it would take an extremely long time to even find my file's bit-pattern (as represented in base-10) in the digits of Pi, assuming it was even there at all.

There are other problems, too, I'm sure.
posted by odinsdream at 5:09 PM on May 10, 2005

I imagine that the pi-based scheme would generally generate "compressed" data that is much larger than the original.

For example, to encode "11", you'd need to encode 94, 2.
posted by neckro23 at 5:40 PM on May 10, 2005

At risk of a derail, (because this has already drifted so close to a question that has been nagging me forever)...

Could the digits of pi be used as a reliable source for encryption ala one-time pad, where the source is encrypted by adding the digits of pi beginning at a certain place to the plaintext, and then using a pre-arranged algorithm to know where in pi to begin, decrypt?
posted by BleachBypass at 5:51 PM on May 10, 2005

This page does a good job (at least I think so, but I'm not a mathematician) of explaining why it is not possible to compress random data and why ideas like Odinsdream's, when applied to random data will not work.

For fun, go here to search Pi for any number you wish.
posted by darksquirrel at 5:57 PM on May 10, 2005

Could the digits of pi be used as a reliable source for encryption ala one-time pad, where the source is encrypted by adding the digits of pi beginning at a certain place to the plaintext, and then using a pre-arranged algorithm to know where in pi to begin, decrypt?

This only works if you know more digits of pi (or can generate them more quickly) than other people. Unfortunately the digits of pi are a subject of mathematical interest, so we know a lot of them and have come up with relatively efficient algorithms for generating more. It is relatively easy to use the digits of pi as this sort of key if you know you're looking a key contained in the digits of pi.
posted by kindall at 6:20 PM on May 10, 2005

It is relatively easy to use the digits of pi as this sort of key if you know you're looking a key contained in the digits of pi.

It is relatively easy to crack such a message, I mean.
posted by kindall at 6:21 PM on May 10, 2005

BleachBypass, yes digits of pi can be used for encryption. I don't know of any situation where the digits are the soul source of encryption, but the randomness of pi is an effective tool in the field. Blowfish, an iterative block cipher similar to AES, uses hex values of pi digits to increase diffusion.
posted by gaelenh at 6:27 PM on May 10, 2005

Hey fleacircus - I got it.

(Then I snorted too.)
posted by attercoppe at 7:17 PM on May 10, 2005

What a far out question - and interesting responses.

My feeling is that the trouble with the "transmitting two numbers" theory is in significant digits. A number long enough to encode a serious amount of information would have to be measured in a large amount of very tiny units. A time difference in even seconds can't realistically hold much information, so your level of precision would have to be pretty high. And I bet it turns out that to transmit numbers of sufficient precision to describe something you end up transmitting about the same amount of data.
posted by 31d1 at 7:47 PM on May 10, 2005

In related numerical fun, certain numbers may be illegal. Famously, the DeCSS program is illegal due to misguided copyright law (IANAL) and, since it can be represented as a large (prime, in fact) number, it could be illegal to give that number to people.
posted by callmejay at 9:53 PM on May 10, 2005

Another difficulty in the "transmit two very precisely timed ticks" scheme for transmitting an arbitrarily large chunk of data in an finite time: even if you have an ideal, drift-free, infinite-resolution clock, it turns out that to prevent the 'ticks' from getting smeared out, you need an infinitely wide-bandwidth communications channel to start with. Continuing down this road leads to information theory.

Re the use of π in crypto: IIRC, it's sometimes used when the algorithm needs some random-looking constant, and in order to deflect speculation that the specific constant was carefully chosen to have some nefarious property, the digits of pi are used. I think this is the case with Blowfish.
posted by hattifattener at 11:25 PM on May 10, 2005

BleachBypass: one time, dude. In your example, pi is your pad.

In fact, this is even worse than reusing a onetime pad a couple of times - pi never changes.
posted by i_am_joe's_spleen at 1:00 AM on May 11, 2005

Well, now I feel thick. I read the cartoon twice, I read all the responses here... and I still have no idea what the hell is being said. Represent any data as a single statement of time... I mean, HUH? Sooo... if I want to represent the complete works of William Shakespeare as a statement of time, how the hell do I do that?

Is it just that I haven't had enough coffee today or is senility advancing apace?

I have an honours degree in astrophysics and an IQ of 170, btw. And I shall cling to those things as I drown in tears of self-loathing.
posted by Decani at 7:30 AM on May 11, 2005

if I want to represent the complete works of William Shakespeare as a statement of time, how the hell do I do that?

Well, let's take a simple example that can easily be ramped up to yours. If you want to transmit a single word, like "pie", in electronic form, you have to eventually get it to the point of 1's and 0's that electronic devices understand. This is like transmitting something in morse code, only using a variety of codes. The reason we use 1's and 0's is that they can be represented by electrical On or Off signals.

If we want to transmit someone a literal "1" or "0", that's easy, we just turn the switch on or off. If we wanted to send someone a "2", we have to get more creative. This is where base-2, or binary, notation comes in. We can convert the number in to binary form, and we get "0010." So, now, to send the number "2", we leave the switch off for two units of time, flip it on for the third unit of time, and flip it back off for the fourth unit of time.

Now, how do we transmit "pie" or "the complete works of william shakespeare" ? Well, first we have to agree on a method to map the set of letters in our message to numerical codes. Enter ASCII, which is a letter-number map that lots of people have agreed on. Using this, we convert "pie" in to the decimal equivalents: 112, 105, 101.

We need to go one more step, though, and put those numbers in to binary so that they can be transmitted with on-off signals: 1110000, 1101001, 1100101. Now we can flip the switch on and off according to that pattern, and if the receiving end knows ASCII and understands that each character should consist of eight bits, we're in business!

As far as representing "pie" as a number, that number would be
111000011010011100101, or 1848549 in decimal notation.
posted by odinsdream at 11:37 AM on May 11, 2005

Get a binary file of the complete works of Shakespeare. Convert that very large binary number into a very large decimal number. Use that number as a number of (milli-/micro-/nano-)seconds, then write out the result time.

The catch, as others have said, is in the significant digits.
posted by squidlarkin at 11:39 AM on May 11, 2005

...So I could transmit the word "pie" to you by using the clip-art-dinosaur-stopwatch method if I said "START" right now, then 21.395243055555555555555555555556 years from now, exactly, I said "STOP!"

You'd take the elapsed time in seconds, convert it to binary form, split it into groups of eight bits, then run it backwards using an ASCII table and get the word "pie".
posted by odinsdream at 11:39 AM on May 11, 2005

*note in my example above, the binary representations of the letters of "pie" should each have a preceeding 0 on them, to make them officially eight bits. My apologies for your and your family's suffering.
posted by odinsdream at 1:26 PM on May 11, 2005

This reminds me of a paper by a professor of mine on Probabilstic Packet Marking.

The idea is that given a network layout, a receiving computer can determine a path through the network if the routers along that path flip a single bit in the packet with some probability p based on their position in the network. Once the receiver has collected enough packets (2^n^ where n is the length of the encoded path-string) the probability of the bit being flipped in the collected packets encodes the path through the network.

Furthermore, for multiple paths, more bits are necessary (log(2k-1) where k is the number of paths through the network, in fact).

If that was confusing (and I'm pretty sure it was), try this, straight from the horse's mouth: (Danger: Computer Scientists and Other Probabilty Freaks only)
posted by ThePants at 10:01 AM on May 12, 2005

« Older HTML for Web page counter?   |   Getting started with Applescript Studio Newer »
This thread is closed to new comments.