Why do ordered lists in open source documentation start with zero?
April 2, 2004 12:34 PM   Subscribe

I've been reading a lot of open source software documentation lately, and I'm wondering why all the ordered lists start with zero? Here's an example. Is it a programing thing that counting is supposed to start at zero? An international thing (is it american to start counting 1, 2, 3?). I've see at least three examples in the past week and I'm curious.
posted by mathowie to Computers & Internet (30 answers total)
 
Best answer: It's because in most computers, lists (arrays and the such) natural start at index 0. Since things are just layed in memory, if you want to get at the first item of list l, you use the address of l plus 0. (Various languages, like Basic and Pascal, have, by default, made it so arrays are based on an index of 1.)

Also, if I remember correctly, in several (some? most?) of Donald E. Knuth's papers/books, he numbers the chapters starting at 0, so it might have started with him. I'm sure he did it for the above reason. To answer your "international" question, I guess it's vaguely a Compute Science thing. :)
posted by skynxnex at 12:40 PM on April 2, 2004


Best answer: It surely must be a sort of in-joke for programmers, since only programmers would get the reference to array indexes, which always start at zero. More here.
posted by ericost at 12:42 PM on April 2, 2004


0 is the first "real" integer, yes?
posted by maniactown at 12:58 PM on April 2, 2004


yeah, it's just a geek thing, i think.

i don't imagine it's cultural - certainly not european or what i've seen of s america, and i guess that in cultural terms zero is a pretty new number anyway.

also, if there's a zeroth item it's often joking about a basic assumption.

(oh, and arrays don't always start at zero - fortran default is 1; '"real" integer' is not well defined, but zero is the first natural number)
posted by andrew cooke at 1:03 PM on April 2, 2004


More from the Jargon File.
posted by bshort at 1:11 PM on April 2, 2004


Along with the array index inside knowledge, the 0'd item is often used as a sort of preface, a "things you should know, do, or have prepared before you begin" step in the documented process.
posted by Danelope at 1:15 PM on April 2, 2004


Yeah, it's a programmer injoke of sorts, probably due in large part to Knuth. As andrew cooke says, some languages start arrays at 0, and others start at 1; some (like Perl) let you switch back and forth on the fly...

The reason that many languages start at 0 is just that it makes a lot of things slightly simpler. You spend a little less time adding and subtracting 1 from indices and counts and offsets and things.
posted by hattifattener at 1:21 PM on April 2, 2004


Response by poster: So for open source documentation -- which is usually scant -- they choose to number lists in a jokey fashion that only other software engineers would get.

And people wonder why regular folks don't use open source software.
posted by mathowie at 1:34 PM on April 2, 2004


Since when did "regular folks" read documentation?
posted by electro at 1:49 PM on April 2, 2004


I'm not sure it's always a joke. I have known programmers that actually do start counting at zero out of habit. Since most open-sourcers don't have the benefit of a tech writer or copy editor, the documentation tends to be poorly written from a "regular user" point of view. Ideally, they should probably run the documentation by someone who is not tech savvy and see if it's understandable to them.
posted by sixdifferentways at 1:51 PM on April 2, 2004


Since when did "regular folks" read documentation?

Since there's no way to install (or in fact use) open-source software without it.
posted by jjg at 1:56 PM on April 2, 2004


And people wonder why

(i've edited a dozen replies to this that either seemed as snitty as your comment or too pompous - i just want to say that not everyone who shares code cares whether "regular folks" use it or not. i put stuff out there in case people find it fun, is all, and worrying whether mathowie gets a joke or not isn't going to change whatever docs it gets)
posted by andrew cooke at 2:36 PM on April 2, 2004


Mathematicians have thought about this. First, there is a difference between ordinality and cardinality in counting. It is very confusing that, if you acknowledge the existence of zero, the number 1 is the first cardinal but second ordinal. See also here for an intuitive explanation.

This issue is addressed well by John Conway. He says you should start counting at zero, but then report the number that you didn't get to yet (see especially the last paragraph of the last link). Thus, from the mathematician's viewpoint, one should count the first chapter as the zeroth, but report it as the first (it makes sense if you think about it). However, this does not prevent people from reporting the first chapter as zero if they want to!
posted by jjray at 3:04 PM on April 2, 2004 [1 favorite]


In my CS classes in college, all the tests started with a 0th question. God, that pissed me off.
posted by rafter at 3:07 PM on April 2, 2004


Think of it as representing the offset from the beginning.
posted by pemulis at 3:46 PM on April 2, 2004


Too long among the ColdFusion code, mathowie, too long.
posted by yerfatma at 3:53 PM on April 2, 2004


Response by poster: i just want to say that not everyone who shares code cares whether "regular folks" use it or not.

I didn't mean to sound snitty, but if there's a significant segment of OSS writers saying stuff like "why don't people use our stuff? why isn't linux on the desktop of most users?" take a look at the paltry documentation and typical UI (look around for ESR's recent essay about this very thing where he admits most OSS is dreck to setup, look at, and use).

No not every GPL and OSS app should come with a glitzy brochure and flash movie to show you exactly how to use it, but when stuff is being coded for regular folks and install instructions are an absolute requirement to get the software working, perhaps it isn't the best venue for sharing your programmer in-jokes like numbering from zero.
posted by mathowie at 4:13 PM on April 2, 2004


The Don Knuth thing with chapters starting at zero most likely is a joke- but serious, you know.

He pays people who find errors in his books $2.56, or one hexadecimal dollar. Lively fellow, that chap, with the geek humor.
posted by pissfactory at 4:26 PM on April 2, 2004


I would say that zero based indexing is less an in-joke than a sub-cultural idiom. In context, it makes perfect sense. Out of context , it's just a small bit of weirdness. It's not jargon in the sense that's it's not used to exclude people or mark outsiders. I don't get why anyone would care.

On the other hand, I have vague memories of raging debates over zero based indexing among various language advocates so apparently indexing styles have more meaning than I recognize. Personally I've never understood why associative indexing isn't more popular.
posted by rdr at 4:43 PM on April 2, 2004


Bit of topic drift here, but this piece does a really good job of discussing issues with usability/documentation/OSS.
posted by willnot at 5:03 PM on April 2, 2004


Perhaps this isn't a helpful answer, but this quotation is too good not to share:

"Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration."
-- Stan Kelly-Bootle
posted by xil at 5:20 PM on April 2, 2004 [1 favorite]


why associative indexing isn't more popular

efficiency - it's typically O(log(n)) rather than O(1) because you can't just use the index as an offset into a chunk of memory. in a perfect world that should be a transparent optimization, but it's not an easy call to make at compile time (which arrays will be pretty much filled with a contiguous range of index values?) and runtime optimization hasn't got there yet (afaik).

(apologies if i'm stating the obvious here!)
posted by andrew cooke at 5:21 PM on April 2, 2004


why O(log(n))?

would it make more sense to be O(typeof(x) * n) ?

i am no computer scientist...just a programmer, heh. i don't fuck with low level stuff.

i also imagine that JIT compilers are probably pretty good with the associative indexing, when compared with the cost of compiling the byte code in the first place.
posted by taumeson at 5:26 PM on April 2, 2004


why don't we make that "sizeof(x)" instead of "typeof(x)"?

heh.
posted by taumeson at 5:32 PM on April 2, 2004


He pays people who find errors in his books $2.56, or one hexadecimal dollar.

That joke was better in the days when a dollar sign meant hexadecimal. When C came along suddenly it was 0x that meant hexadecimal, which irked the shit out of me because it's less efficient.
posted by kindall at 5:54 PM on April 2, 2004


taumeson - i was assuming that the keys can be ordered, so you don't have to look at each one (your O(n)), but can instead do a binary search (hence O(log(n))). however, i've been using functional languages too long, because my answer completely ignored hashtables, which will give you O(1), the same as arrays (but with more memory overhead and a bigger time constant).
posted by andrew cooke at 6:57 PM on April 2, 2004


taumeson, I think andrew is using 'Big O' notation, but you think he's talking about an array called O that is indexed like this: O(i).

Very loosely, O(something) describes the efficiency of an algorithm or process. It relates the time taken to do something (in this case, look up an item in an array), compared to some characteristic size (in this case, the size of the array, n).

In a typical, indexed, flat array, the time taken to look something up is essentially independent of the size of the array: O(1). On the other hand, searching through an associative by iterating through each entry is a more intensive process; if you double the size of the array, you double the search time: O(n).

andrew has described an associative array in which the key can be sorted, and hence we can apply a more intelligent searching mechanism, e.g. binary search, which improves the size-dependence to just O(log(n)).

Did that make sense, or have I just muddied the waters? (And apologies for going horrifically off-topic!)
posted by chrismear at 1:04 AM on April 3, 2004


So for open source documentation -- which is usually scant -- they choose to number lists in a jokey fashion that only other software engineers would get.

I'm not sure it's a joke. After eighteen years of programming, I find it pretty natural to start lists with zero, and I have a hard time remembering that non-programmers think that's weird. But it's common in programmer idiom to make jokes that are also completely serious, so I guess it could be intended both ways.
posted by Mars Saxman at 1:48 AM on April 3, 2004


I think the Jargon File entry on fencepost error gets across why people might want to start counting at zero. For an example, see the confusion about Y2K and the correct start of the new millenium. Conway's thoughts on this are also very helpful. Thanks, jjray!

But I think the most likely answer to this question lies ultimately in how computer hardware necessarily realizes the concepts discussed above. That is, in its most basic computer hardware sense, an element in an ordered list is going to be represented by a memory location and an offset from that location. The maximum extent of the offset (the size of the list) would be declared by the programmer and the resulting chunk of memory set aside at compilation. You're always going to have the memory location + 0...that will be the first item in the list. The second would be memory location + 1. Once you've abstracted the language away from the hardware, you no longer are tied to thinking about an ordered list in this way. But since the whole enterprise of computer programming has been a continuous process of bootstrapping, aside from whatever extent practical use this paradigm still has (and it does), it's vestigal.

And, anyway, zero is really cool.
posted by Ethereal Bligh at 2:02 AM on April 3, 2004


Second what Ethereal Bligh said. The logic of having a zeroth element becomes a lot more apparent if you've done some C or assembly programming, because you're more likely to work directly with chunks of memory directly rather than dealing with the abstraction of an array.
posted by weston at 8:53 PM on April 3, 2004


« Older Give me recommendations for home exercise...   |   How do you feel? Newer »
This thread is closed to new comments.