Learning about algorithms and data structures from a more humble POV
May 20, 2011 8:26 PM   Subscribe

What can a common PHP/MySQL-based CMS package teach me about data structures and algorithms?

I would like to start learning more about the "important bits" of data structures and algorithms.

I work with different content management systems all day. Sometimes I am up to my elbows in PHP.

What I'd like to do is start learning about the theory behind this type of code -- what sort of data structures, algorithms, and theories am I looking at -- rather than start with an abstract view that I keep getting from textbooks and online explanations.

Like, I look at a linked list, pointers, O(n log n), etc. in a textbook, and I feel like I'm coming unmoored from my current reality.

Thanks for any help!
posted by circular to Computers & Internet (7 answers total) 6 users marked this as a favorite
In php when you use an associative array ($foo["bar"] = "baz";) behind the scenes the language must provide a quick and efficient way to find the member named 'bar', because this construct is used so pervasively throughout the language that it can't be a slow. A naive implementation might just have the keys in a list, and you'd start at the beginning of the list and check each key until you find the one that matches, and if you don't then you know you're referencing a new slot that needs to be added. Hopefully you can see why that's doomed to failure, and why PHP internally uses some kind of hash function and possibly a sorted tree to make these operations fast. (You'd have to look into the PHP internals to find the details.)

Similarly when you ask MySQL to fetch rows from the users table where ID = 56, it would be insane if MySQL had to scan every row from the start (table scan) to find that one. Instead it uses an index, which is sorted by ID, so it can use a binary search to find the appropriate row. This is again an internal implementation detail, however. There is a terrific amount of this kind of thing going on under the covers, from your application to the C library it was written on top of to the operating system kernel that it runs under. I'm not sure how much code you will find at the actual CMS layer that would exhibit classic data structures, because once you have efficient associative arrays you tend to use that for most of your needs. For example, to de-dupe a list of words, you use an associative array and set each key to some constant value, then you extract all the keys of the array (array_keys()). Or to sort a list of words you use one of the language's built in sorting functions and you don't really care how that's implemented, but under the hood there's a quicksort algorithm going on.
posted by Rhomboid at 8:57 PM on May 20, 2011

It's been ages since I've coded up a linked list directly in C. The newer languages like Rhomboid says all have some form of hash or tree integrated and abstracted so you don't need to worry about the details of pointers. One ruby app that I worked on needed something faster and added judy arrays for performance. So in production code it would usually make sense to add a well debugged library rather than 'code your own'. Possibly some game engines need specialized efficiencies that require coding up custom structures.

Consider trying some problems at challenge sites like Project Euler in good raw C to try out a cool algorithm. Look at Skip Lists, supposed to be not too bad to code and brings up interesting arguments as to the benefits over trees.
posted by sammyo at 9:29 PM on May 20, 2011

Today's popular CMS's are probably not the best place to learn anything. In general, most CMS's I've looked at are big giant piles of doggy do code wise, and you'd potentially do yourself more harm that good by trying to learn from them. The best way to learn this stuff is by doing. Write your own code. Rewrite it so it's more efficient, cleaner, more managable, whatever. It's also easier to write your own code than it is to figure out someone else's (never mind way more fun). Read concise examples from books, the web, etc. Do not try and learn anything from Wordpress, Joomla, Drupal and the like. You'll just frustrate yourself, and won't find what you're looking for.

Also, PHP is a horrible language to learn good software engineering algorithms, data structures, etc - so much existing PHP is crap, fractured and inconsistant, and the really fun bits are encapsulated away anyway. Most high level languages today shield you from these types of things. As mentioned above, I highly doubt you'd find anything more than simple objects and associative arrays in your standard Wordpress/Joomla/Drupal/whatever-is-cool-today code bases.

Project Euler is a good place to start, but after the first few exercises it becomes more about mathematical algorithms that don't really translate to the real word (or at least not my real world). I can't think of any better places to go offhand right now though, but I know I've seen the question asked several times here, so search and I'm sure you will find tons to keep you busy and to teach you.

Alternatively, think about learning a good MVC framework, understand how a proper implementation can be very, very powerful, and how it works under the hood. That, IMHO, is much more relevant knowledge in todays high-level-language'd world, and can be just as interesting. There are many options available, from the simple & lightweight to the heavy duty huge mammoth beasts.
posted by cgg at 9:52 PM on May 20, 2011 [1 favorite]

What can a common PHP/MySQL-based CMS package teach me about data structures and algorithms?

What can today's modern supercars like the Veyron or McLaren F1 teach you about gas/fuel efficiencies or aerodynamic engineering?

You're doing it backwards.
posted by Civil_Disobedient at 5:17 AM on May 21, 2011 [2 favorites]

In general, most CMS's I've looked at are big giant piles of doggy do code wise, and you'd potentially do yourself more harm that good by trying to learn from them

Seconded. Most anything that gets turned into a general CMS started out with a specific purpose and then caught feature cancer where tiny little new feature requests get bolted on with hacks. I know because I've been both the victim or surgeon on more than one. Like C_D says, this is the wrong-way-round.
posted by yerfatma at 7:19 AM on May 21, 2011

Civil_Disobedient:"What can today's modern supercars like the Veyron or McLaren F1 teach you about gas/fuel efficiencies or aerodynamic engineering?"

Well, that's a bad analogy. The advanced things racing does in aerodynamcs and fuel efficiency often trickle down to consumer automotives. Turbo chargers for example, help recapture waste heat. Racing just trades those gains for higher acceleration and top speed. Learning from PHP based CMS packages is more like looking at a donkey cart for automotive inspiration.

When PHP was originally designed, it's author was unaware of many language design theories and tools. It was originally designed to show his resume and web traffic, and based on Perl. Neither of which suggest there's much data structures, algorithms, and theories to learn.

circular, given your request to learn from PHP/MySQL, my half-assed recomendation is to read PHP library routines and database code. These are the places where algorithmic complexity matters. It may very well require you to understand C, however. Which is the language of choice when performance matters. Just yesterday, I was asked to review a very old version of cracklib we have in production to determine what all constraints its enforcing. Part of it's goal is to check your password string against a dictionary of common words and variations thereof. This dictionary winds up being millions of words, so the file format structure was chosen to optimize lookup: a trie. From there the algorithm to load the file and check passwords is straightforward. I find the datastructure commonly drives the algorithm used.

That's why even the best PHP/MySQL CMS's aren't a great place to learn. They've largely outsourced the data structure to MySQL and the filesystem.
posted by pwnguin at 10:48 AM on May 21, 2011

As pwnguin says, reading PHP code isn't really a good place to start learning programming theory.

To learn to program, you have to read good code and write code. Your best approach so start writing code is to "scratch an itch". If there's something you want or need to do, sit down with Python or Ruby or even Rhino and figure out how to do it.

And find some great code to read. (PHP isn't conducive to writing great code).

There's a ton of resources out there for learning to program. You could start with related ask.meta posts, proggit, stackoverflow, the original Wiki, or even wikipedia. I usually suggest Software Carpentry to anyone expressing an interest in programming.
posted by and for no one at 11:54 AM on May 21, 2011

« Older Yar, this be a conundrum.   |   Life inside a hyperbolic plane? Newer »
This thread is closed to new comments.