Ordering MySQL queries by comma-separated strings?
April 9, 2008 9:50 PM   Subscribe

I'm trying to run a MySQL query that returns rows in order by a value stored in a string (but it will also include the non-matching strings). This is not as easy as I would hope. Code inside!

I work at an educational technology consulting company, and i'm making a script in PHP that allows me to track which updates i've completed to various courses. I have a database set up in the following manner:

id----lms----------------------url-------------------------year
1----string(1,5,8)-----------http://blahblah-----------2008
2----(1,4,6)------------------http://blahblah-----------2007
3----(4,8)--------------------http://blahblah-----------2007

Each of the items in the 'lms' section refers to a specific learning management system. I explode the string into an array by breaking it at the comma. The string is created via a series of checkboxes in my app that add a certain number depending on the lms (i.e. 1 = Blackboard, 2 = Plateau, 4 = Moodle).

I can run a MySQL query that sorts by year. It is the following:
$query = "SELECT * FROM courses ORDER BY year";

What I cannot figure out is how to order by which lms is listed in the database. For instance, I want to run a query that is ordered by 4 being in the lms column, so that I see lines 2 and 3 first, and then 1 afterward. Is this possible? Does this even make sense? Thanks.
posted by EduTek to Clothing, Beauty, & Fashion (15 answers total) 4 users marked this as a favorite
 
Response by poster: Sorry, wrong category. Should've been the next one down - Computers.
posted by EduTek at 9:52 PM on April 9, 2008


You shouldn't really be storing lists in a database field, generally. Your difficulties making a query are just one reason for this.

Based on your app's setup of checkboxes, I'm assuming you have a list of LMS's, let's say numbered 1 to 9... and they are either checked or unchecked.

It would make sense, then, to have an enumerated data type for each LMS..

LMS1 enum('Y','N')
... etc

Unless you have an unlimited number (or a very large number) of LMSs...

Then, you could simply order by LMS4, LMS6, LMS1.... or whatever you want to order by.

The above may or may not seem silly depending upon what you're trying to do. It's not 100% crystal clear from your post, so if anything above seems odd, perhaps you can clarify exactly what you're after.

Either way, I would highly recommend reading an article or two on beginners' database structure in mysql....
posted by twiggy at 10:16 PM on April 9, 2008


SELECT * , IF( lms REGEXP '[+]?4[+]?', 1, 2 ) AS sortorder
FROM courses
ORDER BY sortorder

The '[+]?4[+]?' regular expression means 'match 0 or more commas followed by a 4 followed by 0 or more commas', which should do the trick. If matches are found, the sortorder value is set to 1, if not it is set to 2. The results are then ordered by this value. You can then add any further order fields (year etc).

There may of course be more efficient ways to do it. In particular I would tend to use a new table to store the values from the 'lms' column.
posted by le morte de bea arthur at 10:18 PM on April 9, 2008 [1 favorite]


Paging orthogonality... our resident MySQL optimizeratingtionmachine...
posted by disillusioned at 10:20 PM on April 9, 2008 [1 favorite]


Best answer: Ok, think this through. You want to order, so that means you need an order by:

"SELECT * FROM courses ORDER BY ????";

You want to order by the existence of a '4' in lms, so you need a predicate that tests for a substring. Like is such a predicate:

"lms like '%4%'"

Put this together:

"SELECT * FROM courses ORDER BY ms like '%4%'"

This will actually sort rows where lms does not contain a '4' to the top, so order descending:

"SELECT * FROM courses ORDER BY ms like '%4%' desc"

and add back in your year order:

"SELECT * FROM courses ORDER BY ms like '%4%' desc, year asc"


Now, that will work, but your design isn't very good, because it's impossible for you to use your data without transformong it outside of the database. That's a big red falag: if you're storing stuff in a way it can't be used, it means your data model is poor.

Rather than storing lms as a string, and exploding the string outside the database, you should just store it in the database, one row per lms type in a related table. Then it would have been obvious how to do this.
posted by orthogonality at 10:24 PM on April 9, 2008 [1 favorite]


If you used:


ORDER BY CASE WHEN lms LIKE '%4%'
THEN CAST('1' AS VARCHAR)
ELSE CAST('2' AS VARCHAR)
END


This will create an ordering system putting anything with a 4 above anything not with a 4.

NB: I use, and did test with, T-SQL, so I'm hoping if this is not MySQL compatible, someone here can point it out.

Who knows, you might be able to get away without those annoying 'CAST' statements....
posted by pompomtom at 10:26 PM on April 9, 2008


Response by poster: @twiggy - I see your point, it would be much easier if I didn't set it up this way. I grabbed the code from a gradebook app that I wrote which did have an unlimited amount of assignments and grades, and this is why I ended up in this mess :)

@le morte de bea arthur, orthogonality, and pompomtom - Thanks for all of your suggestions, i'm giving them a shot right now. I completely agree with you regarding the structure of the DB. I was trying to make it easy on myself by using old code, which is now clearly spoiling my Wednesday night.

Thanks all!
posted by EduTek at 10:35 PM on April 9, 2008


Best answer: If you can redesign your tables, you might do something like:

url_values
----------------------
url_id:INT
url_value:VARCHAR
url_year:VARCHAR

lms_values
----------------------
lms_id:INT
url_id:INT
lms_value:INT


Example tables:

url_id----url_value-------------------url_year
==============================================
1---------http://blahblahx------------2008
2---------http://blahblahy------------2007
3---------http://blahblahz------------2007

lms_id----url_id----------------------lms_value
==============================================
1---------1---------------------------1
2---------1---------------------------5
3---------1---------------------------8
4---------2---------------------------1
5---------2---------------------------4
6---------2---------------------------6
7---------3---------------------------4
8---------3---------------------------8


This design lets you accomplish two things:

1. By JOIN'ing the two tables based on matching url_id's (which can help you build your 'lms' string) you gain direct control over the format of that lms string

2. You can SELECT on lms_values directly to get back url_id's that correspond to matches with lms_values of 4, etc., and you avoid the following:

a. You avoid the performance hit of a regular expression query

b. You avoid having to code for a specific regular expression that matches your format of the lms string (if you change the format of the lms string, you'll probably have little choice but to write and test a new regular expression)
posted by Blazecock Pileon at 10:47 PM on April 9, 2008 [1 favorite]


Response by poster: @Blazecock Pileon - I'll be taking your suggestions into account for this app in the future. I never used joins, so I guess it's time to learn. Thanks for taking the time to write that out!!
posted by EduTek at 10:51 PM on April 9, 2008


I never used joins, so I guess it's time to learn.

Joins sound scary, but all you have to do is picture a Venn diagram with one circle being one table, and the other circle the other table, and you're just looking at different pieces of the circles.

The area of overlap between the two is simply where the info you're looking for is the same value in both tables. That's called an "inner join".

There are a few other join types, and this link is a good diagrammatical overview of how they work. In the end, it's all just pieces of those circles.
posted by Blazecock Pileon at 11:01 PM on April 9, 2008 [2 favorites]


Ugh, friggin' firefox crashed when I was working on my answer. Ugh. (and I forgot to try the 'restore session' button when it came up.

Anyway, breifly, what you can do is this:


Select id,url,year,'0' as myorder
from courses where lms like '%4%'

union

Select id,url,year,'1' as myorder
from courses where lms not like '%4%'

order by myorder asc


Basically, you create a 'virtual' table by selecting each segment separately, then sort that 'virtual' table in your query. Of course, for each grouping, you're going to need a separate select statement, so it gets painful quickly.

If you had 3 groups you wanted ordered separately your selects would look like this


Select id,url,year,'0' as myorder from courses where lms like '%4%'
Select id,url,year,'1' as myorder from courses where lms like '%3' and not like '%4%'
Select id,url,year,'2' as myorder from courses where lms like '%2' and not like '%3%' and not like '%4%'


and so on.
posted by delmoi at 11:06 PM on April 9, 2008


As usual, you're getting excellent AskMe answers; this is a great place to ask about databases.

I was going to answer you, but the answers you marked best cover it very well. I just wanted to reinforce that they are, indeed, excellent advice.

You're having the problem because your database is poorly structured, but orthogonality's method will let you do what you need for now, and Blazecock's answer will help you restructure the data so that it works properly in the future.

If you want to get better at this, look up 'database normalization'. It sounds scarier than it actually is. Basically, it's a method of organizing your data so that it's broken into the correct discrete values, and you never repeat data, ever. Whenever you have a value that's repeated more than a couple of times, you make a lookup table for it. That's what Blazecock is showing you... making a lookup table of LMs, so you're just storing a small ID value, and then looking up the LM name when you need it. You save a lot of space, and avoid all kinds of possible data error issues. If "Joe's Training Facility" is listed only once in the entire database, with pointers in records associated with that facility, you automatically avoid problems like someone typing in "Joe's Training" or "Joe's Trainggi Facility", which look like different providers. Basically, you get rid of typos, because if you do make an entry mistake, it's in just one spot, and fixing that fixes every record in the database at once, and they all stay correctly pointed to the same actual provider.

It's a _tremendously_ better way to organize your data, and if you're going to continue working with databases very much, it's well worth spending a few days to wrap your head around how to do it properly. Once you understand normalization, you can approach complex data modeling tasks with total confidence.
posted by Malor at 5:19 AM on April 10, 2008


Plus, if you do this the Right Way with normalization, you won't get bitten by LIKE '%4%' matching '2,10,14,15' and regexps being slow.
posted by flabdablet at 8:33 AM on April 10, 2008


please note I'm pretty sure (haven't fully thought about any consequences) for your needs you can use FIND_IN_SET(4,lms) instead of LIKE '%4%' (as both should return TRUE or 1 when 4 is present in the field). FIND_IN_SET should be somewhat more efficient but probably won't really make a huge diff. on small databases. This works because the lms field is in a list fashion, but don't take the existence of this function to mean that it is OK to store data in that fashion. I've had to use these dozens of times on a legacy database where everything was in fields like this instead of being properly normalized.

Everybody's (well, everybody who's not had formal RDBS training) first data structure is totally wrong (and often, as was in my case, copied from someone else who did it totally wrong), and you find this out by having difficulty with these sorts of queries; here's your chance to learn how to do it right: read up on database normalization, read up on JOINs, read up on ids, KEYs, INDEXes, and GUIDs.

It'll probably take a couple applications before you see why these seemingly counter-intuitive guidelines for relational data make sense, but it's worth trying to learn now so you don't have to work with clunky old data structures (honestly, it's usually fastest to just toss them, re-import the data, and start over than to try and 'fix' them). I typically try to write my data structures BEFORE any programming takes place -- adding fields and whatnot mid-development process means you did not properly plan ahead and that you are likely teetering on the brink of a bad data backend and poorly architectured code.
posted by fishfucker at 11:22 AM on April 10, 2008


you won't get bitten by LIKE '%4%' matching '2,10,14,15'

ah yes, and this is why you want to use FIND_IN_SET instead. Forgot about this (Ran into this prob on that legacy database too).
posted by fishfucker at 11:23 AM on April 10, 2008


« Older What to get a geeky male for his 21st birthday?   |   Cocoa, MIDI, how to? Newer »
This thread is closed to new comments.