How to get mySQL to ignore "A, An, & The" when presenting alphabetized book titles
December 5, 2006 8:36 AM   Subscribe

mySQLfilter: How can I ignore "A, An, & The" at the beginning of book titles when sorting results alphabetically by title?

I have a database of several hundred book titles and I'd like to be able to list them alphabetically, but ignore any articles that may be at the front of title.

For example, consider the following 3 titles as they appear in the db:

"A Summer's Day"
"Beach Fun"
"Zebras Are Neat"

I'd like them to appear, in order, as:

"Beach Fun"
"A Summer's Day" (See? Ignoring the first article "A" and alphabetizing on the "S")
"Zebras Are Neat"

I'm assuming that there's some sort of conditional code I could use in mySQL to make this easier.

I also have (some) PHP skills, so writing a helper function or two isn't out of the question.
posted by Wild_Eep to Computers & Internet (10 answers total) 2 users marked this as a favorite
 
ORDER BY IF(LEFT(title,2)="A ",SUBSTRING(title FROM 3),IF(LEFT(title,3)="An ",SUBSTRING(title FROM 4),IF(LEFT(title,4)="The ",SUBSTRING(title FROM 5),title)));

Or something equally horrible. I'd create an extra column with this stored, however.
posted by cillit bang at 9:02 AM on December 5, 2006 [1 favorite]


Doing this real-time within your SELECT query is going to be very expensive. I'd add another field to your titles table called "SortTitle", and do your logic in PHP before insert:

if(substr($title, 0, 2) == "A ")
{
$title = substr($title,2,(strlen($title)-2))
}

... and so on. I'm sure there's some tighter function to strip out a series of possible starting articles, but that should do the trick.
posted by mkultra at 9:10 AM on December 5, 2006


I don't use MySQL but this should work, same basic concept as cillit's:

SELECT book_id, book_name FROM books ORDER BY TRIM( LEADING 'a ' FROM TRIM( LEADING 'an ' FROM TRIM( LEADING 'the ' FROM LOWER( book_name ) ) ) );
posted by nicwolff at 9:10 AM on December 5, 2006 [1 favorite]


I have solved this in the past by writing a quick PHP script to crawl through the records and rotate "A," "An" and "The" to the end of the name like such:

New Day at Midnight, A

Then I query normally. And I use a little function (I call it it "pivot") to check each title for ", A" or ", An" or ", The" at the end and, if found, pivot that suffix to the beginning and strip out the comma. It's significantly easier then doing it in MySQL.

From there on out, of course, you'll want to insert the data with the prefixes rotated around to become suffixes.
posted by waldo at 10:11 AM on December 5, 2006


In in-place-select examples are going to screw up if the title starts with a but not the article a, like "Albino Alligators". It would also mess up things like "thesaurus" because it starts with "the".

I'd advocate creating a new column with the "sort title" and making a good routine to populate this column. If you have an existing database you'll have to step through all the rows and populate this column.
posted by RustyBrooks at 10:14 AM on December 5, 2006


Seconding the "sort_title" column idea. You should do it at insertion time, too. Beware foreign languages.

Boot, Das
Cars
Fabuleux destin d'Amélie Poulain, Le
Hunt for Red October, The
posted by cmiller at 11:13 AM on December 5, 2006


Why not clean the data (articles at the end of the titles) before it gets into the database? You could clean the data that is already in the database once now and then just ensure new data going in has been cleaned before insertion.
posted by cmicali at 11:18 AM on December 5, 2006


Jeez, guys, he said how he'd like them to appear, and it's not with the articles moved to the end of the names - he just wants to sort them that way.

It's generally better to operate on data in its canonical form if possible, rather than introduce integrity issues by storing the same information in multiple formats.

Speed is hardly going to be an issue for MySQL running a simple text operation on a couple of hundred records.

And, RustyBrooks, see the spaces in 'a ', 'an ', 'the ' in cillit's and my SQL examples? They'll work fine.
posted by nicwolff at 12:50 PM on December 5, 2006


i've stored my data in the format in SQL sort format, and then make it pretty when I need to output it, as waldo suggests. Works fine, and you don't have to worry about keeping two columns (e.g. title and sort_title) updated.
posted by kamelhoecker at 2:48 PM on December 5, 2006


I vote for an "article" column which holds the "a", "an" and "the". That's how I've always done it.
posted by AmbroseChapel at 5:04 PM on December 5, 2006


« Older best charities for the homeless in NYC?   |   Going to Algiers, Algeria. What do I need to know? Newer »
This thread is closed to new comments.