Alphabetize an Excel column of titles while ignoring initial articles
September 2, 2019 7:58 PM Subscribe
I'm creating a database in the current version of Excel (working in MacOS so I don't have access to... Access). One of the columns consists of titles, some of which start with an article (a, an, the). I want these titles to appear as they're normally written, rather than placing initial articles at the end (e.g. Lord of the Rings, The). But when I sort this column A to Z, I want it to ignore those initial articles. Has this become possible in Excel since this previous question?
As an example, let's say I have the following titles in a column:
As an example, let's say I have the following titles in a column:
A City of SadnessWhen I sort the column A to Z I want it to ignore the initial articles so the column would appear thusly:
Flowers of Shanghai
The Assassin
The AssassinI'm happy to use the Mac app Numbers instead if that'll do the trick. I'd rather not do this in a table in Word since that gets a little unwieldy for my purposes.
A City of Sadness
Flowers of Shanghai
This uses Regular Expressions to split the the articles into a separate column. If you paste your titles column in the middle box at the bottom it will give you two new columns with the articles and title split when you paste into Excel.
posted by gregr at 8:08 AM on September 3, 2019
posted by gregr at 8:08 AM on September 3, 2019
Best answer: Same approach as brainmouse, with a shorter and more tidily extensible formula that does the same job:
You might be wondering why, given the existence of the case-insensitive SEARCH() function, this formula instead relies on an explicit translation to lowercase before a case-sensitive FIND(). I did it this way because as well as being case-insensitive, SEARCH() allows the use of wildcard * and ? characters inside the text to search for. This could cause spurious detection of "a" or "an" or "the" in the admittedly unlikely case that the first word of the title contained one of those characters. FIND() just does a simple text search, removing potential surprises. If you're unconcerned about that and would prefer a shorter formula:
posted by flabdablet at 8:52 AM on September 3, 2019 [2 favorites]
=IF(ISERR(FIND(LOWER(LEFT(A2,FIND(" ",A2))),"a an the ")),A2,REPLACE(A2,1,FIND(" ",A2),""))Translation: If there isn't a space in the title, or if everything up to and including the first space, after translation to lowercase, can't be found inside "a an the ", then just use the whole title in the hidden sort column. Otherwise, use a version with everything up to and including the first space deleted.
You might be wondering why, given the existence of the case-insensitive SEARCH() function, this formula instead relies on an explicit translation to lowercase before a case-sensitive FIND(). I did it this way because as well as being case-insensitive, SEARCH() allows the use of wildcard * and ? characters inside the text to search for. This could cause spurious detection of "a" or "an" or "the" in the admittedly unlikely case that the first word of the title contained one of those characters. FIND() just does a simple text search, removing potential surprises. If you're unconcerned about that and would prefer a shorter formula:
=IF(ISERR(SEARCH(LEFT(A2,FIND(" ",A2)),"a an the ")),A2,REPLACE(A2,1,FIND(" ",A2),""))
posted by flabdablet at 8:52 AM on September 3, 2019 [2 favorites]
Another advantage of a separate sort column just occurred to me: you're not actually forced to sort on algorithmic transformations of the visible titles.
If you have a peculiar title that you think ought to sort as if it were something else - for example, if the author decorated it liberally with rock and roll umlauts, or styled it with extra spaces between the letters, or put a dollar sign where an S ought to be, but you'd rather the cosmetic effects were ignored for sorting purposes - then all you need to do is fill in its sort column entry explicitly, overwriting the pre-filled formula.
posted by flabdablet at 9:33 AM on September 4, 2019
If you have a peculiar title that you think ought to sort as if it were something else - for example, if the author decorated it liberally with rock and roll umlauts, or styled it with extra spaces between the letters, or put a dollar sign where an S ought to be, but you'd rather the cosmetic effects were ignored for sorting purposes - then all you need to do is fill in its sort column entry explicitly, overwriting the pre-filled formula.
posted by flabdablet at 9:33 AM on September 4, 2019
This thread is closed to new comments.
If the titles are in Column A (assuming they start in A2), then put in B2:
=IF(LEFT(A2,2)="A ",RIGHT(A2,LEN(A2)-2),IF(LEFT(A2,3)="An ",RIGHT(A2,LEN(A2)-3),IF(LEFT(A2,4)="The ",RIGHT(A2,LEN(A2)-4),A2)))
That will remove any opening "A ", "An ", or "The " in the title (there are some other options to do a similar thing, but this works fine).
Fill down, and then sort by that column instead of column A. Then hide column B if you don't want to see it.
posted by brainmouse at 8:25 PM on September 2, 2019 [20 favorites]