When is a last name left lowercase?
January 21, 2015 4:07 PM Subscribe
I'm writing a piece of software to go through a very large database and format last names in their correct casing (i.e., smith or SMITH or sMith to Smith.) However, I'm obviously aware some names should be kept as lowercase—but what names?
Setting aside the issue of preferred capitalization later in a surname (such as Macintyre vs. MacIntyre or d'Azzo vs. D'Azzo) what parts of a surname should be left lowercase? So far, I'm looking at the following:
- von, van
- da, di, de
- van, der, ter
- al-, ibn
What other names should be considered?
Setting aside the issue of preferred capitalization later in a surname (such as Macintyre vs. MacIntyre or d'Azzo vs. D'Azzo) what parts of a surname should be left lowercase? So far, I'm looking at the following:
- von, van
- da, di, de
- van, der, ter
- al-, ibn
What other names should be considered?
I grew up around lots of people with Dutch ancestry and Van was capitalized more often than not. Capitalize all the letters!
posted by goodbyewaffles at 4:12 PM on January 21, 2015
posted by goodbyewaffles at 4:12 PM on January 21, 2015
Double f last names are often uncapitalized like in Wodehouse.
posted by Trifling at 4:21 PM on January 21, 2015 [1 favorite]
posted by Trifling at 4:21 PM on January 21, 2015 [1 favorite]
There's really no way to consistently tell, unfortunately. My mother-in-law capitalizes her "Von", but her father would go back and forth. Her mother just dropped the "von" altogether to be free of the whole mess.
posted by Diagonalize at 4:59 PM on January 21, 2015
posted by Diagonalize at 4:59 PM on January 21, 2015
- al-, ibn
Along those lines, bint or binte (Arabic; 'daughter') is often (but not always) left uncapitalised.
posted by Ziggy500 at 5:14 PM on January 21, 2015
Along those lines, bint or binte (Arabic; 'daughter') is often (but not always) left uncapitalised.
posted by Ziggy500 at 5:14 PM on January 21, 2015
Please don't surrender to the ALL CAPS beast. It is so ugly and dehumanizing.
You will need to choose a style for each word part and stick with it, that's all. You will not be able to poll each person about his or her own personal spellings, bu you can be consistent in you style without giving up the ship. Which is what I think you are trying to do.
Personally, I prefer to have the little connector words in lower case because I think it expresses the history of the name. And Mc/Mac with an upper case letter following, for the same reason.
Add el, lo, la, and l' to your list perhaps. I am not sure that there are any names beginning quite like that but they seem like good possibilities.
posted by SLC Mom at 5:41 PM on January 21, 2015
You will need to choose a style for each word part and stick with it, that's all. You will not be able to poll each person about his or her own personal spellings, bu you can be consistent in you style without giving up the ship. Which is what I think you are trying to do.
Personally, I prefer to have the little connector words in lower case because I think it expresses the history of the name. And Mc/Mac with an upper case letter following, for the same reason.
Add el, lo, la, and l' to your list perhaps. I am not sure that there are any names beginning quite like that but they seem like good possibilities.
posted by SLC Mom at 5:41 PM on January 21, 2015
Lowercase y sometimes turns up joined Spanish surnames, as in Maria Guzman y Gonzales.
posted by apparently at 6:12 PM on January 21, 2015 [1 favorite]
posted by apparently at 6:12 PM on January 21, 2015 [1 favorite]
Best answer: Please don't surrender to the ALL CAPS beast. It is so ugly and dehumanizing.
It may sometimes be ugly. That's a question of taste, I suppose. But I don't think it's dehumanizing to make it clear what someone's name is. The purpose of all cap last names is to make it clear what the last name actually IS.
Write "Garbrial Garcia Marquez" and a bunch of dolts will file the book under M. Make it clear that it's Gabriel GARCIA MARQUEZ and it will be filed correctly under G. And maybe students will even get it right in their bibliographies. When my uncle was dying in the hospital people frequently called him Mr. SecondLastName which is completely wrong: The equivalent of being called "Mr. Marquez" (Mr. Garcia is correct in this case). In his not entirely lucid state, he never quite got that they were talking to/about him, because that's not his name. Had his surname been capitalized, he might have been addressed correctly. At least they would have called him the equivalent of Mr. Garcia Marquez, which he would have recognized as speaking to him.
There are related issues with Asian names when it's customary to put surname first and first name second and where names can have more than one word. Some people stick with the lastname firstname way, some people adopt the Western firstname lastname. You can't always tell from looking at a name which is the surname. I know someone from Asia who uses surname firstname but goes by her surname. I had no idea until recently that I was referring to her in correspondence as Ms. Firstname.
So yeah, maybe it is ugly, but it is useful even beyond the problem of untangling correct casing in a database, and I don't think it dehumanizes people to help others get their names right.
posted by If only I had a penguin... at 6:37 PM on January 21, 2015 [10 favorites]
It may sometimes be ugly. That's a question of taste, I suppose. But I don't think it's dehumanizing to make it clear what someone's name is. The purpose of all cap last names is to make it clear what the last name actually IS.
Write "Garbrial Garcia Marquez" and a bunch of dolts will file the book under M. Make it clear that it's Gabriel GARCIA MARQUEZ and it will be filed correctly under G. And maybe students will even get it right in their bibliographies. When my uncle was dying in the hospital people frequently called him Mr. SecondLastName which is completely wrong: The equivalent of being called "Mr. Marquez" (Mr. Garcia is correct in this case). In his not entirely lucid state, he never quite got that they were talking to/about him, because that's not his name. Had his surname been capitalized, he might have been addressed correctly. At least they would have called him the equivalent of Mr. Garcia Marquez, which he would have recognized as speaking to him.
There are related issues with Asian names when it's customary to put surname first and first name second and where names can have more than one word. Some people stick with the lastname firstname way, some people adopt the Western firstname lastname. You can't always tell from looking at a name which is the surname. I know someone from Asia who uses surname firstname but goes by her surname. I had no idea until recently that I was referring to her in correspondence as Ms. Firstname.
So yeah, maybe it is ugly, but it is useful even beyond the problem of untangling correct casing in a database, and I don't think it dehumanizes people to help others get their names right.
posted by If only I had a penguin... at 6:37 PM on January 21, 2015 [10 favorites]
mac and nic (son of/daughter of) in Gaelic names is often in lower case.
The ff at the beginning at names like ffoulkes and ffinch is conventionally lower case. It's actually the Welsh digraph ff. I worked with a dude (name of ffrench) who made it his life's work to educate people on the correct way to represent his name.
posted by scruss at 6:42 PM on January 21, 2015
The ff at the beginning at names like ffoulkes and ffinch is conventionally lower case. It's actually the Welsh digraph ff. I worked with a dude (name of ffrench) who made it his life's work to educate people on the correct way to represent his name.
posted by scruss at 6:42 PM on January 21, 2015
If your goal is to format the names in their correct casing, you'll need to ask individuals for their preferences. That's really the only way to get them right.
Capitalization aside, do you trust the spacing in your database (e.g. DeAnna vs. De Anna)?
posted by in278s at 7:11 PM on January 21, 2015 [1 favorite]
Capitalization aside, do you trust the spacing in your database (e.g. DeAnna vs. De Anna)?
posted by in278s at 7:11 PM on January 21, 2015 [1 favorite]
So long as you're working on normalizing name data, Patrick McKenzie's classic Falsehoods Programmers Believe About Names is probably worth a skim. On a sufficiently large data set, you're likely to need to handle several of the cases he mentions. Good luck!
posted by SemiSophos at 8:04 PM on January 21, 2015 [8 favorites]
posted by SemiSophos at 8:04 PM on January 21, 2015 [8 favorites]
Yeah man put me in the column of "please make them all caps". I was named McSomething for a long time and seeing it as Mcsomething set my teeth on edge. I worked at a place where they couldn't amend my name badge to have my name spelled as McSomething so I walked around for 2+ years with a Mcsomething name tag, feeling like I was answering to a false name every time I showed it to someone. But there are definitely people who are perfectly happy to be Mcsomethings.
posted by town of cats at 9:45 PM on January 21, 2015
posted by town of cats at 9:45 PM on January 21, 2015
As much as I hate to see things in all-caps, I think I have to agree that it's the best option in this case.
When I see my name in all-caps, I assume it's the result of a system where that's all they can do. But when I see my name miscapitalized, I feel like someone (or some algorithm) thinks they know my name better than I do.
For me, my legal name is "easy" and I don't usually have it corrected, but my online name sometimes is. For people who have unusual legal names, it's a constant struggle, and I think having it formatted in a way that looks like a technological limitation would be less frustrating.
posted by duien at 10:15 PM on January 21, 2015
When I see my name in all-caps, I assume it's the result of a system where that's all they can do. But when I see my name miscapitalized, I feel like someone (or some algorithm) thinks they know my name better than I do.
For me, my legal name is "easy" and I don't usually have it corrected, but my online name sometimes is. For people who have unusual legal names, it's a constant struggle, and I think having it formatted in a way that looks like a technological limitation would be less frustrating.
posted by duien at 10:15 PM on January 21, 2015
My first question is - where did the data originate? If the data entry was done by end users typing in their own name and other info - I'd simply keep it as is. Then you don't need to worry about accidentally upper- or lower-casing a letter that the user typed in differently. If they typed something wrong, put a mechanism in place for them to request a correction.
Even if it's not user-generated data...
...format last names in their correct casing...
As shown by several comments already, there's probably not going to be a universal "correct" casing, even for names of the same structure. Again, let people request a correction if something is wrong - or if you want to be a little more proactive, send out individual emails asking each person to verify if their name is spelled and capitalized correctly and to reply only with corrections.
This may not be feasible given the size of your particular dataset - which reinforces the notion that there just isn't a programmatic way to do this and get it right for everyone. You may not introduce as many errors as you'll fix, but the ones you do create will be the ones that people are sensitive about, rather than laughing off an all upper- or all lower-case name in the correspondence they receive.
posted by trivia genius at 12:40 AM on January 22, 2015 [1 favorite]
Even if it's not user-generated data...
...format last names in their correct casing...
As shown by several comments already, there's probably not going to be a universal "correct" casing, even for names of the same structure. Again, let people request a correction if something is wrong - or if you want to be a little more proactive, send out individual emails asking each person to verify if their name is spelled and capitalized correctly and to reply only with corrections.
This may not be feasible given the size of your particular dataset - which reinforces the notion that there just isn't a programmatic way to do this and get it right for everyone. You may not introduce as many errors as you'll fix, but the ones you do create will be the ones that people are sensitive about, rather than laughing off an all upper- or all lower-case name in the correspondence they receive.
posted by trivia genius at 12:40 AM on January 22, 2015 [1 favorite]
In Swedish, af, in Spanish del, in German zu. There are others... here's a Wikipedia article on these so-called Nobiliary particles.
posted by Talkie Toaster at 9:27 AM on January 22, 2015
posted by Talkie Toaster at 9:27 AM on January 22, 2015
This thread is closed to new comments.
Depending on their intended use, I would consider changing them to ALLCAPS. It's not unusual to style names with the surname (not anything else) in all caps in some contexts. Usually to make it easier to figure out which part of the name is the surname. If you have such a use in mind, give up on this hopeless task and just capitalize everything.
posted by If only I had a penguin... at 4:11 PM on January 21, 2015 [5 favorites]