More SQL help, please
July 15, 2008 12:25 PM   Subscribe

Another SQL question (*sigh*), but this time a bit more complicated.

There is a column in my database called 'Contact'. It is currently null for about 140k of the 148K entries. There are two other columns called 'Lastname' and 'Firstname' in my database, which are either populated with data or one or the other or both are null.

I want all entries in the column 'Contact' that are null to be a combination of the 'Firstname' and 'Lastname' columns by taking the value in 'Firstname' appending a space then appending the value in 'Lastname' to it, and sticking that into the 'Contact' column. Because some entries may have only a first name, only a last name or neither, I still want to stick with that logic, since I want the information in the Contact column, even if it's just copying a space back into it.

How can I go about doing this? I'm running SQL Server 2005 and have the management studio, FWIW.

BTW, I can do a select statement that gets the 140k entries in which the contact value is null, but I want to make sure that I overwrite those values only, not the other 8k or so entries whose contact values are not null.

Thanks to all in advance!
posted by omnipotentq to Computers & Internet (7 answers total)
 
Best answer: Paste these into your Query Analyzer:

UPDATE TABLE
Set Contact = Firstname + " " + Lastname
where contact is null and firstname is not null and lastname is not null

UPDATE TABLE
Set Contact = Lastname
where contact is null and firstname is null and lastname is not null


UPDATE TABLE
Set Contact = Firstname
where contact is null and firstname is not null and lastname is null

UPDATE TABLE
Set Contact = " "
where contact is null

Should do what you need. (Yeah, it can be done in one horribly complex query or four really simple queries...)

posted by arniec at 12:36 PM on July 15, 2008


Best answer: Oops, two things. First, I put <> after each line saying UPDATE TABLE, so be sure to include your tablename.

BTW, that last query is to just put a space in contact in case firstname and lastname are null. It will make the value of contact a space instead of null, which may be what you want. But all "Is Not Null" queries will then fail on Contact as it won't be null, it will be a space.

If you don't want just a space, then don't run the 4th UPDATE above
posted by arniec at 12:38 PM on July 15, 2008


I'm not sure about horribly complex query, but based on my knowledge of SQLServer (I do more Oracle normally), this should work fine:

update table_name
set contact = isnull(firstname, '') + ' ' + isnull(lastname, '')
where contact is null

This will leave an empty space before lastname in contact, but that seems to be what you want... If not, it does need to be more complex or go with the multiple updates that arniec has.
posted by skynxnex at 12:38 PM on July 15, 2008


I would recommend against leaving just a space in contact for the exact reason that arniec posted earlier. Couldn't you do something along the lines of

TRIM(firstname + ' ' + LASTNAME) That will give you a nice NULL instead of a space. Then, in future queries you can just select WHERE CONTACT IS NULL.

Additionally, you might want to accomplish the same effect using a view(or function) instead of DML Update - If any names change, you'd have to remember to change the name in both places - its not exactly good practice to store the reduntant data in multiple places, let alone in the same table.

assuming an oltp system, not a DW that's been denormalized for performance; leave the flames at home
posted by neilkod at 12:55 PM on July 15, 2008


Skynxnex has the answer, but you can also look up the COALESCE function too:

update table_name
set contact = coalesce(firstname, '') + ' ' + coalesce(lastname, '')
where contact is null


The nice thing about coalesce is that you can have a list of alternatives for a null value, rather than just one withnisnull.

Neilkod--your query would give a null if the first name was there, but the last name was null, as an example.
posted by jsmith77 at 2:16 PM on July 15, 2008


Response by poster: Thanks to all who replied; I had a tough time choosing the best answer. I suspect these will all work, but I'm going to try arniec's suggestion first. Again, thanks!
posted by omnipotentq at 4:07 PM on July 15, 2008


No, no, no.

Let's imagine that we had the perfect update statement to do exactly what you want.

We take a row where firstname = "John" and lastname = "Jones", and we'd make that row's contact = "John Jones".

Great right? No, not so great.

What happens when we insert a new person, say "Alice Addison"? Her contact remains null until you re-run your script.

But worse, what happens when we learn that John Jones goes by "Jonathan Jones"? His contact is then incorrect, until you run your update script again.

The problem is that you have attribute data, the Contact column, that is dependent on other attribute data. It's derived data. This violates normal form, because it's possible to get into an inconsistent state, as in the case when John's name changes to Jonathan. In that inconsistent state, it's no longer true that contact = firstname + space + lastname.

Now, you may say, "But that's not true, Contact really is not derived, sometines contact is not firstname + space + lastname!"

Ok, then for all the rows with null contacts, they really ought to be null because null is how we record in a database that we don't know a value. Null has several usual properties that make it a perfect "I don't know" value; among other things, it's not less than anyother value, it's not more than any other vakue, and it doesn't count when you count values.

Now you may come back to me, and say, "But I want something, so not-null value, so I can display a pseudo-contact when I don't have one!"

Ok, then create a view, and replace all null constacts with the pseudo-value. assuming your table is called "person":

select *, case when contact is null then concat( concat( firstname, " "), lastname) else contact as pseudo_contact from person;

Now the view is especially nice, because
* when you change John's first name to Jonathan, the psuedo_contact will change unless you really do know his contact, and have given it a non-null value, and then you'll see that instead of the default calculated pseudo-coontact.
* And when you insert Alice's data into the table, if you don't have a real contact for her, you'll get the calculated pseudo-contact.
* But for Elizabeth Emerson, whose contact is "Betty", you'll get "Betty".
* And you'll never have to run, or remember to run, or forget to run the update script,
* and you'll never have to worry that an incorrectly written update script will overwrite real contact data.

Now, we just have to get the view right. Of course, the view's calculation is the same as the calculation you'd use in an update statement, so if you want to ignore my advice and just run and update because it's simpler, you can do that too.

What we need to do in the calculation, of course, is make provision for null values in firstname and lastname. If we always add a space between then, we'll get pseudo_contacts that start with a space where the firstname is null, or end with a space where the last name is null, right. Well, actually, that's not true.

Above I mentioned that null has some special properties. Among these is that almost any operator or function given null as an argument will return null. This is good and by design, because null means "I don't know".

So 3 + null equals what? null.

That is, the sum of 3 and "some unknown quanity" is, necessarily, "some unknown quantity".

Similarly, catenating strings shoudl show this same property, so 'abc' + null should be, yes, null. That is, catenating 'abc' and "some unknown string" should give us "some unknown string", and we represent "some unknown string" with null.

Thus firstname + space + lastname will be null if either or both of firstname or lastname is null. (This can be changed, but shouldn't be, by setting an option in SQL Server 2005.)

So we need to gaurd against nulls in firstname and last name, and we need to add that space betwen them only if both are non-null.

How do we do this? Well, as I mentioned, most operators and functions, given a null argument, return null. But there's a special function, called
coalesce
that returns its first non-null argument. We'll use it like this:
coalesce(firstname, '')
. The second argument is the "empty string". It's not at all like null, because rather than meaning "I don't know" it means "a string with no characters in it, and thus a length of zero". Catenating ("adding") and empty string to a string results in the non-empty string.

So
coalesce(firstname, '')
means "give me the first name, unless it's null, otherwise give me the empty string.

So
coalesce( firstname, '' ) + ' ' + coalesce( lastname, '' )
will do almost what we want. It will give us firstname space lastname if firstname and lastname are both not null, space lastname if firstname is null, firstname space if lastname is null, and a space if both are null.

But we still want to get rid of the leading or trailing space, so, we';; need to chnage that.

One nice thing about colasece is that it can take as many arguments as we want to give it, and if nbone of them are non-null, it's return null. So Let's try this:
coalesce( firstname + ' ' + lastname, firstname, lastname)
Now what's that do?
Well, the first argument is
firstname + lastname
. If neither firstname not last name is null, coalesce will retuns that first argument, which is what we want: firstname space lastname. If either is null, then the expression firtname + ' ' + lastname must also be null, so
colaesce will look at its second argument, which is
firstname
. If that's not null, it means that lastname must be null, and coalesce will return just firstname, which is what we want if firstname is non-null but lastname is null.
But if firstname is null, coalesce will consider its third argument, which is
lastname
, and return that it it's not null.
If both firstname and lastname are null, coalesce has no further arguments to consider, and will return null. If you really wanted to (and you shouldn't) , you could give a fourth argument like 'Both firstname and lastname are null, so my psuedo-contact is null'.


So, to sum up, we create an expression to make a pseudo-contact:
coalesce( firstname + ' ' + lastname, firstname, lastname)
We use that to make a calculated column in a view. Now recall, we only want the pseudo-contact if the actual contact column is null, so we just make the actual contact the first argument of coalesce:
coalesce( contact,  firstname + ' ' + lastname, firstname, lastname)
Now it's just a list of decreasingly good alternatives, fed to colaesce which will select the first non-null of those decreasingly good alternatives.
Then we put that in a view:
create view person_pseudo_contact as select *, coalesce( contact,  firstname + ' ' + lastname, firstname, lastname) as pseudo_contact from person;
And finally, in our front end, we reference the view person_pseudo_contact instead of the table person.
posted by orthogonality at 5:15 PM on July 15, 2008 [5 favorites]


« Older Important book damaged, how can I fix?   |   Laid-back camera salesfolk in NYC? Newer »
This thread is closed to new comments.