Finding scraps of good data in a sea of irrelevancy
December 19, 2011 11:11 AM   Subscribe

I was recently placed in charge of a large database which is probably 90% irrelevant information. What should my plan of action be for cleaning it up?

I work for a nonprofit, and we use Raiser's Edge for funder info, grant info, and (historically) as just a giant Rolodex of everyone any of our employees had ever met. As a result, we have about 14,500 'constituents' listed in our database, most of whom are either irrelevant to our work or no longer in the positions we have listed. They are mostly not donors- some are employees of our donors, some are just random people, but I'd guess that maybe 1% of the people/orgs listed as constituents have actually directly given us money. There are issues like- every embassy of a certain country has its own listing, even though most of those embassies have never been cultivated for a grant. Stuff like that. It's kind of a giant clusterfuck.

I am (partly) the record-keeper for my department, and part of that job is 'maintain the database.' I do what I can within the records of the 50-odd donors we regularly work with. But beyond that, it's a massive jungle, and I'm not really sure how to tackle it. Adding new info seems pointless, like tossing it down a well- with all the bad info, what's the point of adding in good info?

My boss knows these problems exist and is supportive of fixing it. My whole department currently has limited training on Raiser's Edge, due to a tumultuous few years in which all the staff in our department turned over and we were severely understaffed. In the last couple months, we've gotten back on track, and just bought access to unlimited training on the database. So I will be more prepared to deal with this, from a technical standpoint, in a few months.

But I'm wondering- has anyone dealt with a problem like this before? My first task here was to clean up our grant file folders, and that wasn't too hard, but this is a much bigger problem to tackle. I have a good intuition for organization, but this- I don't even know where to start.
posted by showbiz_liz to Technology (15 answers total) 6 users marked this as a favorite
Are you looking for sort of a strategy, or the technical details for how you would prune data from raiser's edge?
posted by RustyBrooks at 11:18 AM on December 19, 2011

I was once in a soooooort of very similar position - having a huge database that was almost entirely useless - and the right path turned out to be using human knowledge to find pieces of it that were worth saving ("Oh, we definitely don't want to forget to talk to Bob") and then abandoning the rest. Basically, once you reach a certain level of irrelevancy/outdatedness, there's minimal difference in effort between basic maintenance (ie, keeping things up to date that may have changed) and building it from scratch all over again, and you avoid the enormous time/dollar cost of trying to update dead data.

That said, I was working with a private-industry system where no single data element was particularly important (eg, no equivalent of "we might lose data on a big donor") so I don't know how much that factors into things. I also had very, very little time/money/people to actually do the hypothetical updating, so variants of "actually go through it all and call people to update things" were simply not possible.
posted by Tomorrowful at 11:18 AM on December 19, 2011 [1 favorite]

Is it possible to start a second database, pull out the data you know is good, toss out the data you know is bad, then start asking around about the remainder? If 90% of your data is useless right now, this is no longer a maintenance issue: you're basically going to have to rebuild the database.
posted by Gilbert at 11:27 AM on December 19, 2011

Response by poster: Are you looking for sort of a strategy, or the technical details for how you would prune data from raiser's edge?

Either! We're still at the brainstorming stage when it comes to this issue...
posted by showbiz_liz at 11:32 AM on December 19, 2011

Looking just at what you have written here, I disagree that you have a problem. Databases are supposed to contain data. That is what they do. Containing a lot of data is not in itself a problem. Now, reading between the lines, you may have one or more of the following problems:
- Some of the data in the database is out of date
- It is hard to pull out certain kinds of commonly-used data (regular donors, occasional donors, prospective donors)
- It is not clear where to put new kinds of information into the database
And probably some other problems that you know about.

So the first thing I would do is identify what your actual problems are (ie, things your organization wants to do that interact with the database and are difficult to do), prioritize them, and come up with specific solutions for them (for instance, create views/filters on the full list of people that only show your regular donors). I would not advise throwing away the database and starting over until you have identified what you actually want to do with the database, or you are likely to just create another mess.
posted by inkyz at 11:36 AM on December 19, 2011

If there's only 50 or so records that you use on a regular basis, can you add a binary field ("current donors" or somesuch) and create a query to only show those? Unless you're having storage capacity or slow loading problems, I see no reason to actually delete all the other records. God help you if someone decided they needed them someday. (Caveat: I've never used Raiser's Edge.)
posted by desjardins at 11:38 AM on December 19, 2011

Response by poster: Please don't do that. Step away from the technical solutions until you understand what you want to accomplish. Then worry about how to do it.

Don't decide to empty the database simply because it is easy.

Don't worry, I'm not going to do anything before I come up with a plan, with my bosses' input. I asked this question because tomorrow we are having a 'what to do about the database' meeting, and I thought I might get some extra food for thought here.
posted by showbiz_liz at 11:47 AM on December 19, 2011

I'm a database manager for a similar type of NGO database.

Although I agree that you shouldn't be too hasty about deleting records, I think you should give some thought about what is "important " to your organization. Names and addresses in and of themselves aren't very useful, especially if they are outdated, or you don't know the context or purpose of why the were entered,

Here are criteria that I would consider:
1. Any record with a donation attached (hard credit or soft credit) would be kept.
2. I would run a query to see which records had been accessed or updated in the last three years (or whatever time period your team feels is relevant), and then consider keeping most of these.
2. Any record that has some degree of interaction (details about cultivation or interaction with the donor/propect) are kept
3. Any records without contact info (address/email/or phone) gets axed, unless there is some sort of interaction information that's important.

I think one technique you could consider creating constituencies of "good records" and flagging them (or create member types, or whatever RE uses). That would make it easier to pull groups of relevant records. These might be individual & major donors, foundations, corporations, donor-advised funds, board members, prospects... whatever affinity groups make sense for you. You might use this is the short term to create some order, while you determine what to do with the rest. See what gets used/accessed and what doesn't.
posted by kimdog at 11:53 AM on December 19, 2011 [1 favorite]

Looking just at what you have written here, I disagree that you have a problem. Databases are supposed to contain data. That is what they do. Containing a lot of data is not in itself a problem.

I was going to come in to write something similar to this. Reading between the lines, it seems like you are thinking it might be time to delete a lot of that data, but I'd urge you to reconsider. You can think of the data along two axes: dirty/clean and useful/not useful. The former refers to whether or not the data is actually correct (is the phone number correct? is the position title correct?); the latter refers to whether or not you have a current use for the data. Your goal should be to make dirty data clean by correcting it, and your goal should be to determine the uses to which you put the data you have, making more of it useful (even if that doesn't mean getting more grants/donations from it). One way to make a greater percentage of your data clean is to excise dirty data that you do not currently have a use for, but the problem with that is if you do come up with a use for it in the future, you will not longer have the data to use. Better to develop better ways of searching what you have, by adding fields if need be, then to eliminate data that you could then not replace. Once you have better search implemented, clean the resulting subsets first.
posted by OmieWise at 11:55 AM on December 19, 2011

Don't throw away the old database, ever. Create a new one and populate it with good stuff.

Also if there is access history (like timestamps) associated with the database make sure to preserve it before making changes like adding fields, making copies etc so you know what was important before you starting making changes. In fact, get a good backup before doing anything.

Records never accessed since creation are a prime candidate for sweeping under the carpet.
posted by epo at 12:25 PM on December 19, 2011

If they really all need to be thrown in the same bucket, I would add two things: a checkbox, boolean, true/false type flag denoting that they're actually a donor (or past donor, or whatever your criteria is for separating donors from a mere contact). Secondly, a field, preferably a date field, to record when the person was last validated/verified. Possibly thirdly, another flag denoting whether the record/user/contact is garbage and should be ignored. This is all just from a database perspective, nothing about NP specificially.
posted by rhizome at 12:31 PM on December 19, 2011 [1 favorite]

When approaching a database, most people look at the data they have, not the question they want to answer. Think about what you want from a database, think about cool things you'd like to do, like finding anybody who has access to the legislators on a particular committee, or people who would volunteer to help with the winter carnival. If you don't know what you want, get help from others who have experience.
posted by theora55 at 3:08 PM on December 19, 2011

I'd also suggest beginning by creating a backup, and then maintaining a working version.

I also think OmieWise's point that useful/not useful != clean/dirty is a very important one.

Also you have to distinguish between good/bad data, and good/bad metadata.

You have a lot of legacy issues here, related to why the various records were created in the first place, and why they were created in the format that they were. Are you able to talk to the previous database admins and find out?

It would be worth considering creating one or more new fields that can be added to each record, for instance (as has been suggested above) for current donors, etc.

Also as others have said, unless this is an issue with maxing out the performance of RE, I think it's actually easier to keep what you have (bits are free), avoid either weeding or crosswalking, and look at better ways to support useful queries.
posted by carter at 4:00 PM on December 19, 2011

agree with the advice so far, my $.02...

I do see the value in culling the WORKING database. Resources are usually at a premium, and even if your system doesn't struggle with the database size, your people who look through the database doing prospecting or whatever won't trust the data if 90% of it is a dead end. But I agree, don't throw anything away at this stage. Kimdog's idea of creating some criteria for what to leave in and what to leave out of a new and improved database is really great. Not only does it create some logical rules that you or your techs can use to strip away the chaff (in a database 2.0 that becomes the standard for the team, NOT to delete forever, at least for a year or two), it allows you to refine some ideas and present them in a non-technical way to non-technical people.

At some point be prepared to patiently explain what can be culled based on the data you have and what can't, ex.

Non-Technical Guy: Can we keep everyone who has a net worth over $X? We need to call all of those, regardless of past history or lack thereof.

You: We don't have that data, so we have no way of knowing, other than guessing.

There is a temptation to start tossing and hope no one higher up thinks to ask those kind of questions (kind of like throwing away stuff in the attic because your wife should never miss it). This is why a) you keep the old database and b) you establish those criteria.

That way, 6 months from now, when someone asks "Why isn't so-and-so in the database? I thought we included him..." you can say "He didn't fit the criteria, but we can add him if his record can be made to fit the criteria."

This is definitely measure twice and cut once kind of work. I once had a vendor who, like most vendors in my industry, has a policy of throwing away artwork not used in an order for more than 2 years. Fair enough, but he threw away my artwork because he went through and got rid of everything more than 2 years old. Despite the fact that I faithfully reordered the product every 6 months or so... so you have to set up your logical conditions carefully. Computers don't know "what you meant." Careful with those "OR" operators.
posted by randomkeystrike at 4:20 PM on December 19, 2011

I inherited a similar database (messy, redundant data; approximately 14,000 records) about 3 years ago. I'm currently in the middle of a yearlong cleanup migration project into a new system that is forcing me to consider how the data is structured and what gets kept.

Everyone who has mentioned creating a backup before making any huge changes is right on the nose. There's also something to be said for not deleting records unless you have cause. Just because someone hasn't been a donor in the past doesn't mean they aren't a potential prospect to be cultivated. Sometimes even noticing the date that a record was created in your system could be a useful bit of data if your organization ends up finding a reason to contact that person again.

There are a few basic things you can do to help clean up your data. Try running your address info against the NCOA database. Run a duplicate detection search to catch addresses that match multiple names, or names that match multiple addresses. Most appraisal districts have searchable online databases, and a vast majority of philanthropic prospects are homeowners and you can verify their address information that way even if they're not listed in the phonebook.

Start creating categories that you can use to sort your contacts, beyond simply donor/nondonor. It's useful to know who works in the press. Embassies and local cultural organizations should be kept informed of your programming- you never know if they'll find an overlap with their mission and interests and want to offer you assistance. Employees of donors often have similar interests to their employers (i.e. lawyers like to donate to the same organizations that the firm/partners do). Any vendors you've used are prospects for cash or in-kind donations.

Once you've got some categories set up, start applying them to the records in your system. Even if you just apply them to constituents you interact with starting now, people and organizations you touch the most will be categorized the quickly. You'll start to see patterns emerge and you can refine your criteria.

I understand the impulse to slash and burn, but a good CRM isn't data that you mine, it's data that you cultivate. A good CRM is a living entity and will NEVER BE FINISHED. Think of it as a process to be shepherded, not a project to be finished.
posted by Uncle Ira at 9:43 PM on December 19, 2011

« Older Need the Howard Zinn of secret santas   |   Budgeting without the Maruchan diet Newer »
This thread is closed to new comments.