Is there a name for this practical data analysis concept?
June 2, 2011 7:31 AM   Subscribe

I'm creating a presentation for newcomers to my social science-y profession. I'm looking for a one or two word concept name for a basic tenet of data analysis, which is that you never reference or conduct any computer analysis on a production archival dataset that your organization owns. Instead, you make your own copy and analyze that copy. That way, if you write over the top of it or change values in it, the real data remains clean and extant. Any suggestions for a concept name? Yes, I realize that you can have read-only files that would obviate this need. But not all organizations can or do follow that useful practice.
posted by eaglehound to Computers & Internet (14 answers total) 1 user marked this as a favorite
 
"Avoid all opportunities to shoot yourself in the foot?"
posted by Alterscape at 7:37 AM on June 2, 2011


"For original research, make a copy!"
posted by Cold Lurkey at 7:37 AM on June 2, 2011


Best answer: This might fall under the general heading of data hygiene.
posted by carmicha at 7:38 AM on June 2, 2011 [1 favorite]


I think that what you're talking about could be referred to as "sandboxing," but if people don't get the concept, they're not going to get the term. Perhaps some aphorisms would help.

"Don't tinker with your engine while it's running"
"Don't fix your electric wiring when it's hot."
posted by adamrice at 7:39 AM on June 2, 2011


Best answer: When I worked with large databases, I would create a "working dataset" to tinker with, distinguishing it from the original master dataset.
posted by noonday at 7:51 AM on June 2, 2011


Anything like "snapshotting", "mirroring" or "cloning" would work.
posted by DWRoelands at 7:59 AM on June 2, 2011


Isn't this just another application of "back up your work?"
posted by Ignatius J. Reilly at 8:01 AM on June 2, 2011


Best answer: "Data hygiene" is definitely what I would call the practice, with "mirroring" or "cloning" being what you do in order to have good data hygiene. I'll also mention that in my discipline, maintaining a clean archival copy is explicitly required according the NSF research ethics/best practices training I recently had to do -- don't know if the same is true for you, but might be worth mentioning to your trainees if it is.
posted by dorque at 8:01 AM on June 2, 2011 [1 favorite]


Agree with adimrice. An analogy about unplugging electronics before messing with them is someone most everyone understands.
posted by davextreme at 8:02 AM on June 2, 2011


Best answer: The concept of revision control in the software development world (and, increasingly, document management in general) is pretty close, though you're not necessarily pushing your "changes" back.

I think adamrice's analogy misses the key point that you want to work with a copy, not simply in an offline state. I think a closer analogy might be that you go out of town and your friend is housesitting. You don't give him your key, you give him a copy because if he loses yours you're screwed.
posted by mkultra at 8:13 AM on June 2, 2011


How about "quarantining" or similar infectious disease analogy?
posted by Rumple at 8:15 AM on June 2, 2011


Don't cut out your eye to see into your head. Take an X-ray
posted by BadgerDoctor at 8:57 AM on June 2, 2011


Response by poster: Thanks everybody for the quick and great analysis! I'm going with 'data hygiene' as the concept name, 'clone a working copy' as the recommended practice, and 'do not give your housekey to your housesitter, give them a working copy' as the reinforcing analogy.
posted by eaglehound at 9:07 AM on June 2, 2011


I've used "snapshot" and "clean copy" for this concept, but my favourite term is "scratch monkey"
posted by bonehead at 9:11 AM on June 2, 2011


« Older Photo Documentary of House Construction   |   There are rich counsels in the trees Newer »
This thread is closed to new comments.