Optimizing Access database for search
July 13, 2009 1:21 PM   Subscribe

I am tasked with standardizing the nomenclature of a large MS Access database (about 8,000 records) that has been updated by many different people, and is searched regularly. Any ideas?

Basically, this is an inventory database of different items in a store room. There is a terminal where users search for these items to determine their location in the store room, and they also check them out, so that the administrator can see when more need to be ordered.

The "description" field has various descriptors, for instance one might read "BEARING 4 IN" or "BELT 12FT 2-1/2" X 100"WIDE 2PLY PVC WITH V GUIDES ON 99-1/4" CBS WITH APV PLASTIC LACING"

I need to optimize the database for searching. Unfortunately, there are numerous different standards such as "1 IN" or "1"" or "1 1/4 IN" or "1 1-4 IN". I need a way to standardize these entries so I can post guidlines, and make it easier to search.

Any ideas? I'm not sure I want to go through 8k entries by hand so I'm hoping there is another solution!

Also, keep in mind that those searching the database are not search proficient. They don't know how to use wildcards (despite many attempts to teach) and they will not automatically search for the terms most likely to return what they need.
posted by kraigory to Computers & Internet (8 answers total) 2 users marked this as a favorite
 
Best answer: First you'll want to analyze the description terms. Second, develop an understanding of who searches for what how often so you know what to optimize most.
For the set size you're referring to, and assuming you don't know perl, I'd recommend you export the data to Excel and do a word frequency distribution. Before doing that you may have to go through some pain separating the words so they can be individually analyzed, e.g., how often the word 'erasabale' comes up and where it occurs on average - the first or second or whatever word in the search term.
There are lots of ways to do this step, ranging from crafty VBA code to brute-force Data|Text to Columns manipulation and *lots* of copy|paste.
First you'll look for primary attributes - such as 'wrench' or 'shoe' - then secondary ones that the data will yield - color, size, material, whatever. Optimizing search by a primary term will yield good results quickly, and there are lots of other options from there, including but not limited to breaking the description field into primary,secondary, etc attribute columns.
posted by nj_subgenius at 2:05 PM on July 13, 2009


...oh yeah. The frequency distribution will also tell you whether it's worth the trouble ;-). The data may be so fouled up that it's a waste of time, and there's some comfort to knowing you're not going to be able to fix it by much...
posted by nj_subgenius at 2:08 PM on July 13, 2009


You could construct some test queries to return all records with target descriptors you'd like to change, to get some idea of the size of your problem. For example, if you plan to change every instance of double quotes to IN, find out how many descriptions contain double quotes; perhaps after doing that, you'll find that double quotes are by far the prevalent standard, and you'll want, instead, to change all instances of IN to double quotes.

You can use SQL statements to select and update Access databases, but the Jet database engine upon which Access is built is notoriously non-standard for SQL, and Access databases are pretty easily corrupted. So, if you're taking this route, build your SQL in Access's Query View, to minimize syntax problems. Make sure you have good, tested backups of the database, before doing any mass record updates, deletions or insertions. Compress the database after doing such, to force out any blank row records, and update any foreign key fields for referential integrity.
posted by paulsc at 2:10 PM on July 13, 2009


1) Do a group by, count query to determine all the different entries to figure out the distribution of terms

2) Run update queries to standardize, based on the output of #1

3) After everything is standardized, create a Lookup table (in the field settings for the table storing the non-standard values.

4) Design a search form using the drop-down values and another form to create new drop-down values (perhaps by a supervisor or something?)
posted by Pants! at 2:32 PM on July 13, 2009


The database should have been designed so that things such as measurements are populated by standardized tables.

I'm assuming, though, that building a new, normalized database from scratch is beyond the scope of this question...
posted by dfriedman at 2:48 PM on July 13, 2009


I'd probably run a few wildcard queries, like search for * in * and *' " '* to find the records I'm worried about, I'd probably create a new table with size choices, create a lookup field within the main table where the description originally appeared that looks at that table.

Export to Excel for a delightful round of find and replace then import back in.

The thing is that as long as people are allowed to enter free form data, they will. If it's not "in" or " " " or " inch " it'll be "inhc."

They'll drive you crazy, when the fix is to fix it once and restrict them from their free styling ways.
posted by A Terrible Llama at 4:16 PM on July 13, 2009


Plus "12in" versus "12 In" and "Inch" versus "pinch."

Another user here once described the horror at reviewing this as "stray data anxiety." I have that feeling just answering the question.
posted by A Terrible Llama at 4:18 PM on July 13, 2009


Response by poster: Awesome ideas- thanks! Much better than my "set aside a week and do it by hand" approach.
posted by kraigory at 6:42 AM on July 14, 2009


« Older Name change for a minor when contested by father?   |   How should my Georgetown-bound friend find a place... Newer »
This thread is closed to new comments.