How to fix this?
December 10, 2009 2:04 PM   Subscribe

I'm doing a project in Access and I need help conceptualizing how to get messy data into shape...

So I have some data in an Access table. I am well-versed in Access and am using it in this instance for its report-generating qualities and ease of use for the non-tech proficient. This is the only tool at my disposal.
The data came to me as an .xls with a lot of junk and stuff basically all over the place. I made a bunch of queries to get rid of the junk and re-organize (which was a lot easier to do in Access than Excel).

However this is where I'm stuck. This is what I have now:

Name | Test Date | Test data
Jane | --------------------------
------ | 3/10/09 | -----------
------- | ------------ | 1
------- | ------------ | 2
------- | ------------ | 3
John | ---------------------------
------ | 3/12/09 | -------------
------- | ------------ | 3
------- | ------------ | 1
------- | ------------ | 5
------ | 3/13/09 | -------------
------- | ------------ | 2
------- | ------------ | 5
------- | ------------ | 7

This is what I'd like to have:
Name | Test Date | Test data
Jane | 3/10/09 | 1
Jane | 3/10/09 | 2
Jane | 3/10/09 | 3
John | 3/12/09 | 3
John | 3/12/09 | 1
John | 3/12/09 | 5
John | 3/13/09 | 2
John | 3/13/09 | 5
John | 3/13/09 | 7

Some of the names have more than one test under them. How many or which names are impossible to predict. You need to SEE the data to do the updating.

I know ideally you would have Names and Test dates in their own table, but I'm trying to make this a one-table deal for ease of use by people who are not me who will be ok with importing and running the queries (which is basically one click using macros) but not much else.

I can make and run a bunch of queries and forms to do this, but I can't seem to figure out which ones.
posted by amethysts to Computers & Internet (10 answers total)
 
Can you export your data to Excel, clean it up, and then re-import it to Access? That's what I would do if I understand what you want.

I would create a set of 3 new columns in Excel just to the right of your 3 data columns. Then use IF statements to fill in cells that are blank with whatever is above them, and leave them alone if they aren't blank.

If your 3 columns are A,B and C in excel, I would create a D, E and F with the following formula: =IF(A2<>"",A2,A1). Then fill that cell down. This should "clean" your data so that each cell is filled with the appropriate value and not blank. Then you can export this back to Access and it should be relatively simple
posted by jckll at 2:50 PM on December 10, 2009


Somehow, your concept of "seeing the data" goes against the entire theory of relational databases. Quoting Wikipedia:
In relational model:

a relation is a data structure which consists of a heading and an unordered set of tuples which share the same type.
This explains why SQL will not get you anywhere with this data. To get your row ordering included in a rational model, you want to add the row number as an explicit column. Then, you can write queries to simulate the "copy down" that you want by looking up the value from the valid row with the highest row number <= the current row number.
Nthing that you're abusing a database here and be much better off doing this in a spreadsheet.
posted by themel at 3:09 PM on December 10, 2009


Yeah, you could add a row number field, and do something similar to my IF statement (but using CASE -- not sure if this works in Access) referencing the row # and the row #-1. But it's jury rigging to the max.
posted by jckll at 3:18 PM on December 10, 2009


Response by poster: Hmm. The two reasons I have it in Access as opposed to Excel is that it made it really easy to do the initial clean up, and it'll be really easy to create reports with (and easy for other people to run reports). I'm cringing at the idea of getting into Access, spitting it out again, doing some stuff manually and then loading it back in, or getting it into Access, spitting it out again, doing some stuff manually and then do all the reports in Excel. But if that's the only way to do it then that's the only way to do it.
There's like 1300 or so records in this table so the manual part was also what I was hoping to avoid.
posted by amethysts at 3:22 PM on December 10, 2009


Wait, is the data being continually updated in this fashion? I thought you just have this set of data and want to present it in one table so that people can query against it?
posted by jckll at 3:52 PM on December 10, 2009


Response by poster: Well, for now. But another set of data is going to come down from heaven in a few weeks. And probably will continue doing so.
posted by amethysts at 4:16 PM on December 10, 2009


Like I said, this is quite jury-rigged, but here goes:

ALTER TABLE [table]
ADD id_key PRIMARY KEY AUTOINCREMENT,
name1 VARCHAR(50),
date1 DATETIME,
score1 INT

UPDATE [table]
SET name1=b.name
FROM [table] a INNER JOIN [table] b
ON a.id_key=b.id_key+1

UPDATE [table]
SET date1=b.date
FROM [table] a INNER JOIN [table] b
ON a.id_key=b.id_key+1

UPDATE [table]
SET score1=b.score
FROM [table] a INNER JOIN [table] b
ON a.id_key=b.id_key+1

I think this will work, but I am not positive as I have never done something like this before. So pretty much off the top of my head. And I don't really use Access, so you may have to convert some of this into "Access SQL" because it has its own idiosyncrasies. For what it's worth, AUTOINCREMENT is Access--in MSSQL it would be IDENTITY.
posted by jckll at 4:32 PM on December 10, 2009


In MySQL:
mysql> describe test;
+-------+--------------+------+-----+---------------------+-------+
| Field | Type         | Null | Key | Default             | Extra |
+-------+--------------+------+-----+---------------------+-------+
| row   | int(11)      | YES  |     | NULL                |       |
| name  | varchar(100) | YES  |     | NULL                |       |
| date  | timestamp    | YES  |     | 0000-00-00 00:00:00 |       |
| data  | int(11)      | YES  |     | NULL                |       |
+-------+--------------+------+-----+---------------------+-------+
4 rows in set (0.00 sec)

mysql> select * from test;
+------+------+---------------------+------+
| row  | name | date                | data |
+------+------+---------------------+------+
|    1 | John |                NULL | NULL |
|    2 | NULL | 2009-11-30 00:00:00 | NULL |
|    3 | NULL |                NULL |    3 |
|    4 | NULL |                NULL |    4 |
|    5 | Jane |                NULL | NULL |
|    6 | NULL | 2009-12-01 00:00:00 | NULL |
|    7 | NULL |                NULL |    1 |
|    8 | NULL |                NULL |    2 |
|    9 | NULL |                NULL |    3 |
+------+------+---------------------+------+
9 rows in set (0.00 sec)

mysql> create view name_row as select t1.row as row, max(t2.row) as name_row from test t1, test t2 where not t2.name is null and t2.row < t1.row 
mysql> create view date_row as select t1.row as row, max(t2.row) as date_row from test t1, test t2 where not t2.date is null and t2.row < t1.row
mysql> select tn.name, td.date, t1.data from test t1,test td, test tn, date_row, name_row where tn.row = name_row.name_row and td.row = date_row.date_row and t1.row = date_row.row and t1.row = name_row.row and not t1.data is null;
+------+---------------------+------+
| name | date                | data |
+------+---------------------+------+
| John | 2009-11-30 00:00:00 |    3 |
| John | 2009-11-30 00:00:00 |    4 |
| Jane | 2009-12-01 00:00:00 |    1 |
| Jane | 2009-12-01 00:00:00 |    2 |
| Jane | 2009-12-01 00:00:00 |    3 |
+------+---------------------+------+
5 rows in set (0.00 sec)
I couldn't reduce it to anything less than five joins, maybe someone else can.
posted by themel at 4:58 PM on December 10, 2009


Response by poster: Whoa, this looks awesome, guys. I'm going to give it a shot tomorrow!
posted by amethysts at 5:41 PM on December 10, 2009


Response by poster: Well, I couldn't get any of these solutions to work but that's my fault as my SQL skills are not as good as they should be.

So what I did was I made a form that lets you see the data and update it in batches. The form has a query that only holds non-updated rows, which get cleared from the form as soon as they're updated. Textboxes at the top grab the name, date, and IDs of the first record and that's what the query uses. To update the name, there are three buttons that say "one test", "two tests", or "three tests" which will grab the name from the textbox and stick it into the 12, 24 or 36 rows underneath the row with the name. Then you hit a button to update the date for the 12 rows underneath the date. This was the best way I could think of to account for the randomness of the data and make sure that someone is verifying that the correct updates go to the correct rows. It's a little time consuming but pretty painless as i disabled the warnings.
Thanks for your help!
posted by amethysts at 8:42 AM on December 15, 2009


« Older Help me create a Christmas playlist...   |   How should I react when people visually... Newer »
This thread is closed to new comments.