mysql hope
August 14, 2006 12:40 PM Subscribe
MySQL refuses to index my table properly, because the data I'm looking for is only three letters long.
I wrote a little database postcode app, and wondered for ages why it wasn't working properly. Then I realised it would return results perfectly well when I searched for a four-character postcode, like BA16 -- it was just ignoring the three-character ones, like BA5. Is there any way to turn off (or work around) this idiotic MySQL behaviour?
If it matters, I'm using fulltext indexes with this SQL:
SELECT * FROM 'pc' WHERE MATCH (town,county,postcodes,area,county) AGAINST ('BA5').
I wrote a little database postcode app, and wondered for ages why it wasn't working properly. Then I realised it would return results perfectly well when I searched for a four-character postcode, like BA16 -- it was just ignoring the three-character ones, like BA5. Is there any way to turn off (or work around) this idiotic MySQL behaviour?
If it matters, I'm using fulltext indexes with this SQL:
SELECT * FROM 'pc' WHERE MATCH (town,county,postcodes,area,county) AGAINST ('BA5').
What are the results when you EXPLAIN that query, such that you're certain it's not using your index?
MySQL has a built-in optimizer that will examine the structure of a query and attempt to estimate the time saved by using an index. If it determines that it's better off ignoring the index and just going at it, it'll do just that.
How many different postcodes do you have, or what other information is contained in the field that you're using a FULLTEXT index for (which is somewhat unreliable with single strings)?
posted by disillusioned at 12:51 PM on August 14, 2006
MySQL has a built-in optimizer that will examine the structure of a query and attempt to estimate the time saved by using an index. If it determines that it's better off ignoring the index and just going at it, it'll do just that.
How many different postcodes do you have, or what other information is contained in the field that you're using a FULLTEXT index for (which is somewhat unreliable with single strings)?
posted by disillusioned at 12:51 PM on August 14, 2006
Response by poster: orthogonality: How would you go about changing that on a shared web hosting server? Would you ever be able to?
posted by reklaw at 12:56 PM on August 14, 2006
posted by reklaw at 12:56 PM on August 14, 2006
This doesn't seem like the sort of incidence that you'd want to use FULLTEXT. Just index the first 4 chars of postcode and use LIKE (or a LEFT() and =). It'll still use the indexes (although you can check this). The amount of postcodes in the United Kingdom is trivial. Do you need to search county, town, etc, for postcodes?
posted by wackybrit at 1:02 PM on August 14, 2006
posted by wackybrit at 1:02 PM on August 14, 2006
reklaw writes "orthogonality: How would you go about changing that on a shared web hosting server? Would you ever be able to?"
If you've got your own instance of the mysql server running on the shared host, you're good to go. If not, you're out of luck, because other users of the database won't want to make the min length 3.
But the real answer is the one wackybrit gave you; there's unlikely to be a good reason to be using fulltext to look for postal codes.
posted by orthogonality at 1:06 PM on August 14, 2006
If you've got your own instance of the mysql server running on the shared host, you're good to go. If not, you're out of luck, because other users of the database won't want to make the min length 3.
But the real answer is the one wackybrit gave you; there's unlikely to be a good reason to be using fulltext to look for postal codes.
posted by orthogonality at 1:06 PM on August 14, 2006
Response by poster: The reason I'm using fulltext is that I've got someone else's silly dirty database with comma-separated postcodes in each column, like this:
column one: name of place
column two: postcode, postcode, postcode, postcode
Daft, I know.
I think I do have my own instance of the mysql server running, though, although I've no idea how to edit the config. Do I have to mess around with a unix shell or something?
posted by reklaw at 1:13 PM on August 14, 2006
column one: name of place
column two: postcode, postcode, postcode, postcode
Daft, I know.
I think I do have my own instance of the mysql server running, though, although I've no idea how to edit the config. Do I have to mess around with a unix shell or something?
posted by reklaw at 1:13 PM on August 14, 2006
reklaw writes "I think I do have my own instance of the mysql server running, though, although I've no idea how to edit the config. Do I have to mess around with a unix shell or something?"
This way will take you to hell. You'd have to change the value, and then re-index. And make sure that anyone who indexes in the future with myisamchk passes the same min length parameter -- if they donm't, the query breaks. MySQL Full-Text Search is half-baked, and its re-indexing is pretty bad too. (It's often faster to drop indices than to repair them, as a projectI'm working on just completed for disillusioned demonstrates.)
Why not just fix the database? In the long run, this will be easier and more maintainable.
posted by orthogonality at 1:25 PM on August 14, 2006
This way will take you to hell. You'd have to change the value, and then re-index. And make sure that anyone who indexes in the future with myisamchk passes the same min length parameter -- if they donm't, the query breaks. MySQL Full-Text Search is half-baked, and its re-indexing is pretty bad too. (It's often faster to drop indices than to repair them, as a project
Why not just fix the database? In the long run, this will be easier and more maintainable.
posted by orthogonality at 1:25 PM on August 14, 2006
Response by poster: Note for future Google searchers etc.: I couldn't be bothered to fix this properly, so I just did a find-and-replace to make all my commas into "ZZZ" -- MySQL indexes "BA5ZZZ", and it's trivial to strip all occurances of ZZZ from the output. OMG HAX. Don't worry, this script is just for my own use anyway...
posted by reklaw at 3:09 PM on August 14, 2006
posted by reklaw at 3:09 PM on August 14, 2006
Oh dear god in heaven.
posted by orthogonality at 4:24 PM on August 14, 2006
posted by orthogonality at 4:24 PM on August 14, 2006
Now watch that kluge turn up in Microsoft Postcodes :-)
posted by flabdablet at 6:54 PM on August 14, 2006
posted by flabdablet at 6:54 PM on August 14, 2006
Orthogonality: in the interests of learning, what would be the right way to do this?
1) If the data is just simply (locality, csv_postcodes), create the table in the form of (unique_postcode, locality).
2) If it's any more complex e.g. actual address data, use linked tables to a (unique_postcode, locality) table and set indexes based on the most common query type.
3) Something else?
posted by Pinback at 7:15 PM on August 14, 2006
1) If the data is just simply (locality, csv_postcodes), create the table in the form of (unique_postcode, locality).
2) If it's any more complex e.g. actual address data, use linked tables to a (unique_postcode, locality) table and set indexes based on the most common query type.
3) Something else?
posted by Pinback at 7:15 PM on August 14, 2006
Pinback writes "in the interests of learning, what would be the right way to do this?"
Yeah, you pretty much have it. The "rule" is, every entity has a single row in some table.
An entity is any thing you want to represent in the database, possibly including relationships among other things, or even consisting of metadata. That's the essence of database normalization. If you don't normalize, you have duplicated entities, which means you can simultaneously represent one real thing in two different, contradictory states.
posted by orthogonality at 10:47 AM on August 15, 2006
Yeah, you pretty much have it. The "rule" is, every entity has a single row in some table.
An entity is any thing you want to represent in the database, possibly including relationships among other things, or even consisting of metadata. That's the essence of database normalization. If you don't normalize, you have duplicated entities, which means you can simultaneously represent one real thing in two different, contradictory states.
posted by orthogonality at 10:47 AM on August 15, 2006
Response by poster: Yeah, totally. The moral of the story is, whoever made the database originally doesn't know shit about databases. Comma-separated (or pipe-separated, or whatever) fields break the ENTIRE POINT of them.
I guess the truly-right thing to do would be to write a script that would take each postcode and make it its own row -- but that would, of course, require both writing the script to begin with, and then running it over the entire db. Still, I feel a little guilty about my kludge, even though I'm the only one to have to suffer with it.
posted by reklaw at 12:15 PM on August 15, 2006
I guess the truly-right thing to do would be to write a script that would take each postcode and make it its own row -- but that would, of course, require both writing the script to begin with, and then running it over the entire db. Still, I feel a little guilty about my kludge, even though I'm the only one to have to suffer with it.
posted by reklaw at 12:15 PM on August 15, 2006
Thanks, ortho - I knew reklaw's "solution" was horribly, horribly wrong, and was just curious to know if my thinking was on the right track (and to elicit a more generically useful explanation for future AskMeologists).
Thankfully, reklaw knows his solution is pretty nasty. But, as a non-programmer who's just good enough to create ugly 1000-line hacks in PHP or JavaScript, it's the sort of sideways solution to a problem that would cross my mind for ... oh, maybe 5 seconds? ... before I disgusted even myself ;-)
Though, I admit, I might use something like it if necessary as part of a quick-and-dirty to initially populate a new table...
posted by Pinback at 11:53 PM on August 15, 2006
Thankfully, reklaw knows his solution is pretty nasty. But, as a non-programmer who's just good enough to create ugly 1000-line hacks in PHP or JavaScript, it's the sort of sideways solution to a problem that would cross my mind for ... oh, maybe 5 seconds? ... before I disgusted even myself ;-)
Though, I admit, I might use something like it if necessary as part of a quick-and-dirty to initially populate a new table...
posted by Pinback at 11:53 PM on August 15, 2006
This thread is closed to new comments.
MySQL 5.1 Reference Manual :: 12.7 Full-Text Search Functions: MySQL 5.1 Reference Manual :: 12.7.5 Fine-Tuning MySQL Full-Text Search:
(The MySQL documention is your friend.)
posted by orthogonality at 12:50 PM on August 14, 2006