How can I quickly match one of 150 million names to a code?
September 3, 2009 10:10 AM
Subscribe
I have up to 150 million product names, I need to match each of them to 1 or more codes. How can I do this very quickly and with little infrastructure requirements?
Basically, up to 100,000 names will be checked against the master list of 150 million to see if they are on that specific list (they may not be). If they are on that list, the system will return one or more codes back. While I'm sure I can use a database for this activity, I'd love to hear other suggestions for alternatives such as a specific disk backed HashMap or other similar technology. A Java based or solution with Java bindings would be preferred. The system could possible dedicate up to 1GB of Ram in a transient fashion to this lookup (though much less would be preferred).
The names and mappings to code are relatively static, changing once a month at their maximum. I can pre-generate a digested format if that is required for the library to work.
A name is a string up to 255 characters, a code is a Java Int.
This would need to run on a single system running 64 bit Java JDK 1.5.0 and I'd rather that the solution use in process communication instead of network communication.
A solution using a single digested custom data file and a single Java jar library would be very nice.
Thank in advance.
posted by bottlebrushtree to computers & internet (6 comments total)
2 users marked this as a favorite
posted by cmm at 10:19 AM on September 3