comparison engine for books
December 22, 2004 11:26 AM

I'd like to build (or have built for me) a comparison engine like FetchBook or BookkooB. Basically, a comparison engine that compares prices of books, and then if the user clicks over to buy the book from one of the affiliates, the program would append my account code, getting me a kickback on the referral.

I have no idea how to do this, or what it would cost to pay someone to do it. What would be the best way to figure this out? Are there any open source comparison engine scripts? If I had to pay someone to build something like this, what should I expect to pay? (I can do all the frontend work, just not the backend.)
posted by Alt F4 to Computers & Internet (19 answers total)
Why can't you use the existing ones? What is the functionality you're missing?
posted by u.n. owen at 11:28 AM on December 22, 2004


So you've decided to Make Money Fast [tm] and you want MetaFilter users to help you with the business plan, including finding code?
posted by WestCoaster at 11:55 AM on December 22, 2004


I don't think it's an abuse of AskMe to ask about available code resources. Pricing... maybe, if only because it's so tricky and will vary widely between who you talk to. You might get away with finding some sharp kid or Third-world shop that'll do it for you for peanuts, or you might pay thousands.
posted by weston at 12:15 PM on December 22, 2004


Why can't you use the existing ones? What is the functionality you're missing?

Specifically, I'd like for the site to find the prices, and to give it to the user in two columns: One that features big, chain stores that will probably have the best price; one that features smaller, indie stores and mom-and-pop ventures. I recognize that the affiliate programs are only going to work with the very biggest of the stores, but I want people to be able to support smaller shops. There's more to it, but that's one of the primary differentiators.
posted by Alt F4 at 12:54 PM on December 22, 2004


Thanks for the idea! Now, where did I put that computer science degree?
posted by jon_kill at 1:09 PM on December 22, 2004


So ... no open source (or other) software that'll do this?

Thanks, weston, for the thoughts. I know the peanuts-to-pricey range applies to everything of this type, but I thought somebody here would have had experience with a solid, professional coder, and could say "this'd probably take 20 to 40 hours, and you'd expect to spend about $30/hr." Sorry for not clarifying that earlier. I'd prefer to try the coding out myself (learning new skills, more control over it, all that), but, again, I don't have API/XML/database scraping knowledge, which is why I'd prefer a program that I could work through.

I'll be checking this post a lot, so if anyone else has input, even days down the road, I'd be interested in hearing it.
posted by Alt F4 at 2:53 PM on December 22, 2004


If you were asking me to do it, I'd quote you $35 an hour, with the number of hours depending on the ease of access to the various store's databases. Amazon's SOAP interface is pretty slick, but other stores would require screenscraping or special arrangements with them. So really think about whether or not you'd be able to offer anything that isbn.nu doesn't offer already.

I don't know of any existing software out in the open to do what you're asking.
posted by cmonkey at 3:12 PM on December 22, 2004


I'd never thought about how easy this kind of service is to run. You don't need to store anything on your server, except information about the sites and how to get their information, via API or scraping. Someone asks for something, you just go and make a few dozen requests, wait for the results, and display them. So the backend is tiny.

So all you need is a server with bandwidth scaled to the traffic you expect and a reasonably fast CPU to do some basic text-processing quickly.

I'd say a decent programmer with CGI and Perl and little spidering knowledge could whip up a working demo in a few days. There's nothing fancy involved, so you could do it yourself. Heck, I almost did it while I was typing this comment. ;-)

(Note: I may be over-optimistic. I tend to be that way for such things. I'm just trying to be helpful. But I'm serious that it should be quick and easy to get to the proof-of-concept phase. Then if 1,000 people per hour are trying to use it, you'll have problems, but they're the kind of problems you want to have, because that means that your idea is a success.)

(Oh, and you'll be taking care of the front-end, right, all that shiny HTML and colors and graphics stuff, right?)

(Also, I wonder what the legal aspects are. I mean, I don't, but someone probably should somewhere along the line.)
posted by Turtle at 4:06 PM on December 22, 2004


Have you checked out elance and Rent-a-coder?
posted by deshead at 5:44 PM on December 22, 2004


I know the peanuts-to-pricey range applies to everything of this type, but I thought somebody here would have had experience with a solid, professional coder, and could say "this'd probably take 20 to 40 hours, and you'd expect to spend about $30/hr." Sorry for not clarifying that earlier.

If I were guess, I'd rough out the first draft about 10 hours per store you want searched that has a nice webservices interface, and 20-40 hours per store that doesn't have said nice interface and involves screen scraping. Some other number of hours to tie it into a coherent whole. These numbers could possibly be reduced (perhaps by as much as half) if you find coders already familiar with webservices interfaces for the stores that have them, and already screenscraping pros for the rest.
posted by weston at 9:57 PM on December 22, 2004


deshead - I knew of Rent-a-coder, but haven't used them before. Have you or anyone else you know used one of their contractors to good success? Elance is new to me, but I'll look into it.
posted by Alt F4 at 6:03 AM on December 23, 2004


here are some possible problems with the idea, just for completeness:

- for every request made to your site, you need to make N requests to other sites. so if you have 9 afilliates then the bandwidth is multiplied by a factor of 10.

- page scraping can be very fragile. this may mean regular errors and maintenance.

- it's not just page scraping/reading data that must be tailored to each site. you may also need affiliate specific code for enterig values intheir search engine, etc. this may mean several page reads, even higher bandwidth use, even more fragile code, and even more maintenance.

- if the sites don't want to cooperate (and why should they? - the majority will be losing out to the cheapest) they can easily block you, since all requests come from your server.

- sites may automatically impose bandwidth limitations for serving to single addresses, auto-blacklisting you.

- how do you handle different editions (paperback v hardback, for example)?

- what about postage and packing calculations? these might depend on the number of items bought, the urgency of the delivery and the geographical destination.
posted by andrew cooke at 7:31 AM on December 23, 2004


andrew's got some very good points. The fragility of screen-scraping would be my biggest concern. I don't know about non-cooperation -- if sites are getting sales off of you, they probably won't care too much, or you can work something else out with them, and if not, you probably don't need to care as much. But yeah. While I admire Turtle's "get it going!" philosophy (and it's that kind of enthusiasm that does indeed get projects started, so hang on to it), it's important to recognize there are some things that are going to make doing this well problematic.
posted by weston at 8:50 AM on December 23, 2004


Great questions. A lot to think about. The only immediate answer I'd have to how do you handle different editions (paperback v hardback, for example)? would be with using ISBNs. But even there, I'm sure there are issues to contend with. Thanks for your input, all.
posted by Alt F4 at 10:39 AM on December 23, 2004


isbns are certainly going to be part of the solution, but do you want them "obvious" to the user? when i last used services like this, at least one had a two stage process where the first step was to identify the isbn. it wasn't that pleasant to use.

a lot of the problems are solved if the info is available through web services, as turtle said way back. even more are solved if there's a standard api for all book sellers.

if you're asking around for prices from developers, i'd suggest you don't go with people who don't raise questions like those, unless they prove to you that there is a standard api web service (at least for final code - as turtle (again) said, a first mock-up is a different question altogether).
posted by andrew cooke at 11:17 AM on December 23, 2004


Good points everyone. Slight digression: it strikes me that the really low overhead architecture would go one step further: why not let each user transparently do the downloading from the X web sites and the scraping or whatever on his own machine?

Perhaps the easiest way to do that would be with a Java applet (yuk) (or Flash, ick, or some other RIA) (I'd be tempted to try in plain old Javascript, but I suspect there may be security difficulties accessing multiple domains from unsigned Javascript). Your site just provides the applet and the data that allows it to run. The user's machine does all the downloading and calculating. Your server barely registers the activity, and you sit back and collect the megabucks. Smooth! :-) (It's also a way of sidestepping andrew's issues 1, 4, and 5).

(By the way, that kind of client smarts is the shape of things to come, imho (actually, it's been announced for as long as Java's been around, but it's really going to happen now) Who really wants to be locked in by some web site's idea of a "nice" user interface?).

As noted by andrew and weston, fragility of screen-scraping definitely means you need to plan for the ability to update the screen-scraping knowledge base. This wouldn't normally require advanced technical skills, just knowledge of HTML and of some basic "reverse template" language. It does suppose that the program provides a function that makes updating the scrape DB easy for some administrator/"knowledge-engineer" type. Which is extra work for the developer.
posted by Turtle at 1:26 PM on December 23, 2004


MAB (the Mozilla Amazon Browser) shows how book-price browsing could run either on the client in a Rich Web App or on the server. Source code available. Mozilla/Firefox-only, however.
posted by Turtle at 6:50 PM on December 25, 2004


deshead - I knew of Rent-a-coder, but haven't used them before. Have you or anyone else you know used one of their contractors to good success?

(In case you're still checking) ya, I've had good experiences with both services, though I prefer Rent-a-coder. eLance caters a lot to off-shore development firms, and while their rates are good, I prefer the one-on-one interaction with a single developer (which seems easier to find on R-A-C).
posted by deshead at 8:05 AM on December 26, 2004


Thanks, everyone, for the suggestions. I do really appreciate it. The off-server applet sounds really interesting; hadn't thought of that before. Turtle and deshead - thanks for continuing to post to this. Good to know you've had good experiences with R-A-C, d.
posted by Alt F4 at 9:42 AM on December 26, 2004


« Older What's the origin of "Long time x, first time y?"   |   Crud-tolerant DVD players? Newer »
This thread is closed to new comments.