How do I forecast costs for database development?
June 12, 2007 3:12 PM   Subscribe

How do I forecast costs for database development?

I am serving as the de-facto technical adviser to a family member looking at a potential startup. I believe the idea is not only viable but quite inspired. This individual is a middle-aged attorney with significant financial resources he is able to bring to bear.

I'm unable to divulge much about the nature of the project. However, it will require a custom (and continually customizable) relational database backed that will need to be able to scale to potentially enormous size. This data will feed a Web-based front-end that will need to be capable of real-time transaction processing.

I am looking for information on the costs of developing the database. I understand that complexity levels come into play here, and I think our data mix is very simple. Names, Places, Addresses. The important thing is that it is "infinitely" scalable.

How do I go about finding the right person or team to put this together? How are jobs like this priced? Any personal relevant experience to share?

Thanks all!

(Also, we are leaning towards MySQL at the moment, FWIW).
posted by Roach to Computers & Internet (19 answers total) 2 users marked this as a favorite
 
Don't worry too much about scalability, it's the best kind of problem to have (too many customers!) Just have the tech guy do the ground work in the design phase so that you can just keep throwing servers at the problem. That is really how the big guys deal with giant growth. It's not so much hard as just needs to be thought about. Up until then though, it really doesn't matter.

Design the database well, proper normalization, and that will make your life easier in the long run.
posted by cschneid at 3:38 PM on June 12, 2007


Best answer: Thoughts:

1. There ain't no such thing as "infinitely scalable". Good thing, too, because you can't scale for the infinite. I would recommend getting a handle on how many users you're going to have, say, six months and a year from now, and making sure you can handle them. Certainly, you want to make sure you aren't impeding scaling up to the billions of humans your optimistic business plan calls for, but really I would recommend against accommodating the entire human population with your first release. People hiring IT folks love to ask for infinitely scalable, portable, functional, etc. and the solution ends up over-engineered to hell. Focus your needs as best as you are able and then ensure these needs are covered.

2. Your bottleneck on a startup is almost always sales and/or attracting an audience. Usually these drag enough that the tech side has no trouble keeping up. There are stories of startups failing because they couldn't scale, but these are relatively rare. Unless your initial DBA or programmer makes egregious errors, you should be able to scale it up as you go.

3. Good DBAs are expensive. They are rare, and difficult to find. There are two major functions of a DBA--database development, and database setup and maintenance. Given the simple nature of your needs, a decent programmer will likely have the SQL chops to perform the development role, and you can hire a DBA when your database needs tuned, replicated, and so on and so forth.

4. Oracle, and its attendant DBAs, are on a-whole-nother tier. If you're on MS SQL Server, Postgres, or MySQL, a DBA can definitely help, but it can also be overkill.

5. One poorly-written query can undo all the performance gains your DBA patiently produced.

Having said all that, the data model is the biggest pain to change post-release, so be sure to get the model itself right. Can a DBA help with this? Sure, if he or she is good. Would it probably be wasted cash at this point? Yeah. I would recommend getting one superstar programmer you can trust. Let the programmer handle the database.
posted by Nahum Tate at 3:46 PM on June 12, 2007


I get the impression from your secrecy that this is a corporate project and probably one that actually has a "budget", rather than a bootstrapped or startup type deal.. in which case, cschneid's advice about scalability could be dangerous.

What you need is someone / some people who have extensive *experience* in putting databases together (not theoretical experience, but actual experience in dealing with machine-killing database crises). The best independent consultants I know of in this regard are at http://www.mysqlperformanceblog.com/ .. they can work online or onsite (I think?) but totally know what they're talking about (I've been reading their blog for aeons now and I maintain large, scaling databases myself).

The problem is that designing the database "well" could actually involve denormalization. There is a massive chasm between theoretically good and "actually good", which you only learn once you've had your fingers burnt once or twice (and you quickly learn scalability really does matter when you're dealing with, say, 10 gigabytes of indexes that are taking hours to rebuild ;-)).
posted by wackybrit at 3:50 PM on June 12, 2007


I think you really need a firm handle on the expected drivers of traffic and growth, both in terms of concurrency (# of simultaneous users) and in terms of data size.

You describe this as a startup. Taking that at face value, suggests that there are lots of things to worry about besides scalability. YouTube, which has done a pretty good job dealing with really remarkable growth, had a lot of issues to deal with after launch before scalability mattered.

Even if this is a startup that is looking to do some big deals with big companies that will drive a lot of traffic very quickly, you'll almost certainly need a working prototype first.
posted by Good Brain at 4:05 PM on June 12, 2007


The first thought that came to me is that if they're having to constantly customize the database itself (rather than the data, natch), that might be a red flag as far as db design goes. All databases grow and change, but a schema that has to be constantly customizable as a factor of the business plan reeks of bad design.
posted by rhizome at 4:09 PM on June 12, 2007


Response by poster: Thanks for the great answers so far.

By "customizable", what I really meant is that we have 3 phases for the project, and some of the features and functionality won't be implemented until the later phases. We just want to make sure the backend is built with that in mind.

Also, maybe this add-on is better suited for the Jobs page, but if any of you have any recommendations for developers, my email address is jfroach AT gmail.com.

Thanks again.
posted by Roach at 4:29 PM on June 12, 2007


Mefi's own orthogonality is a bit of an SQL expert, (Less the really hard stuff) though he's a bit tough to get a hold of.

Good luck!
posted by disillusioned at 4:53 PM on June 12, 2007


I've typed, and deleted, an awful lot of stuff - I think a comprehensive answer would run to book size.

So instead I'll just say cschneid is right - the database design (the schema) should conform to all the standard rules. And the implementation will start off looking a lot like the schema. But as your load goes up and you start hitting bottlenecks, the implementation will move further and further away from this platonic ideal.

The Really Big Sites eventually outgrow (a) normalisation and (b) relational database systems. They all, eventually, seem to move towards a damn great array of [key,value] pairs, with a thin veneer of custom code on top implementing whatever RDBMS features they can't afford to jettison.

I would suggest employing a standard LAMP codemonkey to generate your initial version as cheaply as possible, then decide whether or not you want to invest more money. If you do, bring on more experienced people, including a decent project manager (not least because, if you're asking questions like this, I'm not sure you have the experience to evaluate potential new hires).

Your business plan should assume continual refactoring of both code and data as the business grows.

By the way, database design is easy; database tuning is hard. So the db work for the first version will cost nothing in comparison to the front-end coding.
posted by Leon at 5:16 PM on June 12, 2007


Definitely get a copy of Cal's book, and pay special attention to the bits about databases near the end. It's solid advice from someone with experience growing Flickr from nothing to what it is today. Don't get too caught up in it, though. Worry about building something that works first ("Not Everything Worth Doing Is Worth Doing Well"), and then scale it later. There are some things that you just can't know until after you've got traffic, and you might as well be scratching your head about growth while you've got ad money/subscriptions/pageviews rolling in.
posted by migurski at 5:40 PM on June 12, 2007


Best answer: disillusioned writes "Mefi's own orthogonality is a bit of an SQL expert,"

Thanks Chris!



The OP writes "I'm unable to divulge much about the nature of the project. However, it will require a custom (and continually customizable) relational database backed that will need to be able to scale to potentially enormous size. "

At this point, you need a couple of things: a good business plan (outside my area of expertise), and a good and thorough design document that describes the site's required (and desired) functionality. Concentrate on what it's supposed to do, not so much on how. I recommend you do this, and work out the kinks, before hiring anyone.

Then hire a project manager/architect to further refine the requirements.

A database is a (highly selective*) model of reality -- of some reality, in this case your site's "picture" of "the world". Done correctly, if you identify your requirements, you identify the entities you need to model, and those entities become the things your database holds. if done right, by the right people, the database "naturally" emerges from the requirements.

Implementation decisions (MySQL, LAMP, PHP or Java) should also come out of those requirements, not before.

(*A highly selective model: a geneticist's database might model humans, their lineages, their genetics, and phenotypical information like eye color, but probably not salaries; a geneologist's would model humans and lineages (familial relationships) but rarely genetics; a human resource database humans and salaries and jobs, and some relationships (for insurance coverage) or racial categories (for EEO compliance), but not eye color or ancestry, etc. Proper database design is about asking "what is truth" for some particular "slice" of the world you need to model. Do this right, and tell the database the truth, and it'll tell you the truth. Do this wrong, or "lie" to the database, and it'll "lie" to you, or perform poorly.)

If you have more specific questions, feel free to email me.
posted by orthogonality at 5:47 PM on June 12, 2007


What Leon says!
posted by phrontist at 5:51 PM on June 12, 2007


Intelligent database design and optimization definitely helps, but when you start getting lots of traffic, caching can minimize the database calls away. Take a look at memcached.
posted by bertrandom at 6:11 PM on June 12, 2007


Response by poster: Leon?
posted by Roach at 6:23 PM on June 12, 2007


Then hire a project manager/architect to further refine the requirements.

I agree up to this point but respectfully disagree about hiring any technical resources. The chances of you hiring someone from the very limited pool of tech people who can actually execute is extremely slim, since these people are generally not in the job market and won't leave their highly-paid consulting gigs or VC-funded startups. Even if one such person comes your way, how will you tell her apart from the other seemingly-qualified-but-not-really applicants?

Instead, get the plan and the design together and start networking your way towards VCs. I'm not a big fan of VC funding, but taking an idea and executing it is what they do. The good VCs have their own networks of known-good tech people who can take care of this stuff. Some companies are well-rounded enough to execute without help, but, to be blunt, you're asking strangers for help because you don't even know where to start. Take that as a sign and focus on your core competency rather than thinking about database implementations.
posted by backupjesus at 6:40 PM on June 12, 2007


Best answer: Me?

(The Scalable book is inappropriate here - too much detail for management, not enough detail for an engineer).

Implementation decisions should [come after hiring a project manager/architect] not before.

Ok, business decision not database decision, but I've been personally burnt by investing heavily in the first phase of an idea that turned out to be junk, so I'm strongly in favour of putting up as little money as possible at each stage.

Cost of hacking together a prototype in PHP/MySQL? $5K, and you've got a something-close-to-working prototype to show for your money.

Cost of hiring a good systems architect and refining your project plan/functional spec? $30K, and you've got a PDF to show for your money.

Look up "bootstrapping" as it relates to startups. It's a good philosophy, IMO.
posted by Leon at 7:13 PM on June 12, 2007


As far as scaling goes, there is a lot of voodoo and just plain wrong-ness that gets spread around. Take a look at Jason Hoffman's Scale With Rails presentations from 2006 and 2007 (both PDFs) for a good approach to what the numbers really are for large sites. They deal quite a bit with non-SQL specific data, but they're fantastic as a reality check.

Everyone said my usual points, but for one: Don't tie yourself into one platform overly. In this context, that means don't rely on features that only X product has if you don't know you need it. Writing cross-platform or at least easily portable SQL is harder but when you need to move it will be substantially easier. It's just good practice anyway. Maybe there's an abstraction layer that can do this for you at the low-end of the scaling pool.

Oh, and Leon speaks truth. You'll be changing everything as you scale, don't be afraid of that. Don't expect to have the One True Platform and keep it forever. The model is king, not the platform.
posted by Skorgu at 7:28 PM on June 12, 2007 [1 favorite]


Oh and if those PDF links disappear, my gmail is in my profile.
posted by Skorgu at 7:30 PM on June 12, 2007


Response by poster: Thanks, this has been great.

And, FWIW, I do have an MIS degree from a well-known 4 year university in the US and 10 years post-college experience in IT, but my career path immediately veered towards the Networking/Telecom angle after college. I used to speak SQL but it has been since '98 since I last did any database work.

That said, I still feel like I speak the language of relational databases from a theoretical and conceptual level, just not at a brass-tacks level, especially at the scale we are considering.

Thanks again for all of the input. You guys are awesome.
posted by Roach at 7:37 PM on June 12, 2007


I'm going to repeat one of orthogonality's points for emphasis:

Implementation decisions (MySQL, LAMP, PHP or Java) should also come out of those requirements, not before.
posted by BrotherCaine at 4:36 AM on June 13, 2007


« Older Is being paranoid of dying a normal thing?   |   Mind my business? Newer »
This thread is closed to new comments.