Buried alive to pass the time.
July 16, 2005 3:56 PM   Subscribe

I've got a coding project, which will make me a bazillion dollars if I can pull it off. But I'm having trouble starting it, partially because I'm not a well-trained programmer, and partially becuase I'm unsure of whether to seek help.

To give you an idea of the scope of the project, I will need to integrate the following:
- RSS and web scraping
- database creation and heavy live usage.
- text analysis methods.
- numerical methods, particularly (very) large matrix computations
- once it's done, the fun of talking to very business-driven people

(God, that looks like I'm trying to remake google.)

At this point, I'm really just trying to bootstrap this into a proof of concept, and I'd love to do it all alone in my spare time, with zero funding. Is that just unrealistic?

The further wrench in the works is this:
While I know how to do all these things conceptually, and have a good idea of the theoretical problems involved, I don't yet know a useful language. I'm a Matlab guru, but that won't help. So: will python do the job for this? [I know it has the modules to save me a lot of work, but: Should I worry about memory management for large computations? Should I worry about speed, stability?] Am I insane to try to develop a major project in a language I don't know?

The final, and biggest question is this: am I insane to try to manage a complex project in a language I don't know, with a focus that is slightly out of my field, in my spare time, alone? If the answer is yes - what should I look for in a collaborator? Given my weak programming, I worry that I'll lose control of the project once I explain the basic ideas to a reasonably intelligent programmer. Does that mean I should focus on getting them to sign an NDA and officially employ them, or should I employ a talented friend and trade some control for the trust of our friendship? Should I be programming at all?

Anyway, that's my rambling set of concerns about my project. Any thoughts on how you might go about this yourself?
Where would you ask for advice if you felt out of your depth on a project you loved?
posted by metaculpa to Technology (22 answers total)
Dude, you don't write something like this all at once. Write it in small chunks that talk to each other. The web scraping is a particularly easy piece and I'd recommend that you concentrate on that first. Then, work on the data access layer, then wire the two together, then you can start in on the business logic.
posted by bshort at 4:10 PM on July 16, 2005

One option might be to outsource well-defined chunks via something like ScriptLance, and then you're the only one who knows how to put the puzzle together.

It sounds like the project is pretty serious though, so it's still going to be a substantial amount of effort for you. Also, optimizing systems that are heavily database-dependent can be a significant challenge all its own, one that can require a LOT of time and money.

If you have neither, your best bet might be to write up a business plan and take it to the money people before you've written a line of code. Of course, then you're giving up a lot of control right off the bat, but it takes money to make money.

Whatcha scrapin', anyway? ;)
posted by trevyn at 4:16 PM on July 16, 2005

And make sure it doesn't exist already. Sounds kind of like Findory?
posted by trevyn at 4:17 PM on July 16, 2005

The final, and biggest question is this: am I insane to try to manage a complex project in a language I don't know, with a focus that is slightly out of my field, in my spare time, alone?

It's certainly doable, and it's a great way to learn more about the language you intend to use. Like bshort said, do the easy bit first, and by the time you're done with that, you'll know the language well enough to start in on the more challenging bits.

Python or perl would both be fine for something like this.

Does that mean I should focus on getting them to sign an NDA and officially employ them, or should I employ a talented friend and trade some control for the trust of our friendship? Should I be programming at all?

If you hire someone else, make them sign an NDA (one that a competent lawyer has looked at closely, of course) and have them work as independent (or "1099") contractors so you don't have to deal with their taxes. Hiring someone else won't mean that you lose control of the software; if they don't follow your management/vision/whatever, find someone new. They're there to put their programming skills to work, not draw up the Product Requirement Document.

Speaking of a PRD, write one up. Make it very detailed, and make sure that you hire someone who knows how to read one, and someone who knows how to write the accompanying Engineering Requirements Document. This way you'll have an easier time transfering the knowledge from one employee to another.

Good luck!
posted by cmonkey at 4:33 PM on July 16, 2005

Oh, and even if you hire a friend, make them sign all the legal documents. A friendship can sour all too easily when money gets involved.
posted by cmonkey at 4:34 PM on July 16, 2005

I have the impression (from people I know who tried to start a startup) that the most important thing in a collaborator is someone you trust, get along with very well, and have a lot of confidence in. The people I know were all friends at the beginning, set up various legal agreements, and even then it didn't end all that well (the company failed - they had a finished product they couldn't/didn't sell, and the various people involved splintered into subgroups who didn't like each other any more). Of course one thing they did that seems to me to have been very unwise was invest substantial (to them) amounts of their own savings for servers/office space/etc.

Python is particularly well-suited for large matrix operations, given that it's well-used in the scientific community for such things. You probably want to check out scipy and numpy. As to memory management, if you are worried, you should probably try to write a simple proof-of-concept for whatever kind of calculation you are interested in first, and see what the memory requirements are, before prototyping anything else.

In my experience, large-scale personal projects done in my spare time progress excruciatingly slowly. You might want to take on some smaller-scale project (or a small modular chunk, like bshort suggests) in your spare time first, and see how much you actually do and how hard it is to make progress. In particular, I find it very hard to make progress if I don't divide things up into chunks where I see visible progress on a fairly short timescale (~1 week).
posted by advil at 4:44 PM on July 16, 2005

I've had good luck with rentacoder.com, depending on how much you want to outsource. Also a lot of the people are in east Europe and Asia and work cheap, so you may get a lot of it done with them (rentacoder has an NDA). Given the language barriers and their more limited resources, it will be less likely that a programmer in Slovenia will run off with your idea versus one in New Jersey.
posted by rolypolyman at 4:49 PM on July 16, 2005

I believe it was the founder of Atair. Mr Busnell that said you should always have people MORE intelligent working for you, how will you ever promote yourself the a higher level if you can train someone comptent to do your own job :)
posted by crewshell at 4:49 PM on July 16, 2005

i could do that :o)

seriously, you need a good programmer. that probably means paying them, but you might find a hopeful youth who'll do it for a fraction of the future proceeds. to be honest, you probably need to pay and offer future proceeds, since you can't offer much job security.

if you find the right person, what you describe is not much work, at least for proof of concept. worry about handling millions of users later - what you need now is something that will get you funding.

look for someone who can show you similar things they have done, and who can make something that looks pretty. don't understimate looking pretty.

python would be fine for the initial version.

and you won't make loads of money. you do realise that, right?
posted by andrew cooke at 4:52 PM on July 16, 2005

For the screen scraping, I would recommend Perl, and the module WWW::Mechanize which has lot's of goodies for following links, form submissions, etc. The only thing it can't handle very well is javascript, as this requires a full browser type application.
posted by Dag Maggot at 4:56 PM on July 16, 2005

Its sad to say, but I have to reccomend not using another programmer; this really doesn't sound too big, and you don't want to become dependent on them. Instead, take advantage of open-source stuff like Python, and the tools others have mentioned. I would think that personal mastery of the technical details would be helpful during the business stuff, and if this is something truly new, you'll want to oversee the inevitable compromises and tradeoffs of development yourself.

I'm no programming master, but was originally self taught because, like you, there was something I really wanted to build, and teaching myself to program was the easiest way. Two pieces of general advice: expect to throw almost all your code out and start over at least twice (this is normal and generally good, as long as you have time), and never underestimate the breadth of the open-source community... if there's any part of your project that's concievably useful in other contexts, chances are that at least half a dozen open-source implementations are already available.
posted by gsteff at 7:32 PM on July 16, 2005

open source is great, but if you're building a commercial application, make sure you read the license agreements. if you do end up making a gazillion dollars, you don't want someone to come along and sue you for using their intellectual property.
posted by clarahamster at 7:41 PM on July 16, 2005

You mention not having much programing skills, what do you have to bring to the project? I'd be wary of starting a project of this size and not having a good solid understanding of the basics. Yes there are lots of self taught programers out there. Yes some are very good, but it takes time to get there, and most never do. Even godly coders have a lot of really stupid projects in their past, often a past that goes back to when they were 10, but hey.

How much money do you have saved up? How long can you last without money? Could you seriously work 40 hours a week at a job and 60+ on your private dream? I've never seen anyone able to keep that up for very long. While it is still not dot.com days all over again there is money out there for smart ideas, but you need to know how to find it, but if you planning on keeping your day job, well, suspect it to be an awesome learning experience rather than a money maker.
posted by aspo at 7:49 PM on July 16, 2005

The single largest determining factor in whether or not this is flat-out insane is the relationship to risk:

1) If this is purely a self-starting project, where you see an unfulfilled need, and the client has very reasonable expectations (if they have any at all), then this could be an interesting effort. You can build it in pieces as you experiment and learn how to do things, and then turn it over once it's ready. Could be an awesome hobby, and if it doesn't work, who really cares?.

2) If you're talking about putting yourself on the hook, contractually, to build something with fixed features, in a fixed timeframe, on a fixed budget--when you don't even know how to do it--then don't. Don't, don't, don't.

If it's anywhere in between, then be very careful to make sure you understand the risks involved, and have mitigated each one of them. If you can't, then don't do this. I really don't mean to discourage your ambition, but believe me--there is really nothing worse than being contractually committed to something you can't deliver. Be careful.
posted by LairBob at 7:53 PM on July 16, 2005 [1 favorite]

metaculpa: So: will python do the job for this? [I know it has the modules to save me a lot of work, but: Should I worry about memory management for large computations? Should I worry about speed, stability?] Am I insane to try to develop a major project in a language I don't know?

As a former Python programmer, I recommend Ruby, which is a more object-oriented language with better support for web technologies. Ruby has some high-quality numeric libraries, including NArray (similar to NumPy, but also provides vector and matrix subclasses).

Ruby is easy to get started with, and if you already know a little bit of programming, you can get productive within minutes. Why's (Poignant) Guide to Ruby is probably the best (and funniest) introduction to Ruby around (and the only programmer's guide that comes with an onion), and it'll have you yelling out "chunky bacon!" by the third chapter. Programming Ruby (affectionally known as the Pickaxe) from The Pragmatic Programmers is also a good start; the previous version (covering Ruby 1.6 only) is available online. And if you need help with something, the best solution is to join the Ruby users mailing list and/or join the #ruby-lang channel on the FreeNode IRC network.

Like Python, Ruby has the ability to call into native code through a C interface. The recommended practice is to write the code in Ruby first, then iron out the performance bottlenecks in C, C++ or any other natively compiled language you like. By the time you have written the code, you'll usually find that performance is acceptable; but in your case it might be necessary, either to achieve acceptable performance or to be able to fit your structures in memory.

On the other hand, since you say you're not a programmer, you might be overestimating your requirements. For example, Ruby's NArray is implemented in C and can work with very large multidimensional arrays; for example, I tried creating a 10,000 * 10,000 array of integers, which took about 1.5 seconds and consumed 400MB of memory (10,000 * 10,000 * 4 bytes per int = 400MB; that's zero overhead). Working with the array is fast: Setting all 100 million elements to a value took just 1.5 seconds, for example.

Oh, and you're not insane. But if you want to actually complete the project, you will need some help.
posted by gentle at 8:00 PM on July 16, 2005

I'm going to echo and expound on one of the earlier comments. If your really think that your idea could generate a bajillion dollars, then I'd say that coding isn't necessarily the first thing you should do. The first thing you should do is document and protect the idea (patents or whatever else a good lawyer recommends). Then I'd recommend the following:
1) Build a business plan with a solid business case
2) Talk to Venture Capitalist types
3) ( this is where the miracle happens and you find someone that thinks your idea is interesting)
4) Get funding
Then you start a real business, you'd have the development arm working to code and deliver the prototype and/or product as the business development arm is out drumming up business.

Of course you can be coding to prove that your concept is realizable while doing all of the above. However, its been my experience that successfully moving an idea to a successful product is less about creating the product, and more about selling it. It helps to actually have something to show, but its not required, since people may be interested in the idea, rather than the product.
posted by forforf at 8:56 PM on July 16, 2005

I second forforf's comments. *Seriously* second them. If there really is big money in what you're doing, you'll want to protect the idea. The last thing you want is to be plodding along, learning as you go, and then 3 months down the line read about some company with resources announce something similar. And if the idea is protected (assuming of course that that's possible), there's much less of a need to worry about NDA's and such.

And as trevyn said, do some research and make sure it hasn't been done. No use in reinventing the wheel.

Good luck!
posted by edjusted at 9:28 PM on July 16, 2005

forforf: 2) Talk to Venture Capitalist types

Probably not. At this stage, the founder has no capital, no product, nada -- just an idea. The VCs, if they are even interested, will demand a huge stake of the company in the shape of equity, in order to offset the risk of funding something from the ground up, and they will require a lot of control over your company.

They may like your idea so much that they decide it would be more profitable to rule out the founder. VCs don't sign NDAs; quite on the contrary, they will demand a rigorous "due diligence" process, sometimes as a cover to find out everything about you.

VCs routinely steal ideas; a good example is the Kozmo vs. Urban Fetch case back in the violent dot-com days:
    Integrity was impressed with Kozmo's business plan. So impressed, in fact, that it decided not to fund the company but back a start-up rival, Urban Fetch, instead. Kozmo subsequently sued Fetch for allegedly copying its model. Fetch counter-sued, arguing Kozmo's legal action had helped turn off the tap on further funding.
(I believe that it's worse than the article says: Urban Fetch wasn't an existing rival, but was started by Integrity itself.)

You will want to defer VC funding for as long as possible. When you have a product, sales, earnings, even real profit and capital, you are in a powerful position to negotiate the terms. Until you are at that stage, "friends, family and fools" are the way to go; angel investors are another source, though they, too, demand potentially crippling terms.

For more insight, see Joel Spolsky's Fixing Venture Capital, and An Engineer's View of Venture Capitalists.
posted by gentle at 10:22 PM on July 16, 2005

If you are enough of a programmer that you can figure out how to make "with input x, figure out what it means and return it" -- and you have the drive to get it done -- then hire an architect type.

In my experience, many of the top programmers are not as successful as they should be, because they struggle when not challenged. The chance to assist with the "tough" problems while avoiding what is drudgery to them can be a godsend.

If you can do a large part of the implementation work, the right person could steer you well through the process and cost far less than full outsourcing, while you gain great experience.
posted by SpookyFish at 10:51 PM on July 16, 2005

I wanted to point out that I also think gentle is right (but I am too!).
The difference boils down to your priorities. If you want control, then you'll have a harder time finding external funding, and will likely have to risk a large chunk of your own capital (either directly or as collateral for loans).
If you want to minimize personal financial risk, then the approach of protecting your idea and get external funding may be better. Gentle is exactly right, though, in that if someone gives you money they will want some controls.
posted by forforf at 11:15 PM on July 16, 2005

I would reccomend keeping it within a very small group until you have a working prototype. As far as languages, python is an excellent choice, although for the large computations you will probably want to utilize straight C/C++. Thankfully Python integrates very well with this type of thing. As said before, develop a working protype and worry about scaling it up later.
posted by sophist at 1:34 AM on July 17, 2005

I could see doing this project in Python easily. Here's some modules I think you would find useful:
  • RSS: Search the web and comp.lang.python for specialized modules, but if you don't find any, I think ElementTree/cElementTree is the easiest-to-use XML parsing module
  • web scraping: mechanize
  • database creation and heavy live usage: either PostgreSQL and psycopg or MySQL and MySQLdb
  • text analysis methods: you're on your own here, unless you want to describe these methods
  • numerical methods, particularly (very) large matrix computations: SciPy or numarray
  • once it's done, the fun of talking to very business-driven people: ReportLab to generate PDF reports
Initially, I'd do each one of these steps as a separate program. It will make it a lot easier to get accomplished.

Remember that premature optimization is the root of all evil. But if after profiling your programs (with hotshot), you find you need more speed in those heavy analysis methods, you can always rewrite the most-used ones in Pyrex, which produces C code from a Python-like script.

Another nice thing about Python is that the developers write in English, and all the documentation is in English.
posted by grouse at 1:45 AM on July 17, 2005

« Older Can You Hear That Dog?   |   Clarinet music recommendations Newer »
This thread is closed to new comments.