Help me with a couple of technical strategy questions as I build a web based service from scratch.
November 11, 2005 11:57 AM   Subscribe

I am building a web based service. I am about 1/3 done with the project. Right now I am starting to think about how this app will scale if the user base expands into the hundreds (hundreds of sites with each getting their own unique traffic). Apoligies in advance if my question is too conceptual or ambiguous.

The concept behind this service started with my own site. I built a site that I believe to be valueable/unique/fancy. I built this initial site so it was very stripped down and easy for me to edit. The postings are one long text file that I was doing file I/O on through PHP. The gallery was being built through MOD/REWRITE goodness based off of directories and files I uploaded through FTP.

Because I had all this existing code. When I decided to start this service I started building the site in a distributed fashion to re-use code and logic. The scripts (PHP) are centralized but the data is scattered across each account in a combination of xml and flat text files.

So here is my question for y'all:
Should I bring all the data into mySQL so all the data and structure would be in a centralized spot?
Will it save me in the long run when I start to squash bugs/add features/work on version #2?


Other Misc. Details
------------------------------------------------------------
- I am doing everything (building/designing) myself from scratch.
- I am using PHP 5 and mySQL hosted through Dreamhost.
- I have a decent handle on PHP and rationalizing databases. However, I also have had a couple of scary situations with over zealous joins where they started really bogging down.


Elements that Make Up Each User/Site
------------------------------------------------------------
- the public site (gallery, simple blog, other interactive areas)
- user settings (which skin, which sections)
- data (photos, posts, etc)
- a web-based login that will let them edit their settings and data
posted by rdurbin to Computers & Internet (2 answers total)
 
A database will help, of course, but I would warn to generalize even that. Make drivers for your access functions. Something like the Perl DBI interface may be enough for you, or you could go all the way and write a backend driver for DBI, one for flat text (although DBI can use DB_File/et al.), etc. Then you can choose many methods depending on how reusable / adaptable you need the code to be. Overengineering is a very real possibility here, ofr course.
posted by kcm at 12:31 PM on November 11, 2005


It's very often a good idea to store data in a DB instead of flat files. Try to partition your data so that if the db server gets overloaded you can add more servers painlessly. For example, for user data, find some unique identifier that is always part of the input to every request (e.g., user name) and use a hash of that string to determine which database server the request is sent to. A simplified way of doing this would be "All users whose names are in A-M are on server 1, and N-Z are on server 2".. Build a caching layer on top of the database (I think PHP has a module that helps with storing peristant data in memory, though its name escapes me) to reduce the load on the DB, and make this your primary scaling point. Design it so the cache can run on different boxes than the databases in case you need to scale them separately. Each site can have its own cache as well. Such a design would allow you to scale up arbitrarily large, as long as your DB is well designed (no joins across many tables, for example).
posted by jewzilla at 6:58 PM on November 11, 2005


« Older Number 9?   |   Rebuilding Movable Type Using PHP Newer »
This thread is closed to new comments.