Synchronize databases....3...2...1... NOW!
April 3, 2008 6:50 PM   Subscribe

What's the best way to keep data one way synchronized between an internal Mircosoft SQL Server database and a web facing mySQL database?

So we have an internal system that's based on SQL Server. I've created numerous DTS packages for moving SQL Server data from the inside SQL Server to a SQL Server at our ISP with some complex transformations, but what I really need to do now is move data from our internal SQL Server to a mySQL database.

Here are some requirements and complications.

1. The internal SQL Server system doesn't have any obvious way to see what's changed in a days work.

2. The mySQL server should know when a row was last modified!

3. We can have as many supplemental tables on either database, but no changes can be made to the exisiting SQL Server tables.

4. Assume SQL Server 2000 and the mySQL version is rather up to date, but completely flexible on the mySQL side.

Are any of those database synchronization tools out there decent? The internal database is mission critical, so it's vital that it not be fouled up.
posted by advicepig to Computers & Internet (9 answers total) 4 users marked this as a favorite
Honestly, last time we faced this issue we ended up writing web services to push the data. We had an additional table on the MS-SQL side that described which data was pushed and when, and then we loaded it into MySQL along with any metadata we needed, usually a timestamp of when the dataset was queued up to push.

At first we were just caching it using memcached on the web-facing side, but then we went to actually keeping a copy of the database since reading in all the cached objects and grinding them for some statistical summaries the website had to do was getting to be a bit of a kludge.

It's kind of a kludgey fix, but there was no other good way to do it in our case since a lot of the data was in complex relational structures that needed to be pushed as a complete transaction as opposed to on a row-by-row basis.
posted by SpecialK at 7:50 PM on April 3, 2008

The easiest and safest way I've found to accomplish something with as much detail as you'd like is to build DTS packages to export as CSV or similar and import at the mySQL end. mySQL is so fast, it'd almost be quicker to replace entire tables (as long as there weren't millions of records).

Main advantage is that you have much more control over your data and can fine tune as necessary.
posted by snarkle at 7:50 PM on April 3, 2008

Oh, and how we queued the changes was we altered the sprocs on the MS-SQL side that entered the data into the tables in the first place to also queue up the row pkey and table id in the push-queue data, and then an automated process on the MySQL side would query the web service which would pull that chunk of data and any relationally associated data. The latter was a bit of black magic done by a C# coder I know.

You're probably going to end up with a separate storage silo on your MS-SQL box that maintains the state of the last synch with the MySQL database and acts as a local synchronization comparison. I have no idea how you'd go about doing this.
posted by SpecialK at 7:54 PM on April 3, 2008

This may help. Haven't had to do it myself yet, but the topic of the article is “How do I… Set up replication between Microsoft SQL Server 2000 and MySQL?”
posted by lordaych at 9:28 PM on April 3, 2008

Replication still doesn't solve his need to know when data has been updated in the MySQL side.
posted by SpecialK at 9:38 PM on April 3, 2008

? This is a one-way push from MSSQL to MySQL as I understand it.

I did skip absentmindedly over the concern The mySQL server should know when a row was last modified!

Sounds like a good application for triggers. This could get tedious depending on the number of tables involved, but a decent coder could probably write a script to build them and to create separate tracking tables on the SQL Server side for timestamping purposes.

The replication process could bring these tables over to MySQL as well.
posted by lordaych at 9:56 PM on April 3, 2008

I'm with lordaych on the "when a row was last updated" part. Treat that as data, not metadata, and store it in the table.
posted by mendel at 5:20 AM on April 4, 2008

Response by poster: OP here: Sadly since the internal database can't really get mucked with too much without losing our vendor's support, I don't think adding triggers will work. Maybe I'm just the sucker who's going to have to try those unreviewed apps out there and report back.

We've gotten by with drop and reload using dts packages for a while, but we'd really like to be able to show last modified data.

(Is it AskMeFi cheating to mark something a best answer for not misusing the term metadata?)
posted by advicepig at 6:42 AM on April 4, 2008

An alternative to triggers might be running a continuous trace (i.e. automate the functionality of the SQL Profiler tool). You could transfer and then parse this data on the other end using a number of means -- DB or filesystem.

See: Using the SQL Server Profiler
posted by lordaych at 1:01 PM on April 4, 2008

« Older A New View of 'Julius Caesar' for High Schoolers   |   Wind Instrument Try-Before-Buy? Newer »
This thread is closed to new comments.