How to remotely transfer inventory data?
April 18, 2005 9:46 PM   Subscribe

I'm the 'lone programmer' for a small business, making an app that will log when inventory is received, and also used, at a remote location, via serialized barcodes. What would be the best way to transfer this usage data to me, preferably daily?

I'm programming fully in python with boa constructor. The remote computer will have net access, but who knows how reliable, and additionally, it's also in mexico. I've been brainstorming all alone (poor me!) about how to do this, but I need some advice from some people who have some experience. i.e. you.

My ideas thus far:
Use the Twisted Framework (which looks awesomely awesome) to either:
-transfer data via Perspective Broker (an RPC framework) to our domain, although im not sure if the hosting company will let me run my server script.
-transfer data via FTP to our host.
-transfer data via SMTP to a mailbox.

Money is involved (indirectly) so reliablility is required. The only data that needs to be sent is barcode numbers, along with how many were scanned. What would be the best way?
posted by Mach5 to Computers & Internet (17 answers total)
If your end is reliable, you could always have it send the items via RPC calls to you in set chunks. If the connection is up and down in a bad way, then FTP or SMTP are not necessarily reliable, but RPC calls could be re-requested. However, if there's even a modicum of reliability involved, using something like FTP or SMTP is certainly more desirable! (And far easier to implement)

Due to the sort of information you're sending, tar and gzipping the plain text files together each day would probably end up with a miniscule file that could be sent in seconds. Furthermore, going out on port 25 or port 110 would get around most firewall scenarios that may be in place.
posted by wackybrit at 9:59 PM on April 18, 2005

I like the idea of ftp for it's simplicity. A simple chron script could run every hour on the remote machine and place data into an ftp dropbox on the server. The client script would report any data that has not yet been sent; If the script is unable to connect for 2 hours, it may be dropping off three hours of data. The server can then poll this ftp dropbox, keeping things organized and watching for incoming data. I would look into using a checksum system to validate complete files, and keep logs of all data on both machines for periodic reconciliation. The ansynchronous nature of this allows missed connections to be handled gracefully, but forcing the remote machine to push data when it's connected and can send info.

However... with FTP alone, you will be sending plaintext passwords, and so another layer of security would be prefered. SSL, rotating passwords, and connections restricted by IP at either the FTP server or the router would be good steps to protect the server.

I could see a simple webservice handling this as well, a script that accepts a csv chunk of info (or an xml format) via a form post. This method is similar to a dropbox in that it allows the remote machine to periodically push data. It would be best to use a checksum system to validate the data, and restrict connections via IP as well. The remote machine would keep track of success codes from the webservice script, and only send new data periodically.

I am not a big fan of using SMTP mailboxes as vehicle for moving data this way.
posted by tumble at 10:11 PM on April 18, 2005

HTTP is pretty much always the best way to transfer data. It's the One True Protocol. What you're describing seems like a classic case of a simple POST with some basic XML. For security, you get HTTP Digest for free. Also you get all the side benefits like common error codes (service unavailable, unauthorized) and protocol specification. There's no need to invent your own protocol over FTP or SMTP. Programmers need to stop inventing protocols. There's a severe overpopulation problem as is. HTTP is all you need.

As for reliability this depends on what you mean by reliability. Most of the time you can get away with failure and corrupt data as long as failures are easily detected. Hopefully you don't need real reliability and can get away with sender-fails. If you do need 100% reliability then things get pretty tricky. You should probably use a shared database. People have created things like HTTP-R but they're kludgy. I've rolled my own reliable HTTP protocols and it's not rocket science but it's a bit of work.
posted by nixerman at 10:57 PM on April 18, 2005

What about, instead of the remote app pushing data to your server, your server pulls it? It would be fairly trivial to either install a simple web server on the client's computer, or even to program a very trivial one in python. It doesn't even have to be a web server. The server on the client side could be something that your server connects to periodically, it streams some data, and you disconnect. An FTP server on your client's computer would be OK too although not ideal.

I don't know much about python, but other scripting languages I've used have had fairly easy access to encryptable, compressible channels either natively or through external libraries. These would be wise to use.
posted by RustyBrooks at 10:59 PM on April 18, 2005

I have to disagree with nixerman. HTTP is far from the ideal protocol. It's goodness rests pretty much solely in it's ubiquity. There is no point in sendind your data as XML. Send it in whatever format you it's easiest for you to put it in, and as compact as possible. For sending a list of barcodes and the number of times they were scanned, XML would probably double or triple your data file size for no good reason. If you're concerned that you might not get the whole file, that's what checksums, headers, and footers are for.
posted by RustyBrooks at 11:02 PM on April 18, 2005

I'd say FTP, SCP, or HTTP; not SMTP.

Why's that? Let's say your server can't connect. With FTP, SCP, or HTTP, you know right away if it's failed or if there was an error on the far side.
With SMTP, you may as well be using pigeon packet protocol for all you know about your information reaching the destination reliably.

I wouldn't structure your data; XML is overkill. Make it easy on yourself, just do a comma or tab delimited text format. I'm going to disagree with RustyBrooks that HTTP isn't a great format to use; XML wouldn't be ideal, but HTTP is stateful in the sense that you know if the transaction succeeded or not (you can capture a 503 or a 200 OK but error code in the content field), tolerates a ton of tiny connections over a long period of time, performs well in low-reliability situations, can be set up with security and authentication, and is an ideal format to squirt a large number of tiny transactions down a shakey, iffy foreign line. My experience is that you're much more likely to have an issue with data integrity or scrambling on a line that could be mangling your packets with SCP or FTP (that being said, I'd still checksum your data somehow, since money's involved.)

I don't know how to do it in python, but I'd have a script on one end that would cache the scans and send a blip via HTTP every minute or two. I'd use comma-sep, and I'd checksum your data on the mexico server, and use the checksum as a header and a tailer. Then re-checksum the data on the US side, and if the checksums all match, report a transaction is good; otherwise say "hold onto the data and try again next cycle." (With frequent updates you're usually pretty close to up-to-date, even if you skip a few cycles you're still pretty close to up-to-date, and it looks impressive to the bosses to see the numbers scrolling up while they watch.) If the connection times out, scrambles, or otherwise fails, hold onto the cached data, and continue to collect data, and then try agian in another minute. Wash, rinse, repeat.

Re: reliability ... See if you can get a backup connection for the machine ... for instance, have the primary connection be a network connection, and then if that fails X number of times, start updating via a PPP dialup internet connection every 15 minutes or so. On top of that, make sure the remote machine is running RAID, has redundant power supplies, and is plugged into something like an APC or a really good surge protector. If you do the code well, don't forget that you can still get taken out by the unreliability of the infrastructure or the hardware down yonder. (Redundancy is good. Redundnacy is good. Redundancy is good.)
posted by SpecialK at 11:22 PM on April 18, 2005

"Send it in whatever format you it's easiest for you to put it in, and as compact as possible."

On behalf of the future poor chump who inherits this system to maintain years later... I hate you.

Sure, XML may increase the size, but don't sacrifice simplicity and ease of maintenance without thinking about the consequences (and cost) down the track.
posted by bruceyeah at 11:36 PM on April 18, 2005

The protocol isn't important. I'd go with either XML over HTTP or a straight database sync, but use whatever you have libraries for and fits with your architecture.

As to your connection, when I've been in similar situations (monitoring apps sitting in remote locations, eg a couple of miles out to sea) I've gone with a GSM modem. You don't have to worry about cables getting pulled or the local ISP in that case - you just dial into the home office's network directly, for a few seconds (you're sending integers, which properly stored are going to be tiny), once a day. Mexico appears to have CDMA, TDMA and GSM to choose from.
posted by Leon at 12:32 AM on April 19, 2005

Since you want to do this on a regular schedule, I assume you will be checking for the results on a regular schedule? So if you don't receive them, you know something went wrong.

If this is the case, I'd stick with SMTP. The data size is small, someone is waiting for it, and SMTP will queue if it can't be sent.

Also, this allows you to easily change to To: address in the future if need be.

You could very easily have a daily and a weekly mail sent to you with a cron script. If you think the system will be offline a lot, have it send the past few days worth of data every day for redundancy.
posted by bh at 3:58 AM on April 19, 2005

Worrying about the size of your payload is pretty silly. This isn't 1995. Yes, XML will "bloat" protocol and who knows you may end up transferring a whole 1kb! What you get is something human readable, self-documenting and that plays well with many other tools and libraries out there (XSLT, XML Schema). Unless size really is an issue there's absolutely no reason to invent your own data format or use something like CSV.
posted by nixerman at 4:44 AM on April 19, 2005

HTTP for sure. ftp, scp, or rsync would transfer the data nicely, but then you need something additional to tell the other side when the transfer is complete. You probably don't want it picking up a partial file before the transfer is done.

RPC would be good, but too complex for me.

SMTP would be nice and simple, but leaves you at the mercy of the mail servers involved.

So I would go with HTTP. Use a CGI or whatever, not just a webserver dumping stuff to a file. Use a sequence number for each record so you can detect any gaps or duplicates.

XML is more difficult for both humans and machines to parse than is csv or similar for data like this. Fine to use if you already have an XML parser around, but it wouldn't pay to go out of your way for it.
posted by sfenders at 6:32 AM on April 19, 2005

No kidding. XML "plays nice" with other tools, provided that they have some built in way to parse XML or access to such a mechanism. For many platforms this means rolling your own xml parser or trying to build in a third party library. In about 90% of the cases I've seen, XML is used to store what is essentially a 1 dimensional array. CSV is a perfectly self documenting format for this. The first line would be "UPC,NUMBER" indicadting the values of each column, followed by a set of lines. The equivalent XML would be no more readable and probably less. To call the XML for such a thing easier to parse or handle is simply ludicrous. To claim that it would be difficult to maintain such a format in the future is frankly laughable.

If you're dealing with an unreliable connection, size of the file matters a lot. Frankly, you also don't know how many UPC's this guy wants to handle. If each transfer has, say, 100 UPCs then you're talking about, perhaps, 1K CSV versus a 3K CSV. These are functionally equivalent given the setup required for transferring. If, however, you're talking about 100,000 then we're now talking about transfering 1MB or 3MB. There's a pretty big difference between the 2.

I'm currently working on a project that involves RPC over a fairly limited available bandwidth connection. The connection is a T1 but it's the line is also used to stream backup data from our production DB servers to our backup ones so there is generally only about 100Kbits/s available at the top to make the RPC calls. Being smart with our data sets, and compressing them, saved us about a factor of 10 transfer wise, and man is there a difference between 5 seconds and 50.

My suggestion regarding having your server fetch from the client also has to do with one of the bitches of HTTP: you can retrieve from HTTP in binary but you can't really send. To send binary data you have to mime encode it which increases it by a factor of 3-4. So, more efficient to have your server fetch a nicely compressed file at given intervals than to have the client post a much larger file.

I think HTTP is OK and it probably is the best tradeoff of simplicity and functionality in this case. Sorry to be a hater but HTTP often frustrates me.
posted by RustyBrooks at 8:43 AM on April 19, 2005

you can retrieve from HTTP in binary but you can't really send.

I don't think that's true at all. According to my undersanding of the rfc, the entity body is defined exactly the same way for responses as it is for requests, so you can post all the same types of data you can receive in a response. Maybe you're thinking of a particular webserver that doesn't support gzip encoding on POST requests or something?

If you're dealing with an unreliable connection, size of the file matters a lot.

Yeah, it may be best to send it in smaller pieces if that's a concern.
posted by sfenders at 9:38 AM on April 19, 2005

Perhaps you're right about sending compressed data in a POST. I don't know that I've ever tried it. I'm pretty sure the server I use most often (AOLServer) doesn't support it but it pretty much doesn't support anything like that natively. I don't recall if Apache or IIS support it but obviously I've never heard if it, since I said the above (note: not saying they can't do it, just saying I've never heard of it)

Do you know of webservers that definitely *would* handle a POST that had the body listed as Content-Encoding: gzip ? I guess I always fall back on the mime because most webservers support it (and most browsers do also)
posted by RustyBrooks at 10:36 AM on April 19, 2005

mod_gzip does it for apache. You could also just use content-type gzip. Every server should at least be able to deal with application/octet-stream, though I wouldn't be too surprised if something called AOLServer is an exception.
posted by sfenders at 10:55 AM on April 19, 2005

Heh, AOLServer has little to do with actual AOL. I don't remember much about it's history but it's an open source project these days. It's actually quite good, but it's not the swiss-army-knife kind of server that Apache is. AOLServer comes as a very bare-bones server but it's fairly easy to add functionality onto it. I should look into it to see if it supports application/octet-stream posting natively or not.
posted by RustyBrooks at 11:44 AM on April 19, 2005

thanks for all your help guys. to follow up, the amount of scans per day is not going to be a whole lot as far as i know. im pretty sure im going with the http post method, but just plaintext. i got another burning question to ask next time i can post about windows desktop (the literal desktop, i.e. wallpaper) programming. i wanna turn electric sheep into my wallpaper.
posted by Mach5 at 8:07 PM on April 19, 2005

« Older Jon Stewart on Oprah   |   ImageReady help for Fireworks refugee Newer »
This thread is closed to new comments.