Join 3,379 readers in helping fund MetaFilter (Hide)

Archival software naming ideas needed
January 20, 2013 6:16 PM   Subscribe

Could anybody suggest some name ideas for a program (UNIX command-line/cron job, not GUI) for progressively backing up/archiving files in a directory tree to archive volumes and sending them to some remote location (i.e., Amazon Glacier, or else a server reachable by ssh)?

This tool would be called with the path of a directory tree and would keep its configuration and metadata in a directory under the tree. When it runs, it goes through the tree, finding unarchived (new or changed) files, grouping them into volumes according to size limits in the tree's configuration, packaging and postprocessing (i.e., encrypting or similar) the volumes and then pushing them to their final destination. A list of volumes and which files they contain is kept in the metadata directory, and can upload to a remote location as well. It would be modular in design, having a number of configurable modules (storage back ends, encryption, schemes for naming volumes). It would be written in Python, for what it's worth.

If you're reading this and thinking 'git' or 'subversion', not quite; this system does not implement the retrieval of the volumes (that's a matter of downloading, decrypting them and untarring them, though a decrypt script would be provided as a convenience), and it deals in sequentially numbered volumes, which it assembles. The point of it is basically to be able to back up a large, gradually growing/changing tree (i.e., music collection, photos) to Glacier or a similar service.

This is something I'm writing for my own use and intend to open-source when it's ready, though need a name for it. Any suggestions
posted by acb to Computers & Internet (22 answers total)
Tarsync? Pusher? Offset? Offsite? Scat (ie, it poops in the woods)?
posted by gregglind at 6:35 PM on January 20, 2013

posted by odinsdream at 6:42 PM on January 20, 2013 [2 favorites]

Nth'ing rsync, that's what it's for.
posted by tylerkaraszewski at 6:44 PM on January 20, 2013

He's asking for something that acts as a complement to rsync, not for rsync itself. I'd think something like "cloudsync" or similar... although if you're targeting Glacier a more specialized name (and interface) may be desirable, given the necessity of dealing with byte offsets in archives if you're trying to pull down a specific file.
posted by sonic meat machine at 6:54 PM on January 20, 2013

Dump Up The Volume...
Cloud Of Holding...
car - Cloud ARchive (a la 'tar')...
Wherehouse... Wherehost...
posted by zengargoyle at 7:10 PM on January 20, 2013

It's not rsync; it does not copy the original files as is, but batches up new ones into archives and pushes them across (optionally encrypting them), making no assumption of random access on the other end (Glacier is an extreme example of this).
posted by acb at 7:11 PM on January 20, 2013

And “sync” is probably a word to avoid, given that it only works in one direction. Architecturally, there is no provision for reading the remote end.

The software also doesn't use byte offsets, rather sending discrete volumes one at a time. Each volume makes a new file on the remote host or a new volume in a Glacier vault (or whatever the back-end used is).
posted by acb at 7:14 PM on January 20, 2013

Push-only and cloud oriented?

Maybe... storm surge, monsoon, maybe something like 'ice age' if it's only targetted toward the slower backends like Amazon Glacier.
posted by Matt Oneiros at 7:41 PM on January 20, 2013

freezer (I like this one because waiting for thaw == slow recovery time)
cpush (cloud push)
posted by scose at 8:05 PM on January 20, 2013

Not to parade-rain, but this sounds almost exactly like duplicity - that may already do what you want. As for name suggestions:
evaporate - send to the cloud
packup - back up a pack at a time
entomb - has that feeling of permanence & weight
amber - preserving for the ages
pushup - just what it sounds like
posted by pocams at 8:26 PM on January 20, 2013 [1 favorite]

It seems like rsync could do that too, even without knowing what the old files were. For example, you could rsync to a local temp folder for the first batch and on subsequent occasions accomplish the discovery of changed/new files like so ...
(cd src && find . -newer .somedotfileyoutouchedlasttime -print) | rsync --files-from=- src/ temp/
and of course tar, gzip, gpg, and scp (or whatever) the temp folder to your long-term storage. If there are size limits somewhere, throw a split command into the mix.

So, I like all of scose's and pocams's suggestions, but I'd probably add ".sh" to the end or try to think of new distinguishing features that could be the basis for the name.
posted by Monsieur Caution at 8:45 PM on January 20, 2013

Also sounds like Cumulus [pdf] (also, sorry for the parade-raining)
posted by qxntpqbbbqxl at 8:49 PM on January 20, 2013

Filefish (they are real)
posted by carmicha at 9:06 PM on January 20, 2013

It makes sense to me that this isn't the same thing as rsync, though maybe it would utilize the same algorithms the way that git does. (But I can see that it isn't the same thing as git either.)

If the primary use case for it is going to be Glacier, how about calling it Ötzi?
posted by XMLicious at 9:30 PM on January 20, 2013 [2 favorites]

Shover robot
posted by flabdablet at 1:47 AM on January 21, 2013 [1 favorite]

How about "Data Echo", "Data Doppler", or "Spitting Disk Image"?
posted by Blazecock Pileon at 3:43 AM on January 21, 2013

I suggest "stratum," invoking the idea of layers of new data overlying the old.
posted by megatherium at 4:06 AM on January 21, 2013

How about "mammoth"? They're huge, and they migrate around glaciers!

(Also, even if you're not using rsync, you can still let your backend speak the rsync protocol! BackupPC does this, so clients can just use rsync, which they probably have installed anyhow.)
posted by vasi at 4:15 AM on January 21, 2013 [1 favorite]

(Also, even if you're not using rsync, you can still let your backend speak the rsync protocol! BackupPC does this, so clients can just use rsync, which they probably have installed anyhow.)

Currently it has backends which speak sftp (i.e., SSH file transfers) and Glacier. I'm thinking of using PyFilesystem to get a bunch of others (WebDAV, Amazon S3, Tahoe LAFS, FTP) for free. rsync doesn't really fit the model, given that it doesn't keep local copies of the cooked archives (unless the endpoint is a local filesystem, which is also possible), so one wouldn't get anything over a dumb file transfer.
posted by acb at 4:31 AM on January 21, 2013

posted by mikeh at 8:05 AM on January 21, 2013 [1 favorite]

posted by the Real Dan at 11:56 AM on January 21, 2013

posted by hwestiii at 2:58 PM on January 21, 2013 [1 favorite]

« Older We just adopted a dog and and ...   |  Need recommendation for a tail... Newer »
This thread is closed to new comments.