A million tiny pieces, on a USB drive.
December 3, 2009 9:52 AM   Subscribe

Transferring lots of files quickly to USB sticks?

So, I'm trying to transfer map tilesets to USB drives ('sticks') - which means 1-12 gigs of data, but organized in files which are all less than 1MB. Transferring this with cp to a USB drive would probably take a week, so I've been trying to create images of the directory on my machine and then basically 'burning' the USB drive, with little success, since it appears that the Apple tools for this (hdiutil and asr) are either junk or too Apple-specific to deal with FAT32 - asr definitely can't.

So, how do people do this? I assume that they do, right? Ideas? Thanks
posted by tmcw to Computers & Internet (26 answers total)
 
Do they need to exist as individual files on the USB stick? If not, use zip/tar.gz/rar/whatever to archive them into one file. The copy should go marginally quicker than individual files.

or just recursively copy them with `cp -rf /my/files /my/usb/stick/.`

I'm assuming when you say "will take a week" you are talking about individual copy commands, not the data tranfer itself. USB2.0 is fast enough that 1-12 gigs of data won't take that long to copy. Minutes at the longest.
posted by mcstayinskool at 10:02 AM on December 3, 2009


Response by poster: @mcstayinskool: I need to get the files on the sticks as individual files - they'll be used there individually, so a zip/tar file won't cut it. I know that as far as raw bandwidth it's there, but, yeah, copying individual files kills this so I'm looking for a solution to that problem.
posted by tmcw at 10:05 AM on December 3, 2009


Assuming all the tilesets are in a folder:
Open Disk Utility
Create an image from the folder (Cmd+Shift+N) (File->New->New Disk Image from folder)
After the image is created, go to the RESTORE tab
Choose the image just created for the source
Choose your USB drive as destination

Alternatively, you might be able to drag the folder containing the tilesets to the Source field (under the RESTORE tab in Disk Utility) skipping the intermediary steps of creating an image.
posted by now i'm piste at 10:21 AM on December 3, 2009


You could transfer the bundled files (in a zip or rar or whatever) and then un-zip them directly on the stick (keeping the directory structure).
posted by Xhris at 10:24 AM on December 3, 2009


tar them, copy to disk, untar! that'll be faster than zipping/raring. and still get rid of the individual-file overhead. You just need memory sticks 2x as big as what you're looking to store.
posted by soma lkzx at 10:27 AM on December 3, 2009


I don't think unzipping or untarring them directly on the stick will help. In fact I think it'd be slower, because of the overhead of reading the tar file back into main memory before writing each file back out.

Are you making multiple identical sticks? You could copy the files on the slow way, then dd the stick to a file on your hard drive — poor man's disk image creation technique — and dd that image onto other sticks.
posted by hattifattener at 10:33 AM on December 3, 2009


Response by poster: @now i'm piste: that's the workflow I've been trying to do, and it works almost, but seems to exclusively produce HFS volumes, which won't work for my purposes. asr, the underlying tool, only supports HFS and HFS+.

Not sure if transferring and then untarring would give any performance benefit... wouldn't you just incur the same file operations at a different time? The overhead seems to be in creating individual files on the stick, and you'll still be doing that?
posted by tmcw at 10:35 AM on December 3, 2009


Is there a reason you can't accomplish this by dragging the file sub-tree from wherever it is now onto your device, in Finder? I just tried that with a 76 MB sub tree and it took (very approximately) <30 seconds. Extrapolating .5 minutes/76MB to 12 GB, I get 76 minutes. Very crude estimate, but probably with 2:1, and in any case, it will proceed unattended.
posted by TruncatedTiller at 10:38 AM on December 3, 2009


(Oops - typo. Make that 79 minutes.)
posted by TruncatedTiller at 10:47 AM on December 3, 2009


Hmmm, this series of steps worked for me, just now. It can probably be tidied up a bit.
  1. Create a disk image using hdiutil create -srcfolder FolderContainingStuff -fs MS-DOS DiskImageContainingStuff.dmg
  2. Attach that dmg as a device, using hdiutil attach -nomount DiskImageContainingStuff.dmg (make a note of the device name it prints, eg /dev/disk2)
  3. Insert blank usb drive; use either diskutil or Disk Utility to unmount it without ejecting it if it mounts automatically
  4. Copy the raw disk contents: dd if=/dev/disk2 of=/dev/disk3 (change disk names as appropriate; avoid accidentally wiping out your hard drive)
  5. Remove and re-insert to check that it worked
  6. Use hdiutil eject /dev/disk2 to "eject" the attached-but-not-mounted disk image
One big downside of this is that even if the USB device is 2GB, the filesystem on it might be much smaller, meaning the extra space won't be usable until you reformat the stick. You might be able to get around that by using something like -size 2g -type SPARSE for the create command, though that'll mean you'll end up writing the entire stick's size, even the blank parts. Check the hdiutil man page for ideas.
posted by hattifattener at 11:03 AM on December 3, 2009


Response by poster: @TruncatedTiller: I'll try. This is a lot of files. It'll be more than 156,000 files, to be specific.
posted by tmcw at 11:04 AM on December 3, 2009


Yeah, this whole thread has me baffled. Is there a reason you can't just drag the directory from your master drive to the mounted usb device, per TT's advice?
posted by Aquaman at 11:36 AM on December 3, 2009


Response by poster: @Aquaman File Overhead.

Transferring a 990MB file wouldn't be a problem, transferring 120,000 files that add up to 990MB will take over 7 hours. The parts are much, much, much, much greater than the whole, thanks to the cost of creating files and doing this file by file. Given I want to make maybe 8 USB drives from this data, (and have 2 usb ports), 4 * 7 = 28 hours isn't an acceptable timeframe.

(7 hours is Mac OS X's estimate after transferring for a few minutes. The estimate continues to increase and just hit 8 hours.)
posted by tmcw at 11:59 AM on December 3, 2009


The command line is a lot quicker than Finder.
posted by cogat at 1:30 PM on December 3, 2009


If you have a windows machine, teracopy is quite smart about file copies, no mac version though, sorry.
posted by defcom1 at 1:48 PM on December 3, 2009


Doing some more digging.. USB Drives really hate small files. Again the windows solution.. http://www.winimage.com/winimage.htm will let you inject files into a USB image, so that would theoretically do what you want, though I have not tried it to see if it would be any faster writing to the USB flash drive.
posted by defcom1 at 1:54 PM on December 3, 2009


You could give FastCopy a try. I use it to transfer files to and from my external hard drive that are around 7 gb and it goes faster than the normal copy and paste. You can set it to copy an entire directory so this might work for you. The program itself is freeware and small enough to fit on a jump drive.
posted by Deflagro at 1:55 PM on December 3, 2009


Sorry, missed the part about it being on Apple
posted by Deflagro at 1:55 PM on December 3, 2009


Another Windows-based solution that could work for you is RichCopy. Be sure to check the "Serialize disk access" box to reduce the I/O overhead.

I use this all the time at work, ever since copying a huge hard drive through the Windows shell only got to about the 20% mark with a whole weekend to run--RichCopy handled the entire thing in a matter of hours.
posted by tellumo at 2:54 PM on December 3, 2009


Response by poster: @hattifattener: I think that method is working - it's been an hour and dd is chugging along, claiming to have copied 1.3 gigs. Hopefully the size bump doesn't mean it's just silently failing...

The trick for getting status from dd is useful here: ~$ sudo kill -s INFO 53962 (different on things besides macs)
posted by tmcw at 3:49 PM on December 3, 2009


Chances are that OS X is disabling the write cache for removable devices like your USB drive. That's a performance killer when doing lots of smallish file manipulations, because of the overhead associated with constantly updating the filesystem metadata.

So you would probably see a significant performance boost for 'cp' if you could figure out how to turn on write caching.
posted by Galvatron at 5:21 PM on December 3, 2009


Response by poster: @hattifattener: I think it works! Thanks a ton! dd freaks me out but it's exactly what I need for this scenario (just need to check my typing a million times before pressing enter and minimize the terminal while its working). I'm getting about 0.1MB/sec, which I think is decent. This is basically all about details so another one: by default this will make a FAT16 partition. You can make a FAT32 partition with

sudo hdiutil create -srcfolder testdir -fs MS-DOS -fsargs "-F 32 -c 1" testdir.dmg

(not sure if -c 1 is required here. FAT32 drives require a minimum number of clusters or they won't write - I added a few extra files to testdir to make it ~250MB and it correctly imaged after that.)
posted by tmcw at 3:35 AM on December 4, 2009 [1 favorite]


0.1MB/sec is pretty rough. If you increase the blocksize of the dd transfer, you may see an improvement. "bs=32k" would be a reasonable option to pass on the dd command line.
posted by Galvatron at 6:51 AM on December 4, 2009


Response by poster: @Galvatron: I'll try that this time. I've tried 1024k and 2048k and arrived at the exact same transfer rate as with no argument, so something else must be going on here... although the drive, the finished product, is finally coming out correct (albeit with some space not available and apparently not resizable)
posted by tmcw at 6:55 AM on December 4, 2009


Response by poster: bs=32k gives the same transfer rate currently.
posted by tmcw at 7:35 AM on December 4, 2009


Response by poster: As a followup, since Macs are BSD, not linux, they have raw disk access, which you don't know about because it is completely, literally undocumented. It's just /dev/rdiskXsX instead of /dev/diskXsX

But, still, this is one of the hardest things I've ever had to do computing-wise, and probably the most tear-inducing. It is so difficult that it can take literally days to get a tiny result. And to make it worse, it seems like it would be so simple that the result is unimpressive to anyone unaware of the intricacies of the five or six incredibly difficult areas that you have to know about to get it to work.

Needless to say, I am back on this task at work.
posted by tmcw at 3:08 PM on February 2, 2010


« Older What to do with lots of bananas and bell peppers?   |   YANMD/PT: How to maintain push-up-related fitness... Newer »
This thread is closed to new comments.