Extracting my life in photos - why does gzip hate me?
August 17, 2007 10:34 AM   Subscribe

I have 4GB of photos spanning the last 7 years of my life - they're sitting on my linux webserver and the other day I compressed the directory using the gzip command while I reinstalled my photo album software. When I went to extract the gz file, it just gives me an extensionless file as opposed to the directory structure that it once was. I have a bad feeling the directory structure wasn't preserved when the file was compressed. I have tried extracting it but to no avail but I know there are photos in there, because when I head the file there's a clear file name in it. Is there anything I can do to get at least the files back? I don't care if I lose the directory structure at this point, just as long as I get the photos back.
posted by cubedweller to Computers & Internet (24 answers total)
 
Do you remember what command you used? I tried doing this myself and gzip gives an error if I try to compress a directory, even with -f.
posted by coined at 10:48 AM on August 17, 2007


I can't remember! Infuriating. I *think* it was gzip - r but it may have been tar. That said, it is a .gz file.
posted by cubedweller at 10:51 AM on August 17, 2007


So you have: filename as the archive, not filename.gz ?
posted by iamabot at 10:52 AM on August 17, 2007


Close - when I extract filename.gz I get a 4GB file called filename as opposed to a directory called filename.
posted by cubedweller at 10:54 AM on August 17, 2007


It's probably a .tar.gz file that's just been named .gz. Try "tar tf filename" (after you've un-gzipped it). If that lists a bunch of files, then you can use "tar xf filename" to extract them all into the current directory.
posted by hattifattener at 11:00 AM on August 17, 2007


First I'd urge you to make a copy of the .gz file so you don't accidentally destroy anything. But you think it may have been tar'ed somewhere along the line as well? Once you extract it, what happens if you try tar -tvf filename on the extracted file
posted by tyllwin at 11:03 AM on August 17, 2007


I tried tar tf filename and got this:

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Error exit delayed from previous errors
posted by cubedweller at 11:05 AM on August 17, 2007


Run 'file filename' to see what the file appears to be.
posted by marionnette en chaussette at 11:07 AM on August 17, 2007


'file filenamel' gives:
albums: JPEG image data, EXIF standard 0.73, 10752 x 2048

this gives me hope that the files are intact.
posted by cubedweller at 11:09 AM on August 17, 2007


How big is the .gz file (or the uncompressed version)
posted by TravellingDen at 11:15 AM on August 17, 2007


Ok, that probably means that all of your photos are smushed together into one file. They likely are intact. You need to find some sort of utility that will go through that file and extract every JPEG.
posted by marionnette en chaussette at 11:16 AM on August 17, 2007


All I can think of is to burn it (the original .gz) to a dvd and see what WinRAR makes of it.

This is odd, it sounds as if you concatenated the photos into a single file while archiving it. If that's the case you'll need a smart file splitting utility (or maybe gzip has this built in somewhere?).
posted by IronLizard at 11:17 AM on August 17, 2007


Can you type 'history'? perhaps you'll be lucky enough to have a history going back far enough to see what command you used to cat the files together.
posted by voxpop at 11:17 AM on August 17, 2007


gzip -r will traverse a directory structure and compress each file in-place. This means after it is finished, the files will be in their same locations, but will all be compressed and have a .gz extension tacked on.

Is there any chance the commands you used are still in your shell history?
posted by TravellingDen at 11:19 AM on August 17, 2007


given the output of file, it sounds like gzip just compressed all the files together as one stream. a programmer with a knowledge of the jpeg standard could probably figure out where each file starts and stops and split it up.

try this program on your extracted file:

jpegextractor

and please please please take tyllwin's advice and make a backup of the original 4 GB file before you do anything.
posted by AaRdVarK at 11:19 AM on August 17, 2007


Maybe this is what happened?
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing them.
posted by IronLizard at 11:20 AM on August 17, 2007


Err, oops?
posted by IronLizard at 11:20 AM on August 17, 2007


jpegextractor definitely seems like the thing to use.
posted by marionnette en chaussette at 11:21 AM on August 17, 2007


History shows a couple of commands that look like they could do it:

gzip -c * > albums.gz
gzip -rcvf --best albums > albums.gz
posted by cubedweller at 11:23 AM on August 17, 2007


yeah, those -c's are the culprit. they just cat all the files together and compress the big stream. unfortunately to undo it you will have to use a tool like jpegextractor, but it shouldn't be too difficult.

in the future use:

tar czvf albums.tar.gz albums
posted by AaRdVarK at 11:29 AM on August 17, 2007 [1 favorite]


well, the most recent command is the one used to create albums.gz since > would overwrite existing albums.gz
posted by voxpop at 11:29 AM on August 17, 2007


jpegextractor it is - guys, THANKS. This is 7 years worth of photos that I managed to screw up so badly. I really appreciate your help.
posted by cubedweller at 11:32 AM on August 17, 2007


gzip -c * > albums.gz does indeed concatenate all its input (all the files of the current directory of the moment) together into one stream, which you directed to albums.gz. But it doesn't delete the original files.

gzip -rcvf --best albums > albums.gz is similar, except it'll work on the directory (or file, but I presume directory) 'albums' (in the current directory of the moment) and recursively do everything, in one long stream, which you directed to albums.gz. If you did them in this order, in the same directory, this command clobbered the info formerly in albums.gz. (But you probably did the first command within albums.gz and the second command in the directory above it.)

It also shouldn't have deleted the source files. Did you delete them by hand in a step you didn't mention? Did the photo software installation delete the directory contents?

If you want to be belts and suspenders, shut down the machine and don't do anything more until you've made a bit-by-bit copy of the entire drive -- the original photo files may be recoverable, but this gets less and less likely the longer the machine's in use, potentially overwriting the disk space they're on.

But it's likely that the existing advice -- back up albums.gz and give

zcat albums.gz | java jpegextractor -D /newemptydirectory

a shot.
posted by Zed_Lopez at 11:42 AM on August 17, 2007


This probably goes without saying, but I'll say it anyway: when making a backup of any kind, where the information is of extreme value to you, always move the backup somewhere and unpack it to make sure it's intact and as you expect it BEFORE deleting the originals.
posted by davejay at 12:00 PM on August 17, 2007


« Older After Five is alive?   |   XPExplorer Views Newer »
This thread is closed to new comments.