Where did that file go, exactly, and isn't there a 'backup?'
October 26, 2024 5:48 AM Subscribe
Back in the previous century, and likely into the first decade of this one, text files would 'automatically' back-up to a sort of limbo (purgatory) on your hard drive. And if you screwed up and accidentally erased a file, you could go digging around in the unchartered waters of your HardDrive and likely find the copy from just before you stupidly deleted the file.
My question is two-fold, really:
1. Is that not how things are done anymore? Yet I'd swear that my "OpenOffice" automatically saves files I'm working on... but why can't I find them?
2. I've tried a couple recovery programs with no luck, Plying them I have the impression how things were done, once upon a time, is no longer the way they are done. How are documents handled these days?
I know, I know, "Time Machine" would have solved that. Sigh.
My question is two-fold, really:
1. Is that not how things are done anymore? Yet I'd swear that my "OpenOffice" automatically saves files I'm working on... but why can't I find them?
2. I've tried a couple recovery programs with no luck, Plying them I have the impression how things were done, once upon a time, is no longer the way they are done. How are documents handled these days?
I know, I know, "Time Machine" would have solved that. Sigh.
Best answer: Also maybe to better answer the core of your question things are done differently now in the sense that a modern personal computer has a lot more going on at once, and software is much more liberal in writing to the disk. Installing that recovery software would do it, but also just using your web browser is generating kBs of data. As a result, old data is written over faster and recovery is harder.
posted by jy4m at 6:57 AM on October 26 [1 favorite]
posted by jy4m at 6:57 AM on October 26 [1 favorite]
Response by poster: Thanks! Not encouraging, (sadly) but informative. I know there are layers to MacOS and I was hoping for the magic password that would open up the level of over-view where I could dig around in the 'past' and find the 'unnamed' file I stupidly erased (along with any memory of erasing it(!)) But if I understand what you are saying, this was given up in favor of this 'new' way of operating.
posted by From Bklyn at 8:47 AM on October 26
posted by From Bklyn at 8:47 AM on October 26
Best answer: this was given up in favor of this 'new' way of operating.
Not really? It was never standard for computers to keep a hidden log of every bit of data for the purposes of recovering deleted files. It just so happened by coincidence that the way storage was implemented, if you were quick and lucky you could maybe have a chance to fish the data out of the disk before it got overwritten.
That remains true, but the volume of data that a computer churns through these days means the data gets overwritten and becomes non-recoverable a bit quicker than before.
The way you've phrased your question makes it sound like there was a standard feature that kept a careful record of your data, and then someone decided to take that feature away, and that's not the case.
Think of it like trying to recover a physical item from the physical trash. If you're quick you can just grab it out of the trash can, but if the trash has been picked up and taken away by the trash collectors then your odds of recovering the item are slim. And the city has never promised to keep an archive of your trash on hand for recovery purposes.
We do have programs like Time Machine, which create backups that you can rummage around in, but that was never a standard, always-on part of the system. You had to turn it on for it to work.posted by june_dodecahedron at 9:03 AM on October 26 [3 favorites]
Not really? It was never standard for computers to keep a hidden log of every bit of data for the purposes of recovering deleted files. It just so happened by coincidence that the way storage was implemented, if you were quick and lucky you could maybe have a chance to fish the data out of the disk before it got overwritten.
That remains true, but the volume of data that a computer churns through these days means the data gets overwritten and becomes non-recoverable a bit quicker than before.
The way you've phrased your question makes it sound like there was a standard feature that kept a careful record of your data, and then someone decided to take that feature away, and that's not the case.
Think of it like trying to recover a physical item from the physical trash. If you're quick you can just grab it out of the trash can, but if the trash has been picked up and taken away by the trash collectors then your odds of recovering the item are slim. And the city has never promised to keep an archive of your trash on hand for recovery purposes.
We do have programs like Time Machine, which create backups that you can rummage around in, but that was never a standard, always-on part of the system. You had to turn it on for it to work.posted by june_dodecahedron at 9:03 AM on October 26 [3 favorites]
Best answer: Back in the previous century, and likely into the first decade of this one, text files would 'automatically' back-up to a sort of limbo (purgatory) on your hard drive.
Yeah, no, not really. Writing out a disk copy of the file currently being edited for recovery purposes was always an application-specific feature, not a general pattern, and was never truly reliable. This behaviour was also intended to deal with the case where the application or OS crashed partway through an editing session, not to deal with users explicitly deleting the wrong file.
Periodical backups with versioning became a thing with Time Machine on Apple platforms, Volume Shadow Copies on Windows, and Dropbox for cloud-hosted backups. The ZFS and Btrfs filesystems for Unix and Linux (both introduced in this century, fwiw) can keep multiple versions of overwritten and/or deleted files available in the live filesystem (rather than in a backup of it) if periodical snapshotting is turned on.
Sometimes those periodical backups will catch one of the application-specific work-in-progress recovery files and make a versioned copy of that, but recovery of older versions of now-deleted files has never been a feature of the ordinary on-disk filesystems of any consumer-grade platform I'm aware of.
if you screwed up and accidentally erased a file, you could go digging around in the unchartered waters of your HardDrive and likely find the copy from just before you stupidly deleted the file
That still works, and still requires the same kind of specialist recovery software to do that digging around, but because modern systems do indeed tend to have more stuff being sporadically written to disk in the background than earlier systems did, the chance of being able to recover a complete copy that hasn't been at least partially overwritten is lower than it once was.
I'd swear that my "OpenOffice" automatically saves files I'm working on... but why can't I find them?
OpenOffice (and its fork, LibreOffice) do indeed keep an on-disk copy of what you're editing as you edit it, and if you crash the application or the OS before quitting, both will offer to recover those documents the next time you start the application. But if you quit cleanly, OO assumes you know what you're doing and have finished your editing session, and deletes the recovery file.
The most reliable way I know of to get the kind of accident prevention you're asking about is to set up a free Dropbox account, do all your editing inside your local Dropbox folder, and get into the habit of periodically hitting Ctrl-S to overwrite the current file on disk.
Dropbox automatically propagates all changes made inside your local Dropbox folder to the backup it keeps for you on Dropbox's servers. Its web interface exposes access to multiple versions of overwritten or deleted files and folders, and this facility will work regardless of OS or application even if they don't natively implement anything like local filesystem snapshotting or OpenOffice's crash recovery files.
Google Drive and Microsoft OneDrive can do similar things but their local filesystem monitoring performances are both dogshit compared to Dropbox's.
posted by flabdablet at 9:18 AM on October 26 [3 favorites]
Yeah, no, not really. Writing out a disk copy of the file currently being edited for recovery purposes was always an application-specific feature, not a general pattern, and was never truly reliable. This behaviour was also intended to deal with the case where the application or OS crashed partway through an editing session, not to deal with users explicitly deleting the wrong file.
Periodical backups with versioning became a thing with Time Machine on Apple platforms, Volume Shadow Copies on Windows, and Dropbox for cloud-hosted backups. The ZFS and Btrfs filesystems for Unix and Linux (both introduced in this century, fwiw) can keep multiple versions of overwritten and/or deleted files available in the live filesystem (rather than in a backup of it) if periodical snapshotting is turned on.
Sometimes those periodical backups will catch one of the application-specific work-in-progress recovery files and make a versioned copy of that, but recovery of older versions of now-deleted files has never been a feature of the ordinary on-disk filesystems of any consumer-grade platform I'm aware of.
if you screwed up and accidentally erased a file, you could go digging around in the unchartered waters of your HardDrive and likely find the copy from just before you stupidly deleted the file
That still works, and still requires the same kind of specialist recovery software to do that digging around, but because modern systems do indeed tend to have more stuff being sporadically written to disk in the background than earlier systems did, the chance of being able to recover a complete copy that hasn't been at least partially overwritten is lower than it once was.
I'd swear that my "OpenOffice" automatically saves files I'm working on... but why can't I find them?
OpenOffice (and its fork, LibreOffice) do indeed keep an on-disk copy of what you're editing as you edit it, and if you crash the application or the OS before quitting, both will offer to recover those documents the next time you start the application. But if you quit cleanly, OO assumes you know what you're doing and have finished your editing session, and deletes the recovery file.
The most reliable way I know of to get the kind of accident prevention you're asking about is to set up a free Dropbox account, do all your editing inside your local Dropbox folder, and get into the habit of periodically hitting Ctrl-S to overwrite the current file on disk.
Dropbox automatically propagates all changes made inside your local Dropbox folder to the backup it keeps for you on Dropbox's servers. Its web interface exposes access to multiple versions of overwritten or deleted files and folders, and this facility will work regardless of OS or application even if they don't natively implement anything like local filesystem snapshotting or OpenOffice's crash recovery files.
Google Drive and Microsoft OneDrive can do similar things but their local filesystem monitoring performances are both dogshit compared to Dropbox's.
posted by flabdablet at 9:18 AM on October 26 [3 favorites]
Best answer: The feature where every text editor keeps a second backup file forever is something I strongly associate with Linux and open source.
Complex editors like Audacity and Open Office make a backup. But, as others suggest, the backup is only kept in case of a full crash and is otherwise deleted during a clean exit.
The key thing is that the backup files were hidden dot files with tildes or .bak. You won't normally see them in Finder. Try hitting command+shift+period in the directory to see if any backup is still in the clutter.
I just tested with Libre Office and I've got a .~lock.Budget.ods# in the mix, for example.
posted by Snijglau at 10:29 AM on October 26 [1 favorite]
Complex editors like Audacity and Open Office make a backup. But, as others suggest, the backup is only kept in case of a full crash and is otherwise deleted during a clean exit.
The key thing is that the backup files were hidden dot files with tildes or .bak. You won't normally see them in Finder. Try hitting command+shift+period in the directory to see if any backup is still in the clutter.
I just tested with Libre Office and I've got a .~lock.Budget.ods# in the mix, for example.
posted by Snijglau at 10:29 AM on October 26 [1 favorite]
Have you ever set up Time Machine? If so, your Mac (likely) keeps local snapshots on your Mac between the times you plug in a backup drive. You may be able to access those snapshots to find your file. Here’s more info.
This does require the file to have been saved at some point; it doesn’t apply if you lost your work prior to saving.
posted by bluloo at 11:02 AM on October 26 [1 favorite]
This does require the file to have been saved at some point; it doesn’t apply if you lost your work prior to saving.
posted by bluloo at 11:02 AM on October 26 [1 favorite]
Incidentally, the way to use Dropbox versioning to recover a file that's been entirely deleted (as opposed to overwritten) is to look through older versions of the now-deleted file's containing folder until you find one from which the file you actually want had not yet been deleted.
This isn't immediately obvious but it makes sense once you realise that folders are themselves just files, the contents of which happen to be lists of references to other files, and that deleting a file involves altering the contents of its containing folder so as to remove the reference to the deleted file from that list.
posted by flabdablet at 1:51 PM on October 26 [1 favorite]
This isn't immediately obvious but it makes sense once you realise that folders are themselves just files, the contents of which happen to be lists of references to other files, and that deleting a file involves altering the contents of its containing folder so as to remove the reference to the deleted file from that list.
posted by flabdablet at 1:51 PM on October 26 [1 favorite]
Response by poster: This has been simultaneously informative, interesting and, sadly, sad - in that I guess that file really but really is gone.
And, I guess a notice that I should run 'Time Machine' quarterly or maybe monthly. (I get the sense of using DropBox (my partner/spouse loooooves it) but I can't get around their tediously persistent recommendations that I buy more space for the low, low price of whatever and the (to me) constant up-dating (which seems to suck all bandwidth and only occurs when least convenient). It just - I want less complexity.)
posted by From Bklyn at 1:03 AM on October 27
And, I guess a notice that I should run 'Time Machine' quarterly or maybe monthly. (I get the sense of using DropBox (my partner/spouse loooooves it) but I can't get around their tediously persistent recommendations that I buy more space for the low, low price of whatever and the (to me) constant up-dating (which seems to suck all bandwidth and only occurs when least convenient). It just - I want less complexity.)
posted by From Bklyn at 1:03 AM on October 27
I just tested with Libre Office and I've got a .~lock.Budget.ods# in the mix, for example.
To clarify, that's just an administrative file indicating that "Budget.ods" is being worked on, containing the date/time of when it was opened and on which system that was. It's nothing like an actual work-in-progress copy of the document.
On my system a couple of those are currently around; all of them are less than 100 bytes.
posted by Stoneshop at 1:44 AM on October 27 [1 favorite]
To clarify, that's just an administrative file indicating that "Budget.ods" is being worked on, containing the date/time of when it was opened and on which system that was. It's nothing like an actual work-in-progress copy of the document.
On my system a couple of those are currently around; all of them are less than 100 bytes.
posted by Stoneshop at 1:44 AM on October 27 [1 favorite]
I can't get around their tediously persistent recommendations that I buy more space for the low, low price of whatever
I only see those when I log into Dropbox on the Web, not when I'm using the Dropbox client on my own computer. And because the only time I ever do log into Dropbox on the Web is when I've fucked something up and need to recover from that, I don't see them often enough to be a nuisance.
Generally when sites insist on bothering me with upsell nags, I just use uBlock Origin to hide those. Usually works pretty well, though I don't recall Dropbox having annoyed me enough to set one up for that. Or maybe I did, which is why I don't get annoyed now? Hard to tell.
and the (to me) constant up-dating (which seems to suck all bandwidth and only occurs when least convenient).
Are you talking about the Dropbox client downloading updates for itself, or about the online sync activity it performs to do its job? If it's the latter, the client has preferences settings that let you limit how much bandwidth it uses for upload or download or both.
Also, if the only thing you're using Dropbox for is to give you automatic versioned backups for documents whose sizes are typical of those you'd use OpenOffice to edit, as opposed to keeping the Dropbox folders on multiple computers in sync, you can just link each computer to its own free Dropbox account. This should keep sync traffic to a minimum, as long as you're not doing what OneDrive does and cloud-syncing every folder you have by default.
If you've got a NAS, most of those can be persuaded to run a Nextcloud server. Connect that to Nextcloud clients on your computers and you get Dropbox-like versioned automatic backups that run entirely inside your LAN so they don't use any Internet bandwidth, with no usage restrictions other than available NAS storage space and without upsell nags.
posted by flabdablet at 4:27 AM on October 27
I only see those when I log into Dropbox on the Web, not when I'm using the Dropbox client on my own computer. And because the only time I ever do log into Dropbox on the Web is when I've fucked something up and need to recover from that, I don't see them often enough to be a nuisance.
Generally when sites insist on bothering me with upsell nags, I just use uBlock Origin to hide those. Usually works pretty well, though I don't recall Dropbox having annoyed me enough to set one up for that. Or maybe I did, which is why I don't get annoyed now? Hard to tell.
and the (to me) constant up-dating (which seems to suck all bandwidth and only occurs when least convenient).
Are you talking about the Dropbox client downloading updates for itself, or about the online sync activity it performs to do its job? If it's the latter, the client has preferences settings that let you limit how much bandwidth it uses for upload or download or both.
Also, if the only thing you're using Dropbox for is to give you automatic versioned backups for documents whose sizes are typical of those you'd use OpenOffice to edit, as opposed to keeping the Dropbox folders on multiple computers in sync, you can just link each computer to its own free Dropbox account. This should keep sync traffic to a minimum, as long as you're not doing what OneDrive does and cloud-syncing every folder you have by default.
If you've got a NAS, most of those can be persuaded to run a Nextcloud server. Connect that to Nextcloud clients on your computers and you get Dropbox-like versioned automatic backups that run entirely inside your LAN so they don't use any Internet bandwidth, with no usage restrictions other than available NAS storage space and without upsell nags.
posted by flabdablet at 4:27 AM on October 27
text files would 'automatically' back-up to a sort of limbo (purgatory) on your hard drive. And if you screwed up and accidentally erased a file, you could go digging around in the unchartered waters of your HardDrive and likely find the copy from just before you stupidly deleted the file.
I think I've finally understood what you're actually asking here, which isn't about periodic auto-saving of drafts, it's about what happens when you explicitly delete a file and then instantly regret doing that.
It was the word "back-up" that threw me, because the files that you were talking about that used to be recoverable after deletion (and still are, with the right tools, if you're lucky) are not and never have been any kind of backup. What they are is reconstructions of the original files, based on whatever can be found lying around in disk blocks that the filesystem now considers to be available for re-use.
All that typically gets scrubbed when you delete a disk file is the metadata that describes exactly where the file's contents got placed on the disk when it was originally saved. If you're not doing the deletion with a "secure erase" app that explicitly overwrites deleted files before asking the OS to delete them, the data from deleted files typically doesn't get explicitly destroyed; it just sticks around on the disk, wherever it happened to land, until some other operation happens to re-use those same disk blocks for something else.
And in fact even that degree of deletion doesn't happen immediately in most modern operating systems, which generally have a facility like a Trash or a Recycle Bin. Typically, all that asking for a file to be deleted does is move it, complete with all metadata, into some subfolder of Trash.
If you use Undo (for which the hotkey is ⌘-Z on Mac, Ctrl-Z on everything else) right after deleting a file that way, the OS can un-delete it by moving it out of Trash and back into whatever folder it originally came from. Or you can use the OS file browser to go spelunking through Trash by hand, select a deleted file from there, and choose something like Restore.
Again, this is not any kind of backup. No copy of your original file is ever made and nor were such copies made 25 years ago. All this stuff works now as it did then, just by moving the original file between folders. And in fact, under the hood, the actual file content never actually moves; it stays in whichever disk blocks it was originally written out to, and assorted metadata in other places gets updated to keep track of which blocks those are.
It's only when you empty the Trash that all the files that had been moved into it actually get deleted, meaning that the disk blocks that their actual content occupies get returned to the free space list for later re-use. So if you have that "oh shit" moment very soon after emptying the Trash, that's the point at which you shut the machine down, pull its drive, stick it in a USB enclosure, hook it up to a different machine that's running PhotoRec and see what it can piece back together for you.
Because, again, you're not dealing with any kind of backup here, you're simply taking advantage of the fact that freed-up disk blocks will typically not get overwritten immediately and typically will retain their original contents in ways that might permit coherent reassembly as recognizable files. The longer you leave a machine running between "oh shit" and shutdown, the lower becomes the chance of a successful deleted-file recovery.
Which is as it ever was. What makes successful post-deletion recovery less common in 2024 than in 1999 is not so much a consequence of any technological policy change as of the increased social tendency to treat consumer-grade IT as reliable, inscrutable appliances powered by dark majicks beyond the ken of mere mortals.
We've allowed relentless marketing to lull us into an expectation that our pricey yet somehow indispensable little fondleslabs will simply not destroy our data, and to give up or outsource any responsibility for keeping our stuff backed up. So whenever a machine does lose something we stored on it, the usual responses are ¯\_(ツ)_/¯ and 😭.
posted by flabdablet at 6:22 AM on October 27 [3 favorites]
I think I've finally understood what you're actually asking here, which isn't about periodic auto-saving of drafts, it's about what happens when you explicitly delete a file and then instantly regret doing that.
It was the word "back-up" that threw me, because the files that you were talking about that used to be recoverable after deletion (and still are, with the right tools, if you're lucky) are not and never have been any kind of backup. What they are is reconstructions of the original files, based on whatever can be found lying around in disk blocks that the filesystem now considers to be available for re-use.
All that typically gets scrubbed when you delete a disk file is the metadata that describes exactly where the file's contents got placed on the disk when it was originally saved. If you're not doing the deletion with a "secure erase" app that explicitly overwrites deleted files before asking the OS to delete them, the data from deleted files typically doesn't get explicitly destroyed; it just sticks around on the disk, wherever it happened to land, until some other operation happens to re-use those same disk blocks for something else.
And in fact even that degree of deletion doesn't happen immediately in most modern operating systems, which generally have a facility like a Trash or a Recycle Bin. Typically, all that asking for a file to be deleted does is move it, complete with all metadata, into some subfolder of Trash.
If you use Undo (for which the hotkey is ⌘-Z on Mac, Ctrl-Z on everything else) right after deleting a file that way, the OS can un-delete it by moving it out of Trash and back into whatever folder it originally came from. Or you can use the OS file browser to go spelunking through Trash by hand, select a deleted file from there, and choose something like Restore.
Again, this is not any kind of backup. No copy of your original file is ever made and nor were such copies made 25 years ago. All this stuff works now as it did then, just by moving the original file between folders. And in fact, under the hood, the actual file content never actually moves; it stays in whichever disk blocks it was originally written out to, and assorted metadata in other places gets updated to keep track of which blocks those are.
It's only when you empty the Trash that all the files that had been moved into it actually get deleted, meaning that the disk blocks that their actual content occupies get returned to the free space list for later re-use. So if you have that "oh shit" moment very soon after emptying the Trash, that's the point at which you shut the machine down, pull its drive, stick it in a USB enclosure, hook it up to a different machine that's running PhotoRec and see what it can piece back together for you.
Because, again, you're not dealing with any kind of backup here, you're simply taking advantage of the fact that freed-up disk blocks will typically not get overwritten immediately and typically will retain their original contents in ways that might permit coherent reassembly as recognizable files. The longer you leave a machine running between "oh shit" and shutdown, the lower becomes the chance of a successful deleted-file recovery.
Which is as it ever was. What makes successful post-deletion recovery less common in 2024 than in 1999 is not so much a consequence of any technological policy change as of the increased social tendency to treat consumer-grade IT as reliable, inscrutable appliances powered by dark majicks beyond the ken of mere mortals.
We've allowed relentless marketing to lull us into an expectation that our pricey yet somehow indispensable little fondleslabs will simply not destroy our data, and to give up or outsource any responsibility for keeping our stuff backed up. So whenever a machine does lose something we stored on it, the usual responses are ¯\_(ツ)_/¯ and 😭.
posted by flabdablet at 6:22 AM on October 27 [3 favorites]
The feature where every text editor keeps a second backup file forever is something I strongly associate with Linux and open source.
That's an impression I don't share, by the way.
In LibreOffice and OpenOffice it's an option that you have to enable; it's off by default. Also, the temporary work files that they create while the file is being edited, and removed when the file is saved and closed can in no way be called a backup. Same with vi/vim. Emacs does, with some caveats.
Programs that I use or come across that do actually make backup copies for the files being worked on are often CAD and graphic editors.
posted by Stoneshop at 6:33 AM on October 27 [1 favorite]
That's an impression I don't share, by the way.
In LibreOffice and OpenOffice it's an option that you have to enable; it's off by default. Also, the temporary work files that they create while the file is being edited, and removed when the file is saved and closed can in no way be called a backup. Same with vi/vim. Emacs does, with some caveats.
Programs that I use or come across that do actually make backup copies for the files being worked on are often CAD and graphic editors.
posted by Stoneshop at 6:33 AM on October 27 [1 favorite]
Best answer: Old text editors would move the original file you're changing to one prefixed with '.' and suffixed with a tilde '~' and write a new one with the original filename -- not all recent editors keep this behaviour.
posted by k3ninho at 6:38 AM on October 27 [1 favorite]
posted by k3ninho at 6:38 AM on October 27 [1 favorite]
Best answer: When thinking about these things, it helps to have some kind of mental model of what's going on under the hood.
Computer filesystems are built on top of underlying storage drives that present to the operating system as very large but fixed-sized arrays of numbered fixed-sized storage blocks. The traditional size for these blocks is 512 bytes each, but for the last decade or so there's been a general move toward 4096-byte blocks as drive capacities and storage densities have increased, so I'll assume 4096 bytes as the size of a block in the explanation that follows.
A byte is eight bits and a bit can have one of exactly two values, so you can think of a 4096-byte block as akin to a page of graph paper with a 128x256 grid of tiny little squares ruled up on it, each of which can be either blank (0) or coloured in (1). A typical two terabyte (two trillion bytes) drive, then, would be like a notebook with 488,281,250 of those pages, numbered from 0 through 488,281,249. Industry jargon for the page numbers is Logical Block Addresses or LBAs.
Drives don't give you access to individual squares for colouring in or rubbing out: they work a whole page at a time. One block is the smallest quantity of data that a drive can transfer in a single operation. If you want to alter less than all of the data in a block, you need to read that whole block into memory, make whatever changes you want to the memory copy, then write the whole block out to the drive again at the same LBA you read it from.
Drives also don't let you rip pages out, or stick new ones in, or move them about: the number of pages is fixed, as are their page numbers. As it comes from the factory, every tiny square on every notebook page is blank: all bits in blocks at all LBAs are zeroes. But the drive lets you replace the existing colourings-in pattern on any page with a pattern of your own, and until you replace that same page again, it will remain there ready for retrieval exactly as it was last written out.
The key thing to understand here is that although the amount of information you can write to a drive is essentially arbitrary, the amount of information that's actually stored on it in any given moment is always the same: it's the drive's entire capacity (2TB for our example drive).
What varies as you use a drive is the specific patterns stored within each block and which blocks hold stuff you care about. If I say that I've got 1.5TB of movies stored on my 2TB drive, what I actually mean is that of the 2TB of information that that drive cannot help but store, 1.5TB of that is stuff I actually care about (movies) and 0.5TB is stuff I don't (possibly mostly still zeroes written out during manufacture).
Drives have no notion of files, filenames, folders, document formats, icons or any of that stuff. All they can do is read or write a range of blocks starting from some particular LBA. In order to implement all those nice user-friendly things, there needs to exist some convention for interpreting the patterns of colourings-in found on specific pages of that great big notebook. Some of those patterns need to be interpreted as names of files, their sizes, modification dates, access permissions and so forth, some need to be interpreted as actual file content, and - critically - some need to be interpreted as lists of LBAs identifying the disk blocks within which all that other stuff is kept. There also needs to exist some kind of list that identifies free blocks, those containing only data that the user doesn't care about and wouldn't be upset to see overwritten with something else.
Many such conventions have been designed, and they go by the general name of filesystems. Specific filesystems you might have heard of include FAT, FAT32, EXFAT, NTFS, Ext2, Ext3, Ext4, MFS, HFS, HFS+, APFS and so forth. There isn't a "standard" one; FAT32 is probably the most widely supported but it has severe design limitations that make it unsuitable for a lot of modern use cases.
When you "format" a drive with a given filesystem, what you're actually doing is starting from the assumption that none of its blocks hold information that anybody cares about, and writing out a data pattern starting from some well-known LBA, consisting of that filesystem's representation of an empty folder with no files inside plus its list of free LBAs. When you delete a file, you're destroying the list of LBAs that identifies the blocks to which that specific file's contents were written, and returning all those LBAs to the filesystem's global free-blocks list.
What filesystem drivers typically don't do when deleting a file is make any changes to the blocks that contain its actual content. There is typically no need to; simply accounting for those blocks as Nobody Cares What's In These Now is good enough. So the definition of the file as a file is actually gone, even though it's quite fair to describe its content as existing in limbo or purgatory. But the blocks in which that content remains stored are the opposite of a backup, given how likely they are to get overwritten at any time.
What data recovery tools like PhotoRec or Klennet Recovery do is walk through an entire drive in sequential LBA order, paying no mind to filesystem metadata structures found along the way (or at most treating those as hints), instead attempting to recognize left-over cut-loose file content blocks based on the characteristic patterns found inside them. This is helped by the general preference that most filesystems have for laying out the data blocks for any given file in sequential LBA order for performance reasons. So if the data recovery tool recognizes e.g. the header block of a photo file in JPEG format, it will often find the rest of that file in the blocks whose LBAs immediately follow that of the header, or can at least glean a scant handful of multiple-block sequential runs to stitch back together.
Data recovery tools need to understand the internal formats of a heap of different file types, and a fair bit about how a heap of different filesystems prefer to organize their on-disk layouts, and frankly I've always been amazed that those tools work at all let alone as reliably as they do. They're definitely a last-ditch measure and absolutely not a substitute for backups.
posted by flabdablet at 10:35 AM on October 27 [2 favorites]
Computer filesystems are built on top of underlying storage drives that present to the operating system as very large but fixed-sized arrays of numbered fixed-sized storage blocks. The traditional size for these blocks is 512 bytes each, but for the last decade or so there's been a general move toward 4096-byte blocks as drive capacities and storage densities have increased, so I'll assume 4096 bytes as the size of a block in the explanation that follows.
A byte is eight bits and a bit can have one of exactly two values, so you can think of a 4096-byte block as akin to a page of graph paper with a 128x256 grid of tiny little squares ruled up on it, each of which can be either blank (0) or coloured in (1). A typical two terabyte (two trillion bytes) drive, then, would be like a notebook with 488,281,250 of those pages, numbered from 0 through 488,281,249. Industry jargon for the page numbers is Logical Block Addresses or LBAs.
Drives don't give you access to individual squares for colouring in or rubbing out: they work a whole page at a time. One block is the smallest quantity of data that a drive can transfer in a single operation. If you want to alter less than all of the data in a block, you need to read that whole block into memory, make whatever changes you want to the memory copy, then write the whole block out to the drive again at the same LBA you read it from.
Drives also don't let you rip pages out, or stick new ones in, or move them about: the number of pages is fixed, as are their page numbers. As it comes from the factory, every tiny square on every notebook page is blank: all bits in blocks at all LBAs are zeroes. But the drive lets you replace the existing colourings-in pattern on any page with a pattern of your own, and until you replace that same page again, it will remain there ready for retrieval exactly as it was last written out.
The key thing to understand here is that although the amount of information you can write to a drive is essentially arbitrary, the amount of information that's actually stored on it in any given moment is always the same: it's the drive's entire capacity (2TB for our example drive).
What varies as you use a drive is the specific patterns stored within each block and which blocks hold stuff you care about. If I say that I've got 1.5TB of movies stored on my 2TB drive, what I actually mean is that of the 2TB of information that that drive cannot help but store, 1.5TB of that is stuff I actually care about (movies) and 0.5TB is stuff I don't (possibly mostly still zeroes written out during manufacture).
Drives have no notion of files, filenames, folders, document formats, icons or any of that stuff. All they can do is read or write a range of blocks starting from some particular LBA. In order to implement all those nice user-friendly things, there needs to exist some convention for interpreting the patterns of colourings-in found on specific pages of that great big notebook. Some of those patterns need to be interpreted as names of files, their sizes, modification dates, access permissions and so forth, some need to be interpreted as actual file content, and - critically - some need to be interpreted as lists of LBAs identifying the disk blocks within which all that other stuff is kept. There also needs to exist some kind of list that identifies free blocks, those containing only data that the user doesn't care about and wouldn't be upset to see overwritten with something else.
Many such conventions have been designed, and they go by the general name of filesystems. Specific filesystems you might have heard of include FAT, FAT32, EXFAT, NTFS, Ext2, Ext3, Ext4, MFS, HFS, HFS+, APFS and so forth. There isn't a "standard" one; FAT32 is probably the most widely supported but it has severe design limitations that make it unsuitable for a lot of modern use cases.
When you "format" a drive with a given filesystem, what you're actually doing is starting from the assumption that none of its blocks hold information that anybody cares about, and writing out a data pattern starting from some well-known LBA, consisting of that filesystem's representation of an empty folder with no files inside plus its list of free LBAs. When you delete a file, you're destroying the list of LBAs that identifies the blocks to which that specific file's contents were written, and returning all those LBAs to the filesystem's global free-blocks list.
What filesystem drivers typically don't do when deleting a file is make any changes to the blocks that contain its actual content. There is typically no need to; simply accounting for those blocks as Nobody Cares What's In These Now is good enough. So the definition of the file as a file is actually gone, even though it's quite fair to describe its content as existing in limbo or purgatory. But the blocks in which that content remains stored are the opposite of a backup, given how likely they are to get overwritten at any time.
What data recovery tools like PhotoRec or Klennet Recovery do is walk through an entire drive in sequential LBA order, paying no mind to filesystem metadata structures found along the way (or at most treating those as hints), instead attempting to recognize left-over cut-loose file content blocks based on the characteristic patterns found inside them. This is helped by the general preference that most filesystems have for laying out the data blocks for any given file in sequential LBA order for performance reasons. So if the data recovery tool recognizes e.g. the header block of a photo file in JPEG format, it will often find the rest of that file in the blocks whose LBAs immediately follow that of the header, or can at least glean a scant handful of multiple-block sequential runs to stitch back together.
Data recovery tools need to understand the internal formats of a heap of different file types, and a fair bit about how a heap of different filesystems prefer to organize their on-disk layouts, and frankly I've always been amazed that those tools work at all let alone as reliably as they do. They're definitely a last-ditch measure and absolutely not a substitute for backups.
posted by flabdablet at 10:35 AM on October 27 [2 favorites]
So, 'tis better to invest in a solution that provides the service you want than trust to an unsupported side effect.
A MyPassport drive and Western Digital software used to do exactly what you need - keep an archive of every version ever saved. However WD dropped the software and now uses the pricey Acronis software. Ugh.
Looking around, I see there are some not-so-pricey backup programs that will scan a folder for changes every 10 or 15 minutes and save them without overwriting anything. I haven't used one, though, so I can't make a suggestion.
posted by SemiSalt at 11:22 AM on October 27 [1 favorite]
A MyPassport drive and Western Digital software used to do exactly what you need - keep an archive of every version ever saved. However WD dropped the software and now uses the pricey Acronis software. Ugh.
Looking around, I see there are some not-so-pricey backup programs that will scan a folder for changes every 10 or 15 minutes and save them without overwriting anything. I haven't used one, though, so I can't make a suggestion.
posted by SemiSalt at 11:22 AM on October 27 [1 favorite]
According to the setting folder on my Mac, Time Machine "keeps local snapshots and hourly backups or the past 24 hours, daily backups for the past month and weekly backups for all previous months. The oldest backups and any local snapshots are deleted as space is needed"
That is far more than just quarterly or monthly and it happens automatically in the background as long as the backup drive is connected. This makes it really good for those "oops" moments since you can get back to the most recent version of just one hour ago if needed.
Highly recommend.
posted by metahawk at 12:18 PM on October 28
That is far more than just quarterly or monthly and it happens automatically in the background as long as the backup drive is connected. This makes it really good for those "oops" moments since you can get back to the most recent version of just one hour ago if needed.
Highly recommend.
posted by metahawk at 12:18 PM on October 28
you can get back to the most recent version of just one hour ago if needed
Subtle correction: you can retrieve the version you last saved at most an hour ago. If you're sitting in a long edit session using an application that doesn't automatically save the current draft back to a recovery file, and you're not in the habit of hitting Ctrl-S every so often to save it back explicitly, then there will be nothing new on the local disk for Time Machine to take its periodic snapshot of.
The reason I prefer Dropbox and similar networked sync tools to Time Machine or Volume Shadow Copy snapshots for Oh Shit recovery is that their backup operations are triggered not periodically, but instantly on detecting any change to the local filesystem. You create a recoverable version in your online Dropbox every time you hit Ctrl-S when working with whatever editor, and this works regardless of application software or local operating system because Dropbox is cross-platform.
This is a distinction without a difference when the Oh Shit moment is the kind caused by explicitly deleting the wrong file while not editing it, but it creates incentives that stand in stark contrast to those that exist while editing stuff with no form of ongoing background versioning in place.
Without background versioning, the natural tendency is to avoid hitting Ctrl-S until you're relatively happy with the current round of edits, lest by doing so you overwrite and lose an earlier version that was actually better. And sure, you can work around that by periodically using Save As instead of Save and doing the versioning by hand inside the filename, or even by manually cloning and renaming the most recent copy before you even start your editing session, but not many people actually bother doing either of those things because they break editing flow.
With background versioning in place, you can happily hit Ctrl-S every time your typing fingers hit a natural pause and be confident that you won't lose a thing. And if the versioning is being done by a change-triggered file sync tool rather than a time-triggered snapshotter, then between that and the local editor's Undo it becomes quite hard to Oh Shit yourself by more than a minute or two.
posted by flabdablet at 11:44 PM on October 30 [1 favorite]
Subtle correction: you can retrieve the version you last saved at most an hour ago. If you're sitting in a long edit session using an application that doesn't automatically save the current draft back to a recovery file, and you're not in the habit of hitting Ctrl-S every so often to save it back explicitly, then there will be nothing new on the local disk for Time Machine to take its periodic snapshot of.
The reason I prefer Dropbox and similar networked sync tools to Time Machine or Volume Shadow Copy snapshots for Oh Shit recovery is that their backup operations are triggered not periodically, but instantly on detecting any change to the local filesystem. You create a recoverable version in your online Dropbox every time you hit Ctrl-S when working with whatever editor, and this works regardless of application software or local operating system because Dropbox is cross-platform.
This is a distinction without a difference when the Oh Shit moment is the kind caused by explicitly deleting the wrong file while not editing it, but it creates incentives that stand in stark contrast to those that exist while editing stuff with no form of ongoing background versioning in place.
Without background versioning, the natural tendency is to avoid hitting Ctrl-S until you're relatively happy with the current round of edits, lest by doing so you overwrite and lose an earlier version that was actually better. And sure, you can work around that by periodically using Save As instead of Save and doing the versioning by hand inside the filename, or even by manually cloning and renaming the most recent copy before you even start your editing session, but not many people actually bother doing either of those things because they break editing flow.
With background versioning in place, you can happily hit Ctrl-S every time your typing fingers hit a natural pause and be confident that you won't lose a thing. And if the versioning is being done by a change-triggered file sync tool rather than a time-triggered snapshotter, then between that and the local editor's Undo it becomes quite hard to Oh Shit yourself by more than a minute or two.
posted by flabdablet at 11:44 PM on October 30 [1 favorite]
« Older Where is the "one" in Chick Corea's Children's... | How to cope with severe, possibly long-term stress Newer »
You are not logged in, either login or create an account to post comments
So the gist of it is that if you saved the file at one point and you only deleted it very recently, a file recovery program could find it. But if you didn't actually save it, or it's been a while since it was deleted, the odds aren't good.
posted by jy4m at 6:48 AM on October 26 [1 favorite]