How do I version-track source code with accurate file timestamps?
December 20, 2020 11:08 PM   Subscribe

I'd like to store (in a modern version control system) a variety of mostly text-based files that were modified decades ago in an MS DOS environment, while preserving original file modification times until (or if) they're ever updated with a commit (and then thereafter).

So far, their host filesystems have managed to preserve file modification times and I would like these to continue to transfer over (and be updated with each commit from now on) to anyone who checks out a working copy of the files using whatever version control system I should be using.

But it appears from Googling that the two I'm familiar with, don't support this: git doesn't, and svn does but only if you actually have commits -- it won't preserve the original file timestamps in case of "importing" a folder like this...

Can anyone point me in the right direction? Thanks~!
posted by fvox13 to Computers & Internet (10 answers total) 3 users marked this as a favorite
 
Yeah I think any version control system made for source code don't generally do this because it will break build systems.

Most build systems try hard to only rebuild components that have changed since the last time they were built. This is generally done by checking if the output file has an older timestamp than the input file.

All "general purpose" version control systems that I know of are really designed for software development. You can use them for other stuff, but they're really made for software dev. It's possible something has an esoteric option to do this, but I think it'll basically never be the default, and rarely even be supported, since, for most of the use cases the developers encounter, it'll just cause a pile of problems.
posted by aubilenon at 11:48 PM on December 20, 2020


Oh most version control systems do allow you to have a bunch of scripts on checkin and checkout and stuff, so it may be possible to make scripts that before committing, gather timestamps into a file, include that in the commit, and then after checkout update, set the timestamps to the stored values.
posted by aubilenon at 11:52 PM on December 20, 2020


What operating system do you want to use the files on? File timestamps are kind of a nightmare across the board, but support varies widely between operating systems (and, in the case of Linux, the between different filesystems).
posted by wesleyac at 12:55 AM on December 21, 2020 [1 favorite]


When I want to do things that well-respected off-the-shelf packages designed to work in that domain don't do, my first reaction is to check whether I'm doing something the hard way and have a good long think about why I want to do whatever it is.

Assuming that I could in fact find sound reasons to deal with this much admin smell in this instance, I'd work around the system's lack of explicit file metadata preservation support using hooked scripts that maintain an extra metadata file as aubilenon suggests.
posted by flabdablet at 3:21 AM on December 21, 2020 [2 favorites]


We ran into this problem with a Git project that's preserving the ITS operating system, and ended up using a separate metadata file with a list of timestamps that get applied when building it - see this issue.
posted by offog at 3:36 AM on December 21, 2020 [5 favorites]


Check in each file with a git author date and commit date corresponding to its modification time, then run a script (which you’ll need to create) after checkout to set the file modification time to the git author date or commit date?
That will get you your historical record, and continue to allow you to list by file modification time even with new commits.
posted by dttocs at 3:41 AM on December 21, 2020 [1 favorite]


I don't think the question is about the version control but the filesystem underneath.

Migrating to a POSIX-compatible filesystem from MS's FAT/FAT32 should preseve modified times, be careful not to confuse 'accessed time' (atime) with ’modified time’ (mtime) and/or 'created time' (ctime) for the filesystem you move to, plus check the options rsync (or other) used when copying into the new filesystem.

(Edit, I'd have sworn that git uses the files' mtime and commit message in the address it gives each commit.)
posted by k3ninho at 3:54 AM on December 21, 2020 [2 favorites]


I think (like dttocs) I would commit each file to git with its last modified time as the commit time, and not worry about actual file modification times after that. I feel like I've seen code preservation projects take this approach in the past, but can't remember which ones.

If that sounds ok, you could use this script (via this SO question) to do the import -- something along the lines of perl file-fast-export.pl source/** | git fast-import in a new git repo.

And then if you do need the files back on disk with correct modification times for whatever reason, it looks like someone's figured out a bit of perl to do that.
posted by john hadron collider at 7:28 AM on December 21, 2020 [1 favorite]


Perforce does this with the +m filetype which you can set for your entire project. Perforce is used a lot in my industry (games) because it has good features for large archival data files. But it has various other issues that may not make it a good fit (expensive for large teams, weird branching)
posted by JZig at 7:55 AM on December 21, 2020 [1 favorite]


"be careful not to confuse 'accessed time' (atime) with ’modified time’ (mtime) and/or 'created time' (ctime) for the filesystem you move to"

Careful. Linux 'ctime' is 'change time'; unlike Windows, unix-like filesystems haven't traditionally had any notion of file creation time. (Though some filesystems do support it and the new 'statx' system call does provide a way to access it, as 'btime' (for birth time).)

Linux ctime differs from mtime in that it's updated on changes to metadata (like attributes), not just data. And also it's not supposed to be modified by users; you can set mtime however you'd like, but you can't change ctime through normal operating system interfaces. Instead, ctime is managed only by the filesystem itself, making it a more reliable measure of when a file's changed.

Anyway, I'd nth the recommendation of keeping timestamp information in a separate file.
posted by bfields at 11:38 AM on December 21, 2020 [4 favorites]


« Older Public radio piece about ICU doctors refusing end...   |   What could cause these symptoms? Newer »
This thread is closed to new comments.