Resources on how to break down inherited code?
September 15, 2019 9:57 PM   Subscribe

In a new job, I have inherited a large number of scripts (coded in R) which produce Excel files for various purposes. Understanding what any bit of it does is fine - it's well commented and usually isn't doing anything esoteric - but I'm finding it hard to get a "big picture" of what hundreds of lines of code are doing. Can you recommend any guides on how to break down and document something without getting overwhelmed?
posted by solarion to Work & Money (5 answers total) 8 users marked this as a favorite
 
Working Effectively with Legacy Code by Michael Feathers
posted by matildaben at 10:04 PM on September 15 [4 favorites]


In my experience legacy code will break down on its own plenty!

No but seriously, when I had to do this I just kept a text file that I'd describe some overview type descriptions of high level behaviors. Stuff like: "If feature X is enabled (off by default. Research question: does anyone at all use this?), module X uses Y to do Z"
posted by aubilenon at 10:56 PM on September 15


Pick a few scripts and build a TDD framework around them. You have some known inputs and known (correct?) outputs. Be able to fire off a test-everything and get back OK.

Then start refactoring the scripts. If you notice the same set of lines in multiple scripts, or find some chunk of lines that does one particular thing, wrap it up into a function and replace the original lines with a function call. Test again to ensure OK.

It's helpful for testing to turn scripts into Modules (or Modulinos). Most modern scripting languages can have a file be both a module (that's loadable as a module) and also be a standalone script (if run as a script instead of loaded as a module). This makes testing easier because now you want to expand your tests to in addition to testing the script as a whole, to also test the particular function of the modules you're making. Keep testing to ensure OK.

By the time you've refactored a few scripts, you may have made some generic modules that almost all of the scripts share (like argument parsing, reading input files, writing output files). Or some modules that hold certain functions that are used in many scripts.

You'll end up with a bunch of modules like:

MyCompany::Util
MyCompany::HardMath
MyCompany::EasyMath
MyCompany::Script::Script1
MyCompany::Script::Script2

And a bunch of plain-old-wrapper-scripts like:

#!/usr/bin/r
use MyCompany::Script::Script2 'run';
run();

But all of those modules will be pretty simple collections and chains of functions that you've of course documented along the way.

Refactoring in this way makes it easier to use other tools (maybe your editor) that can generate call-graphs. Or just a tag-list of functions that you can jump around between.

Well, at least that's what I do when inheriting a bunch of scripts that sorta all fall into the same domain of operation. Your scripts may already be about as minimal as possible already. But even a good 100 line script can benefit from chunking up bits into functions so you have one section of 10 function calls and 10 functions of 10 lines a piece that do a well-named thing.
posted by zengargoyle at 3:11 AM on September 16 [6 favorites]


Seconding 'take on some refactoring'. If the structure is amenable, I find sequence diagrams invaluable.
posted by j_curiouser at 4:03 AM on September 16 [3 favorites]


Since your current situation is R-specific, you might get some value playing with the packages described in this paper (note: I haven't used them myself).
posted by shelbaroo at 6:31 PM on September 18


« Older Recommend me some twisty mystery audiobooks please...   |   UK investment account to pay for nursing home. Newer »

You are not logged in, either login or create an account to post comments