Join 3,572 readers in helping fund MetaFilter (Hide)


Good habits for the highly effective computer scientist
August 31, 2012 3:08 PM   Subscribe

I'm a physics graduate student and my PhD work is going to involve a fair amount of programming. Aside from the coding part, what habits and skills do I need to develop to become a competent programmer?

I am currently a graduate student working toward my physics PhD. I have been played around with coding since I was in middle school, but never anything too serious, and have taken a few classes, but never done anything really intense. However, my research path has taken me toward the computational end of things, and while I am confident in my ability to understand algorithms and translate them into code, there's a lot of quick and dirty/poorly commented/amateurish programming in the academic community, and I don't want to be part of the problem. So what are some good resources that go beyond just the coding side in dealing with becoming a good at programming? What are the best practices? How do I learn the skills that make me more a professional programmer than an amateur hack?

Some of the key things I know to be important are:

Commenting/Documentation
I do try to write as many relevant details down about how the code works when coding up an algorithm, but it sometimes is all over the place. What are some systems of deciding what to when and where to make note of things? Are there any places where commenting on a certain aspect of a program is a necessity for another person's understanding that may not be obvious to me when I am writing the code? What sort of records are good to keep outside of comments in the code?

Testing
While I do come up with test problems to make sure my code is running correctly, I feel like I could save much time by having a plan for setting up tests in advance and automating them. What are some common strategies or techniques for doing this? I know bigger companies have software testing packages, but with my resources I'd probably be writing my own scripts for doing automated testing.

Code Organization
While I do know the basics of object oriented programming, and try to keep from having copy-pasted code that does similar things, I'm still at the point where my specific choices for which objects and routines to group together, naming conventions, etc. are made up on the spot and hardly consistent. Are there general guidelines or frameworks out there that I can work off of to develop my own, consistent style?

Version Control
I'm probably only going to be working on software by myself, so this isn't as big of an issue, but it would be good to have a system for keeping track of changes, and to know what to do/expect if I do end up working on a project with more than a couple of people.

If people have good online resources to share, I would really appreciate that. Books are good, too, though as I grad student I don't have a whole lot of extra spending money so a few choice works would be better than a broad array of options. Personal experience and advice is also great. I know it's a lot easier to make good habits than to break bad ones, so I'd be grateful for help getting started on the right foot.
posted by Zalzidrax to Computers & Internet (21 answers total) 42 users marked this as a favorite
 
Software Carpentry covers the topics you mention from the point of view (common to many starting PhD students) of people who are largely self-taught programmers, who may even know some of the principles of algorithms and computer science, but often don't know modern best practices for testing, modularization, documentation, version control, etc. They have lots of webcasts, but if you prefer to read, the old version of their course is still online.

p.s. even if you're working alone, version control is a big issue.
posted by caek at 3:33 PM on August 31, 2012 [7 favorites]


What language are you using?

For all aspects: I would try to keep it simple and avoid worrying about it — you're going to make mistakes, but I think it's generally easier to make those mistakes and then fix them than it is to predict the mistakes ahead of time. (Perhaps you like reading a lot of theory beforehand and you can apply it effectively — that doesn't really work for me.)

Version control

You need version control, even if just for yourself. Get used to committing frequently, because eventually it will save you.

I like bzr, but the most popular version control system at the moment is git (you may have heard of github?). Probably you will want to find a tutorial to get start. That said, don't worry about the fancy features — start simple!

Testing

Some people like testing more than others. Depending on what language you're writing in, there are probably a number of testing frameworks and methodologies available to you (more if you're working in something like Python or Ruby, less if you're working in something like C).

Again, try to keep it simple and low friction at first — you want to spend your time using it, rather than thinking about using it.

Commenting/documentation

There are various theories, but I try to organize things into comprehensible chunks (a function just does one thing) and comment on each chunk if it's not obvious what that chunk does (from it's name, say).

I find that revisiting old code can be really helpful in understanding what types of things need comments and documentation.

Code organization

This is really language-dependent and style-dependent. You'll need to figure out what works for you.

Generally, I pack things into the simplest structure I can. So, I don't create a class when a function will do. I don't always create classes for my data structures — often (in Ruby or Javascript) I have anonymous objects that hold my information for internal processing.



I realize I'm saying "try it and figure it out yourself", but I think what makes a programmer good is experience and focus on being a better programmer. So, jump in and do it, and as long as you keep trying to get better, you will.
posted by danielparks at 3:44 PM on August 31, 2012 [4 favorites]


Honestly: if you comment, version, distribute, bug fix and don't use FORTRAN in academia you're already on the side of the angels.
posted by cromagnon at 4:05 PM on August 31, 2012 [8 favorites]


You may want to check out the Top 10 books for coders from Stack Exchange.

In particular you may want to have a look at the Pragmatic Programmer listed in that list.

For testing, check out xUnit where x is j/a/c or whatever for your language.

Also have a look at the Joel Test. You can ignore the last 3 bits but good software should be reasonably close for the other bits.
posted by sien at 4:07 PM on August 31, 2012


You may actually want to look into Haskell as a physics language. You don't get stuck with preamble when you're dealing with collections (such as declaring int i to increment for loops, setting up the branch, etc.). I know that C has slowed me down back when I was studying physics because I would occasionally screw up on a multidimensional array index and it would be difficult to track down. Functional programming languages just make life simpler and it's fairly intuitive to understand coming from a physics background.
posted by DetriusXii at 4:24 PM on August 31, 2012


Read Clean Code and Pro Git, and you're pretty much set on all these questions.

Do use an actual unit test framework--there will be a light & simple one for the language of your preference.

And google up that language's most popular style guide / coding standard (typically a very short document explaining common conventions) and stick to it.

You don't mention whether you have experience with object-oriented design patterns, but I'm assuming your few classes covered them.
posted by Monsieur Caution at 4:26 PM on August 31, 2012 [2 favorites]


I have read a bunch of coding style and architecture and design pattern and testing books, and I worked in the software industry for a couple years, and, at the beginning of my PhD, I was really, really careful about stuff like this. But now, four years in, besides using source control religiously, I just have an uncommented, two-letter variable names, and a spaghetti mess.

This is out of necessity, because doing everything right is too damn slow.

Academic programming is often experimental programming, that is, endless tweaking. Building up infrastructure only slows you down, because it's all going to change in an hour or a month, anyway.

You need just enough structure so you can pick up where you left off (an hour or six months later) and so you don't code yourself into an unfixable mess as you start building up functions that use functions that use functions. This is definitely an art and reading those books helps, but it's a waaaaaaay different style than industry programming. Think: naming files correctly, using a directory structure that makes sense, knowing what your entry points are, and, if you have things that talk to each other asynchronously, know where to find the entry points to all those things.

Granted the above applies to scripting languages like Matlab and Python. If you do need to use something like C++, which I did for a year, you do need a lot of the industry guidelines, because slow is better than fucked.

You also need to be more careful if you're writing a library that other people will use in the future, or you're working with more than one person. It's also an issue if you really are building some sort of stable tool that you really don't think will change much and then you plan to use that tool as a foundation for something else. Your research might be different than mine. But I have to hack around a lot because the problem is very ill-defined.

Last but not least, I desperately wanted to use objects in the beginning because it does reduce cognitive load so much. But, for me, even object-oriented programming in C++ was computationally too slow. I had to go back to scripty imperative programming in order to get the results I wanted in under 3000 years. Again, YMMV.
posted by zeek321 at 4:49 PM on August 31, 2012 [8 favorites]


Sounds familiar. I learned programming as a physics student, and after I decided a physics PhD wasn't for me, became a programmer full time. I have some regrets on that score.

Anyhow, you're already doing better than most of the physics students and faculty (especially the faculty) that I encountered. I remember with especial fondness one postdoc who had taken my carefully thought out and modular C source files, concatenated everything into one monstrous blob, FORTRAN-style, randomly tweaked things until the compiler errors ceased, and gave it back to me to debug. Good times.

So, lesson one would be to take some time to learn the best practices of whatever language you're using. See the comments above on style guides, etc. On the other hand, don't let your initial lack of experience in this regard paralyze you. Dive in, but be prepared to revise (or "refactor" as the programmers say) as better approaches come to you.

My only other advice is to second the above on version control software. This shouldn't even be a debate. You simply must use some soft of version control, even if you're solo all the time. It will help you keep your source from getting cluttered with commented-out crap, and can give you some additional freedom to experiment. Also, you'll want to collaborate at some point, and familiarity with VC concepts will be helpful. The above suggestions are good (particularly the git suggestion) -- you might also check out mercurial.
posted by lex mercatoria at 4:59 PM on August 31, 2012 [1 favorite]


If you're going to be doing more calculation than simulation, you need to understand the weirdities of floating point arithmetic and how errors can creep in despite perfect programming logic. Read some books on scientific programming.
posted by Obscure Reference at 5:15 PM on August 31, 2012 [2 favorites]


I've used subversion and git, and I actually like subversion better when it's just me. Less typing when committing, unless I missed something. git/mercurial--distributed source control--are newer and sexier, though.
posted by zeek321 at 5:19 PM on August 31, 2012


All very helpful answers. Especially the bit about style guides. I've just glanced over the FORTRAN one and its been useful already.

For a bit more background, I'm currently adding on to a previous FORTRAN program to do some analysis of plasma simulations. So it may well end up being other people. I do expect to be running into C/C++ in the near future, too.

So, lesson one would be to take some time to learn the best practices of whatever language you're using. See the comments above on style guides, etc. On the other hand, don't let your initial lack of experience in this regard paralyze you. Dive in, but be prepared to revise (or "refactor" as the programmers say) as better approaches come to you.

That's kind of what prompted the question, actually. I finally got around to sorting out a big structural problem that was due to my inexperience with FORTRAN arrays - and realized the code was starting to get out of hand, and now was a good time to sort things out and do them (more) right.
posted by Zalzidrax at 5:47 PM on August 31, 2012


I'm really happy to see Software Carpentry as the very first answer here. Go through that site. Really, really, really.
posted by dmd at 6:00 PM on August 31, 2012


I'd take a look at the Test Anything Protocol (Wikipedia page) for testing, especially since you're using FORTRAN which may not have a native testing framework. TAP started out in Perl testing, but is really just a simple standard test program output format like:
ok 1
ok 2
not ok 3 - diagnostic message here
So if you write a little program that exercises your library and just prints out messages in the TAP format you can hand them off to a testing harness in some other framework in Perl/Python/Ruby or whatever that will give you nicer information. (like: 100 tests run, failing tests: 98, 99. and diagnostic messages for the failing tests) Which is just a step above just printing out pass/fail or whatever else you might come up with for testing FORTRAN code.

If you haven't gotten to it yet, hopefully a Makefile that has some sort of `make test` and/or `make test_verbose` so you can check your changes easily.
posted by zengargoyle at 7:01 PM on August 31, 2012


Version Control
I'm probably only going to be working on software by myself, so this isn't as big of an issue


Yes it is. Learn git.
posted by flabdablet at 7:38 PM on August 31, 2012 [4 favorites]


What flabdablet said. Use version control. Use version control.
posted by blob at 8:06 PM on August 31, 2012 [1 favorite]


Nobody said code reviews yet? Code reviews! Find at least one other person who is knowledgeable in your language, and sit down together on a regular basis to critique each others' code.
posted by parrot_person at 3:02 AM on September 1, 2012


Testing and code organisation are closely related. If your code is well organised, automated testing should be easy, and if you have good tests, rewriting your code becomes less error prone.

If you have short functions with understandable input/output and no side effects, then you can write unit tests to test it independently of the rest of the code.

The usual guidelines for OO are "high cohesion" (https://en.wikipedia.org/wiki/High_cohesion) and "low coupling" (https://en.wikipedia.org/wiki/Coupling_(computer_science)). These concepts apply equally well to functions and packages in imperative languages, and help you write modular code that is easier to work with.

If you are heavily commenting your code then you should ask yourself if the time would be better spent making the code itself easier to understand. How and when to comment is down to preference though; I find it useful for there to be comments above every method/class/file explaining what it is for.

If anything, fast changing, experimental software makes these things even more important. The last thing you want is to end up with a mess of spaghetti code, where you're too afraid to change anything in case it breaks something else.
posted by Unexpected Indent at 3:29 AM on September 1, 2012 [1 favorite]


Use Github for versioning/hosting/backups. Since you're a student, you can get two free years of their lower tiered paid account [micro], meaning that you will have pretty looking version control, and it can be one of three private repos. Link Here.

Use ReadTheDocs for hosting documentation. They have some useful suggestions in the "Getting Started" section. This is a common tool used by Real Programmers in order to help other people use and understand their work.

Have fun!
posted by oceanjesse at 9:14 AM on September 1, 2012


Oh and for the sake of your sanity, use a nice text editor like Sublime when writing your code.
posted by oceanjesse at 9:15 AM on September 1, 2012


Learn how to use a performance profiler for your language. A theoretical understanding of computational complexity (as applied to your language) is well and good, but given that you already have most of your input it's faster and better to just pull out a representational sample of your data and see which functions your program is spending the most time in. (We all know that IO is slow, but it really hits home when you see a graph of where all that time is going.)
posted by anaelith at 3:38 PM on September 1, 2012


Git is very useful even if you're developing by yourself. Let's say you're going to write a bunch of code and you aren't sure if it'll work. With git, you can just:

> git branch experimental_feature
> git checkout experimental_feature


…do a lot of work on your experimental feature:

> edit important.fortran
> edit very_important.fortran
> edit other_important_file.fortran


…and break it:

> ./runapp.fortran

ERROR: Your experiment was a giant failure! You
have to start all over again!


But they are WRONG. You DON'T have to start over. With the power of Git, you can just:

> git checkout master

And you're back to where you started!

If you push to GitHub, you are also safe from losing your work if your dog eats your hard drive.
posted by yaymukund at 5:27 PM on September 1, 2012 [2 favorites]


« Older This weekend, IKEA is offering...   |  What are the recent most popul... Newer »
This thread is closed to new comments.