Best language for highschool bioinformatics course?
June 25, 2009 11:22 AM   Subscribe

I'm teaching a course on bioinformatic programming for high schoolers. Which language should I teach it in? And do you have any nifty ideas for easy projects that fall under the bioinformatics header?

It's a six-week, twice-weekly internship for highschoolers with absolutely no programming experience. The catch is, it has to be a bioinformatics course. What language would be best? I think the two contenders are Perl and Python, though I'm open to other options.

Possible Perl Advantages:

- I know Perl, so I won't need to pick up another language and I'd be better at debugging it.
- If the kids go on to AP Comp Sci or a 101 course in college, the language is often Java. Both Perl and Java use C-style syntax.
- The bioinformatics online support for Perl seems better than for Python.
- Long-string manipulation and speed seem much better in Perl.
- Though complex Perl is scary-looking, well-writen basic perl doesn't seem that intimidating.

Possible Python Advantages:

- Less steep learning curve.
- Possibly more resources for beginner programmers.
- The whitespace-is-important thing and there-is-only-one-best-way-of-doing-something thing may be better for teaching good programming techniques.

Also, any basic bioinformatics projects you can think of are much appreciated. The goal is really to teach programming, but the internship requires that it be done through bioinformatic applications.
posted by bergeycm to Computers & Internet (20 answers total) 7 users marked this as a favorite
Ruby is also a contender.

In the grand scheme of things I think C# in Visual Studio Express is the best environment to become ensconced in going forward.

Alternatively, you could run your programming in a simulator environment, either Android or iPhone.

As for projects, from wikipedia:
  • data mining
  • machine learning algorithms
  • visualization
  • sequence alignment
  • gene finding . . .
  • modeling of evolution

posted by @troy at 11:31 AM on June 25, 2009

Best answer: Python, without a doubt.They'll spend less time hunting for mismatched parentheses and more time getting stuff done. I say this as someone who used to hack perl, but has been loving life after switching to Ruby. What python and ruby share is an emphasis on human readable code that is easy to understand. This will make a world of difference to new programmers.

Absolutely use Biopython to enable quick development by these kids. By using those libraries, you'll be enabling them to do cool things quickly, without having to write all of the code themselves. A few searches will come up with a bunch of tutorials and recipes hta may be helpful to you.

One activity suggestion off the top of my head: Give them two sequences, one with a single SNP, and have them align the sequences to find the mutation, then figure out which residue of the protein will be altered. Finally, have them query and retreive abstracts from NCBI to find out what the protein does, and what disease it might cause in the patient. Bringing it all the way to the disease level makes it real for the kids - they're solving a problem instead of just munging data.

Some visualization stuff might be nice too. Have them output PDB files and then look at the structures using VMD
posted by chrisamiller at 11:38 AM on June 25, 2009 [3 favorites]

No offense @troy, but if the object is to have these kids learn bioinformatics, C#/Visual Studio Express is about the worst platform imaginable, because virtually no bioinformaticians use it or write tools in it.

Yes, there is a Bio:Ruby, and while I love writing Ruby, the other implementations (BioPerl and BioPython) are much more complete, in my experience.
posted by chrisamiller at 11:41 AM on June 25, 2009

Best answer: I've always thought that javascript would be a good language for an intro to programming course because in a browser the feedback can be so much more visual than console output. Especially if you use SVG or VML (or a library like dojox.gfx that abstracts them.) And you've got all of the UI components of HTML readily available so the student can rig up buttons that fire off little snippets of code or get input from form fields, which is far more convenient than console prompts.

Plus, a browser is probably already installed, not only on the lab computers wherever you'll be teaching but on every computer every student has access to Mac or PC or otherwise.

Obviously since there isn't going to be any real bioinformatics software written in javascript it's got the same disadvantage as @troy's suggestion of C# but I would argue that simply learning the principles of programming in an engaging manner ought to be a priority and those principles will easily transfer to something like Perl or Python if they start doing "real" development.

The OpenScience Project may be a good resource for bioinformatics stuff.
posted by XMLicious at 11:51 AM on June 25, 2009

Best answer: Modeling an evolutionary process is probably the best idea for HSers. The underlying idea is simple, and it gets to many core programming tasks. Once you have the basics, you can do all kinds of fun things with it. Set up some kind of rule for mating compatibility, adaptation and you get speciation. Set up a disease "gene", run it forwards, sample, then do an easy association analysis; either case control or transmission disequilibrium. Show what the effect of even small gene flow is on population divergence. Make ARGs or other cool coalescent things. Get the distribution of popular population genetics tests (tajima's D, Fixation index).

Perl has too many idiosyncrasies and things that are Wrong but kinda work or break in subtle ways. Do it in Python or C. Do you want to deal with kids who google "how to do basic thing X in perl" and get ten different ways, some of which you've never seen?
posted by a robot made out of meat at 11:52 AM on June 25, 2009

Best answer: As much as I love Perl, I have to advocate Python here. It's just as good a language for string manipulation as Perl---in fact, its syntax and standard library for strings are much more intuitive than Perl's.

Perl's syntax seems more similar to Python's syntax that it is to Java's, but the similarity is only superficial. The idioms and patterns that your students will use when programming in Python will be much more similar to those in Java, if only because they're both deeply object-oriented. (Just the simple fact that strings are objects in Python, and use object-oriented syntax, will make understanding Java easier for your students, when/if it becomes necessary.)

I'm an educator, and having taught both Perl and Python (sometimes side-by-side), I will report that students almost universally have a much easier time learning Python. This applies to both beginners and experienced programmers: Perl is great for what it is, but it's loaded with weirdnesses that are difficult for newcomers to grasp, and out-of-step with experienced programmers' expectations.
posted by aparrish at 11:59 AM on June 25, 2009

but if the object is to have these kids learn bioinformatics, C#/Visual Studio Express is about the worst platform imaginable, because virtually no bioinformaticians use it or write tools in it.

The content should be independent of the tools of the day, unless this is a job-skills induction class for present professionals.

The current direction of the present programming paradigm is to string tools together via data interchange. This is rather advanced for beginners but starting out I'd go for the least weirdness in the tooling.

I've always thought that javascript would be a good language

ActionScript, perhaps. JS in eg. Firefox w/ Firebug is doable for pros, but there be a lot of weirdness there. Coming from decades of C/C++ and some PHP exposure, JS was a total blast for me. It's got good bones but is still user-hostile in certain areas.
posted by @troy at 12:03 PM on June 25, 2009

Best answer: Oh, Python. Again and again, Python. Not without its quirks. Against Perl, however, it stands as a calm, clear-eyed person who might check the doorknob twice upon leaving the house as compared to a guy who won't leave it, not because he has to slither atop newspapers stacked five feet high, but because it's Tuesday and he knows the mailman will be lingering after work to drain lymph fluids to lubricate his imp factory at the castle.

However, do not under any circumstances allow the students to glimpse the Lutz Learning Python book. I selected it as my first Python book, given that, hey, it's O'Reilly, how can I go wrong? I found the book actually detrimental to my understanding, to the point where I avoided taking the Python plunge for about a year. I've programmed, though with no great depth, in a nice little sample of languages, for many years, so perhaps I was somewhat shielded from the effect.

A book so bad, I won't even use it as a reference book.
posted by adipocere at 12:21 PM on June 25, 2009 [3 favorites]

Absolutely Python. I know a grad student who, like you, preferred Python, but was forced to learn Python as a teaching assistant for a bioinformatics class. He is now considering how to change all of his own work into Python.

And yes, the Lutz book is not very good.
posted by grouse at 12:32 PM on June 25, 2009

python. It'll give them a sense of the subject and there's a real possibility they'll get something semi-cool to run even though they've never programmed before and only have 6 weeks to learn. Ah, success. That's the feeling you want them to leave with!
posted by everythings_interrelated at 12:32 PM on June 25, 2009

I should also say that I have almost nine years of experience in bioinformatics, and chrisamiller is no slouch either. @troy is wrong; you should use the best tool for the job and not try to shoehorn it into more difficult environment because of a mistaken ideal that the content of the course being independent of the language. It's not.
posted by grouse at 12:36 PM on June 25, 2009

The content should be independent of the tools of the day, unless this is a job-skills induction class for present professionals.

That depends: is this a class that's focused on teaching kids how to program, or is it a class on the programming aspects of bioinformatics? If it's the latter, it will be much more useful for the kids to focus on BioPerl or BioPython, because most bioinformatics tools are in one or the other, and kids will immediately be able to take a look at the code for current tools and create easily compatible modules or programs or whatever. If they spend those 6 weeks learning C# or Ruby, they may have a better and more general grasp of programming, but they won't really be able to look at, modify, or add to existing bioinformatics programs. This is a brief (6-week, twice-weekly) intro class for kids with no experience, and so something focused on exposing them to real life problems and the sorts of tools in use now would almost certainly be more appropriate.

If you're specifically going to be dredging the NCBI database or dealing with huge globs of sequencing data, Perl has the advantage. For many of the other sort of apps mentioned by a robot made of meat, Python is probably better and less... quirky.
posted by ubersturm at 12:44 PM on June 25, 2009

Best answer: Sequence alignment is a good one (I've taught it to high school students and non-science-major college students) because the Needleman–Wunsch algorithm allows them to start with something graphical, so they'll have something visceral to model. Then I'd use the sequence alignments to recapitulate the Mitochondrial Eve dataset, so they can map human evolution and migration. Everyone can appreciate that.

Also, to swim against the tide here (computer people get so over-the-top contentious about languages, it's as tired as the mac vs PC debate) I think the bioperl package is a lot better developed than the biopython version, mostly becuase its been around longer. That may not matter for a bunch of high school kids (python is a little easier to learn) but I've found that for complex tasks, bioperl has already written a lot of the necessary code for you.
posted by overhauser at 12:48 PM on June 25, 2009

The content should be independent of the tools of the day, unless this is a job-skills induction class for present professionals.

That's a nice thought, but this is a six week course, designed for people with no programming experience. Maybe you or I could take what we learned in C# and translate it to Ruby or Java or Fortran, but these are kids without any CS background.

Yes, they need to walk out of the class with some very basic programming knowledge, but more importantly, they need to walk away with some real world results and examples that they can use to inform future study. Recreating string comparison libraries is pointless, boring work. As the old adage goes, let them stand on the shoulders of giants by leveraging the many resources that exist. Then you'll be able to have them do much more exciting projects, and ultimately, leave with a better conceptual understanding of what it is that bioinformatics is about.
posted by chrisamiller at 1:00 PM on June 25, 2009

If you're going to teach Smith-Waterman or Needleman-Wunsch, I will humbly suggest my own sequence alignment spreadsheet GALAXI (self-link) as a fun demo and teaching tool.
posted by grouse at 1:01 PM on June 25, 2009

Best answer: Having been both a high school teacher and a computer instructor at the corporate level, I think that some of you guys may be overestimating what kids are going to be able to learn in twelve days.

Also, bergeycm, I would suggest that whatever language you go with you account for the amount of time it will take you to get the lab set up - installing the environment and any development tools or libraries but also any materials needed for your curriculum like data sets or example files. And also remember that if you're going to have them do any homework at home you'll need to provide instructions on how to set everything up at home.

You might want to consider setting everything up within something like the Bochs emulator so that it's simply a matter of copying a folder's worth of files and maybe making a single desktop shortcut.
posted by XMLicious at 1:03 PM on June 25, 2009

Response by poster: Python's sounding more and more like the way to go. Thanks a ton for all the project ideas.

XMLicious, we're lending each kid a netbook already up and running with ubuntu and whatever language tools we decide to go with. On the days we don't meet, there will be assignments and two times during the day for e-mail contact (status updates, debugging help, etc).

We have access to a bunch of phylogenetic apps, which coupled with the modules from biopython should let them do some pretty interesting things, despite the limited time frame. If they walk out with a glimpse of real world bioinformatic applications or with an interest in learning more about programming, I'll consider it a success.
posted by bergeycm at 1:23 PM on June 25, 2009

Just finished a two week Bioinf seminar - We did Perl over two days, and most of the biologists were completely stumped by most of the basic CS concepts like data types and casting. I think that Python would have made all of this a lot more comprehensible.

Also agreeing with @troy that it seems what most people seem to do is string tools together - If you've got the bandwidth, why not teach people how to use the web-based/GPL tools?
posted by Orb2069 at 6:01 PM on June 25, 2009

Python's interactive console (or even better, ipython) is vastly better than anything I have seen for Perl. I think this is a strong argument for choosing it for an introductory programming course.
posted by PueExMachina at 9:24 PM on June 25, 2009

I'm going to go against the flow here and say Perl:

1. You already know it - this is a big deal
2. BioPerl is, IMHO, far more advanced than Bio(Python|Ruby|Java)
3. It is used much more (again, in my experience) than the other two for Real Life Bioinformatics(TM)

There's a more nebulous philosophical point about how Perl has traditionally been used as a kind of glue to stick other programs together into pipelines - which is exactly what a lot of bioinformatics work consists of.

I agree with others re. overestimating what you might be able to accomplish in 12 days. I run an annual beginning-Perl course in my department for a mixture of MSc students, PhD students and postdocs (10 weeks @ 1 day per week) and some of the stuff mentioned is optimistic. Last year our final exercise was to write a program that calculates GC skew for mitochondrial genes, making heavy use of bioPerl. MefiMail me if you'd like more details.
posted by primer_dimer at 2:47 AM on June 26, 2009

« Older Need a good weekend bike trip near Baltimore.   |   Patent bar prep on the cheap. Newer »
This thread is closed to new comments.