, a bioinformatics primer. Would you suggest working out each solution from first principles, or do you think it makes more sense to look up existing algorithms and implement those?
One great feature of Rosalind problems is that your program has to find the solution in under 5 minutes. I am writing Matlab code, because Matlab is what I know. I realize that Matlab has shortcomings, speed among them. I can imagine going back and rewriting most of this code in C or using it as a way to learn Python/NumPy.
If I approach a problem "cold" and without foreknowledge, I usually write the "naïve" algorithm. In the case of problems where the naïve approach takes O(n²) or even O(n³) operations, this usually forces me to look for places to cut the fat and write more efficient code. So that's good.
On the other hand, the majority of these problems are very basic: find the longest common substring of a collection of strings, count the number of subtrees given a list of edges, that sort of thing. Very efficient solutions to these have existed for decades. Sometimes I'll look at the Rosalind forums and see that user ABC has used approach XYZ and his solution runs in 5 milliseconds. Sometimes this is prefaced with "I looked on Wikipedia and found XYZ algorithm…"
I don't know what makes the most sense for me to do, as a non-computer science, non-bioinformatics person. Doing the problems from scratch teaches me about complexity, but I feel like I'm missing important algorithm theory this way. I know this is impossible to answer as stated, but do most CS students already know about these basic algorithms and then just go and apply them as needed? How much knowledge of specific algorithms does a typical CS undergrad have? How would you suggest I approach these problems to reap the biggest benefit
?
posted by michaelh at 7:57 PM on January 9