SVM, HMM, and wavelets in bioinformatics
November 24, 2007 8:49 AM   Subscribe

I'd like to learn about SVM, HMM and wavelet algorithms in their usage in bioinformatics. I have a somewhat rudimentary understanding of the underlying mathematics from Wikipedia and a couple other sources, but I find that a concrete demonstration of their application fleshes it out. Are there texts, software packages (preferably free) or papers you'd recommend that: 1) includes good starter problem sets, or; 2) demonstrates how these techniques are used in research applications (beyond "we used xyz here"). Thanks!
posted by Blazecock Pileon to Science & Nature (7 answers total) 3 users marked this as a favorite
 
Best answer: I really don't have much idea what you are talking about, but googling R and bioinformatics brought up a short course as its first link. R is free statistical software package.
posted by Eringatang at 9:12 AM on November 24, 2007


Best answer: I highly recommend Biological sequence analysis by Durbin and co-workers which is the book on HMMs in bioinformatics. I thought of a few HMM papers but I doubt they will be helpful until after you read the book, which covers many applications as well.

The Eponine transcription start site finder uses a Relevance Vector Machine, which is similar to an SVM. Serafim Batzoglou does lots of SVM applications in bioinformatics.

I really don't have much idea what you are talking about

Then please don't answer the question. That doesn't really help.

posted by grouse at 9:41 AM on November 24, 2007


R is always helpful.
posted by Eringatang at 9:50 AM on November 24, 2007


Best answer: I don't have specific references on papers, but I can talk a little bit about software.

R has the Bioconductor set of packages, which are very useful for working with biological data, as well as the e1073 which implements an SVM. If you have access to Matlab, there are very good HMM and SVM toolboxes for Matlab (in my opinion, much better than R's). Most SVM implementations use libsvm or svmlight libraries to handle the heavy lifting, so if you feel like going straight to the source, you can take a look at those packages. The R and Matlab toolboxes will be wrappers around them.

A good caninical application of HMMs to biological data is gene finding, for which there should be good tutorials online. Take a look for something that makes sense to you.
posted by bsdfish at 10:17 AM on November 24, 2007


Response by poster: This is incredibly helpful. I am grateful to you all.
posted by Blazecock Pileon at 3:09 PM on November 24, 2007


My work is in signal processing and AI for acoustics, not bioinformatics, but let me also chime in with a vote for Matlab with its toolboxes for work in this area. I can personally recommend the toolboxes for wavelets and HMM, and I would imagine there are also good toolboxes out there for SVM. Also, don't forget that many researchers write their own toolboxes, so you'll sometimes find some really good stuff out there that is better than the inbuilt Matlab tools (which in many cases are already very good and well developed!)...

In terms of authors, I've been out of the field too long to remember many names, but I do remember there was a really good set of lectures by a woman called Daubaches (I think that's the right spelling, first name Irene) that were good for wavelets. For HMM and SVM, try looking at the literature for speech recognition systems, as HMM's are used extensively in the area and I seem to remember that SVM's were taking off in that area as I started to move out of it...

(As you travel along, if you do need more advice, please feel free to send me an e-mail or a mefimail! I'd love an excuse to get talking about the area again, since my recent research has moved towards education and it can all be a bit qualitative for my liking sometimes! :))
posted by ranglin at 12:12 AM on November 25, 2007


I use wavelet analysis extensively in my research, which is in climate science and hydrology. I went through a period of "wtf" research as well, and found this site quite helpful:
http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html

I do it all in MATLAB 7.0, (make sure it's up to date or it won't run some of the packages), and use additional software created by the gods of wavelets in my field, Torrence & Compo. You can download it for free at http://atoc.colorado.edu/research/wavelets/software.html
It also comes with a sample dataset to play around with (although it is sea surface temp, not genes or whatever).

Finally, I use additional bits to do cross-wavelet and coherence analysis (using 2 time series), written by A. Grinsted, and located at http://www.pol.ac.uk/home/research/waveletcoherence/

Hope that helps!
posted by hybridvigor at 2:07 PM on November 28, 2007


« Older I'm a little bit country, but New York's not.   |   Earth Science for kids? Newer »
This thread is closed to new comments.