Friday, January 4, 2013

Machine learning tools for biomedical data?

I got this nice email today from Xinghua Lou, a computational data scientist from Sloan-Kattering/Cornell Medical:

Hi Danny,
My name is Xinghua Lou and I follow your blog and watched your GraphLab demo at NIPS (particularly the interactive topic model demo, cool).
I am working for Sloan-Kettering/Cornell Med and we need machine learning for mining massive biomedical datasets (genomics, medical records, etc). I am making a survey of large scale machine learning toolboxes/frameworks and hope to have some suggestion from you.
I know your GraphLab, along with other toolboxes such as Wabbit from Yahoo, Shogun from MPI Germany, and recently Google's large deep networks - feels like the beginning of Skynet :). I am wondering if I have missed any other important tools. If so, please let me know.
Thanks a lot for your consideration. Wish you a nice weekend.
Best regards,Xinghua Lou

Addionally, a couple of weeks ago I had a very interesting conversation with a great guy (with the longest title I ever found in LinkedIn!)  Alon Ben Ari, Director of Regional anesthesia VA Puget Sound Health Care system at University of Washington at VA medical Center Seattle. VA hospital are also looking at large scale data analytics for finding the right ML tools to deploy.

My guess is that there are probably many other medical institutions who are right now looking for tools for large scale data analytics. Why don't we setup (a voluntary) task force to look identify common biomedical problems, existing machine learning tools, and what are the futures tools that are missing? Anyone who is interested in welcome to contact me.

Here is a list of some useful ML tools:
Definitely VW is a great tool for regression/classification. Liblinear is also a great too. Our GraphLab subproject GraphChi is gaining a lot of popularity recently.

And here are two additional tools, thanks to my mega collaborator JustinYan: sofia-ml is a great tool for combined regression and ranking. svm-perf is a good SVM solver that can directly minimize ROC and precision/recall measures.

No comments:

Post a Comment