News Archives

[Colloquium] Predicate Invention and Transfer Learning

February 18, 2010

Watch Colloquium: 

Quicktime file (392 MB)
AVI file (401 MB)

  • Date: Thursday, February 18th, 2010 
  • Time: 11 am — 12:15 pm 
  • Place: Mechanical Engineering 218

Jesse Davis
Department of Computer Science and Engineering
University of Washingoton 

Abstract: Machine learning has become an essential tool for analyzing biological and clinical data, but significant technical hurdles prevent it from fulfilling its promise. Standard algorithms make three key assumptions: the training data consist of independent examples, each example is described by a pre-defined set of attributes, and the training and test instances come from the same distribution. Biomedical domains consist of complex, inter-related, structured data, such as patient clinical histories, molecular structures and protein-protein interaction information. The representation chosen to store the data often does not explicitly encode all the necessary features and relations for building an accurate model. For example, when analyzing a mammogram, a radiologist records many properties of each abnormality, but does not explicitly encode how quickly a mass grows, which is a crucial indicator of malignancy. In the first part of this talk, I will focus on the concrete task of predicting whether an abnormality on a mammogram is malignant. I will describe an approach I developed for automatically discovering unseen features and relations from data, which has advanced the state-of-the-art for machine classification of abnormalities on a mammogram. It achieves superior performance compared to both previous machine learning approaches and radiologists.

Bio: Jesse Davis is a post-doctoral researcher at the University of Washington. He received his Ph.D in computer science at the University of Madison in 2007 and a B.A. in computer science from Williams College in 2002. His research interests include machine learning, statistical relational learning, transfer learning, inductive logic programming and data mining for biomedical domains.