Peter Carbonetto, Ph.D.
Computational Biologist
DNA Science Division
Ancestry.com

pcarbo -at- uchicago -dot- edu

I'm a computational biologist and member of the DNA Science Division at Ancestry.com. Previously, I was a postdoc and HFSP fellow working with Matthew Stephens and Abraham Palmer in the Department of Human Genetics at the University of Chicago. Prior to that, I was a Ph.D. student with Nando de Freitas in the Laboratory for Computational Intelligence at UBC. My work is focused on developing new quantitative approaches to advance our understanding of ourselves through variation in our genes and DNA.

  

Peter

Mailing address:
153 Townsend St, Ste 800 / San Francisco, CA 94107

github | google profile | linkedin | publications | theses | code | data

Publications

Clarissa Parker*, Peter Carbonetto*, Greta Sokoloff, Yeonhee Park, Mark Abney and Abraham Palmer. High-resolution genetic mapping of complex traits from a combined analysis of F2 and advanced intercross mice. Genetics, volume 198, pages 103-116, September 2014. (* indicates shared first authorship)

Peter Carbonetto, Riyan Cheng, Joseph Gyekis, Clarissa Parker, David Blizard, Abraham Palmer and Arimantas Lionikas. Discovery and refinement of muscle weight QTLs in B6 x D2 advanced intercross mice. Physiological Genomics, volume 46, pages 571-582, August 2014.

Peter Carbonetto and Matthew Stephens. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease. PLoS Genetics, volume 9, October 2013. Pubmed | HFSP article | code

Xiang Zhou, Peter Carbonetto and Matthew Stephens. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics, volume 9, February 2013.

Peter Carbonetto and Matthew Stephens. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, volume 7, March 2012, pages 73-108. code

Matthew Hoffman, Peter Carbonetto, Nando de Freitas and Arnaud Doucet. Inference strategies for solving SMDPs. NIPS Workshop on Probabilistic Approaches for Robotics and Control, December 2009.

Peter Carbonetto, Matthew King and Firas Hamze. A stochastic approximation method for inference in probabilistic graphical models. Neural Information Processing Systems 23, December 2009.

Peter Carbonetto, Mark Schmidt and Nando de Freitas. An interior-point stochastic approximation method and an L1-regularized delta rule. Neural Information Processing Systems 22, December 2008. (Note: the proof of asymptotic convergence that was originally published as an appendix in the original paper has a major flaw; the convergence proof remains an open question.) slides | code

Peter Carbonetto, Gyuri Dorkò, Cordelia Schmid, Hendrik Kück and and Nando de Freitas. Learning to recognize objects with little supervision. International Journal of Computer Vision, volume 77, May 2008, pages 219-237.

Peter Carbonetto and Nando de Freitas. Conditional mean field. Neural Information Processing Systems 19, December 2006, pages 201-208.

Peter Carbonetto, Jacek Kisynski, Nando de Freitas and David Poole. Nonparametric Bayesian Logic. 21st Conference on Uncertainty in Artificial Intelligence, July 2005, pages 85-93. This revision corrects a mistake in Fig. 5.

Peter Carbonetto, Gyuri Dorkò and Cordelia Schmid. Bayesian learning for weakly supervised object classification. Technical Report, INRIA Rhône-Alpes, July 2004.

Peter Carbonetto, Nando de Freitas and Kobus Barnard. A Statistical Model for General Contextual Object Recognition. 8th European Conference on Computer Vision, May 2004, part I, pages 350-362.1

Hendrik Kück, Peter Carbonetto and Nando de Freitas. A Constrained Semi-Supervised Learning Approach to Data Association. 8th European Conference on Computer Vision, May 2004, part III, pages 1-12.1

Peter Carbonetto and Nando de Freitas. Why can't José read? The problem of learning semantic associations in a robot environment. Human Language Technology Conference Workshop on Learning Word Meaning from Non-Linguistic Data, June 2003.

Peter Carbonetto, Nando de Freitas, Paul Gustafson and Natalie Thompson. Bayesian feature weighting for unsupervised learning, with application to object recognition. Workshop on Artificial Intelligence and Statistics, January 2003.

Theses

New probabilistic inference algorithms that harness the strengths of variational and Monte Carlo methods. Ph.D. thesis, University of British Columbia, August 2009.

Unsupervised Statistical Models for General Object Recognition. Masters thesis, University of British Columbia, August 2003.

Code

Variational inference for Bayesian variable selection in MATLAB and R. Companion code to my Bayesian Analysis (2012) paper. Includes routines for computing variational estimates of posterior statistics, and demonstrates how to run the full variational inference procedure for Bayesian variable selection in linear and logistic regression.

MATLAB code for on-line L1 regularization. Companion code to my research paper appearing at the 2008 NIPS conference (see below for data). Includes MATLAB functions for learning linear regressors and classifiers subject to L1 regularization, which acts as a form of feature selection. The linear regression is also known in the statistics community as the LASSO. The software package includes implementations of both batch learning and on-line learning, when the model parameters are rapidly adjusted at each iteration by looking at only a single training example. This software is licensed under the CC-GNU GPL version 2.0 or later.

Semi-supervised classification using a Bayesian kernel machine and data association constraints. Matlab implementation of the MCMC algorithms for simulating the Bayesian data association models described in the ECCV 2004 paper and the INRIA tech report (the data association model with hard group constraints), and Learning to classify individuals based on group statistics by Kuck and de Freitas (data association with group statistics). For a much more stable implementation in C, go here.

Gaussian belief propagation. Matlab code for running belief propagation on Gaussian Markov random fields.

Image Translation. Matlab package for generic object recognition using statistical translation models. See my Masters thesis for more information.

Feature Weighting using Shrinkage Priors. Matlab code for running EM on a mixture of Gaussians with Bayesian feature weighting priors. Used for the paper Bayesian feature weighting for unsupervised learning.

Multiple dispatch. An implementation of multiple dispatch in Java using the ELIDE framework. See here for the project report.

Fisher building

Data

TREC2005. Spam filtering data in MATLAB format. Used to evaluate my on-line logistic regression learning algorithm in the paper An interior-point stochastic approximation method and an L1-regularized delta rule. This data set was originally created by Gordon Cormack and Thomas Lynam as part of the 2005 TREC Spam Filter Evaluation Tool Kit, and contains data from 92,189 emails. The open source software SpamBayes was used to extract features from the emails. By downloading and using this data, you accept the terms of agreement for use of the 2005 TREC public spam corpus.

Corel. Object recognition data used for my Masters thesis and the paper A Statistical Model for General Contextual Object Recognition. Contains manual segmentations for evaluation and extracted featres. The Image Translation package contains code for reading the data into Matlab.

Robomedia. Object recognition data used for the Why can't José read? paper. Contains manual segmentations for evaluation and extracted featres. The Image Translation package contains code for reading the data into Matlab.

Face detection. Training data for robust object detection using the AdaBoost algorithm, as formalized by Viola and Jones. Includes Matlab code for reading the data. The project report is available here.

UBC Clocktower

Other work

MATLAB interface for PARDISO. PARDISO is a publicly available software library for solving large, sparse linear systems. It is particularly useful as a subroutine for interior-point methods. I designed a small interface so that the PARDISO solver is easily incorporated into your MATLAB programs.

MATLAB class for limited-memory BFGS. This little MATLAB class I wrote encapsulates all the functionality of limited-memory quasi-Newton methods. It is particularly well-suited for solving constrained optimization problems; I illustrate how it it is used within a primal-dual interior-point method for solving a constrained optimization problem that arises in maximum likelihood estimation. See here for more details on installing and using this software.

Intuition behind primal-dual interior-point methods for linear and quadratic programming. I'm quite aware of the fact that there are probably a hundred textbooks published every year that contain an introduction to linear programming, and there are many introductory presentations on interior-point methods. But I find they are all lacking in providing the key intuition. So I've written a short 7-page document which I'm confident fills a tiny bit of the void.

MATLAB code for solving constrained, convex programs. I wrote a simple, easy-to-use MATLAB function for minimizing a convex objective subject to convex inequality constraints. It uses a primal-dual interior-point method with a suitable merit function for ensuring global convergence (which is useful when it is not desirable to compute the Newton step using the full Hessian of the objective).

MATLAB code for second-order cone programming. I also implemented a simple primal-dual interior-point method in MATLAB for solving second-order cone programs. At each iteration, the solver follows the Newton search direction and makes sure that the iterates remain feasible (they satisfy all the inequality constraints).

MATLAB interface for IPOPT. IPOPT is a fantastic, new open source software package written in C++ for solving optimization problems with nonlinear objectives and subject to nonlinear constraints. IPOPT is short for Interior Point Optimizer. I've developed an interface so that IPOPT can be easily called from the MATLAB programming environment. You can download the current version of IPOPT from the project website.

Notes on probabilistic decoding of parity check matrices. A review of the basic concepts behind low-density parity check codes, and how to come up with a simple and reasonable method for probabilistic decoding. Assumes some familiarity with some ideas in statistical machine learning concepts and optimization.

A MATLAB interface for L-BFGS-B, a solver for bound-constrained nonlinear optimization problems that uses quasi-Newton updates with a limited-memory approximation to the Hessian.

A non-rigorous derivation of a variational upper bound on the log-partition function in eight parts. This is a brief exposé of Martin Wainwright's derivation of a convex alternative to generalized belief propagation (resulting in the so-called tree-reweighted belief propagation algorithm). The intent is to present the main mathematical steps in the derivation while keeping the presentation as "light" as possible.

Installing IPOPT on Mac OS X. Some of my experiences.

Creating, compiling and linking MATLAB executables (MEX files). A tutorial.

A Lesson in measure theory and change of variables. A technical note illustrating and explaining the subtleties in deriving a correct kernel for the snooker move used in population Monte Carlo.

Project webpage for Learning to recognize objects with little supervision.

How to partition and format an external hard drive for Mac OS X.

Note

1 © Springer-Verlag. Published in the Springer-Verlag Lecture Notes in Computer Science series.

Murals
18th c. murals from the Monastère franciscain de Cimiez, Nice.