Lars Kotthoff

Lars Kotthoff

larsko@cs.ubc.ca

BETA lab
Department of Computer Science
University of British Columbia


News

Our paper on Auto-WEKA has been accepted for publication in the Machine Learning Open Source Software track of JMLR.

I will visit the University of Wyoming and give an invited talk.

I will visit Portland State University and give an invited talk there.

I will be attending the mlr workshop 2017.

I will visit Leiden University and give an invited talk.

Our book on the results of the ICON project is now available!

I gave an invited talk on Algorithm Selection and Configuration at 1QBit (PDF slides).

Our paper on reinforcement learning for online algorithm selection has been awarded Best Paper Award at the Computational Intelligence and Data Mining workshop at the ITAT conference! Well done Hans!

Our paper on mlr has been accepted for publication in the Machine Learning Open Source Software track of JMLR.

I am supervising Science without Borders intern Lucas Juliano's work on Auto-WEKA.

I have been awarded (with Bernd Bischl and Joaquin Vanschoren) a Microsoft Azure research award worth $20,000.

We have just released version 2.0 of the Auto-WEKA software.

I am attending Dagstuhl Seminar 16412 on Automated Algorithm Selection and Configuration.

I am attending the mlr hackathon and workshop in Palermo, Italy.

Our paper on reinforcement learning for online algorithm selection has been accepted at the Computational Intelligence and Data Mining workshop at the ITAT conference.

I gave an invited talk on Algorithm Selection and Configuration at University College London (PDF slides).

I was visiting Mark Harman's group CREST, working with Yue Jia on deep parameter optimization.

I am an invited speaker at the ACP Summer School 2016 in Cork, Ireland.

I am organizing the LCI Forum seminar series.

I am supervising Science without Borders intern Oto Alves' work on Auto-WEKA.

Our paper on ASlib has been accepted for publication in the AIJ.

I am organizing (with Frank Hutter and Joaquin Vanschoren) the AutoML 2016 workshop at ICML 2016.

I am supervising (with Bernd Bischl) Mason Gallo for our mlr Google Summer of Code 2016 project on visualizing hyperparameter tuning traces.

Our paper on portfolios for subgraph isomorphism algorithms has been accepted at LION 10.

I am working on automatic parameter configuration for decentralized energy systems with visiting student Hannes Schwarz.

Our work on automatically tuning Javascript compilers has been accepted as a poster presentation at CGO 2016.

Our paper on using the Shapley value to analyze algorithm contributions has been accepted at AAAI 2016.

I am on the organizing committee for the OpenML workshop 2016.

I am teaching CSPSC 221 in winter term 2 2015.

I was visting Bernd Bischl for some hacking on mlr mostly.

I attended the COSEAL workshop 2015 (PDF slides).

I organized the ICON challenge on Algorithm Selection. The official results are described here with an (unofficial) update here.

Our paper on electoral district modelling with constraints has been accepted at Modref 2015 (PDF slides).

Our paper on trajectory mining with constraints has been accepted at CP 2015 (PDF slides).

I was presenting (with Pavel Brazdil, Christophe Giraud-Carrier, and Joaquin Vanschoren) a tutorial at ECMLPKDD 2015 (PDF slides).

I was co-organizing (with Pavel Brazdil, Christophe Giraud-Carrier, and Joaquin Vanschoren) the MetaSel 2015 workshop at ECMLPKDD 2015.

I am supervising (with Julia Schiffner) Zachary Jones for our mlr Google Summer of Code 2015 project on visualization.

Awards

I have given invited talks at IJCAI 11, the Oxford configuration workshop 2012, the Summer School on Experimental Methodology in Computational Science Research, MetaSel 2014, the ICON summer school, Dagstuhl seminar 14411, Dagstuhl seminar 16412, the University of Bologna, and University College London.

I have secured grants from AIJ to support several workshops I helped to organize.

I have secured research grants from Amazon Web Services and Rackspace totalling approximately $23,000. In addition, I have been awarded research grants from Microsoft Azure worth $80,000.

I have secured a travel grant worth €1,500 from Enterprise Ireland and a "New Foundations" award from the Irish Research Council.

I have won an EPSRC Doctoral Prize fellowship.

My collaborators and I have won the Best Paper Award at the Computational Intelligence and Data Mining workshop at the ITAT conference 2016.

I have won the Best Student Paper Prize at the Symposium on Combinatorial Search 2011.

Recent and selected publications

Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, Kevin Leyton-Brown
Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA
Journal of Machine Learning Research, 18(25):1−5, 2017
PDF, bibtex, abstract
WEKA is a widely used, open-source machine learning platform. Due to its intuitive interface, it is particularly popular with novice users. However, such users often find it hard to identify the best approach for their particular dataset among the many available. We describe the new version of Auto-WEKA, a system designed to help such users by automatically searching through the joint space of WEKA's learning algorithms and their respective hyperparameter settings to maximize performance, using a state-of-the-art Bayesian optimization method. Our new package is tightly integrated with WEKA, making it just as accessible to end users as any other learning algorithm.
Hans Degroote, Bernd Bischl, Lars Kotthoff, Patrick De Causmaecker
Reinforcement Learning for Automatic Online Algorithm Selection - an Empirical Study
ITAT Computational Intelligence and Data Mining workshop, Tatranské Matliare, High Tatras, Slovakia, September 2016
preprint PDF, bibtex, abstract
In this paper a reinforcement learning methodology for automatic online algorithm selection is introduced and empirically tested. It is applicable to automatic algorithm selection methods that predict the performance of each available algorithm and then pick the best one. The experiments confirm the usefulness of the methodology: using online data results in better performance. As in many online learning settings an exploration vs. exploitation trade-off, synonymously learning vs. earning trade-off, is incurred. Empirically investigating the quality of classic solution strategies for handling this trade-off in the automatic online algorithm selection setting is the secondary goal of this paper. The automatic online algorithm selection problem can be modelled as a contextual multi-armed bandit problem. Two classic strategies for solving this problem are tested in the context of automatic online algorithm selection: epsilon-greedy and lower confidence bound. The experiments show that a simple purely exploitative greedy strategy outperforms strategies explicitly performing exploration.
Bernd Bischl, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, Zachary M. Jones
mlr: Machine Learning in R
Journal of Machine Learning Research, 2016
preprint PDF, bibtex, abstract
The mlr package provides a generic, object-oriented, and extensible framework for classification, regression, survival analysis and clustering for the R language. It provides a unified interface to more than 160 basic learners and includes meta-algorithms and model selection techniques to improve and extend the functionality of basic learners with, e.g., hyperparameter tuning, feature selection, and ensemble construction. Parallel high-performance computing is natively supported. The package targets practitioners who want to quickly apply machine learning algorithms, as well as researchers who want to implement, benchmark, and compare their new methods in a structured environment.
Bernd Bischl, Pascal Kerschke, Lars Kotthoff, Marius Lindauer, Yuri Malitsky, Alexandre Fréchette, Holger Hoos, Frank Hutter, Kevin Leyton-Brown, Kevin Tierney, Joaquin Vanschoren
ASlib: A Benchmark Library for Algorithm Selection
Artificial Intelligence Journal, 2016 (in press)
preprint PDF, bibtex, abstract
The task of algorithm selection involves choosing an algorithm from a set of algorithms on a per-instance basis in order to exploit the varying performance of algorithms over a set of instances. The algorithm selection problem is attracting increasing attention from researchers and practitioners in AI. Years of fruitful applications in a number of domains have resulted in a large amount of data, but the community lacks a standard format or repository for this data. This situation makes it difficult to share and compare different approaches effectively, as is done in other, more established fields. It also unnecessarily hinders new researchers who want to work in this area.
To address this problem, we introduce a standardized format for representing algorithm selection scenarios and a repository that contains a growing number of data sets from the literature. Our format has been designed to be able to express a wide variety of different scenarios. To demonstrate the breadth and power of our platform, we describe a study that builds and evaluates algorithm selection models through a common interface. The results display the potential of algorithm selection to achieve significant performance improvements across a broad range of problems and algorithms.
Lars Kotthoff, Ciaran McCreesh, Christine Solnon
Portfolios of Subgraph Isomorphism Algorithms
Learning and Intelligent OptimizatioN 10, Ischia, Italy, June 2016
preprint PDF, bibtex, abstract
Subgraph isomorphism is a computationally challenging problem with important practical applications, for example in computer vision, biochemistry, and model checking. There are a number of state-of-the-art algorithms for solving the problem, each of which has its own performance characteristics. As with many other hard problems, the single best choice of algorithm overall is rarely the best algorithm on an instance-by-instance. We develop an algorithm selection approach which leverages novel features to characterise subgraph isomorphism problems and dynamically decides which algorithm to use on a per-instance basis. We demonstrate substantial performance improvements on a large set of hard benchmark problems. In addition, we show how algorithm selection models can be leveraged to gain new insights into what affects the performance of an algorithm.
Chris Fawcett, Lars Kotthoff, Holger Hoos
Hot-Rodding the Browser Engine: Automatic Configuration of Javascript Compilers
International Symposium on Code Generation and Optimization, Barcelona, Spain, March 2016 (poster presentation)
poster (PDF), abstract
Software systems in many application areas offer to the user a multitude of parameters, switches and other customisation hooks. Humans tend to have difficulties determining the best configurations of these parameters for particular applications. Modern optimising compilers are an example of such software systems; they have hundreds of configurable parameters which need to be tuned for optimal performance, but which interact in unpredictable ways and are often left at their default values.
In this work, we automatically determine compiler parameter settings that result in optimised performance for particular applications. Specifically, we apply a state-of-the-art automated parameter configuration procedure based on sequential, model-based, gradient-free optimisation techniques to two prominent JavaScript compilers and demonstrate that runtime performance improvements, more than 35\% in some cases, can be achieved over the default parameter settings on a diverse set of benchmarks. We investigate the performance of this approach applied to entire benchmark suites, as well as to individual benchmarks, showing that it is possible to obtain significant performance gains as the considered tasks get more specific. We also present an analysis of which compiler optimisation parameters (or parameter categories) are most responsible for the increased performance in each of our considered cases.
Alexandre Fréchette, Lars Kotthoff, Tomasz Michalak, Talal Rahwan, Holger H. Hoos, Kevin Leyton-Brown
Using the Shapley Value to Analyze Algorithm Portfolios
30th AAAI Conference on Artificial Intelligence, Phoenix, USA, February 2016
preprint PDF, bibtex, abstract
Algorithms for NP-complete problems often have different strengths and weaknesses, and thus algorithm portfolios often outperform individual algorithms. It is surprisingly difficult to quantify a component algorithm's contribution to such a portfolio. Reporting a component's standalone performance wrongly rewards near-clones while penalizing algorithms that have small but distinct areas of strength. Measuring a component's marginal contribution to an existing portfolio is better, but penalizes sets of strongly correlated algorithms, thereby obscuring situations in which it is essential to have at least one algorithm from such a set. This paper argues for analyzing component algorithm contributions via a measure drawn from coalitional game theory — the Shapley value — and yields insight into a research community's progress over time. We conclude with an application of the analysis we advocate to SAT competitions, yielding novel insights into the behaviour of algorithm portfolios, their components, and the state of SAT solving technology.
Lars Kotthoff, Barry O'Sullivan, S. S. Ravi, Ian Davidson
Complex Clustering Using Constraint Programming: Modelling Electoral Map Creation
ModRef 2015: The Fourteenth International Workshop on Constraint Modelling and Reformulation at the 21st International Conference on Principles and Practice of Constraint Programming, Cork, Ireland, August 2015
preprint PDF, bibtex, abstract
Traditional clustering is limited to a single collection of objects, described by a set of features under simple objectives and constraints. Though this setting can scale to huge data sets, many real world problems do not fit it. Consider the problem motivating this work: creating electoral district maps. Not only are two sets of objects (electoral districts and elected officials) separately clustered simultaneously under complex constraints, the clusters must be matched and it is required to find a global optimum. Existing formulations of clustering such as those using procedural languages or convex programming cannot handle such complex settings. In this paper we explore clustering this complex setting using constraint programming. We implement our methods in the Numberjack language and make use of large-scale solvers such as Gurobi which exploit multi-core architectures.
Lars Kotthoff, Mirco Nanni, Riccardo Guidotti, Barry O'Sullivan
Find Your Way Back: Mobility Profile Mining with Constraints
21st International Conference on Principles and Practice of Constraint Programming, Cork, Ireland, August 2015
preprint PDF, bibtex, abstract
Mobility profile mining is a data mining task that can be formulated as clustering over movement trajectory data. The main challenge is to separate the signal from the noise, i.e. one-off trips. We show that standard data mining approaches suffer the important drawback that they cannot take the symmetry of non-noise trajectories into account. That is, if a trajectory has a symmetric equivalent that covers the same trip in the reverse direction, it should become more likely that neither of them is labelled as noise. We present a constraint model that takes this knowledge into account to produce better clusters. We show the efficacy of our approach on real-world data that was previously processed using standard data mining techniques.
Lars Kotthoff, Pascal Kerschke, Holger Hoos, Heike Trautmann
Improving the State of the Art in Inexact TSP Solving using Per-Instance Algorithm Selection
Learning and Intelligent OptimizatioN 9, Lille, France, January 2015
preprint PDF, bibtex, abstract
We investigate per-instance algorithm selection techniques for solving the Travelling Salesman Problem (TSP), based on the two state-of-the-art inexact TSP solvers, LKH and EAX. Our comprehensive experiments demonstrate that the solvers exhibit complementary performance across a diverse set of instances, and the potential for improving the state of the art by selecting between them is significant. Using TSP features from the literature as well as a set of novel features, we show that we can capitalise on this potential by building an efficient selector that achieves significant performance improvements in practice. Our selectors represent a significant improvement in the state-of-the-art in inexact TSP solving, and hence in the ability to find optimal solutions (without proof of optimality) for challenging TSP instances in practice.
Ian P. Gent, Lars Kotthoff
Recomputation.org: Experience of Its First Year and Lessons Learned
Recomputability 2014, London, UK, December 2014
preprint PDF, bibtex, abstract
We founded recomputation.org about 18 months ago as we write. The site is intended to serve as a repository for computational experiments, embodied in virtual machines so that they can be recomputed at will by other researchers. We reflect in this paper on those aspects of recomputation.org that have worked well, those that have worked less well, and to what extent our views have changed on reproducibility in computational science.
Bilal Hussain, Ian P. Gent, Christopher Jefferson, Lars Kotthoff, Ian Miguel, Glenna F. Nightingale, Peter Nightingale
Discriminating Instance Generation for Automated Constraint Model Selection
20th International Conference on Principles and Practice of Constraint Programming, Lyon, France, September 2014
preprint PDF, bibtex, abstract
One approach to automated constraint modelling is to generate, and then select from, a set of candidate models. This method is used by the automated modelling system CONJURE. To select a preferred model or set of models for a problem class from the candidates, CONJURE uses a set of training instances drawn from the target class. It is important that the training instances are discriminating. If all models solve a given instance in a trivial amount of time, or if no models solve it in the time available, then the instance is not useful for model selection. This paper addresses the task of generating small sets of discriminating training instances automatically. The instance space is determined by the parameters of the associated problem class. We develop a number of methods of finding parameter configurations that give discriminating training instances, some of them leveraging existing parameter-tuning techniques. Our experimental results confirm the success of our approach in reducing a large set of input models to a small set that we can expect to perform well for the given problem class.
Peter George Johnson, Tina Balke, Lars Kotthoff
Integrating optimisation and agent-based modelling
28th European Conference on Modelling & Simulation, Brescia, Italy, May 2014
preprint PDF, bibtex, abstract
A key strength of agent-based modelling is the ability to explore the upward-causation of micro-dynamics on the macro-level behaviour of a system. However, in policy contexts, it is also important to be able to represent downward-causation from the macro and meso-levels to the micro, and to represent decision-making at the macro level (i.e., by governments) in a sensible way. Though we cannot model political processes easily, we can try to optimise decisions given certain stated goals (e.g., minimum cost, or maximum impact). Optimisation offers one potential method to model macro-level decisions in this way. This paper presents the implementation of an integration of optimisation with agent-based modelling for the example of an auction scenario of government support for the installation of photovoltaic solar panels by households. Auction type scenarios of this kind, in which large groups of individuals or organisations make bids for subsidies or contracts from government, are common in many policy domains.
Barry Hurley, Lars Kotthoff, Yuri Malitsky, Barry O'Sullivan
Proteus: A Hierarchical Portfolio of Solvers and Transformations
Eleventh International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) techniques in Constraint Programming, Cork, Ireland, May 2014
preprint PDF, bibtex, abstract
In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different encodings for representing CSPs as SAT instances. In this paper, we leverage advances in both SAT and CSP solving to present a novel hierarchical portfolio-based approach to CSP solving, which we call Proteus, that does not rely purely on CSP solvers. Instead, it may decide that it is best to encode a CSP problem instance into SAT, selecting an appropriate encoding and a corresponding SAT solver. Our experimental evaluation uses an instance of Proteus that involved four CSP solvers, three SAT encodings, and six SAT solvers, evaluated on the most challenging problem instances from the CSP solver competitions, involving global and intensional constraints. We demonstrate that significant performance improvements can be achieved by Proteus obtained by exploiting alternative view-point and solvers for combinatorial problem-solving.
Lars Kotthoff
Algorithm Selection for Combinatorial Search Problems: A Survey
AI Magazine, 2014
preprint PDF, bibtex, abstract
The Algorithm Selection Problem is concerned with selecting the best algorithm to solve a given problem instance on a case-by-case basis. It has become especially relevant in the last decade, with researchers increasingly investigating how to identify the most suitable existing algorithm for solving a problem instance instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where algorithm selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine algorithm selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which algorithm selection has been approached. This paper contrasts and compares different methods for solving the problem as well as ways of using these solutions.
Lars Kotthoff, Ian Gent, Ian Miguel
An Evaluation of Machine Learning in Algorithm Selection for Search Problems
AI Communications, 2012
preprint PDF, bibtex, abstract
Machine learning is an established method of selecting algorithms to solve hard search problems. Despite this, to date no systematic comparison and evaluation of the different techniques has been performed and the performance of existing systems has not been critically compared with other approaches. We compare the performance of a large number of different machine learning techniques from different machine learning methodologies on five data sets of hard algorithm selection problems from the literature. In addition to well-established approaches, for the first time we also apply statistical relational learning to this problem. We demonstrate that there is significant scope for improvement both compared with existing systems and in general. To guide practitioners, we close by giving clear recommendations as to which machine learning techniques are likely to achieve good performance in the context of algorithm selection problems. In particular, we show that linear regression and alternating decision trees have a very high probability of achieving better performance than always selecting the single best algorithm.
Lars Kotthoff
Hybrid Regression-Classification Models for Algorithm Selection
20th European Conference on Artificial Intelligence, Montpellier, France, August 2012
preprint PDF, bibtex, abstract
Many state of the art Algorithm Selection systems use Machine Learning to either predict the run time or a similar performance measure of each of a set of algorithms and choose the algorithm with the best predicted performance or predict the best algorithm directly. We present a technique based on the well-established Machine Learning technique of stacking that combines the two approaches into a new hybrid approach and predicts the best algorithm based on predicted run times. We demonstrate significant performance improvements of up to a factor of six compared to the previous state of the art. Our approach is widely applicable and does not place any restrictions on the performance measure used, the way to predict it or the Machine Learning used to predict the best algorithm. We investigate different ways of deriving new Machine Learning features from the predicted performance measures and evaluate their effectiveness in increasing performance further. We use five different regression algorithms for performance prediction on five data sets from the literature and present strong empirical evidence that shows the effectiveness of our approach.
Dharini Balasubramaniam, Christopher Jefferson, Lars Kotthoff, Ian Miguel, Peter Nightingale
An Automated Approach to Generating Efficient Constraint Solvers
34th International Conference on Software Engineering, Zurich, Switzerland, June 2012
preprint PDF, bibtex, abstract
Combinatorial problems appear in numerous settings, from timetabling to industrial design. Constraint solving aims to find solutions to such problems efficiently and automatically. Current constraint solvers are monolithic in design, accepting a broad range of problems. The cost of this convenience is a complex architecture, inhibiting efficiency, extensibility and scalability. Solver components are also tightly coupled with complex restrictions on their configuration, making automated generation of solvers difficult.
We describe a novel, automated, model-driven approach to generating efficient solvers tailored to individual problems and present some results from applying the approach. The main contribution of this work is a solver generation framework called Dominion, which analyses a problem and, based on its characteristics, generates a solver using components chosen from a library. The key benefit of this approach is the ability to solve larger and more difficult problems as a result of applying finer-grained optimisations and using specialised techniques as required.
Lars Kotthoff, Ian Miguel, Peter Nightingale
Ensemble classification for constraint solver configuration
16th International Conference on Principles and Practices of Constraint Programming, St Andrews, Scotland, September 2010
preprint PDF, bibtex, abstract
The automatic tuning of the parameters of algorithms and automatic selection of algorithms has received a lot of attention recently. One possible approach is the use of machine learning techniques to learn classifiers which, given the characteristics of a particular problem, make a decision as to which algorithm or what parameters to use. Little research has been done into which machine learning algorithms are suitable and the impact of picking the "right" over the "wrong" technique. This paper investigates the differences in performance of several techniques on different data sets. It furthermore provides evidence that by using a meta-technique which combines several machine learning algorithms, we can avoid the problem of having to pick the "best" one and still achieve good performance.
Ian Gent, Chris Jefferson, Lars Kotthoff, Ian Miguel, Neil Moore, Peter Nightingale, Karen Petrie
Learning When to Use Lazy Learning in Constraint Solving
19th European Conference on Artificial Intelligence, Lisbon, Portugal, August 2010
preprint PDF, bibtex, abstract
Learning in the context of constraint solving is a technique by which previously unknown constraints are uncovered during search and used to speed up subsequent search. Recently, lazy learning, similar to a successful idea from satisfiability modulo theories solvers, has been shown to be an effective means of incorporating constraint learning into a solver. Although a powerful technique to reduce search in some circumstances, lazy learning introduces a substantial overhead, which can outweigh its benefits. Hence, it is desirable to know beforehand whether or not it is expected to be useful. We approach this problem using machine learning (ML). We show that, in the context of a large benchmark set, standard ML approaches can be used to learn a simple, cheap classifier which performs well in identifying instances on which lazy learning should or should not be used. Furthermore, we demonstrate significant performance improvements of a system using our classifier and the lazy learning and standard constraint solvers over a standard solver. Through rigorous cross-validation across the different problem classes in our benchmark set, we show the general applicability of our learned classifier.

Software

I am maintainer of the FSelector R package.

I am author and maintainer of LLAMA, an R package to simplify common algorithm selection tasks such as training a classifier as portfolio selector.

I am one of the core contributors to the mlr R package.

I am leading the Auto-WEKA project, which brings automated machine learning to WEKA.

Other

If I'm not in the office, it's possible that you can find me in the jungle of Belize excavating and/or mapping Maya ruins. Check out the interactive map. Failing that, I might be making fancy drinks while fighting spider monkeys. I also like taking pictures.