Hydra: Algorithm Configuration for Portfolio-based selection

Bioinformatics, and Empirical & Theoretical Algorithmics Laboratory (ß-Lab)
Department of Computer Science
The University of British Columbia

 

Quickstart Guide to Hydra

(Adapted from the SMAC quickstart guide)
Hydra tries to finds a portfolio of parameter settings that minimizes performance of portfolio-based algorithm selection. Hydra greedily adds new parameter settings to the portfolio that minimize the performance of the portfolio. In every iteration of Hydra, SMAC is used to look for a parameter setting that minimizes some performance metric when added to your portfolio (such as runtime, error, or cost).

Contents

SMAC Running Hydra with SMAC

Setting up SMAC scenario

To use SMAC for this purpose you need to tell it about your algorithm's parameters and how to evaluate your algorithm's performance. Here, we'll show how to do this. First make sure you have the Hydra project and execute:

unzip Hydra-1.0-development-{commit}-dist.zip
cd Hydra-1.0-development-{commit}-dist

{commit} will be a sequence of numbers representing the latest commit.

Here, we'll walk through a basic example with the saps solver. To run the example saps SMAC scenario, execute the following command from the root folder:

cd example_scenarios/saps
../../bin/smac --scenario saps-scenario.txt

Inside the saps-scenario.txt file, are the following options:
  
algo = ruby ./saps_wrapper.rb
execdir = ./
deterministic = 0
run_obj = runtime
overall_obj = mean
algo-cutoff-time=20
wallclock-limit=50
rungroup=hydra_testing
paramfile = ./saps-params.pcs
instance_file = ./instances-train.txt
test_instance_file = ./instances-test.txt
  
algo specifies script that takes an instance and configuration and evaluates the algorihtm. See Wrapper section for details.
paramfile specifies saps parameter configuration space file. Detailed information about how to specify parameters of the various types (continuous, integer, categorical, conditional, etc) can be found in Section 4.4 of the SMAC manual
deterministic specifies that the target algorithm is nondeterministic; SMAC will then execute promising configurations multiple times, using different seeds for a pseudo-random number generator.
run_obj, overall_obj, algo-cutoff-time specify that we're aiming to minimize mean runtime, with a maximal per-run cutoff of 20 second (this is a toy example)
wallclock-limit sets a wall clock timeout of 50 seconds for SMAC; in practice, we often set this to 172800 seconds (2 days).
instance_file, test_instance_file specifies a training and testing set of SAT instances resepectively

Instances

Below is the ./instances-train.txt file.

./instances/train/SWlin2006.10286.cnf
./instances/train/SWlin2006.19724.cnf
./instances/train/SWlin2006.2705.cnf
./instances/train/SWlin2006.4072.cnf
./instances/train/SWlin2006.8287.cnf

Each line in this file contains a string specifying an instance and in each algorithm run SMAC performs, it will pass one of these strings to the wrapper. Note that an instance here does not have to be an actual file. For example, in cross-validation, the instance file could simply list the number i on line i, and the wrapper could use that information to only evaluate cross-validation fold i in a single algorithm run. Note that we only included 5 instances in this example set to keep the download size small, but normally we would choose it much larger in order to avoid over-tuning to this particular set. Below is the ./instances-train.txt file specifying a disjoint test set of instances for offline validation after SMAC finishes:

./instances/test/SWlin2006.11218.cnf
./instances/test/SWlin2006.12713.cnf
./instances/test/SWlin2006.18830.cnf
./instances/test/SWlin2006.4166.cnf
./instances/test/SWlin2006.4435.cnf

Parameters

SMAC natively handles a range of parameter types including categorical parameters (such as choices between different heuristics), continuous parameters (such as scaling factors), integer parameters (such as step sizes), and conditional parameters (parameters that are only relevant depending on the values of other "parent" parameters). The parameter --param-file specifies the .pcs (*p*arameter *c*onfiguration *s*pace) file, which lists the algorithm's parameters, their domains, and default values (one per line). Here have a few examples of parameters. Here, we have four continuous parameters, specified using the format <parameter_name real [lower bound, upper bound] [default]>:
  
alpha real [1, 1.4] [1.189] log
rho real [0, 1] [0.5]
ps real [0, 0.2] [0.1]
wp real [0, 0.06] [0.03]
  

Wrapper

The final piece in this configuration scenario is the wrapper around Saps. algo = ruby ./saps_wrapper.rb specifies the *wrapper* that SMAC executes with a prespecified syntax in order to evaluate the algorithm to be optimized.
This wrapper script takes an instantiation of the parameters as input, runs the algorithm with these parameters, and returns how well it did; since every algorithm has a different input and output format, this wrapper acts as a mediator between the algorithm and SMAC. SMAC executes the wrapper through a command line call. For example, to evaluate saps at (1.189,0.1,0.5,0.03), SMAC would make the equivalent of the following call:

ruby ./saps_wrapper.rb ./instances/train/SWlin2006.19724.cnf 0 20.0 2147483647 397015 -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'

which yields the following result:

Result for SMAC: SUCCESS, 0.03, 36354, 0, 397015

When you run the above example, SMAC aims to find parameter settings (x1,x2,x3,x4) that minimize the runtime of the saps solver, starting from the default values specified in the pcs file. Whenever it finds a new better solution, it will output it, along with a sample call string to the command line wrapper that can be executed on the command line.

[INFO ] Sample call for new incumbent config 1 (internal ID: 0x000D):
cd ./saps/.; ruby ./saps_wrapper.rb ./instances/train/SWlin2006.19724.cnf 0 20.0 2147483647 397015 -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'

As requested, SMAC will terminate after 50 seconds and output the final best solution it found.

SMAC has finished. Reason: wall-clock time limit (52.349 s) has been reached.
Total number of runs performed: 9, total configurations tried: 4.
Total CPU time used: 34 s, total wallclock time used: 52 s.
SMAC's final incumbent: config 3 (internal ID: 0x7555), with estimated mean runtime: 3.0900000000000003, based on 4 run(s) on 4 training instance(s).
Sample call for this final incumbent:
cd /Users/Chris/Documents/Summer-2014/Hydra/build/install/Hydra/example_scenarios/saps/.; ruby ./saps_wrapper.rb ./instances/train/SWlin2006.2705.cnf 0 20.0 2147483647 6540762 -alpha '1.2031003770758288' -ps '0.13798053069675698' -rho '0.6859183221146163' -wp '0.04875576413277028'

Now, we give full details for the wrapper inputs and outputs. SMAC calls wrappers through a command line call as follows (for more information see Section 5.1.1 of the manual):

<algo> <instance> <instance_specifics> <runtime cutoff> <runlength> <seed> <solver parameters>

where Thus, SMAC would, for example, call our saps wrapper like this:

ruby ./saps_wrapper.rb ./instances/train/SWlin2006.19724.cnf 0 20.0 2147483647 397015 -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'

which yields the following result:

Result for SMAC: SUCCESS, 0.03, 36354, 0, 397015

More generally, SMAC expects wrapper output in the following format (for more details see Section 5.1.2 of the manual):

Result for SMAC: <status>, <runtime>, <runlength>, <quality>, <seed>

where To help you write a robust wrapper quickly, we provide a generic wrapper that you can easily adapt to wrap your own algorithm.

Running Hydra with SMAC scenario

If you have reached this step and understand how to create a SMAC scenario, we recommend you create your own wrapper, .pcs file, and parameters for your application and test with SMAC before moving on to using Hydra. Once you have been able to get SMAC running, it should be very simple to move on to using Hydra. You can test your SMAC scenario from the root folder by:

./bin/smac --scenario <path-to-your-scenario-file>

Below, we walk through a hydra execution using the above SMAC saps scenario.

Basic execution

To get started, this section will walk the user through the execution of a very simple example. To execute the example locally, execute the following from the root of the folder:

> cd example_scenarios/saps
> ./run_saps-local.sh

Inside the ./run_saps-local.sh :

./../bin/hydra --num-iterations 4 --num-smac-runs 2 --num-configs-per-iter 1 --rungroup Hydra_Saps --num-run 1 --smacOptions ./saps-scenario.txt --smac-execution-options ./smac-execution-options-local.txt

This call string corresponds to a Hydra run with 4 iterations of SMAC (smac scenario file is ./saps-scenario.txt ). In each iteration, 2 SMAC executions (--num-smac-runs) will be run and 1 incumbent (--num-configs-per-iter) will added to the portfolio. The --smac-execution-options must be set to a file containing options on how to run smac runs. To run locally, all the file needs to contain is:

tae=SMAC

See the distributed execution section for instructions on how to distribute your SMAC executions.

Here we describe the output. After the java call string, you will see the following printed to stdout:

[INFO ] Running Hydra for 4 iterations
[INFO ] **************Hydra iteration: 0***************
[INFO ] Generating SMAC runs
[INFO ] Executing 2 SMAC runs

Next, 2 algorithm configurator (SMAC) runs will be run sequentially, which find a best configuration to be added to portfolio. For each SMAC run, you will see logging to stdout similar to:

SMAC started at: 18-Jun-2015 10:16:24 PM. Minimizing mean runtime.
[INFO ] First incumbent: config 1 (internal ID: 0xA31E1), with mean runtime: 0.03; estimate based on 1 runs.
[INFO ] Sample call for new incumbent config 1 (internal ID: 0xA31E1):
......
......
......
[INFO ] SMAC Execution Completed because: wall-clock time limit (53.79 s)
 has been reached.


After the SMAC executions complete, the best 1 out of 2 configurations will be added to the portfolio based on the training performance. The user can choose how many configuration to run and how many to add to their portfolio with command-line parameters.

[INFO ] SMAC runs complete. Processing run results...
[INFO ] Finding best 1 incumbents out 2 SMAC runs
[INFO ] Incumbents sorted by training performance on runs executed by SMAC...
[INFO ] Performance: 0.06941176470588237, Configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] Performance: 3.311111111111112, Configuration: -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'
[INFO ] Found best incumbents.
[INFO ] Setting Best Single configuration. Only occurs in first Hydra iteration
[INFO ] Best single configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] Adding new incumbent configurations...
[INFO ] Adding configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'

Then test performance will then be calculated based on a set of provided test instances. The user has the option to use VBS or zilla to test portfolio performance. The test performance of the portfolio will be printed once the test instances have been solved.

[INFO ] Getting portfolio performance
[INFO ] Test performance of hydra after iteration 0: 0.148

After the new configurations are added to the portfolio, the next iteration will begin in the same fashion with algorithm performance representing the VBS of all the configuration contained in the portfolio.

[INFO ] **************Hydra iteration: 1***************
......
......
......

After 4 iterations, the hydra terminated and prints out the final portfolio and test performance:

[INFO ] Writing current portfolio to: /Users/Chris/Documents/software/releases/ca/ubc/cs/beta/Hydra/1.0-development-5013393/Hydra-1.0-development-5013393-dist/example_scenarios/saps/hydra-output/Hydra_Saps/Hydra-1/Portfolios/portfolio-3.txt
[INFO ] Best single configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] Final portfolio configurations
[INFO ] 1. Portfolio Configuration :  -alpha '1.3570229625905657'
 -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] 2. Portfolio Configuration :  -alpha '1.1223601730778767'
 -ps '0.07888475911986773' -rho '0.8047314559241363' -wp '0.00628190839060591'
[INFO ] 3. Portfolio Configuration :  -alpha '1.3778947812738795'
 -ps '0.14691893797877198' -rho '0.29807307383168247' -wp '0.03949665635243472'
[INFO ] 4. Portfolio Configuration :  -alpha '1.260589798073366'
 -ps '0.03020086893063898' -rho '0.23304825431044507' -wp '0.0010791984714169822'
[INFO ] Hydra performance by iteration: 
{0=0.148, 1=0.036000000000000004, 2=0.024, 3=0.022}

Distributed Execution

To distribute Hydra's SMAC executions, Hydra provides users with the option to store SMAC runs in a database. The database is used to distribute the computation on a cluster by having jobs grabing smac runs from the database in a queue. Once you've setup your database, you can execute an example distributed Hydra run on a TORQUE cluster with the below sequence of call strings.

cd ./example_scenarios/saps
./run_saps-database.sh --pool hydra_test 

...dispatch Hydra run that stores SMAC runs in a database.

cd ../../scripts
./dispatch_workers --wall-time 2:00:00:00 --num-workers 25 --pool hydra_test

...dispatches 10 workers (with ./launchworker script) onto a TORQUE cluster for 2 days. You may have to change this script to adapt to your computational environment. (Note: make sure your worker is running in your working directory). Both hydra and the worker jibs *must* have the same --pool parameter.

For more detailed information to setup your database, see Section 2.8 of manual.pdf, found within the ./doc directory.

The key differences between the local example and this distributed example are: