Hydra: Algorithm Configuration for Portfolio-based selectionBioinformatics,
and
Empirical & Theoretical Algorithmics Laboratory
(ß-Lab) |
(Adapted from the SMAC quickstart guide)
Hydra tries to finds a portfolio of parameter settings that minimizes performance of portfolio-based algorithm selection. Hydra greedily adds new parameter settings to the portfolio that minimize the performance of the portfolio. In every iteration of Hydra, SMAC is used to look for a parameter setting that minimizes some performance metric when added to your portfolio (such as runtime, error, or cost).
unzip Hydra-1.0-development-{commit}-dist.zip
cd Hydra-1.0-development-{commit}-dist
{commit}
will be a sequence of numbers representing the latest commit.
cd example_scenarios/saps
../../bin/smac --scenario saps-scenario.txt
Inside the saps-scenario.txt file, are the following options:
algo = ruby ./saps_wrapper.rb
execdir = ./
deterministic = 0
run_obj = runtime
overall_obj = mean
algo-cutoff-time=20
wallclock-limit=50
rungroup=hydra_testing
paramfile = ./saps-params.pcs
instance_file = ./instances-train.txt
test_instance_file = ./instances-test.txt
algo
specifies script that takes an instance and configuration and evaluates the algorihtm. See Wrapper section for details. paramfile
specifies saps parameter configuration space file. Detailed information about how to specify parameters of the various types (continuous, integer, categorical, conditional, etc) can be found in Section 4.4 of the SMAC manual deterministic
specifies that the target algorithm is nondeterministic; SMAC will then execute promising configurations multiple times, using different seeds for a pseudo-random number generator. run_obj, overall_obj, algo-cutoff-time
specify that we're aiming to minimize mean runtime, with a maximal per-run cutoff of 20 second (this is a toy example) wallclock-limit
sets a wall clock timeout of 50 seconds for SMAC; in practice, we often set this to 172800 seconds (2 days). instance_file, test_instance_file
specifies a training and testing set of SAT instances resepectively
./instances-train.txt
file.
./instances/train/SWlin2006.10286.cnf
./instances/train/SWlin2006.19724.cnf
./instances/train/SWlin2006.2705.cnf
./instances/train/SWlin2006.4072.cnf
./instances/train/SWlin2006.8287.cnf
Each line in this file contains a string specifying an instance and in each algorithm run SMAC performs, it will pass one of these strings to the wrapper. Note that an instance here does not have to be an actual file. For example, in cross-validation, the instance file could simply list the number i on line i, and the wrapper could use that information to only evaluate cross-validation fold i in a single algorithm run.
Note that we only included 5 instances in this example set to keep the download size small, but normally we would choose it much larger in order to avoid over-tuning to this particular set.
Below is the ./instances-train.txt
file specifying a disjoint test set of instances for offline validation after SMAC finishes:
./instances/test/SWlin2006.11218.cnf
./instances/test/SWlin2006.12713.cnf
./instances/test/SWlin2006.18830.cnf
./instances/test/SWlin2006.4166.cnf
./instances/test/SWlin2006.4435.cnf
alpha real [1, 1.4] [1.189] log
rho real [0, 1] [0.5]
ps real [0, 0.2] [0.1]
wp real [0, 0.06] [0.03]
algo = ruby ./saps_wrapper.rb
specifies the *wrapper* that SMAC executes with a prespecified syntax in order to evaluate the algorithm to be optimized.
ruby ./saps_wrapper.rb ./instances/train/SWlin2006.19724.cnf 0 20.0 2147483647 397015 -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'
which yields the following result:
Result for SMAC: SUCCESS, 0.03, 36354, 0, 397015
When you run the above example, SMAC aims to find parameter settings (x1,x2,x3,x4) that minimize
the runtime of the saps solver, starting from the default values specified in the pcs file.
Whenever it finds a new better solution, it will output it, along with a sample call string
to the command line wrapper that can be executed on the command line.
[INFO ] Sample call for new incumbent config 1 (internal ID: 0x000D):
cd ./saps/.; ruby ./saps_wrapper.rb ./instances/train/SWlin2006.19724.cnf 0 20.0 2147483647 397015 -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'
As requested, SMAC will terminate after 50 seconds and output the final best solution it found.
SMAC has finished. Reason: wall-clock time limit (52.349 s) has been reached.
Total number of runs performed: 9, total configurations tried: 4.
Total CPU time used: 34 s, total wallclock time used: 52 s.
SMAC's final incumbent: config 3 (internal ID: 0x7555), with estimated mean runtime: 3.0900000000000003, based on 4 run(s) on 4 training instance(s).
Sample call for this final incumbent:
cd /Users/Chris/Documents/Summer-2014/Hydra/build/install/Hydra/example_scenarios/saps/.; ruby ./saps_wrapper.rb ./instances/train/SWlin2006.2705.cnf 0 20.0 2147483647 6540762 -alpha '1.2031003770758288' -ps '0.13798053069675698' -rho '0.6859183221146163' -wp '0.04875576413277028'
Now, we give full details for the wrapper inputs and outputs. SMAC calls wrappers through a command line call as follows (for more information see Section 5.1.1 of the manual):
<algo> <instance> <instance_specifics> <runtime cutoff> <runlength> <seed> <solver parameters>
where
ruby ./saps_wrapper.rb ./instances/train/SWlin2006.19724.cnf 0 20.0 2147483647 397015 -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'
which yields the following result:
Result for SMAC: SUCCESS, 0.03, 36354, 0, 397015
More generally, SMAC expects wrapper output in the following format (for more details see Section 5.1.2 of the manual):
Result for SMAC: <status>, <runtime>, <runlength>, <quality>, <seed>
where
./bin/smac --scenario <path-to-your-scenario-file>
Below, we walk through a hydra execution using the above SMAC saps scenario.
> cd example_scenarios/saps
> ./run_saps-local.sh
Inside the ./run_saps-local.sh
:
./../bin/hydra --num-iterations 4 --num-smac-runs 2 --num-configs-per-iter 1 --rungroup Hydra_Saps --num-run 1 --smacOptions ./saps-scenario.txt --smac-execution-options ./smac-execution-options-local.txt
This call string corresponds to a Hydra run with 4 iterations of SMAC (smac scenario file is ./saps-scenario.txt
). In each iteration, 2 SMAC executions (--num-smac-runs
) will be run and 1 incumbent (--num-configs-per-iter
) will added to the portfolio. The --smac-execution-options
must be set to a file containing options on how to run smac runs. To run locally, all the file needs to contain is:
tae=SMAC
See the distributed execution section for instructions on how to distribute your SMAC executions.
[INFO ] Running Hydra for 4 iterations
[INFO ] **************Hydra iteration: 0***************
[INFO ] Generating SMAC runs
[INFO ] Executing 2 SMAC runs
Next, 2 algorithm configurator (SMAC) runs will be run sequentially, which find a best configuration to be added to portfolio. For each SMAC run, you will see logging to stdout similar to:
SMAC started at: 18-Jun-2015 10:16:24 PM. Minimizing mean runtime.
[INFO ] First incumbent: config 1 (internal ID: 0xA31E1), with mean runtime: 0.03; estimate based on 1 runs.
[INFO ] Sample call for new incumbent config 1 (internal ID: 0xA31E1):
......
......
......
[INFO ] SMAC Execution Completed because: wall-clock time limit (53.79 s)
has been reached.
After the SMAC executions complete, the best 1 out of 2 configurations will be added to the portfolio based on the training performance. The user can choose how many configuration to run and how many to add to their portfolio with command-line parameters.
[INFO ] SMAC runs complete. Processing run results...
[INFO ] Finding best 1 incumbents out 2 SMAC runs
[INFO ] Incumbents sorted by training performance on runs executed by SMAC...
[INFO ] Performance: 0.06941176470588237, Configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] Performance: 3.311111111111112, Configuration: -alpha '1.189' -ps '0.1' -rho '0.5' -wp '0.03'
[INFO ] Found best incumbents.
[INFO ] Setting Best Single configuration. Only occurs in first Hydra iteration
[INFO ] Best single configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] Adding new incumbent configurations...
[INFO ] Adding configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
Then test performance will then be calculated based on a set of provided test instances. The user has the option to use VBS or zilla to test portfolio performance. The test performance of the portfolio will be printed once the test instances have been solved.
[INFO ] Getting portfolio performance
[INFO ] Test performance of hydra after iteration 0: 0.148
After the new configurations are added to the portfolio, the next iteration will begin in the same fashion with algorithm performance representing the VBS of all the configuration contained in the portfolio.
[INFO ] **************Hydra iteration: 1***************
......
......
......
After 4 iterations, the hydra terminated and prints out the final portfolio and test performance:
[INFO ] Writing current portfolio to: /Users/Chris/Documents/software/releases/ca/ubc/cs/beta/Hydra/1.0-development-5013393/Hydra-1.0-development-5013393-dist/example_scenarios/saps/hydra-output/Hydra_Saps/Hydra-1/Portfolios/portfolio-3.txt
[INFO ] Best single configuration: -alpha '1.3570229625905657' -ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] Final portfolio configurations
[INFO ] 1. Portfolio Configuration : -alpha '1.3570229625905657'
-ps '0.08955468111361481' -rho '0.4549265007328307' -wp '0.00844758766969784'
[INFO ] 2. Portfolio Configuration : -alpha '1.1223601730778767'
-ps '0.07888475911986773' -rho '0.8047314559241363' -wp '0.00628190839060591'
[INFO ] 3. Portfolio Configuration : -alpha '1.3778947812738795'
-ps '0.14691893797877198' -rho '0.29807307383168247' -wp '0.03949665635243472'
[INFO ] 4. Portfolio Configuration : -alpha '1.260589798073366'
-ps '0.03020086893063898' -rho '0.23304825431044507' -wp '0.0010791984714169822'
[INFO ] Hydra performance by iteration:
{0=0.148, 1=0.036000000000000004, 2=0.024, 3=0.022}
cd ./example_scenarios/saps
./run_saps-database.sh --pool hydra_test
...dispatch Hydra run that stores SMAC runs in a database.
cd ../../scripts
./dispatch_workers --wall-time 2:00:00:00 --num-workers 25 --pool hydra_test
...dispatches 10 workers (with ./launchworker script) onto a TORQUE cluster for 2 days. You may have to change this script to adapt to your computational environment. (Note: make sure your worker is running in your working directory). Both hydra and the worker jibs *must* have the same --pool parameter.manual.pdf
, found within the ./doc
directory.
tae=MYSQLDB from tae=SMAC
--mysqldbtae-pool
to ./hydra
call string and --pool
parameter to ./bin/mysql-worker executable. Both *must* be set to the same string so the worker knows where to find information to run SMAC.