Difference: Benchmarking (1 vs. 10)

Revision 102013-08-13 - geschd

Line: 1 to 1

META TOPICPARENT	name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Line: 47 to 47

the micro instance variation is consistent with Amazon's description: "Micro instances (t1.micro) provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available."
Variations between instances of the same type is also small (COV < 6%). This contradicts what some papers claimed. However, looking at the underlying CPU types show very little variation
CPUs mostly were Xeon E5-2650 with only a few Xeon E5645

Changed:

<
<

the region/availability zone must be fairly homogenous

>
>

the region/availability zone must be fairly homogeneous

Added:

>
>

Spear Run Experiment

Another small experiment to test variability of performance. This one focused on testing variations over short periods of time by repeatedly running many small Spear runs and checking the stability of the runtimes.

I tested four instance types: T1.micro, M1.small, M1.medium, M1.xlarge
four instances of each type were tested
each instance ran 80 problems from SW_verification/Wine through Spear
each problem was tested 30 times consecutively to measure for runtime variations
The median completion time for problems was 1.7s (wallclock) so it measured variations at a small scale
Both wall clock and CPU times were taken

The experiment results: SpearResultsWallclock.xls

, SpearResultsCPUTime.xls

CoVs were calculated for all 80 problems on each instance as a measure of the performance variation over the course of the 30 runs
wallclock and CPU time results were very similar
Wallclock CoV (avg, max):
- t1.micro : 64.0%, 145.2%
- m1.small : 2.7%, 15.4%
- m1.medium : 1.0%, 11.7%
- m1.xlarge : 0.9%, 4.1%
Micro instances are very unstable, which is to be expected
other instance types seem to be fairly stable over time
only measured over the course of a few hours in a single region. Possibly a more busy time/region would see worse effects

Related Work

Closely Related

Changed:

<
<

Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance

>
>

Runtime Measurements in the Cloud: Observing, Analysing, and Reducing Variance

Run a variety of benchmarks (CPU, Memory, Disk, Network) on a large number of small and large EC2 instances over the course of a month

Changed:

<
<

Results are stratified corresponding to the two CPU types: Xeon and Opertron
Xeon backed instances perform about twice as well as Opertron instances

>
>

Results are stratified corresponding to the two CPU types: Xeon and Opteron
Xeon backed instances perform about twice as well as Opteron instances

Distribution of processor types varies by availability zone

EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications

Over a long time (hours) individual instances are stable in performance
Over a short time (minutes) instances are can have sharp dips in performance
Between instances of the same type, average performance can vary by a factor of 4

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Changed:

<
<

Measures the distribution of processor types, both Xeon and Opertron and broken down into specific models

>
>

Measures the distribution of processor types, both Xeon and Opteron and broken down into specific models

Benchmarks the performance of each processor type
Checks CPU information located in '/proc/cpuinfo'; the VM hypervisor does not modify this
Outlines simple cost analysis of seeking out better performing instances

Line: 94 to 113

-- Main.geschd - 05 Aug 2013

META FILEATTACHMENT	attr="h" comment="" date="1376095762" name="UnixBenchResults.xls" path="UnixBenchResults.xls" size="22016" user="geschd" version="1.1"

Added:

>
>

META FILEATTACHMENT	attr="h" comment="" date="1376434749" name="SpearResultsCPUTime.xls" path="SpearResultsCPUTime.xls" size="1013248" user="geschd" version="1.1"
META FILEATTACHMENT	attr="h" comment="" date="1376434775" name="SpearResultsWallclock.xls" path="SpearResultsWallclock.xls" size="1034240" user="geschd" version="1.1"

Revision 92013-08-13 - geschd

Line: 1 to 1

META TOPICPARENT	name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Line: 35 to 35

Experiments

Unix Bench Experiment

Changed:

<
<

This small experiment was conducted to attempt to reproduce variability results found in other papers. It used the !UnixBench

benchmark to assess the performance of an instance. UnixBench compiles scores obtained from many other benchmarks testing all aspects of a system and presents it as a single index score.

>
>

This small experiment was conducted to attempt to reproduce variability results found in other papers. It used the UnixBench

benchmark to assess the performance of an instance. UnixBench compiles scores obtained from many other benchmarks testing all aspects of a system and presents it as a single index score.

I tested four instance types: T1.micro, M1.small, M1.medium, M1.xlarge
M1.xlarge instances have 4 cores. Unix bench ran twice on these instances: once using 1 core and once using all 4
four instances of each type were tested to test for homogeneity within instance types

Revision 82013-08-09 - geschd

Line: 1 to 1

META TOPICPARENT	name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Added:

>
>

Benchmarking for AC on the Cloud

Goals

Added:

>
>

We wish to perform algorithm configuration in a cloud setting. To do this effectively, we need to be able to deal with some issues that are found in cloud computing: the homogeneity and variability of virtual machines. We want to be able to:

determine when a run is unreliable due to VM fluctuations
determine when an instance is no longer reliable as a whole so we can release it
determine how instances differ so as to properly weight and compare runs from different instances
perform all measurements and diagnostics with as little overhead as possible

Overall, we hope to:

devise a method to run algorithm configuration in the cloud to obtain results that are consistent with what we would obtain from running on a local cluster

Ideas

Added:

>
>

Look at the instances CPU info and make some adjustment based on pre-existing/collected data on that CPU type
- seems likely to provide some benefit, measurements from past research indicate this closely corresponds to performance
- may or may not account for all inter-instance homogeneity, but will not account for temporal variation
Run short algorithm runs multiple times to build confidence in the result
- One paper found that temporal variation was not significant over long periods of time but in the short-term it could be more of a factor
- The shorter the run is the more redundancy we use
Run a large system benchmark (UnixBench?) at the launch of the instance to assess its performance
- may or may not provided an accurate assessment of how it will perform on the target algorithm
- if instances are consistent throughout their lifetime, one accurate measure at launch could be sufficient
Run multiple small benchmarks throughout the lifetime of an instance
- if instances are inconsistent throughout their lifetimes, this method could adjust as it goes
- could detect if variations are too high for our purposes and terminate an instance
Constantly run some form of monitoring program in the background alongside the target algorithm runs
- could provide very fine grained data on the variations of the system
- seems likely to cause its own interference by running alongside the target algorithm
A benchmark should preferably be very similar to the actual program we wish to run. However, this may not be possible
- we can't make any assumptions about the target algorithm because we want to be able to configure any possible algorithm
- the only thing consistent in all cases is the configuration process itself, which we don't care about benchmarking or stabilizing
- we don't know what instructions and operations we will be executing and some machines are better at some things than others (IO, CPU, Disk). So we can't necessarily use a single benchmark/number to scale our results

Experiments

Added:

>
>

Unix Bench Experiment

This small experiment was conducted to attempt to reproduce variability results found in other papers. It used the !UnixBench

benchmark to assess the performance of an instance. UnixBench compiles scores obtained from many other benchmarks testing all aspects of a system and presents it as a single index score.

I tested four instance types: T1.micro, M1.small, M1.medium, M1.xlarge
M1.xlarge instances have 4 cores. Unix bench ran twice on these instances: once using 1 core and once using all 4
four instances of each type were tested to test for homogeneity within instance types
each instance ran the benchmark 4 times to test for variation over time
The entire process was repeated a second time with 16 new instances for additional data and to see if variance occurred between days

The experiment results: UnixBenchResults.xls

Instances perform reliably over time with little (COV < 4%) variation except for micro instances
micro instances vary considerably (COV ~78%), starting strong for the first run and dropping for those after
the micro instance variation is consistent with Amazon's description: "Micro instances (t1.micro) provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available."
Variations between instances of the same type is also small (COV < 6%). This contradicts what some papers claimed. However, looking at the underlying CPU types show very little variation
CPUs mostly were Xeon E5-2650 with only a few Xeon E5645
the region/availability zone must be fairly homogenous

Related Work

Line: 47 to 93

-- Main.geschd - 05 Aug 2013

Added:

>
>

META FILEATTACHMENT	attr="h" comment="" date="1376095762" name="UnixBenchResults.xls" path="UnixBenchResults.xls" size="22016" user="geschd" version="1.1"

Revision 72013-08-08 - geschd

Line: 1 to 1

META TOPICPARENT	name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Goals

Line: 42 to 42

Evaluated EC2 and related services, seeming to focus on data transfers between EC2 and S3
S3 delivers up to 5 times better performance to EC2 than outside locations, but its performance can vary significantly in all cases

Exploring the Performance Fluctuations of HPC Workloads on Clouds

Changed:

<
<

Measures the variability in runtimes of running various solvers on EC2 and futureGrid

>
>

Measures the variability in runtimes of running various solvers on EC2 and FutureGrid

Runtime fluctuation increases when more cores are used for a solver

-- Main.geschd - 05 Aug 2013

Revision 62013-08-08 - geschd

Line: 1 to 1

META TOPICPARENT	name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Changed:

<
<

Papers

>
>

Goals

Ideas

Experiments

Related Work

Run a variety of benchmarks (CPU, Memory, Disk, Network) on a large number of small and large EC2 instances over the course of a month

Revision 52013-08-08 - geschd

Line: 1 to 1

META TOPICPARENT	name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Papers

Line: 33 to 33

Evaluates the EC2 cloud through a series of benchmarks to determine its suitability to scientific computing
Concluded that its performance and reliability were low compared to a dedicated cluster and thus less desirable

Resource Provisioning of Web Applications in Heterogeneous Clouds

Changed:

<
<

test

>
>

Measures heterogeneity of EC2 instances in terms of CPU and memory performance and proposes provisioning instances to tasks that would best suit instances
Focus on web services and maintaining the Service Level Objective

An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS

Changed:

<
<

test

>
>

Evaluated EC2 and related services, seeming to focus on data transfers between EC2 and S3
S3 delivers up to 5 times better performance to EC2 than outside locations, but its performance can vary significantly in all cases

Exploring the Performance Fluctuations of HPC Workloads on Clouds

Changed:

<
<

test

>
>

Measures the variability in runtimes of running various solvers on EC2 and futureGrid
Runtime fluctuation increases when more cores are used for a solver

-- Main.geschd - 05 Aug 2013

Revision 42013-08-07 - geschd

  META TOPICPARENT 
 name="EmpiricalAlgorithmics" 

 Benchmarking for AC on the Cloud 
 Papers 
 Closely Related 
Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
- META TOPICPARENT
+ name="EmpiricalAlgorithmics"
-<
<
+ test
->
>
+ Run a variety of benchmarks (CPU, Memory, Disk, Network) on a large number of small and large EC2 instances over the course of a month
  Results are stratified corresponding to the two CPU types: Xeon and Opertron
  Xeon backed instances perform about twice as well as Opertron instances
  Distribution of processor types varies by availability zone
 EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications
-<
<
+ test
->
>
+ Over a long time (hours) individual instances are stable in performance
  Over a short time (minutes) instances are can have sharp dips in performance
  Between instances of the same type, average performance can vary by a factor of 4
 
Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 
 Measures the distribution of processor types, both Xeon and Opertron and broken down into specific models
  Benchmarks the performance of each processor type
  Checks CPU information located in '/proc/cpuinfo'; the VM hypervisor does not modify this
  Outlines simple cost analysis of seeking out better performing instances
 
More for Your Money: Exploiting Performance Heterogeneity in Public Clouds 
 Examines EC2 variation on three levels: inter-architecture, intra-architecture, and temporal
  measures significant variation in each case
  proposes both black-box(measured performance) and grey-box(based on knowledge of processor distributions) placement methods
  evaluates placement methods in simulations and on EC2
  Somewhat Related 
How is the Weather tomorrow? Towards a Benchmark for the Cloud
-<
<
+ test
->
>
+ Discusses cloud services in general and what they offer
  Lists important metrics to consider in cloud computing and how to go about writing a cloud-wide benchmark to measure these
 Benchmarking in the Cloud: What it Should, Can, and Cannot Be
->
>
+ Provides a general overview of benchmarks and what makes a good one
  Lists challenges to consider when developing a benchmark for the cloud; focuses on benchmarking entire cloud rather than individual nodes
 
A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing 
 Evaluates the EC2 cloud through a series of benchmarks to determine its suitability to scientific computing
  Concluded that its performance and reliability were low compared to a dedicated cluster and thus less desirable
 
Resource Provisioning of Web Applications in Heterogeneous Clouds
  test
-<
<
+A Performance Analysis of EC2 Cloud Computing Services for Scientiﬁc Computing
->
>
+An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS 
 test
 
Exploring the Performance Fluctuations of HPC Workloads on Clouds
  test
-<
<
 -- Main.geschd - 05 Aug 2013