Difference: Benchmarking (1 vs. 10)

Revision 102013-08-13 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Line: 47 to 47
 
  • the micro instance variation is consistent with Amazon's description: "Micro instances (t1.micro) provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available."
  • Variations between instances of the same type is also small (COV < 6%). This contradicts what some papers claimed. However, looking at the underlying CPU types show very little variation
  • CPUs mostly were Xeon E5-2650 with only a few Xeon E5645
Changed:
<
<
  • the region/availability zone must be fairly homogenous
>
>
  • the region/availability zone must be fairly homogeneous
 
Added:
>
>

Spear Run Experiment

Another small experiment to test variability of performance. This one focused on testing variations over short periods of time by repeatedly running many small Spear runs and checking the stability of the runtimes.
  • I tested four instance types: T1.micro, M1.small, M1.medium, M1.xlarge
  • four instances of each type were tested
  • each instance ran 80 problems from SW_verification/Wine through Spear
  • each problem was tested 30 times consecutively to measure for runtime variations
  • The median completion time for problems was 1.7s (wallclock) so it measured variations at a small scale
  • Both wall clock and CPU times were taken
The experiment results: SpearResultsWallclock.xls, SpearResultsCPUTime.xls
  • CoVs were calculated for all 80 problems on each instance as a measure of the performance variation over the course of the 30 runs
  • wallclock and CPU time results were very similar
  • Wallclock CoV (avg, max):
    • t1.micro : 64.0%, 145.2%
    • m1.small : 2.7%, 15.4%
    • m1.medium : 1.0%, 11.7%
    • m1.xlarge : 0.9%, 4.1%
  • Micro instances are very unstable, which is to be expected
  • other instance types seem to be fairly stable over time
  • only measured over the course of a few hours in a single region. Possibly a more busy time/region would see worse effects
 

Related Work

Closely Related

Changed:
<
<
Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
>
>
Runtime Measurements in the Cloud: Observing, Analysing, and Reducing Variance
 
  • Run a variety of benchmarks (CPU, Memory, Disk, Network) on a large number of small and large EC2 instances over the course of a month
Changed:
<
<
  • Results are stratified corresponding to the two CPU types: Xeon and Opertron
  • Xeon backed instances perform about twice as well as Opertron instances
>
>
  • Results are stratified corresponding to the two CPU types: Xeon and Opteron
  • Xeon backed instances perform about twice as well as Opteron instances
 
  • Distribution of processor types varies by availability zone
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications
  • Over a long time (hours) individual instances are stable in performance
  • Over a short time (minutes) instances are can have sharp dips in performance
  • Between instances of the same type, average performance can vary by a factor of 4
Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2
Changed:
<
<
  • Measures the distribution of processor types, both Xeon and Opertron and broken down into specific models
>
>
  • Measures the distribution of processor types, both Xeon and Opteron and broken down into specific models
 
  • Benchmarks the performance of each processor type
  • Checks CPU information located in '/proc/cpuinfo'; the VM hypervisor does not modify this
  • Outlines simple cost analysis of seeking out better performing instances
Line: 94 to 113
 -- Main.geschd - 05 Aug 2013

META FILEATTACHMENT attr="h" comment="" date="1376095762" name="UnixBenchResults.xls" path="UnixBenchResults.xls" size="22016" user="geschd" version="1.1"
Added:
>
>
META FILEATTACHMENT attr="h" comment="" date="1376434749" name="SpearResultsCPUTime.xls" path="SpearResultsCPUTime.xls" size="1013248" user="geschd" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1376434775" name="SpearResultsWallclock.xls" path="SpearResultsWallclock.xls" size="1034240" user="geschd" version="1.1"

Revision 92013-08-13 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Line: 35 to 35
 

Experiments

Unix Bench Experiment

Changed:
<
<
This small experiment was conducted to attempt to reproduce variability results found in other papers. It used the !UnixBench benchmark to assess the performance of an instance. UnixBench compiles scores obtained from many other benchmarks testing all aspects of a system and presents it as a single index score.
>
>
This small experiment was conducted to attempt to reproduce variability results found in other papers. It used the UnixBench benchmark to assess the performance of an instance. UnixBench compiles scores obtained from many other benchmarks testing all aspects of a system and presents it as a single index score.
 
  • I tested four instance types: T1.micro, M1.small, M1.medium, M1.xlarge
  • M1.xlarge instances have 4 cores. Unix bench ran twice on these instances: once using 1 core and once using all 4
  • four instances of each type were tested to test for homogeneity within instance types

Revision 82013-08-09 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Added:
>
>
 

Goals

Added:
>
>
We wish to perform algorithm configuration in a cloud setting. To do this effectively, we need to be able to deal with some issues that are found in cloud computing: the homogeneity and variability of virtual machines. We want to be able to:
  • determine when a run is unreliable due to VM fluctuations
  • determine when an instance is no longer reliable as a whole so we can release it
  • determine how instances differ so as to properly weight and compare runs from different instances
  • perform all measurements and diagnostics with as little overhead as possible
Overall, we hope to:
  • devise a method to run algorithm configuration in the cloud to obtain results that are consistent with what we would obtain from running on a local cluster
 

Ideas

Added:
>
>
  • Look at the instances CPU info and make some adjustment based on pre-existing/collected data on that CPU type
    • seems likely to provide some benefit, measurements from past research indicate this closely corresponds to performance
    • may or may not account for all inter-instance homogeneity, but will not account for temporal variation
  • Run short algorithm runs multiple times to build confidence in the result
    • One paper found that temporal variation was not significant over long periods of time but in the short-term it could be more of a factor
    • The shorter the run is the more redundancy we use
  • Run a large system benchmark (UnixBench?) at the launch of the instance to assess its performance
    • may or may not provided an accurate assessment of how it will perform on the target algorithm
    • if instances are consistent throughout their lifetime, one accurate measure at launch could be sufficient
  • Run multiple small benchmarks throughout the lifetime of an instance
    • if instances are inconsistent throughout their lifetimes, this method could adjust as it goes
    • could detect if variations are too high for our purposes and terminate an instance
  • Constantly run some form of monitoring program in the background alongside the target algorithm runs
    • could provide very fine grained data on the variations of the system
    • seems likely to cause its own interference by running alongside the target algorithm
  • A benchmark should preferably be very similar to the actual program we wish to run. However, this may not be possible
    • we can't make any assumptions about the target algorithm because we want to be able to configure any possible algorithm
    • the only thing consistent in all cases is the configuration process itself, which we don't care about benchmarking or stabilizing
    • we don't know what instructions and operations we will be executing and some machines are better at some things than others (IO, CPU, Disk). So we can't necessarily use a single benchmark/number to scale our results
 

Experiments

Added:
>
>

Unix Bench Experiment

This small experiment was conducted to attempt to reproduce variability results found in other papers. It used the !UnixBench benchmark to assess the performance of an instance. UnixBench compiles scores obtained from many other benchmarks testing all aspects of a system and presents it as a single index score.
  • I tested four instance types: T1.micro, M1.small, M1.medium, M1.xlarge
  • M1.xlarge instances have 4 cores. Unix bench ran twice on these instances: once using 1 core and once using all 4
  • four instances of each type were tested to test for homogeneity within instance types
  • each instance ran the benchmark 4 times to test for variation over time
  • The entire process was repeated a second time with 16 new instances for additional data and to see if variance occurred between days
The experiment results: UnixBenchResults.xls
  • Instances perform reliably over time with little (COV < 4%) variation except for micro instances
  • micro instances vary considerably (COV ~78%), starting strong for the first run and dropping for those after
  • the micro instance variation is consistent with Amazon's description: "Micro instances (t1.micro) provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available."
  • Variations between instances of the same type is also small (COV < 6%). This contradicts what some papers claimed. However, looking at the underlying CPU types show very little variation
  • CPUs mostly were Xeon E5-2650 with only a few Xeon E5645
  • the region/availability zone must be fairly homogenous
 

Related Work

Closely Related

Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
Line: 47 to 93
  -- Main.geschd - 05 Aug 2013
Added:
>
>
META FILEATTACHMENT attr="h" comment="" date="1376095762" name="UnixBenchResults.xls" path="UnixBenchResults.xls" size="22016" user="geschd" version="1.1"

Revision 72013-08-08 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Goals

Line: 42 to 42
 
  • Evaluated EC2 and related services, seeming to focus on data transfers between EC2 and S3
  • S3 delivers up to 5 times better performance to EC2 than outside locations, but its performance can vary significantly in all cases
Exploring the Performance Fluctuations of HPC Workloads on Clouds
Changed:
<
<
  • Measures the variability in runtimes of running various solvers on EC2 and futureGrid
>
>
  • Measures the variability in runtimes of running various solvers on EC2 and FutureGrid
 
  • Runtime fluctuation increases when more cores are used for a solver

-- Main.geschd - 05 Aug 2013

Revision 62013-08-08 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Changed:
<
<

Papers

>
>

Goals

Ideas

Experiments

Related Work

 

Closely Related

Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
  • Run a variety of benchmarks (CPU, Memory, Disk, Network) on a large number of small and large EC2 instances over the course of a month

Revision 52013-08-08 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Papers

Line: 33 to 33
 
  • Evaluates the EC2 cloud through a series of benchmarks to determine its suitability to scientific computing
  • Concluded that its performance and reliability were low compared to a dedicated cluster and thus less desirable
Resource Provisioning of Web Applications in Heterogeneous Clouds
Changed:
<
<
  • test
>
>
  • Measures heterogeneity of EC2 instances in terms of CPU and memory performance and proposes provisioning instances to tasks that would best suit instances
  • Focus on web services and maintaining the Service Level Objective
 An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS
Changed:
<
<
  • test
>
>
  • Evaluated EC2 and related services, seeming to focus on data transfers between EC2 and S3
  • S3 delivers up to 5 times better performance to EC2 than outside locations, but its performance can vary significantly in all cases
 Exploring the Performance Fluctuations of HPC Workloads on Clouds
Changed:
<
<
  • test
>
>
  • Measures the variability in runtimes of running various solvers on EC2 and futureGrid
  • Runtime fluctuation increases when more cores are used for a solver
  -- Main.geschd - 05 Aug 2013

Revision 42013-08-07 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Papers

Closely Related

Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
Changed:
<
<
  • test
>
>
  • Run a variety of benchmarks (CPU, Memory, Disk, Network) on a large number of small and large EC2 instances over the course of a month
  • Results are stratified corresponding to the two CPU types: Xeon and Opertron
  • Xeon backed instances perform about twice as well as Opertron instances
  • Distribution of processor types varies by availability zone
 EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications
Changed:
<
<
  • test
>
>
  • Over a long time (hours) individual instances are stable in performance
  • Over a short time (minutes) instances are can have sharp dips in performance
  • Between instances of the same type, average performance can vary by a factor of 4
Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2
  • Measures the distribution of processor types, both Xeon and Opertron and broken down into specific models
  • Benchmarks the performance of each processor type
  • Checks CPU information located in '/proc/cpuinfo'; the VM hypervisor does not modify this
  • Outlines simple cost analysis of seeking out better performing instances
More for Your Money: Exploiting Performance Heterogeneity in Public Clouds
  • Examines EC2 variation on three levels: inter-architecture, intra-architecture, and temporal
  • measures significant variation in each case
  • proposes both black-box(measured performance) and grey-box(based on knowledge of processor distributions) placement methods
  • evaluates placement methods in simulations and on EC2
 

Somewhat Related

How is the Weather tomorrow? Towards a Benchmark for the Cloud
Changed:
<
<
  • test
>
>
  • Discusses cloud services in general and what they offer
  • Lists important metrics to consider in cloud computing and how to go about writing a cloud-wide benchmark to measure these
 Benchmarking in the Cloud: What it Should, Can, and Cannot Be
Added:
>
>
  • Provides a general overview of benchmarks and what makes a good one
  • Lists challenges to consider when developing a benchmark for the cloud; focuses on benchmarking entire cloud rather than individual nodes
A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing
  • Evaluates the EC2 cloud through a series of benchmarks to determine its suitability to scientific computing
  • Concluded that its performance and reliability were low compared to a dedicated cluster and thus less desirable
Resource Provisioning of Web Applications in Heterogeneous Clouds
 
  • test
Changed:
<
<
A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing
>
>
An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS
  • test
Exploring the Performance Fluctuations of HPC Workloads on Clouds
 
  • test
Deleted:
<
<
  -- Main.geschd - 05 Aug 2013

Revision 22013-08-05 - geschd

Line: 1 to 1
 
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Papers

Line: 9 to 9
 
  • test
Benchmarking in the Cloud: What it Should, Can, and Cannot Be
  • test
Added:
>
>
A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing
  -- Main.geschd - 05 Aug 2013

Revision 12013-08-05 - geschd

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="EmpiricalAlgorithmics"

Benchmarking for AC on the Cloud

Papers

Closely Related

EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications

Somewhat Related

How is the Weather tomorrow? Towards a Benchmark for the Cloud
  • test
Benchmarking in the Cloud: What it Should, Can, and Cannot Be
  • test

-- Main.geschd - 05 Aug 2013

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback