Difference: WestGrid (4 vs. 5)

Revision 52016-01-19 - cchris13

  META TOPICPARENT 
 name="EmpiricalAlgorithmics" 

 WestGrid - quick user guide
- META TOPICPARENT
+ name="EmpiricalAlgorithmics"
  Scheduler (Fairshare & RAC)
-<
<
+The WestGrid job scheduler is priority queue with a back fill mechanism. The scheduler will dispatch the highest priority job in the queue if there are sufficient resources for it to run. If there are insufficient resources to submit the highest priority job the scheduler will find the next highest priority job whose execution will not overlap with the approximate* earliest start of the original job (* since jobs can finish before their time cutoff the scheduler is using an upper bound of the earliest start time for a job). A job's priority is a weighted sum of several components. The most important of components (by weight) are requested resources and fairshare.
->
>
+The WestGrid job scheduler is priority queue with a back fill mechanism. The scheduler will dispatch the highest priority job in the "eligible jobs" queue if there are sufficient resources for it to run. If there are insufficient resources to submit the highest priority job, the scheduler will find the next highest priority job whose execution will not overlap with the approximate* earliest start of the original job (* since jobs can finish before their time cutoff the scheduler is using an upper bound of the earliest start time for a job). A job's priority is a weighted sum of processor equivalent hours discounted over a 10 day time period.
-<
<
+ Resource Usage
->
>
+ Requested Resources
-<
<
+At the moment the only resource request that will affect your jobs priority is the number of processors you request. This means the amount of time or memory you request will have no impact on your job's priority; though memory intensive runs are harder to dispatch regardless of priority.
->
>
+The resource that affects dispatching is processor-equivalent hours.
-<
<
+Somewhat counter-intuitively (at first) asking for more processors will increase your jobs priority. This is done to improve over-all cluster performance; multi-node jobs are far less likely to get dispatched by the back fill mechanism so must be given a higher priority to compensate. The current contribution to your jobs priority on Glacier and Orcinus is 100*<# of requested processors>.
->
>
+Processor-equivalent hours refers to number of processors your job will take away from the pool of resources. With small memory jobs, processor-equivalent hours are the same as processor hours, however, with high memory jobs, the memory left on a node becomes insufficient for the other processors to be utilized. the QDR nodes have 24 GB for 12 processors. Therefore there is 2GB for processor. If you use X GB, you will be counted as using max(# processors requested, X/2).
  Fairshare (& RAC)
-<
<
+A user's (or account's) fairshare value is weighted average of cluster usage in a set of disjoint time windows. For example; Orcinus and Glacier use 7 time windows that each last 36 hours with the following weights:
->
>
+A user's (or account's) fairshare value is a weighted average of cluster usage in a set of disjoint time windows. For example; Orcinus and Glacier use 7 time windows that each last 36 hours with the following weights:
  window 
 w1 
 w2 
 w3 
 w4 
 w5 
 w6 
 w7 


 weight 
 1.0 
 0.9 
 0.81 
 0.73 
 0.66 
 0.59 
 0.53
- window
+ w7
- weight
+.53
-<
<
+Note that these are not sliding windows. There is a set time where the current window ends and is rolled over. So for example 30 hours into the current window some user has 10% cluster use for w1 (current window) and 30% usage for windows w2 & w3 and 0% for all others their current fairshare value will be 0.108. If they stop using the cluster at this point their fairshare value will reset to 0 after 252+6 hours.
->
>
+One important note; a user's/account's cluster usage in a time window is a % of total usage of the cluster in the time window NOT a % of the available resources. So if the cluster is used by only 1 user in a time window they will be treated as having 100% usage for the window regardless of how many nodes they actually use. However, this is largely not of concern since orcinus typically has bery few idle cores.
-<
<
+One important note; a user's/account's cluster usage in a time window is a % of total usage of the cluster in the time window NOT a % of the available resources. So if the cluster is used by only 1 user in a time window they will be treated as having 100% usage for the window regardless of how many nodes they actually use.
->
>
+The fairshare value is used in conjunction with an account's fairshare target. Without a RAC this is set to be about 2% of a cluster. With a RAC this is set to whatever was awarded (i.e. 200 node RAC on Orcinus would give a target of ~2%). The fairshare component of a priority is then:
-<
<
+The fairshare value is used in conjunction with a user's/account's fairshare target. Without an RAC this is set to be about 1-2% of a cluster. With an RAC this is set to whatever was awarded (i.e. 300 node RAC on Orcinus would give a target of ~10%). The fairshare component of a priority is then:
 

 Without an RAC:
->
>
+ Without a RAC:
           (FS Weight) 
              * (FS User Weight) * ((FS User Target) - (FS User Value))
              * (FS Account Weight) * ((FS Account Target) - (FS Account Value))
-<
<
+ With an RAC:
->
>
+ With a RAC:
           (FS Weight) 
              * (FS User Weight) * ((FS User Target) - (FS User Value))
  FS User Weight = 50
  FS Account Weight = 100
-<
<
+The last important note is that fairshare values are specific to individual clusters on WestGrid; i.e. using Orcinus heavily will not affect your (or your group's) priority on Glacier or Lattice.
->
>
+The last important note is that fairshare values are specific to individual clusters on WestGrid; i.e. using Orcinus heavily will not affect your (or your group's) priority on other westgrid clusters.

 Priority of jobs within a single user's queue 

Using qstat -u (username), you can look at your queue.

Below is an example queueing state:

6111297                v7q8    Running     1     3:04:25  Thu Jan 14 17:46:04
6111294                v7q8    Running     1     3:04:25  Thu Jan 14 17:46:04
6111295                v7q8    Running     1     3:04:25  Thu Jan 14 17:46:04
6111293                v7q8    Running     1     3:04:25  Thu Jan 14 17:46:04
6111296                v7q8    Running     1     3:04:25  Thu Jan 14 17:46:04

5 active jobs          5 of 9616 processors in use by local jobs (0.5%)
                        919 of 931 nodes active      (98.71%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME

6111302                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111299                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111300                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111298                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111301                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49

5 eligible jobs

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME

6111303                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111304                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111305                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111306                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49
6111307                v7q8       Idle     1     6:00:00  Tue Jan 12 13:10:49

5 blocked jobs


The jobs with "eligible" status are the only jobs which the dispatch system considers for allocation. The dispatch system doesn't know about your "blocked" jobs until they are upgraded to have "eligible" job status. The dispatcher will take the highest priority job among all eligible jobs and dispatch if there are sufficient resources. As described earlier, other lower priority jobs may ony be dispatched if their walltimes do not exceed the minimum startime of higher priority jobs (minimum of runtime of currentlty running jobs that would free enough space for higher priority job). This means that the user must wait for high resource "eligible" jobs to be dispatched before lower resource "blocked" jobs can be.

 Tracking Dispatching and Priority 

To see the usage of your group or individual account, first navigate to:
cd /global/system/info/
Within the ./fair_share subdirectory, there are files with "fair share" information for every day in the current month and further subdirectories containing files going back 5 years. 

Each file contains the % usage info for all accounts and users according to the time window weighting scheme describes above. If you grep the file for your account and user, you will get something like so:

FSInterval        %     Target       0       1       2       3       4       5       6
-------------
ACCT
-------------
gdx-911-ae       11.46   7.50+   12.62   15.47   12.12    7.61   10.05   11.89    8.78
gdx-911-aa*       2.11   0.50-    1.40    3.28    3.90    3.33    1.27    0.00 -------

USER
-------------
v7q8              3.90 -------    4.58    5.81    4.49    1.78    3.75    4.16    1.71


Account gdx-911-ae has 11.46 % process equivalent usage over orcinus weighted over the 7 36 hour time windows. The target usage of the dispatch system for account gdx-911-ae is 7.50+ (at least 7.5) processor equivalent percentage of orcinus. As of January, 2016, orcinus has 10,000 processors. Thus 7.5% allocation means the account should have ~750 processor allocated to it at any one time.

Within the ./stats subdirectory, contains files with detailed usage information for all the accounts. If you grep for your accounts, the file will display usage as so:

           |--------- Active ------|---------------------------- Completed -------------------------------|
acct         Jobs Procs ProcHours    Jobs    %    PHReq       %     PHDed      %   FSTgt    AvgXF   AvgQH
gdx-911-ae    107   481   13981.3     373   8.9   2.35K     0.16    8.95K    2.05  2.00+    14.5    70.2
gdx-911-aa     73   292   27376.5      -0  -0.0  ------    -0.00    7.65K    1.75  2.00     -0.0    -0.0

The number of jobs, processors (Procs), and processor equivalent hours (ProcHours) are show for active jobs. The "Completed" sections shows job stats for completed jobs aggregated over the whole year. PHReq corresponds to process hours requested. PHDed corresponds to prcoess hours dedciated to our group???
  Disk Quota 

Your disk quotas are based on the number of files, and not just the amount of disk space you use.
-<
<
+To check your quota, on glacier, execute the command:
->
>
+To check your quota on orcinus, type the following in a directory within your filesystem
-<
<
+/usr/lpp/mmfs/bin/mmlsquota  
where gpfs1 is glacier, gpfs2 is scratch, and gpfs3 is orcinus
->
>
+lfs quota -u v7q8 ./
-<
<
+A handy command line to see the number of files in your directories recursively is:
->
>
+A handy short script to see the number of files in your directories recursively is show below. The script obtains the number of files in all subdirectories and displays the 50 largest directories in terms of number of files.
-<
<
+=find / -xdev -type d -print0 |
->
>
+find / -xdev -type d -print0 |
   while IFS= read -d '' dir; do
    echo "$(find "$dir" -maxdepth 1 -print0 | grep -zc .) $dir"
  done |
  sort -rn |
-<
<
+  head -50=
->
>
+head -50
-<
<
+When managing disk usage, it may be useful to identify the most memory intensive file system locations. The command below sorts the folders in the directory by memory usage and prints the top 10 heaviest. This command could take substantial time to execute if there are large number of files in the sub directories.
->
>
+When managing disk usage, it may be useful to identify the most memory intensive file system locations. The command below sorts the folders in the directory by memory usage and prints the top 10 heaviest. This command could take substantial time to execute if there are a large number of files in the sub directories.
 du -sch .[!.]* * | sort -h -r | head -n 10
\ No newline at end of file

View topic | History: r6 < r5 < r4 < r3 | More topic actions...