geopmpy(7) – global extensible open power manager python package
Description
An extension to the Global Extensible Open Power Manager (GEOPM), the geopmpy package provides several command line tools and other infrastructure modules for both setting up and launching a job utilizing GEOPM and post-processing the profiling data that is output from GEOPM. Presently the following command line tools are provided:
geopmlaunch
This script will invoke the GEOPM launcher for either the ALPS or SLURM resource managers. See geopmlaunch(1) for more information.
In addition, there is 1 infrastructure module provided:
io.py
This module provides tools for parsing and encapsulating report and trace data
into either simple structures or pandas.DataFrame
s. It can be used to parse
any number of files, and houses structures that can be queried for said data.
This module also houses certain analysis functions in the Trace
class for
extracting specific data. See the in-file docstrings for more info.
Installation
Source Builds
Building GEOPM with the build instructions posted on the GitHub site will put
the Python scripts in either the system path for Python, or in a subdirectory
of the "--prefix"
path. See the Environment section for more
information on how to set up this configuration.
Via PyPI
The geopmpy package can be installed via pip from PyPI with:
sudo pip install geopmpy
OR for an individual user install (not system wide)
pip install --user geopmpy
Environment
A note on PYTHONPATH
and PATH
:
If you are installing GEOPM into the system’s default paths for Python, etc.
then there is nothing to be done here. Otherwise, if you are using
--prefix=<PREFIX_PATH>
when you run configure then you must set your
PYTHONPATH
to the location of the built site-packages
directory. For
example, the PYTHONPATH
for a python 3.6 build may look like:
export PYTHONPATH=<PREFIX_PATH>/lib/python3.6/site-packages:${PYTHONPATH}
You must also set your PATH
variable to:
export PATH=<PREFIX_PATH>/bin:${PATH}
It is recommended to do this in your login script (e.g. .bashrc
).
Programming Interface
geopmpy.agent
- class geopmpy.agent.AgentConf(path, agent='monitor', options={})
Bases:
object
The GEOPM agent configuration parameters.
This class contains all the parameters necessary to run the GEOPM agent with a workload.
- path
The output path for this configuration file.
- options
A dict of the options for this agent.
- get_agent()
- get_path()
- write()
Write the current config to a file.
- geopmpy.agent.enforce_policy()
Enforce a static implementation of the agent’s policy. The agent and the policy are chosen based on the GEOPM environment variables and configuration files.
- geopmpy.agent.names()
Get the list of all available agents.
- geopmpy.agent.policy_json(agent_name, policy_values)
Create a JSON policy for the given agent.
This can be written to a file to control the agent statically.
- geopmpy.agent.policy_names(agent_name)
Get the names of the policies for a given agent.
geopmpy.io
GEOPM IO - Helper module for parsing/processing report and trace files.
- class geopmpy.io.AppOutput(traces=None, dir_name='.', verbose=False, do_cache=True)
Bases:
object
The container class for all trace related data.
This class holds the relevant objects for parsing and indexing all data that is output from GEOPM. This object can be created with a a trace glob string that will be used to search
dir_name
for the relevant files. If files are found their data will be parsed into objects for easy data access. Additionally apandas.DataFrame
is constructed containing all of the trace data. TheseDataFrame
s are indexed based on the version of GEOPM found in the files, the profile name, agent name, and the number of times that particular configuration has been seen by the parser (i.e. experiment iteration).- trace_glob
The string pattern to use to search for trace files.
- dir_name
The directory path to use when searching for files.
- verbose
A
bool
to control whether verbose output is printed to stdout.
- add_trace_df(tt, traces_df_list)
Adds a trace
DataFrame
to the tracking list.The report tracking list is used to create the combined
DataFrame
once all reports are parsed.- Parameters:
tt – The
Trace
object used to extract the TraceDataFrame
. ThisDataFrame
will be indexed and added to the tracking list.
- get_trace_data(node_name=None)
- get_trace_df()
Getter for the combined
DataFrame
of all trace files parsed.This
DataFrame
contains all data parsed, and has a complexMultiIndex
for accessing the unique data from each individual trace. For more information on this index, see theIndexTracker
docstring.- Returns:
Contains all parsed data.
- Return type:
- parse_traces(trace_paths, verbose)
- remove_files()
Deletes all files currently tracked by this object.
- class geopmpy.io.BenchConf(path)
Bases:
object
The application configuration parameters.
Used to hold the config data for the integration test application. This application allows for varying combinations of regions (compute, IO, or network bound), complexity, desired execution count, and amount of imbalance between nodes during execution.
- path
The output path for this configuration file.
- append_imbalance(hostname, imbalance)
Appends imbalance to the config for a particular node.
- Parameters:
hostname – The name of the node.
imbalance – The amount of imbalance to apply to the node. This is specified by a float in the range
[0,1]
. For example, specifying a value of 0.25 means that this node will spend 25% more time executing the work than a node would by default. Nodes not specified with imbalance configurations will perform normally.
- append_region(name, big_o)
Appends a region to the internal list.
- Parameters:
name – The string representation of the region.
big_o – The desired complexity of the region. This affects compute, IO, or network complexity depending on the type of region requested.
- get_exec_args()
- get_exec_path()
- get_path()
- set_loop_count(loop_count)
- write()
Write the current config to a file.
- class geopmpy.io.IndexTracker
Bases:
object
Tracks and uniquely identifies experiment configurations for
pandas.DataFrame
indexing.This object’s purpose is to examine parsed data for reports or traces and determine if a particular experiment configuration has already been tracked. A user may run the same configuration repeatedly in order to prove that results are repeatable and are not outliers. Since the same configuration is used many times, it must be tracked and counted to ensure that the unique data for each run can be extracted later.
The parsed data is used to extract the following fields to build the tracking index tuple:
(<GEOPM_VERSION>, <PROFILE_NAME>, <AGENT_NAME>, <NODE_NAME>)
If the tuple is not contained in the
_run_outputs
dict, it is inserted with a value of 1. The value is incremented if the tuple is currently in the_run_outputs
dict. This value is used to uniquely identify a particular set of parsed data when theMultiIndex
is created.- get_multiindex(run_output)
Returns a
MultiIndex
from thisrun_output
. Used inpandas.DataFrame
construction.This will add the current
run_output
to the list of tracked data, and return a unique muiltiindex tuple to identify this data in aDataFrame
.For
Trace
objects, the integer index of theDataFrame
is appended to the tuple.- Parameters:
run_output – The
Trace
object to produce an index tuple for.- Returns:
The unique index to identify this data object.
- Return type:
- reset()
Clears the internal tracking dictionary.
Since only one type of data (reports OR traces) can be tracked at once, this is necessary to reset the object’s state so a new type of data can be tracked.
- class geopmpy.io.RawReport(path)
Bases:
object
- agent_host_additions(host_name)
- dump_json(path)
- figure_of_merit()
- get_field(raw_data, key, units='')
- host_names()
- meta_data()
- raw_epoch(host_name)
- raw_region(host_name, region_name)
- raw_report()
- raw_totals(host_name)
- raw_unmarked(host_name)
- region_names(host_name)
- total_runtime()
- class geopmpy.io.RawReportCollection(report_paths, dir_name='.', dir_cache=None, verbose=True, do_cache=True)
Bases:
object
Used to group together a collection of related
geopmpy.io.RawReport
s.- static fixup_metadata(metadata, df)
- get_app_df()
- get_df()
- get_df_filtered(columns)
- get_epoch_df()
- get_unmarked_df()
- load_reports(reports, dir_name, dir_cache, verbose, do_cache)
- static make_h5_name(paths, outdir)
- parse_reports(report_paths, verbose)
- remove_cache()
- class geopmpy.io.Trace(trace_path, use_agent=True)
Bases:
object
Creates a
pandas.DataFrame
comprised of the trace file data.This object will parse both the header and the CSV data in a trace file. The header identifies the uniquely-identifying configuration for this file which is used for later indexing purposes.
Even though
__getattr__()
and__getitem__()
allow this object to effectively be treated like aDataFrame
, you must useget_df()
if you’re building a list ofDataFrame
s to pass topandas.concat()
. Using the raw object in a list and calling concat will cause an error.- trace_path
The path to the trace file to parse.
- static diff_df(trace_df, column_regex, epoch=True)
Diff the
DataFrame
.Since the counters in the trace files are monotonically increasing, a diff must be performed to extract the useful data.
- Parameters:
trace_df – The
MultiIndex
edpandas.DataFrame
created by theAppOutput
class.column_regex – A string representing the regex search pattern for the column names to diff.
epoch – A flag to set whether or not to focus solely on epoch regions.
- Returns:
With the diffed columns specified by
'column_regex'
, and an'elapsed_time'
column.- Return type:
- get_agent()
- get_df()
- static get_median_df(trace_df, column_regex, config)
Extract the median experiment iteration.
This logic calculates the sum of elapsed times for all of the experiment iterations for all nodes in that iteration. It then extracts the
DataFrame
for the iteration that is closest to the median. For inputDataFrame
s with a single iteration, the single iteration is returned.- Parameters:
trace_df – The
MultiIndex
edpandas.DataFrame
created by theAppOutput
class.column_regex – A string representing the regex search pattern for the column names to diff.
config – The
TraceConfig
object being used presently.
- Returns:
Containing a single experiment iteration.
- Return type:
- get_node_name()
- get_profile_name()
- get_start_time()
- get_version()
geopmpy.launcher
This module provides a way to launch MPI applications using the
GEOPM runtime by wrapping the call to the system MPI application
launcher. The module currently supports wrapping the SLURM srun
command, the ALPS aprun
command, and mpiexec
command provided by
Open MPI and Intel MPI. The primary use of this module
is through the geopmlaunch(1) command line executable which calls
the geopmpy.launcher.main()
function. See the geopmlaunch(1) man
page for details about the command line interface.
- class geopmpy.launcher.AprunLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
Launcher
- affinity_option(is_geopmctl)
Returns the
--cpu-binding
option foraprun
.
- exclude_list_option()
Returns a list containing the
-E
option foraprun
.
- host_file_option()
Returns a list containing the
-l
option foraprun
.
- launcher_command()
Returns
'aprun'
, the name of the ALPS MPI job launch application.
- node_list_option()
Returns a list containing the
-L
option foraprun
.
- num_node_option(is_geopmctl)
Returns a list containing the
-N
option foraprun
. Must be combined with the-n
option to determine the number of nodes.
- parse_launcher_argv()
Parse the subset of
aprun
command line arguments used or manipulated by GEOPM.
- preload_option()
- quiet_option()
Returns a list containing the
-q
option foraprun
.
- time_limit_option()
Returns a list containing the
-t
option foraprun
.
- class geopmpy.launcher.Config(argv)
Bases:
object
GEOPM configuration object. Used to interpret command line arguments to set GEOPM related environment variables.
- environ()
Dictionary describing the environment variables controlled by the configuration object.
- get_ctl()
Returns the geopm control method.
- get_policy()
Returns the geopm policy file/key.
- get_preload()
Returns True/False if the geopm preload option was specified or not.
- set_omp_num_threads(omp_num_threads)
Control the
OMP_NUM_THREADS
environment variable.
- unparsed()
All command line arguments except those used to configure GEOPM.
- class geopmpy.launcher.Factory
Bases:
object
- create(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, bootstrap=None)
- get_launcher_names()
- class geopmpy.launcher.IMPIExecLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
Launcher
Launcher derived object for use with the Intel(R) MPI Library job launch application
mpiexec.hydra
.- affinity_option(is_geopmctl)
Returns a list containing the command line options specifying the the CPU affinity for each MPI process on a compute node.
- bootstrap_option()
Returns a list containing the command line options specifying the bootstrap option for wiring-up the compute nodes.
- exclude_list_option()
Returns a list containing the command line options specifying the compute nodes to exclude from execution.
- job_name_option()
Returns a list containing the command line options specifying the name associated with the job for scheduler tracking purposes.
- launcher_command()
Returns
'mpiexec.hydra'
, the name of the Intel MPI Library job launch application.
- node_list_option()
Returns a list containing the
-w
option forsrun
.
- num_node_option(is_geopmctl)
Returns a list containing the command line options specifying the number of compute nodes.
- parse_launcher_argv()
Parse the subset of
mpiexec.hydra
command line arguments used or manipulated by GEOPM.
- preload_option()
- time_limit_option()
Returns a list containing the command line options specifying the maximum time that a job is allowed to run.
- timeout_option()
Returns a list containing the command line options specifying the length of time to wait for a job to start before aborting.
- class geopmpy.launcher.Launcher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
object
Abstract base class for MPI job launch application abstraction. Defines common methods used by all Launcher objects.
- affinity_list(is_geopmctl)
Returns CPU affinity prescription as a list of integer sets. The list is over MPI ranks on a node from lowest to highest rank. The process for each MPI rank is restricted to the Linux CPUs enumerated in the set. The output from this function is used by the derived class’s
affinity_option()
method to set CPU affinities.
- affinity_option(is_geopmctl)
Returns a list containing the command line options specifying the the CPU affinity for each MPI process on a compute node.
- bootstrap_option()
Returns a list containing the command line options specifying the bootstrap option for wiring-up the compute nodes.
- environ()
Returns the modified environment dictionary updated with GEOPM specific values.
- exclude_list_option()
Returns a list containing the command line options specifying the compute nodes to exclude from execution.
- get_alloc_nodes()
Returns a list of the names of compute nodes that have been reserved by a scheduler for current job context.
- get_idle_nodes()
Returns a list of the names of compute nodes that are currently available to run jobs.
- host_file_option()
Returns a list containing the command line options specifying the file containing the names of the compute nodes for the job to run on.
- init_governor()
Queries the compute nodes to determine the current CPU frequency governor.
- init_topo()
Determine the topology of the compute nodes that the job will be launched on. This is used to inform CPU affinity assignment.
- int_handler(signum, frame)
This interface enables specialized signal handling. If not overridden by derived class, then the default signal handler is used.
- job_name_option()
Returns a list containing the command line options specifying the name associated with the job for scheduler tracking purposes.
- launcher_argv(is_geopmctl)
Returns a list of command line options for underlying job launch application that reflect the state of the Launcher object.
- launcher_command()
Returns the name/path to the job launch application.
- node_list_delim()
Returns the delimiter that is to be used when constructing a node list string given a list of node names.
- node_list_option()
Returns a list containing the command line options specifying the names of the compute nodes for the job to run on.
- num_node_option(is_geopmctl)
Returns a list containing the command line options specifying the number of compute nodes.
- num_rank_option(is_geopmctl)
Returns a list containing the
-n
option which is defined in the MPI standard for all job launch applications to specify the number of MPI processes or “ranks”.
- parse_launcher_argv()
Parse command line options accepted by the underlying job launch application.
- partition_option()
Returns a list containing the command line options specifying the compute node partition for the job to run on.
- performance_governor_option()
Returns a list containing the command line options specifying that the Linux power governor should be set to performance.
- preload_option()
- quiet_option()
Returns a list containing the job launch option to suppress any end-of-job status messages that may interfere with parsing of stdout.
- reservation_option()
- run(stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)
Execute the command given to constructor with modified command line options and environment variables.
- run_compute_cmd(argv, num_node=None)
Run a command on the compute nodes.
- time_limit_option()
Returns a list containing the command line options specifying the maximum time that a job is allowed to run.
- timeout_option()
Returns a list containing the command line options specifying the length of time to wait for a job to start before aborting.
- class geopmpy.launcher.OMPIExecLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
Launcher
Launcher derived object for use with Open MPI project launch application
mpiexec
.- affinity_option(is_geopmctl)
Returns a list containing the command line options specifying the the CPU affinity for each MPI process on a compute node.
- exclude_list_option()
Returns a list containing the command line options specifying the compute nodes to exclude from execution.
- launcher_argv(is_geopmctl)
Returns a list of command line options for underlying job launch application that reflect the state of the Launcher object.
- launcher_command()
Returns
'mpiexec'
, the name of the Open MPI project job launch application.
- node_list_delim()
Returns the delimiter that is to be used when constructing a node list string given a list of node names.
- node_list_option()
Returns a list containing the
--host
option formpiexec
.
- num_node_option(is_geopmctl)
Returns a list containing the command line options specifying the number of compute nodes.
- parse_host_file(file_name)
- parse_launcher_argv()
Parse the subset of
mpiexec
command line arguments used or manipulated by GEOPM.
- preload_option()
- run(stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)
Pass through to Launcher run.
- class geopmpy.launcher.PALSLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
IMPIExecLauncher
IMPIExecLauncher derived object for use with the PALS MPI Library job launch application mpiexec.
- affinity_option(is_geopmctl)
Returns a list containing the –cpu-bind option for mpiexec.
- launcher_command()
Returns ‘mpiexec’, the name of the PALS MPI Library job launch application.
- num_node_option(is_geopmctl)
Returns a list containing the command line options specifying the number of compute nodes.
- parse_launcher_argv()
Parse the subset of mpiexec command line arguments used or manipulated by GEOPM.
- preload_option()
- exception geopmpy.launcher.PassThroughError
Bases:
Exception
Exception raised when geopm is not to be used.
- class geopmpy.launcher.SrunLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
Launcher
Launcher derived object for use with the SLURM job launch application
srun
.- affinity_option(is_geopmctl)
Returns a list containing the
--cpu_bind
option forsrun
. If thempibind
plugin is supported, it is explicitly disabled so it does not interfere with affinitization. If thecpu_bind
plugin is not detected, an exception is raised.
- exclude_list_option()
Returns a list containing the
-x
option forsrun
.
- int_handler(signum, frame)
This is necessary to prevent the script from dying on the first CTRL-C press. SLURM requires 2
SIGINT
signals to abort the job.
- job_name_option()
Returns a list containing the
-J
option forsrun
.
- launcher_argv(is_geopmctl)
Returns a list of command line options for underlying job launch application that reflect the state of the Launcher object.
- launcher_command()
Returns
'srun'
, the name of the SLURM MPI job launch application.
- node_list_option()
Returns a list containing the
-w
option forsrun
.
- num_node_option(is_geopmctl)
Returns a list containing the
-N
option forsrun
.
- parse_launcher_argv()
Parse the subset of
srun
command line arguments used or manipulated by GEOPM.
- partition_option()
Returns a list containing the command line options specifying the compute node partition for the job to run on.
- performance_governor_option()
Returns a list containing the command line options specifying that the Linux power governor should be set to performance.
- preload_option()
- reservation_option()
- time_limit_option()
Returns a list containing the
-t
option forsrun
.
- timeout_option()
Returns a list containing the
-I
option forsrun
.
- class geopmpy.launcher.SrunTOSSLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)
Bases:
SrunLauncher
Launcher derived object for use with systems using TOSS and the
mpibind
plugin from LLNL.- affinity_option(is_geopmctl)
Returns the mpibind option used with SLURM on TOSS.
- geopmpy.launcher.int_ceil_div(aa, bb)
Shortcut for the ceiling of the ratio of two integers.
- geopmpy.launcher.main()
Main routine used by
geopmlaunch
wrapper executable. This function creates a launcher from the factory and calls thegeopmpy.launcher.Launcher.run()
method. If help was requested on the command line then help from the underlying application launcher is printed and the help for the GEOPM extensions are appended. Returns -1 and prints an error message if an error occurs. If theGEOPM_DEBUG
environment variable is set and an error occurs a complete stack trace will be printed.
- geopmpy.launcher.range_str(values)
Take an iterable object containing integers and return a string of comma separated values and ranges given by a dash. Example:
>>> geopmpy.launcher.range_str({1, 2, 3, 5, 7, 9, 10}) '1-3,5,7,9-10'
geopmpy.policy_store
- geopmpy.policy_store.connect(database_path)
Connect to the database at the given location. Creates a new database if one does not yet exist at the given location.
- Parameters:
database_path (str) – Path to the database.
- geopmpy.policy_store.disconnect()
Disconnect the associated database. No-op if the database has already been disconnected.
- geopmpy.policy_store.get_best(agent_name, profile_name)
Get the best known policy for a given agent/profile pair. If no best has been recorded, the default for the agent is returned.
- geopmpy.policy_store.set_best(agent_name, profile_name, policy)
Set the record for the best policy for a profile with an agent.
Troubleshooting
If you have an existing clone of the GEOPM GitHub repo and are experiencing
a pkg_resources.DistributionNotFound
error when attempting to run the Python
scripts, please remove the VERSION
file at the root of your repo and re-run
autogen.sh
.
The version file will be removed if the dist-clean
Makefile target is invoked.
This is also remedied by rerunning autogen.sh
.