geopmpy(7) – global extensible open power manager python package

Description

An extension to the Global Extensible Open Power Manager (GEOPM), the geopmpy package provides several command line tools and other infrastructure modules for both setting up and launching a job utilizing GEOPM and post-processing the profiling data that is output from GEOPM. Presently the following command line tools are provided:

geopmlaunch

This script will invoke the GEOPM launcher for either the ALPS or SLURM resource managers. See geopmlaunch(1) for more information.

In addition, there is 1 infrastructure module provided:

io.py

This module provides tools for parsing and encapsulating report and trace data into either simple structures or pandas.DataFrames. It can be used to parse any number of files, and houses structures that can be queried for said data. This module also houses certain analysis functions in the Trace class for extracting specific data. See the in-file docstrings for more info.

Installation

Source Builds

Building GEOPM with the build instructions posted on the GitHub site will put the Python scripts in either the system path for Python, or in a subdirectory of the "--prefix" path. See the Environment section for more information on how to set up this configuration.

Via PyPI

The geopmpy package can be installed via pip from PyPI with:

sudo pip install geopmpy

OR for an individual user install (not system wide)

pip install --user geopmpy

Environment

A note on PYTHONPATH and PATH:

If you are installing GEOPM into the system’s default paths for Python, etc. then there is nothing to be done here. Otherwise, if you are using --prefix=<PREFIX_PATH> when you run configure then you must set your PYTHONPATH to the location of the built site-packages directory. For example, the PYTHONPATH for a python 3.6 build may look like:

export PYTHONPATH=<PREFIX_PATH>/lib/python3.6/site-packages:${PYTHONPATH}

You must also set your PATH variable to:

export PATH=<PREFIX_PATH>/bin:${PATH}

It is recommended to do this in your login script (e.g. .bashrc).

Programming Interface

geopmpy.agent

class geopmpy.agent.AgentConf(path, agent='monitor', options={})

Bases: object

The GEOPM agent configuration parameters.

This class contains all the parameters necessary to run the GEOPM agent with a workload.

path

The output path for this configuration file.

options

A dict of the options for this agent.

get_agent()
get_path()
write()

Write the current config to a file.

geopmpy.agent.enforce_policy()

Enforce a static implementation of the agent’s policy. The agent and the policy are chosen based on the GEOPM environment variables and configuration files.

geopmpy.agent.names()

Get the list of all available agents.

Returns:

List of all agent names.

Return type:

list[str]

geopmpy.agent.policy_json(agent_name, policy_values)

Create a JSON policy for the given agent.

This can be written to a file to control the agent statically.

Parameters:
  • agent_name (str) – Name of agent type.

  • policy_values (list[float]) – Values to use for each respective policy field.

Returns:

JSON str containing a valid policy using the given values.

Return type:

str

geopmpy.agent.policy_names(agent_name)

Get the names of the policies for a given agent.

Parameters:

agent_name (str) – Name of agent type.

Returns:

Policy names required for the agent configuration.

Return type:

list[str]

geopmpy.agent.sample_names(agent_name)

Get all samples produced by the given agent.

Parameters:

agent_name (str) – Name of agent type.

Returns:

List of sample names.

Return type:

list[str]

geopmpy.io

GEOPM IO - Helper module for parsing/processing report and trace files.

class geopmpy.io.AppOutput(traces=None, dir_name='.', verbose=False, do_cache=True)

Bases: object

The container class for all trace related data.

This class holds the relevant objects for parsing and indexing all data that is output from GEOPM. This object can be created with a a trace glob string that will be used to search dir_name for the relevant files. If files are found their data will be parsed into objects for easy data access. Additionally a pandas.DataFrame is constructed containing all of the trace data. These DataFrames are indexed based on the version of GEOPM found in the files, the profile name, agent name, and the number of times that particular configuration has been seen by the parser (i.e. experiment iteration).

trace_glob

The string pattern to use to search for trace files.

dir_name

The directory path to use when searching for files.

verbose

A bool to control whether verbose output is printed to stdout.

add_trace_df(tt, traces_df_list)

Adds a trace DataFrame to the tracking list.

The report tracking list is used to create the combined DataFrame once all reports are parsed.

Parameters:

tt – The Trace object used to extract the Trace DataFrame. This DataFrame will be indexed and added to the tracking list.

get_trace_data(node_name=None)
get_trace_df()

Getter for the combined DataFrame of all trace files parsed.

This DataFrame contains all data parsed, and has a complex MultiIndex for accessing the unique data from each individual trace. For more information on this index, see the IndexTracker docstring.

Returns:

Contains all parsed data.

Return type:

pandas.DataFrame

parse_traces(trace_paths, verbose)
remove_files()

Deletes all files currently tracked by this object.

class geopmpy.io.BenchConf(path)

Bases: object

The application configuration parameters.

Used to hold the config data for the integration test application. This application allows for varying combinations of regions (compute, IO, or network bound), complexity, desired execution count, and amount of imbalance between nodes during execution.

path

The output path for this configuration file.

append_imbalance(hostname, imbalance)

Appends imbalance to the config for a particular node.

Parameters:
  • hostname – The name of the node.

  • imbalance – The amount of imbalance to apply to the node. This is specified by a float in the range [0,1]. For example, specifying a value of 0.25 means that this node will spend 25% more time executing the work than a node would by default. Nodes not specified with imbalance configurations will perform normally.

append_region(name, big_o)

Appends a region to the internal list.

Parameters:
  • name – The string representation of the region.

  • big_o – The desired complexity of the region. This affects compute, IO, or network complexity depending on the type of region requested.

get_exec_args()
get_exec_path()
get_path()
set_loop_count(loop_count)
write()

Write the current config to a file.

class geopmpy.io.IndexTracker

Bases: object

Tracks and uniquely identifies experiment configurations for pandas.DataFrame indexing.

This object’s purpose is to examine parsed data for reports or traces and determine if a particular experiment configuration has already been tracked. A user may run the same configuration repeatedly in order to prove that results are repeatable and are not outliers. Since the same configuration is used many times, it must be tracked and counted to ensure that the unique data for each run can be extracted later.

The parsed data is used to extract the following fields to build the tracking index tuple:

(<GEOPM_VERSION>, <PROFILE_NAME>, <AGENT_NAME>, <NODE_NAME>)

If the tuple is not contained in the _run_outputs dict, it is inserted with a value of 1. The value is incremented if the tuple is currently in the _run_outputs dict. This value is used to uniquely identify a particular set of parsed data when the MultiIndex is created.

get_multiindex(run_output)

Returns a MultiIndex from this run_output. Used in pandas.DataFrame construction.

This will add the current run_output to the list of tracked data, and return a unique muiltiindex tuple to identify this data in a DataFrame.

For Trace objects, the integer index of the DataFrame is appended to the tuple.

Parameters:

run_output – The Trace object to produce an index tuple for.

Returns:

The unique index to identify this data object.

Return type:

pandas.MultiIndex

reset()

Clears the internal tracking dictionary.

Since only one type of data (reports OR traces) can be tracked at once, this is necessary to reset the object’s state so a new type of data can be tracked.

class geopmpy.io.RawReport(path)

Bases: object

agent_host_additions(host_name)
dump_json(path)
figure_of_merit()
get_field(raw_data, key, units='')
host_names()
meta_data()
raw_epoch(host_name)
raw_region(host_name, region_name)
raw_report()
raw_totals(host_name)
raw_unmarked(host_name)
region_names(host_name)
total_runtime()
class geopmpy.io.RawReportCollection(report_paths, dir_name='.', dir_cache=None, verbose=True, do_cache=True)

Bases: object

Used to group together a collection of related geopmpy.io.RawReports.

static fixup_metadata(metadata, df)
get_app_df()
get_df()
get_df_filtered(columns)
get_epoch_df()
get_unmarked_df()
load_reports(reports, dir_name, dir_cache, verbose, do_cache)
static make_h5_name(paths, outdir)
parse_reports(report_paths, verbose)
remove_cache()
class geopmpy.io.Trace(trace_path, use_agent=True)

Bases: object

Creates a pandas.DataFrame comprised of the trace file data.

This object will parse both the header and the CSV data in a trace file. The header identifies the uniquely-identifying configuration for this file which is used for later indexing purposes.

Even though __getattr__() and __getitem__() allow this object to effectively be treated like a DataFrame, you must use get_df() if you’re building a list of DataFrames to pass to pandas.concat(). Using the raw object in a list and calling concat will cause an error.

trace_path

The path to the trace file to parse.

static diff_df(trace_df, column_regex, epoch=True)

Diff the DataFrame.

Since the counters in the trace files are monotonically increasing, a diff must be performed to extract the useful data.

Parameters:
  • trace_df – The MultiIndexed pandas.DataFrame created by the AppOutput class.

  • column_regex – A string representing the regex search pattern for the column names to diff.

  • epoch – A flag to set whether or not to focus solely on epoch regions.

Returns:

With the diffed columns specified by 'column_regex', and an 'elapsed_time' column.

Return type:

pandas.DataFrame

get_agent()
get_df()
static get_median_df(trace_df, column_regex, config)

Extract the median experiment iteration.

This logic calculates the sum of elapsed times for all of the experiment iterations for all nodes in that iteration. It then extracts the DataFrame for the iteration that is closest to the median. For input DataFrames with a single iteration, the single iteration is returned.

Parameters:
  • trace_df – The MultiIndexed pandas.DataFrame created by the AppOutput class.

  • column_regex – A string representing the regex search pattern for the column names to diff.

  • config – The TraceConfig object being used presently.

Returns:

Containing a single experiment iteration.

Return type:

pandas.DataFrame

get_node_name()
get_profile_name()
get_start_time()
get_version()

geopmpy.launcher

This module provides a way to launch MPI applications using the GEOPM runtime by wrapping the call to the system MPI application launcher. The module currently supports wrapping the SLURM srun command, the ALPS aprun command, and mpiexec command provided by Open MPI and Intel MPI. The primary use of this module is through the geopmlaunch(1) command line executable which calls the geopmpy.launcher.main() function. See the geopmlaunch(1) man page for details about the command line interface.

class geopmpy.launcher.AprunLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: Launcher

affinity_option(is_geopmctl)

Returns the --cpu-binding option for aprun.

exclude_list_option()

Returns a list containing the -E option for aprun.

host_file_option()

Returns a list containing the -l option for aprun.

launcher_command()

Returns 'aprun', the name of the ALPS MPI job launch application.

node_list_option()

Returns a list containing the -L option for aprun.

num_node_option(is_geopmctl)

Returns a list containing the -N option for aprun. Must be combined with the -n option to determine the number of nodes.

parse_launcher_argv()

Parse the subset of aprun command line arguments used or manipulated by GEOPM.

preload_option()
quiet_option()

Returns a list containing the -q option for aprun.

time_limit_option()

Returns a list containing the -t option for aprun.

class geopmpy.launcher.Config(argv)

Bases: object

GEOPM configuration object. Used to interpret command line arguments to set GEOPM related environment variables.

environ()

Dictionary describing the environment variables controlled by the configuration object.

get_ctl()

Returns the geopm control method.

get_policy()

Returns the geopm policy file/key.

get_preload()

Returns True/False if the geopm preload option was specified or not.

set_omp_num_threads(omp_num_threads)

Control the OMP_NUM_THREADS environment variable.

unparsed()

All command line arguments except those used to configure GEOPM.

class geopmpy.launcher.Factory

Bases: object

create(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, bootstrap=None)
get_launcher_names()
class geopmpy.launcher.IMPIExecLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: Launcher

Launcher derived object for use with the Intel(R) MPI Library job launch application mpiexec.hydra.

affinity_option(is_geopmctl)

Returns a list containing the command line options specifying the the CPU affinity for each MPI process on a compute node.

bootstrap_option()

Returns a list containing the command line options specifying the bootstrap option for wiring-up the compute nodes.

exclude_list_option()

Returns a list containing the command line options specifying the compute nodes to exclude from execution.

job_name_option()

Returns a list containing the command line options specifying the name associated with the job for scheduler tracking purposes.

launcher_command()

Returns 'mpiexec.hydra', the name of the Intel MPI Library job launch application.

node_list_option()

Returns a list containing the -w option for srun.

num_node_option(is_geopmctl)

Returns a list containing the command line options specifying the number of compute nodes.

parse_launcher_argv()

Parse the subset of mpiexec.hydra command line arguments used or manipulated by GEOPM.

preload_option()
time_limit_option()

Returns a list containing the command line options specifying the maximum time that a job is allowed to run.

timeout_option()

Returns a list containing the command line options specifying the length of time to wait for a job to start before aborting.

class geopmpy.launcher.Launcher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: object

Abstract base class for MPI job launch application abstraction. Defines common methods used by all Launcher objects.

affinity_list(is_geopmctl)

Returns CPU affinity prescription as a list of integer sets. The list is over MPI ranks on a node from lowest to highest rank. The process for each MPI rank is restricted to the Linux CPUs enumerated in the set. The output from this function is used by the derived class’s affinity_option() method to set CPU affinities.

affinity_option(is_geopmctl)

Returns a list containing the command line options specifying the the CPU affinity for each MPI process on a compute node.

bootstrap_option()

Returns a list containing the command line options specifying the bootstrap option for wiring-up the compute nodes.

environ()

Returns the modified environment dictionary updated with GEOPM specific values.

exclude_list_option()

Returns a list containing the command line options specifying the compute nodes to exclude from execution.

get_alloc_nodes()

Returns a list of the names of compute nodes that have been reserved by a scheduler for current job context.

get_idle_nodes()

Returns a list of the names of compute nodes that are currently available to run jobs.

host_file_option()

Returns a list containing the command line options specifying the file containing the names of the compute nodes for the job to run on.

init_governor()

Queries the compute nodes to determine the current CPU frequency governor.

init_topo()

Determine the topology of the compute nodes that the job will be launched on. This is used to inform CPU affinity assignment.

int_handler(signum, frame)

This interface enables specialized signal handling. If not overridden by derived class, then the default signal handler is used.

job_name_option()

Returns a list containing the command line options specifying the name associated with the job for scheduler tracking purposes.

launcher_argv(is_geopmctl)

Returns a list of command line options for underlying job launch application that reflect the state of the Launcher object.

launcher_command()

Returns the name/path to the job launch application.

node_list_delim()

Returns the delimiter that is to be used when constructing a node list string given a list of node names.

node_list_option()

Returns a list containing the command line options specifying the names of the compute nodes for the job to run on.

num_node_option(is_geopmctl)

Returns a list containing the command line options specifying the number of compute nodes.

num_rank_option(is_geopmctl)

Returns a list containing the -n option which is defined in the MPI standard for all job launch applications to specify the number of MPI processes or “ranks”.

parse_launcher_argv()

Parse command line options accepted by the underlying job launch application.

partition_option()

Returns a list containing the command line options specifying the compute node partition for the job to run on.

performance_governor_option()

Returns a list containing the command line options specifying that the Linux power governor should be set to performance.

preload_option()
quiet_option()

Returns a list containing the job launch option to suppress any end-of-job status messages that may interfere with parsing of stdout.

reservation_option()
run(stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)

Execute the command given to constructor with modified command line options and environment variables.

Parameters:
  • stdout (IO) – Destination for standard output

  • stderr (IO) – Destination for standard error

run_compute_cmd(argv, num_node=None)

Run a command on the compute nodes.

time_limit_option()

Returns a list containing the command line options specifying the maximum time that a job is allowed to run.

timeout_option()

Returns a list containing the command line options specifying the length of time to wait for a job to start before aborting.

class geopmpy.launcher.OMPIExecLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: Launcher

Launcher derived object for use with Open MPI project launch application mpiexec.

affinity_option(is_geopmctl)

Returns a list containing the command line options specifying the the CPU affinity for each MPI process on a compute node.

exclude_list_option()

Returns a list containing the command line options specifying the compute nodes to exclude from execution.

launcher_argv(is_geopmctl)

Returns a list of command line options for underlying job launch application that reflect the state of the Launcher object.

launcher_command()

Returns 'mpiexec', the name of the Open MPI project job launch application.

node_list_delim()

Returns the delimiter that is to be used when constructing a node list string given a list of node names.

node_list_option()

Returns a list containing the --host option for mpiexec.

num_node_option(is_geopmctl)

Returns a list containing the command line options specifying the number of compute nodes.

parse_host_file(file_name)
parse_launcher_argv()

Parse the subset of mpiexec command line arguments used or manipulated by GEOPM.

preload_option()
run(stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)

Pass through to Launcher run.

class geopmpy.launcher.PALSLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: IMPIExecLauncher

IMPIExecLauncher derived object for use with the PALS MPI Library job launch application mpiexec.

affinity_option(is_geopmctl)

Returns a list containing the –cpu-bind option for mpiexec.

launcher_command()

Returns ‘mpiexec’, the name of the PALS MPI Library job launch application.

num_node_option(is_geopmctl)

Returns a list containing the command line options specifying the number of compute nodes.

parse_launcher_argv()

Parse the subset of mpiexec command line arguments used or manipulated by GEOPM.

preload_option()
exception geopmpy.launcher.PassThroughError

Bases: Exception

Exception raised when geopm is not to be used.

class geopmpy.launcher.SrunLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: Launcher

Launcher derived object for use with the SLURM job launch application srun.

affinity_option(is_geopmctl)

Returns a list containing the --cpu_bind option for srun. If the mpibind plugin is supported, it is explicitly disabled so it does not interfere with affinitization. If the cpu_bind plugin is not detected, an exception is raised.

exclude_list_option()

Returns a list containing the -x option for srun.

int_handler(signum, frame)

This is necessary to prevent the script from dying on the first CTRL-C press. SLURM requires 2 SIGINT signals to abort the job.

job_name_option()

Returns a list containing the -J option for srun.

launcher_argv(is_geopmctl)

Returns a list of command line options for underlying job launch application that reflect the state of the Launcher object.

launcher_command()

Returns 'srun', the name of the SLURM MPI job launch application.

node_list_option()

Returns a list containing the -w option for srun.

num_node_option(is_geopmctl)

Returns a list containing the -N option for srun.

parse_launcher_argv()

Parse the subset of srun command line arguments used or manipulated by GEOPM.

partition_option()

Returns a list containing the command line options specifying the compute node partition for the job to run on.

performance_governor_option()

Returns a list containing the command line options specifying that the Linux power governor should be set to performance.

preload_option()
reservation_option()
time_limit_option()

Returns a list containing the -t option for srun.

timeout_option()

Returns a list containing the -I option for srun.

class geopmpy.launcher.SrunTOSSLauncher(argv, num_rank=None, num_node=None, cpu_per_rank=None, timeout=None, time_limit=None, job_name=None, node_list=None, exclude_list=None, host_file=None, partition=None, reservation=None, quiet=None, do_affinity=None, bootstrap=None)

Bases: SrunLauncher

Launcher derived object for use with systems using TOSS and the mpibind plugin from LLNL.

affinity_option(is_geopmctl)

Returns the mpibind option used with SLURM on TOSS.

geopmpy.launcher.int_ceil_div(aa, bb)

Shortcut for the ceiling of the ratio of two integers.

geopmpy.launcher.main()

Main routine used by geopmlaunch wrapper executable. This function creates a launcher from the factory and calls the geopmpy.launcher.Launcher.run() method. If help was requested on the command line then help from the underlying application launcher is printed and the help for the GEOPM extensions are appended. Returns -1 and prints an error message if an error occurs. If the GEOPM_DEBUG environment variable is set and an error occurs a complete stack trace will be printed.

geopmpy.launcher.range_str(values)

Take an iterable object containing integers and return a string of comma separated values and ranges given by a dash. Example:

>>> geopmpy.launcher.range_str({1, 2, 3, 5, 7, 9, 10})
'1-3,5,7,9-10'

geopmpy.policy_store

geopmpy.policy_store.connect(database_path)

Connect to the database at the given location. Creates a new database if one does not yet exist at the given location.

Parameters:

database_path (str) – Path to the database.

geopmpy.policy_store.disconnect()

Disconnect the associated database. No-op if the database has already been disconnected.

geopmpy.policy_store.get_best(agent_name, profile_name)

Get the best known policy for a given agent/profile pair. If no best has been recorded, the default for the agent is returned.

Parameters:
  • agent_name (str) – Name of the agent.

  • profile_name (str) – Name of the profile.

Returns:

Best known policy for the profile and agent.

Return type:

list[float]

geopmpy.policy_store.set_best(agent_name, profile_name, policy)

Set the record for the best policy for a profile with an agent.

Parameters:
  • agent_name (str) – Name of the agent.

  • profile_name (str) – Name of the profile.

  • policy (list[float]) – New policy to use.

geopmpy.policy_store.set_default(agent_name, policy)

Set the default policy to use with an agent.

Parameters:
  • agent_name (str) – Name of the agent.

  • policy (list[float]) – Default policy to use with the agent.

Troubleshooting

If you have an existing clone of the GEOPM GitHub repo and are experiencing a pkg_resources.DistributionNotFound error when attempting to run the Python scripts, please remove the VERSION file at the root of your repo and re-run autogen.sh.

The version file will be removed if the dist-clean Makefile target is invoked. This is also remedied by rerunning autogen.sh.

See Also

geopm(7), geopmlaunch(1),