geopm(7) – Global Extensible Open Power Manager

Description

The Global Extensible Open Power Manager (GEOPM) is a framework for exploring power and energy optimizations on heterogeneous platforms. This manual page outlines key tools and interfaces to configure and use GEOPM software. The rest of this description outlines the features that are described by sections of this manual page.

The GEOPM HPC runtime monitors platform-level metrics (e.g., CPU power, average core frequency, elapsed time) while an application executes on the platform. GEOPM provides Job Launch tools to execute an application with the GEOPM HPC runtime and to read the platform-level metrics. GEOPM can also report metrics with respect to application-specific regions or checkpoints by utilizing optional Application Profiling interfaces. Several Analysis Tools provide the means to directly query and modify platform state.

The GEOPM HPC runtime interacts with a C++ framework to enable custom power management algorithms (called agents) that utilize the analysis tools. Some Built-In Agents are available by default. Additional agents and platform input/output interfaces (called IOGroups) can be added to the GEOPM HPC runtime by creating a Plugin Extension.

Although GEOPM supports manual instrumentation through Application Profiling interfaces. Some automatic instrumentation is also supported. Integration With PMPI enables GEOPM to automatically account for time spent inside MPI function calls by default. Integration With OMPT enables GEOPM to account for time spent in individual OpenMP parallel regions.

Job Launch

The geopmlaunch(1) script is the recommended method for launching the GEOPM HPC runtime. Unless modified by command-line arguments or environment variables, GEOPM will create a geopm.report file with a summary of metrics from application execution. See geopmreport(7): for documentation about the output file format.

If geopmlaunch(1) does not provide an application launcher supported by your system, please make a change request to support the job launch method used on your system at the github issues page:

https://github.com/geopm/geopm/issues

Also, consider porting your job launch command into the geopmpy.launcher module and submitting a change request as described in Contributor Guide.

If the job launch application is not supported by the geopmpy.launcher the recommended method is to use the environment variables described in this man page including the GEOPM_CTL environment variable. If you use the application launch method, then you also need to launch the geopmctl(1) application in parallel to the application you wish to run with GEOPM.

There are legacy methods for launching the runtime programmatically. These are documented in geopm_ctl(3), but are deprecated as an application-facing interface because their use within an application is incompatible with the GEOPM launcher script.

Application Profiling

GEOPM’s provides application profiling interfaces for the C and Fortran programming languages, documented in geopm_prof(3) and geopm_fortran(3), respectively. These interfaces enable an application to inform GEOPM of key monitoring events, such as entry or exit from regions of interest, entry to a new iteration of a key looping construct, and hints about the nature of the active region of code (e.g., whether the code is expected to be compute-bound, network-bound, or something else).

The GEOPM HPC runtime profiles applications while executing as a separate process or thread within the launched application, or as a separate geopmctl(1) application. C interfaces to drive the GEOPM controller are documented in geopm_ctl(3).

Analysis Tools

GEOPM may also be used as a tooling interface for system analysis.

The geopmread(1) application reports the current values of platform signals at varying levels of scope (domains). The geopmwrite(1) application enables modulation of platform controls at varying domains. Information about signals and controls is documented at geopm_pio(7). Programmatic interfaces for read and write operations are available through geopm_pio(3).

The types of domains and their relationships with each other can be programmatically queried through geopm_topo(3).

GEOPM comes bundled with a synthetic benchmark application geopmbench(1), which can be used as an application workload for basic analysis and to experiment with the impact that signals and controls have on applications under GEOPM.

Built-In Agents

GEOPM comes packaged with several built-in power management algorithms (agents):

Use the geopmagent(1) application or the geopm_agent(3) C interface to query agent information and create static policies.

Plugin Extension

If you wish to monitor or control platform interfaces (IOGroups) that are not part of the core GEOPM distribution, or if you wish to execute GEOPM agents that are not part of the core distribution, then you can extend GEOPM with additional IOGroup and agent plugins.

Agents and IOGroups are defined as C++ classes, documented in geopm::Agent(3) and geopm::IOGroup(3), respectively. Both can be registered with GEOPM through the geopm::PluginFactory(3) interface. The geopm::PlatformIO(3) interface provides a channel through which agents and GEOPM tools can interact with IOGroups.

Integration With PMPI

Linking to libgeopm will define symbols that intercept calls to the MPI interface through PMPI. This can be disabled with the configure time option --disable-mpi, but is enabled by default. See the LD_DYNAMIC_WEAK environment variable description below for the runtime requirements of the PMPI design. When using the GEOPM PMPI interposition other profilers which use the same method will be in conflict. The GEOPM runtime can create an application performance profile report and a trace of the application runtime. As such, GEOPM serves the role of an application profiler in addition to management of power resources. The report and trace generation are controlled by the environment variables GEOPM_REPORT and GEOPM_TRACE; see description below.

Integration With OMPT

Unless the GEOPM runtime is configured to disable OpenMP, the library is compiled against the OpenMP runtime. If the OpenMP implementation that GEOPM is compiled against supports the OMPT callbacks, then GEOPM will use the OMPT callbacks to wrap OpenMP parallel regions with calls to geopm_prof_enter() and geopm_prof_exit(). In this way, any OpenMP parallel region not within another application-defined region will be reported to the GEOPM runtime. This will appear in the report as a region name beginning with "[OMPT]" and referencing the object file and function name containing the OpenMP parallel region e.g.

[OMPT]geopmbench:geopm::StreamModelRegion::run()

To expressly enable this feature, pass the --enable-ompt configure flag at GEOPM configure time. This will build and install the LLVM OpenMP runtime configured to support OMPT if the default OpenMP runtime does not support the OMPT callbacks. Note that your compiler must be compatible with the LLVM OpenMP ABI for extending it in this way.

This feature can be enabled on a per-run basis by setting the GEOPM_OMPT_ENABLE environment variable, or by using the --geopm-ompt-enable option in geopmlaunch(1)

Choosing An Agent And Policy

The Agent determines the optimization algorithm performed by the runtime, and can be specified with the --geopm-agent option for the launcher. If not specified, by default the geopm_agent_monitor(7) is used to collect runtime statistics only, which will be summarized in the report.

The constraints for the Agent algorithm are determined by the policy. The policy can be provided as a file, through the --geopm-policy option for the launcher. Policy files can be generated with the geopmagent(1) tool. The values of the policy will be printed in the header of the report.

If GEOPM has been configured with --enable-beta, policies can also be set through the endpoint, which should be manipulated by a system administrator through an authority such as the resource manager. Use of the endpoint is described in geopm_endpoint(3). In this scenario, users launching GEOPM may not be required or allowed to specify the Agent or policy, if it has been set through the default environment as described in the ENVIRONMENT section below. If not specified in the default environment, the location of the endpoint should be provided through --geopm-endpoint; this option supersedes the use of --geopm-policy. When GEOPM receives the policy through the endpoint, the report will contain "DYNAMIC" for the value of the policy. The specific values received over time can be viewed through use of the optional trace file enabled by --geopm-trace-endpoint-policy.

Refer to geopm::Agent(3) and the individual agent man pages for more details on the behavior of the agents and their policies. See geopmlaunch(1) for more details on the --geopm-agent, --geopm-policy, --geopm-endpoint, and --geopm-trace-endpoint-policy options.

Interpreting The Report

If the GEOPM_REPORT environment variable is set then a report will be generated. There is one report file generated for each run. The format of the report, the data contained in it, and the controller’s sampling are described in geopm_report(7).

Interpreting The Trace

If the GEOPM_TRACE environment variable is set (see below) then a trace file with time ordered information about the application runtime is generated. A separate trace file is generated for each compute node and each file is a pipe (the | character) delimited ASCII table. The file begins with a header that is marked by lines that start with the # character. The header contains information about the GEOPM version, job start time, profile name (job description), and agent that were used during the run.

The first row following the header gives a description of each field. A simple method for selecting fields from the trace file is with the awk command:

$ grep -v '^#' geopm.trace-host0 | awk -F\| '{print $1, $2, $11}'

will print a subset of the fields in the trace file called "geopm.trace-host0".

Environment

When using the launcher wrapper script geopmlaunch(1), the interface to the GEOPM runtime is controlled by the launcher command line options. The launcher script sets the environment variables described in this section according to the options specified on the command line. Direct use of these environment variables is only recommended when launching the GEOPM runtime without geopmlaunch(1). If launching the GEOPM controller in application mode without geopmlaunch(1), the environment variables documented below must be set to the same values in the contexts where geopmctl(1) and the compute application are executed.

In addition to the environment, there are two node-local configuration files that will impact the way the GEOPM behaves. The location of these files can be configured at compile time, but the default locations are:

/etc/geopm/environment-default.json
/etc/geopm/environment-override.json

The geopmadmin(1) tool can be used to display the location of these files for your installation of GEOPM or to check the validity of the system configuration. These files contain JSON objects that map GEOPM environment variables to default or override values. The environment-default.json file will determine default values for the GEOPM runtime in the case where the values are not set in the calling environment. The environment-override.json file will enforce that any GEOPM process running on the compute node will use the values specified regardless of the values set in the calling environment.

GEOPM Environment Variables

GEOPM_NUM_PROC

The number of processes to be tracked and profiled by the controller on each compute node. The controller will wait until this number of processes request profiling before starting the control loop and subsequent requests for profiling will be ignored by the controller. If not set, the controller will wait until at least one process has registered for profiling before beginning the control loop. The default value for GEOPM_NUM_PROC is one except when using the geopmlaunch CLI. The geopmlaunch tool will infer this parameter based on the values passed to the underlying launch command, so the user does not have to set it explicitly.

GEOPM_PROGRAM_FILTER

Required comma separated list of program invocation names of processes which are intended to be profiled and tracked by the controller. See the --geopm-program-filter option description in geopmlaunch(1) for details.

GEOPM_REPORT

The path to which a GEOPM report file is saved. See the --geopm-report option description in geopmlaunch(1) for more details.

GEOPM_REPORT_SIGNALS

Additional signals that are included in a GEOPM report. See the --geopm-report-signals option description in geopmlaunch(1) for more details.

GEOPM_TRACE

The path and base name to which each per-host GEOPM trace file is saved. See the --geopm-trace option description in geopmlaunch(1) for more details.

GEOPM_TRACE_SIGNALS

Additional signals that are included in a GEOPM trace. See the --geopm-trace-signals option description in geopmlaunch(1) for more details.

GEOPM_TRACE_PROFILE

The path and base name to which each per-host GEOPM profile trace file is saved. See the --geopm-trace-profile option description in geopmlaunch(1) for more details.

GEOPM_TRACE_ENDPOINT_POLICY

The path to an endpoint policy trace file is generated. See the --geopm-trace-endpoint-policy option description in geopmlaunch(1) for more details.

GEOPM_PROFILE

The name of the profile written in the GEOPM report file. See the --geopm-profile option description in geopmlaunch(1) for more details.

GEOPM_CTL

The type of GEOPM controller to use. See the --geopm-ctl option description in geopmlaunch(1) for more details.

GEOPM_AGENT

The type of agent to run in the GEOPM HPC runtime. See the --geopm-agent option description in geopmlaunch(1) for more details.

GEOPM_POLICY

The path to the GEOPM policy JSON file to use for the selected agent. See the --geopm-policy option description in geopmlaunch(1) for more details.

GEOPM_ENDPOINT

The prefix for shared memory keys used by the GEOPM endpoint. See the --geopm-endpoint option description in geopmlaunch(1) for more details.

GEOPM_TIMEOUT

The count of seconds that the application will wait for the GEOPM controller to connect over shared memory before timing out. See the --geopm-timeout option description in geopmlaunch(1) for more details.

GEOPM_PLUGIN_PATH

The colon-separated list of search paths for GEOPM plugins. See the --geopm-plugin-path option description in geopmlaunch(1) for more details.

GEOPM_DEBUG_ATTACH

An MPI rank number to wait in MPI_Init for a debugger to attach. See the --geopm-debug-attach option description in geopmlaunch(1) for more details.

GEOPM_DISABLE_HYPERTHREADS

Set to any value to prevent the launcher from pinning to multiple hyperthreads per CPU core. See the --geopm-hyperthreads-disable option description in geopmlaunch(1) for more details.

GEOPM_OMPT_ENABLE

Set to any value to enable OpenMP region detection as described in Integration With OMPT. See the --geopm-ompt-enable option description in geopmlaunch(1) for more details.

GEOPM_INIT_CONTROL

The path to the control initialization file. See the --geopm-init-control option description in geopmlaunch(1) for more details.

GEOPM_PERIOD

The control loop period in seconds, if not specified this is determined by the Agent. See the --geopm-period option description in geopmlaunch(1) for details.

GEOPM_MSR_CONFIG_PATH

The colon-separated list of search paths for additional MSR definitions. See geopm_pio_msr(7) for more details.

GEOPM_CTL_LOCAL

Disable communication between controllers running on different compute nodes and produce one report file per host. Enabled by default when MPI is not compiled into the GEOPM Runtime. See the --geopm-ctl-local option description in geopmlaunch(1) for details.

Other Environment Variables

LD_DYNAMIC_WEAK

When dynamically linking an application to libgeopm for any features supported by the PMPI profiling of the MPI runtime it may be required that the LD_DYNAMIC_WEAK environment variable be set at runtime as is documented in the ld.so(8) man page. When dynamically linking an application, if care is taken to link the libgeopm library before linking the library providing the weak MPI symbols, e.g. "-lgeopm -lmpi", linking order precedence will enforce the required override of the MPI interface symbols and the LD_DYNAMIC_WEAK environment variable is not required at runtime.

Misc

geopmadmin(1)

Configure and check system wide GEOPM settings

geopm_error(3)

Error code descriptions

geopm_version(3)

GEOPM library version

geopm_sched(3)

Interface with Linux scheduler

geopm_time(3)

Time related helper functions

geopm_hash(3)

Numerical encoding helper functions

See Also

geopmpy(7), geopmdpy(7), geopm_agent_frequency_map(7), geopm_agent_ffnet(7), geopm_agent_monitor(7), geopm_agent_gpu_activity(7), geopm_agent_power_balancer(7), geopm_agent_power_governor(7), geopm_pio(7), geopm_pio_const_config(7), geopm_pio_cnl(7), geopm_pio_cpuinfo(7), geopm_pio_dcgm(7), geopm_pio_levelzero(7), geopm_pio_msr(7), geopm_pio_nvml(7), geopm_pio_sst(7), geopm_pio_time(7), geopm_report(7), geopm_agent(3), geopm_ctl(3), geopm_error(3), geopm_field(3), geopm_fortran(3), geopm_hash(3), geopm_policystore(3), geopm_pio(3), geopm_prof(3), geopm_sched(3), geopm_time(3), geopm_version(3), geopm::Agent(3), geopm::Agg(3), geopm::CircularBuffer(3), geopm::CpuinfoIOGroup(3), geopm::Exception(3), geopm::Helper(3), geopm::IOGroup(3), geopm::MSRIO(3), geopm::MSRIOGroup(3), geopm::PlatformIO(3), geopm::PlatformTopo(3), geopm::PluginFactory(3), geopm::PowerBalancer(3), geopm::PowerGovernor(3), geopm::ProfileIOGroup(3), geopm::SampleAggregator(3), geopm::SharedMemory(3), geopm::TimeIOGroup(3), geopmadmin(1), geopmagent(1), geopmbench(1), geopmctl(1), geopmlaunch(1), geopmread(1), geopmwrite(1), geopmaccess(1), geopmexporter(1), geopmsession(1), ld.so(8)