User Guide for GEOPM Runtime
The GEOPM Runtime is software designed to enhance energy efficiency of applications through active hardware configuration. See Getting Started Guide for information on how to begin using the GEOPM Runtime.
User Model
The architecture is designed to provide a secure infrastructure to support a wide range of tuning algorithms while solving three related challenges:
1. Measuring Performance
For hardware tuning algorithms, the first challenge involves generating a durable estimate of application performance. In energy efficiency terms, performance measurements are performance-to-power ratios, for example, “perf per watt”. Therefore, any dynamic hardware power control tuning that aims for energy efficiency must formulate an estimate of application performance. Without specific application feedback on the critical path, these performance estimate might prove inaccurate, causing hardware tuning algorithms to potentially disrupt application performance and lead to elongated run times, resulting in higher energy costs per unit of work than without the adaptive algorithm.
2. Hardware Configuration
The second challenge arises from allowing hardware control algorithms to be influenced by unprivileged user input (like application feedback), which presents a security risk. Not having adequate checks can result in escalated privileges, service denial, and impacts on other system users’ quality of service. GEOPM Service is created specifically to mitigate these issues.
3. Advanced Data Analysis
The third challenge is to provide a software development platform suitable for control algorithms that can be securely deployed and use high-level languages for data analysis. To effectively enhance energy efficiency for certain applications, substantial software dependencies may be required. Control algorithms might lean on optimization software like machine learning packages or other numerical packages. The application being optimized could include millions of lines of code, and there may be significant coupling between the application and the control algorithm. Limiting the privileges of the process running the control algorithm significantly reduces software security audit requirements.
Introduction
The GEOPM Runtime creates a bridge between GEOPM’s application instrumentation interfaces and platform monitoring/control interfaces.
By default, the GEOPM Runtime presents relationships between application instrumentation and platform-monitoring interfaces in a report. For more complex interactions, such as dynamic control of platform settings, different GEOPM agents can be utilized. For more information on user-facing GEOPM Runtime launch options, please refer to geopmlaunch(1) documentation.
GEOPM agents can exploit this hierarchical control system to optimize various objective functions. Examples include maximizing application performance within a power limit (such as GEOPM power_balancer agent) or decreasing energy consumption while minimally affecting application performance. The control hierarchy root can communicate with the system resource manager to extend the hierarchy beyond the individual MPI application, thus facilitating multiple MPI jobs and multiple-user system resource management.
The GEOPM Runtime package includes the libgeopm shared object library. GEOPM comes with numerous command-line tools, each with dedicated manual pages. The geopmlaunch(1) command-line tool launches an MPI application, enabling the GEOPM runtime to create a GEOPM Controller thread on each compute node. The Controller loads plugins and runs the Agent algorithm to manage the compute application. The geopmlaunch(1) command is featured in the geopmpy python package that is part of the GEOPM installation. For more documentation and links, please visit the GEOPM overview man page.
GEOPM Runtime offers several built-in algorithms, each incorporated within an “Agent” implementing the geopm::Agent(3) class interface. Developers can expand these algorithm features by creating an Agent plugin. An implementation of this class can be dynamically loaded at runtime by the GEOPM Controller. The Agent class determines what data is collected, how control decisions are made, and how messages are exchanged between Agents in the compute nodes’ tree hierarchy. The GEOPM Service package, which resides in the service directory of the GEOPM repository, provides the PlatformIO interface which abstracts reading signals and writing controls from the Agent within a compute node. This allows Agent implementations to be ported to various hardware platforms without modification.
The libgeopm library can be called indirectly or directly within MPI applications, enabling application feedback to aid control decisions. Indirect calls are facilitated through GEOPM’s integration with MPI and OpenMP via their profiling decorators. Direct calls are made through geopm_prof(3) or geopm_fortran(3) interfaces. The application can be better integrated with the GEOPM runtime and controlled more accurately by marking up the compute application with profiling information obtained through these interfaces.
Build Requirements
When building the GEOPM Runtime from source, additional requirements must be met. Those uninterested in building the GEOPM Runtime can ignore these requirements, or by providing the disable flag to the configure command line, users may skip particular GEOPM Runtime features enabled by these requirements.
The GEOPM Runtime provides optional support for MPI standards, Message Passing
Interface, version 2.2 or later. Building the Runtime with MPI support will
add MPI related region information to the reports as well as enable Agents that
leverage the hierarchical communications tree (just the power_balancer
at
the time of this writing). If building for an HPC system, target the desired
site-specific MPI implementation. Otherwise the Intel MPI implementation,
OpenHPC or Spack packaging systems, or OpenMPI binaries distributed with most
major Linux distributions satisfy this requirement. For RHEL and SLES Linux,
the requirement can be met by installing the openmpi-devel
package version
1.7 or later, and libopenmpi-dev
on Ubuntu.
Install all requirements on RHEL or CentOS
yum install openmpi-devel elfutils libelf-devel
Install all requirements on SUSE-based distributions
zypper install openmpi-devel elfutils libelf-devel
Install all requirements on Ubuntu (as of 18.04.3 LTS)
apt install libtool automake libopenmpi-dev build-essential gfortran \ libelf-dev python libsqlite3-dev
Requirements that can be avoided by removing features with configure option:
Remove MPI compiler requirement
--disable-mpi
Remove Fortran compiler requirement
--disable-fortran
Remove elfutils library requirement
--disable-ompt
For details on how to use non-standard install locations for build requirements see:
./configure --help
This provides options, for example --with-<feature>
, to be used for
this purpose, such as --with-mpi-bin
.
Building the GEOPM Runtime
The best recommendation for constructing the GEOPM Runtime is to follow the “developer build process” referenced in the developer guide. This will enable GEOPM Service use and also provide the latest developments in the GEOPM repository.
Run Requirements
Beyond the GEOPM Service, the GEOPM Runtime requires several additional features at the time of use. Users uninterested in running the GEOPM Runtime can ignore these requirements.
BIOS Configuration
If power governing or power balancing is the intended usage for GEOPM deployment, an additional requirement involves configuring the BIOS to support RAPL control. To make this check for BIOS support, execute the following on a compute node:
./integration/tutorial/admin/00_test_prereqs.sh
If the script output includes:
WARNING: The lock bit for the PKG_POWER_LIMIT MSR is set. The power_balancer
and power_governor agents will not function properly until this is cleared.
Please enable RAPL in your BIOS, and if such an option doesn’t exist please contact your BIOS vendor to obtain a BIOS that supports RAPL.
For additional information, please contact the GEOPM team.
Linux Power Management
It’s important to note that other Linux mechanisms for CPU power management may
interfere with performance optimization objectives of GEOPM Agents. To achieve
optimal performance when deploying a GEOPM Agent that controls CPU frequency or
power limits it’s recommended that the generic scaling governor userspace
is selected while the GEOPM Agent is active. If userspace
is not available
on your system, it may be preferred to select performance
mode while the
GEOPM Agent is active.
For more information, see the Linux Kernel documentation on generic scaling governors.
Using Slurm to control the Linux CPU governor
When the userspace
or performance
mode is selected, the driver will not
interfere with GEOPM. On SLURM-based systems, the GEOPM launch wrapper will attempt to set the scaling
governor to performance
automatically, eliminating the need to manually set
the governor. On older versions of SLURM, the desired governors must be listed
explicitly in /etc/slurm.conf
. Specifically, SLURM 15.x requires the
following option:
CpuFreqGovernors=OnDemand,Performance
For more on SLURM configuration, please see the slurm.conf manual. On non-SLURM systems, the
scaling governor should still be manually set through some other mechanism
to ensure proper GEOPM behavior. The following command will set the governor
to userspace
:
echo userspace | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Launching the GEOPM Runtime
GEOPM Application Launch Wrapper
The GEOPM Runtime package installs the geopmlaunch
command. This command is
a wrapper for MPI launch commands such as srun
, aprun
, and
mpiexec.hydra
, where the wrapper script enables the GEOPM runtime. The
geopmlaunch
command supports the same command-line interface as the
underlying launch command, while extending the interface with GEOPM-specific
options. The geopmlaunch
application launches the primary compute
application and the GEOPM control thread on each compute node. This wrapper is
documented in the geopmlaunch(1) man page.
If your system’s launch mechanism is not supported then options to the GEOPM
runtime must be passed through environment variables and some features of the
geopmlaunch
command (such as process CPU affinity management) will not be
available. Please consult the geopm(7) man page for
documentation of the environment variables used by the GEOPM runtime that would
otherwise be controlled by the wrapper script and see Profiling Applications without geopmlaunch
for details.
CPU Affinity Requirements
When using the geopmlaunch
wrapper, the user may optionally provide the
--geopm-affinity-enable
command-line option (see
geopmlaunch(1)). This will enable hardware metrics to be
more accurately measured on a per-application-region basis by restricting
process migration.
While the GEOPM control thread connects to the application it will automatically affinitize itself to the highest indexed core not used by the application if the application is not affinitized to a CPU on every core. If the application is using all cores of the system, the GEOPM control thread will be pinned to the highest logical CPU.
Resource Manager Integration
The GEOPM Runtime package can seamlessly integrate with a compute cluster resource manager by altering the daemon of the resource manager running on the cluster compute nodes. An integration example with the SLURM resource manager through a SPANK plugin is available in the geopm-slurm git repository. This example aligns with the process described below.
To integrate, the daemon requires two libgeopmd.so
function calls before
allocating resources to the user (prologue) and one function call after
the resources are released (epilogue). In the prologue, the daemon initiates:
geopm_pio_save_control()
This function records all controllable GEOPM values into memory (refer to geopm_pio(3)). The next function called in the prologue is:
geopm_agent_enforce_policy()
As detailed in geopm_agent(3), this function enforces a pre-set policy like a power cap or a CPU frequency limit by making a one-time hardware setting adjustment. In the epilogue, the manager triggers:
geopm_pio_restore_control()
This restores all GEOPM platform controls to their original state captured during the prologue.
The policy setup in the prologue relies on two configuration files:
/etc/geopm/environment-default.json
/etc/geopm/environment-override.json
These files contain JSON objects that map GEOPM environment variables to
their respective values. The default configuration holds values for any
unset GEOPM variable in the calling environment. Meanwhile, the override
configuration enforces values, overriding the calling environment’s
specifications. A comprehensive list of GEOPM environment variables is
available in the geopm(7) man page. The two primary environment variables
that geopm_agent_enforce_policy()
utilizes are GEOPM_AGENT
and
GEOPM_POLICY
. It’s important to note that /etc
should be mounted on a
local node file system, meaning the GEOPM configuration files typically become
part of the compute node’s boot image. The GEOPM_POLICY
value directs
to another JSON file, possibly located on a shared file system, dictating
the enforced values (like the power cap in Watts or CPU frequency in Hz).
For GEOPM’s integration as the universal power management solution for
a cluster, it’s usual for a single agent algorithm with one policy to be
applied across all compute nodes within a partition. The choice of agent
rests upon the site’s needs. For instance, if the aim is to keep the average
CPU power draw for each node below a specific cap, the power_balancer
agent is ideal. However, if the goal is to
limit application CPU frequencies with exceptions for specific high-priority
processes, the frequency_map agent
is the best fit. Sites can also deploy a custom agent plugin. In every
scenario, invoking geopm_agent_enforce_policy()
before releasing
compute resources ensures the enforcement of static limits impacting all
user applications. For dynamic runtime features, users must initiate their
MPI application using the geopmlaunch(1) tool.
To illustrate, if a system administrator wants to use the power_balancer
agent, the process would involve setting a static power cap for
apps not utilizing geopmlaunch
, while optimizing power caps for
performance when geopmlaunch
is in use. The administrator would
install the following JSON object in the compute node’s boot image at
/etc/geopm/environment-override.json
:
{"GEOPM_AGENT": "power_balancer",
"GEOPM_POLICY": "/shared_fs/config/geopm_power_balancer.json"}
The controlling value, CPU_POWER_LIMIT
, is defined in a separate
“geopm_power_balancer.json” file that could reside on a shared file
system. This file can be generated using the geopmagent(1) tool. By placing the policy file on a shared file system,
you allow modifications to the limit without affecting the compute node
boot image. Changing the policy value affects all new GEOPM processes but
leaves running GEOPM processes untouched.