geopm_pio(7) – GEOPM PlatformIO interface
Description
The PlatformIO
class provides a high-level interface for signals
(system monitors) and controls (system settings). There are a large
number of built-in signals and controls. These built-in signals and
controls include hardware metrics, hardware settings, and signals
derived from application behavior. Application behavior is tracked by
GEOPM’s integration with MPI and OpenMP and also by application use of
the geopm_prof(3) mark-up interface. In
addition to the built-in features, PlatformIO
can be extended
through the geopm::IOGroup(3) plugin
interface to provide arbitrary signals and controls.
A domain is a discrete component within a compute node where a signal
or control is applicable. For more information about the
geopm_domain_e
enum and the hierarchical platform description see
geopm_topo(3). A
signal represents any measurement in SI units that can be sampled or
any unit-free integer that can be read. A control represents a
request for a hardware domain to operate such that a related signal
measured from the hardware domain will track the request. For
example, the user can set a CPU_POWER_LIMIT_CONTROL
in units of
watts and the related signal, CPU_POWER
, will remain below
the limit. Similarly the user can set a CPU_FREQUENCY_MAX_CONTROL
in
hertz and the related signal, CPU_FREQUENCY_STATUS
will show the
CPU operating at the value set.
See the geopmread(1) and geopmwrite(1) tools for command-line interaction with the PlatformIO
interface.
Aliasing Signals And Controls
There are two classes of signals and control names: “low level” and
“high level”. All IOGroup
‘s are expected to provide low level
signals and controls with names that are prefixed with the IOGroup
name and two colons, e.g. the MSRIOGroup
provides the
MSR::PKG_ENERGY_STATUS:ENERGY
signal. If the signal or control may
be supported on more than one platform, the implementation should be
aliased to a high level name. This high level name enables the signal
or control to be supported by more than one IOGroup
, and different
platforms will support the loading different sets of IOGroups
. The
MSRIOGroup
aliases the above signal to the high level
CPU_ENERGY
signal which can be used on any platform to measure
the current CPU energy value. Agents are encouraged to request
high level signals and controls to make the implementation more
portable. The high level signals and controls supported by built-in
IOGroup
classes are listed below. See geopm::PluginFactory(3) section on SEARCH AND LOAD ORDER for
information about how the GEOPM_PLUGIN_PATH
environment variable is used to
select which IOGroup
implementation is used in the case where more than one
provides the same high level signal or control.
Signal names that end in #
(for example, raw MSR values) are 64-bit
integers encoded to be stored as doubles. When accessing these
integer signals, the return value of read_signal()
or sample()
should not be used directly as a double precision number. To
decode the 64-bit integer from the double use
geopm_signal_to_field()
described in geopm_hash(3). The
geopm::MSRIOGroup(3) also provides raw MSR field signals that are
encoded in this way.
Descriptions Of High Level Aliases
BOARD_ENERGY
Total energy measured on the server’s board. See geopm_pio_cnl(7) and geopm_pio_msr(7) for signal availability requirements. On systems that support both IOGroups the CNL alias will be used. The
MSR::BOARD_ENERGY
alias may be used to access the MSRIOGroup version on those systems.BOARD_POWER
Power measured on the server’s board. See geopm_pio_cnl(7) and geopm_pio_msr(7) for signal availability requirements. On systems that support both IOGroups the CNL alias will be used. The
MSR::BOARD_POWER
alias may be used to access the MSRIOGroup version on those systems.BOARD_POWER_LIMIT_CONTROL
The average board power usage limit over the time window specified in BOARD_POWER_TIME_WINDOW_CONTROL.
BOARD_POWER_TIME_WINDOW_CONTROL
The time window associated with BOARD_POWER_LIMIT_CONTROL.
CPU_CORE_TEMPERATURE
CPU core temperature, in degrees Celsius.
CPU_CYCLES_REFERENCE
The count of the number of cycles while the logical processor is not in a halt state and not in a stop-clock state. The count rate is fixed at the TIMESTAMP_COUNT rate.
CPU_CYCLES_THREAD
The count of the number of cycles while the logical processor is not in a halt state. The count rate may change based on core frequency.
CPU_ENERGY
An increasing meter of energy consumed by the package over time. It will reset periodically due to roll-over.
CPU_FREQUENCY_MAX_CONTROL
Target maximum operating frequency of the CPU based on the control register.
CPU_FREQUENCY_MIN_AVAIL
Minimum achievable processor frequency on the system.
CPU_FREQUENCY_MAX_AVAIL
Maximum achievable processor frequency on the system.
CPU_FREQUENCY_MIN_CONTROL
Target minimum operating frequency of the CPU based on the control register.
CPU_FREQUENCY_STATUS
The current operating frequency of the CPU.
CPU_FREQUENCY_STEP
Step size between processor frequency settings.
CPU_FREQUENCY_STICKER
Processor base frequency.
CPU_INSTRUCTIONS_RETIRED
The count of the number of instructions executed.
CPU_PACKAGE_TEMPERATURE
CPU package temperature, in degrees Celsius.
CPU_POWER_LIMIT_CONTROL
The average power usage limit over the time window specified in PL1_TIME_WINDOW.
CPU_POWER_TIME_WINDOW_CONTROL
The time window associated with power limit 1.
CPU_POWER_MAX_AVAIL
The maximum power limit based on the electrical specification.
CPU_POWER_MIN_AVAIL
The minimum power limit based on the electrical specification.
CPU_POWER_LIMIT_DEFAULT
Maximum power to stay within the thermal limits based on the design (TDP).
CPU_POWER
Total power aggregated over the processor package.
CPU_TIMESTAMP_COUNTER
An always running, monotonically increasing counter that is incremented at a constant rate. For use as a wall clock timer.
CPU_UNCORE_FREQUENCY_STATUS
Target operating frequency of the uncore.
CPU_UNCORE_FREQUENCY_MAX_CONTROL
Control that limits the maximum frequency of the uncore.
CPU_UNCORE_FREQUENCY_MIN_CONTROL
Control that limits the minimum frequency of the uncore.
DRAM_ENERGY
An increasing meter of energy consumed by the DRAM over time. It will reset periodically due to roll-over.
DRAM_POWER
Total power aggregated over the DRAM DIMMs associated with a NUMA node.
EPOCH_COUNT
Number of completed executions of an epoch. Prior to the first call by the application to
geopm_prof_epoch()
the signal returns as-1
. With each call togeopm_prof_epoch()
the count increases by one.GPU_CORE_ACTIVITY
GPU compute core activity expressed as a ratio of cycles.
GPU_CORE_FREQUENCY_MAX_AVAIL
Maximum supported GPU core frequency over the specified domain.
GPU_CORE_FREQUENCY_MIN_AVAIL
Minimum supported GPU core frequency over the specified domain.
GPU_CORE_FREQUENCY_STEP
Step size between GPU frequency settings.
GPU_CORE_FREQUENCY_MAX_CONTROL
Control that limits the maximum GPU core frequency.
GPU_CORE_FREQUENCY_MIN_CONTROL
Control that limits the minimum GPU core frequency.
GPU_CORE_FREQUENCY_STATUS
Average achieved GPU core frequency over the specified domain.
GPU_ENERGY
Total energy aggregated over the GPU package.
GPU_POWER_LIMIT_CONTROL
Average GPU power usage limit.
GPU_POWER
Total power aggregated over the GPU package.
GPU_TEMPERATURE
Average GPU temperature in degrees Celsius.
GPU_UNCORE_ACTIVITY
GPU memory access activity expressed as a ratio of cycles.
GPU_UTILIZATION
Average GPU utilization expressed as a ratio of cycles.
REGION_HASH
The hash of the region of code (see geopm_prof(3)) currently being run by all ranks, otherwise
GEOPM_REGION_HASH_UNMARKED
.REGION_HINT
The region hint (see geopm_prof(3)) associated with the currently running region. For any interval when all ranks are within an MPI function inside of a user defined region, the hint will change from the hint associated with the user defined region to
GEOPM_REGION_HINT_NETWORK
. If the user defined region was defined withGEOPM_REGION_HINT_NETWORK
and there is an interval within the region when all ranks are within an MPI function, GEOPM will not attribute the time spent within the MPI function as MPI time in the report files. It will be instead attributed to the time spent in the region as a whole.REGION_PROGRESS
Minimum per-rank reported progress through the current region.
REGION_RUNTIME
Maximum per-rank of the last recorded runtime for the current region.
TIME
Time elapsed since the beginning of execution.
Low Level Signals and Controls
The high level alias signals and controls defined in this man page may be
supported by one or more IOGroups. These IOGroups also provide signals and
controls which extend the capabilities described in this page. These signals
and controls are described as “low level signals and controls”, and these have
names that are prefixed by the IOGroup name that provides it. For example, the
MSRIOGroup
provides the MSR::PERF_CTL:FREQ
low level control. This is
the underlying implementation for the high level alias
CPU_FREQUENCY_MAX_CONTROL
on x86 platforms when HWP is disabled. Some low
level signals and controls do not have high level aliases associated with them.
To learn about these low level signals and controls please consult the chapter
7 man page for each IOGroup as linked below.
Environment
There are environment variables that can be used to disable performance features of GEOPM. The main purpose of these environment variables is to enable easy measurement of the impact of these features in performance testing.
GEOPM_DISABLE_MSR_SAFE
When this environment variable is set, the msr-safe driver interfaces will not be used even if they are present and accessible.
GEOPM_DISABLE_IO_URING
When this environment variable is set, the io-uring asynchronous kernel file I/O will not be used even if the kernel supports this feature and the io-uring feature is enabled in the build of libgeopmd.so.