geopm_pio(7) – GEOPM PlatformIO interface

Description

The PlatformIO class provides a high-level interface for signals (system monitors) and controls (system settings). There are a large number of built-in signals and controls. These built-in signals and controls include hardware metrics, hardware settings, and signals derived from application behavior. Application behavior is tracked by GEOPM’s integration with MPI and OpenMP and also by application use of the geopm_prof(3) mark-up interface. In addition to the built-in features, PlatformIO can be extended through the geopm::IOGroup(3) plugin interface to provide arbitrary signals and controls.

A domain is a discrete component within a compute node where a signal or control is applicable. For more information about the geopm_domain_e enum and the hierarchical platform description see geopm_topo(3). A signal represents any measurement in SI units that can be sampled or any unit-free integer that can be read. A control represents a request for a hardware domain to operate such that a related signal measured from the hardware domain will track the request. For example, the user can set a CPU_POWER_LIMIT_CONTROL in units of watts and the related signal, CPU_POWER, will remain below the limit. Similarly the user can set a CPU_FREQUENCY_MAX_CONTROL in hertz and the related signal, CPU_FREQUENCY_STATUS will show the CPU operating at the value set.

See the geopmread(1) and geopmwrite(1) tools for command-line interaction with the PlatformIO interface.

Aliasing Signals And Controls

There are two classes of signals and control names: “low level” and “high level”. All IOGroup‘s are expected to provide low level signals and controls with names that are prefixed with the IOGroup name and two colons, e.g. the MSRIOGroup provides the MSR::PKG_ENERGY_STATUS:ENERGY signal. If the signal or control may be supported on more than one platform, the implementation should be aliased to a high level name. This high level name enables the signal or control to be supported by more than one IOGroup, and different platforms will support the loading different sets of IOGroups. The MSRIOGroup aliases the above signal to the high level CPU_ENERGY signal which can be used on any platform to measure the current CPU energy value. Agents are encouraged to request high level signals and controls to make the implementation more portable. The high level signals and controls supported by built-in IOGroup classes are listed below. See geopm::PluginFactory(3) section on SEARCH AND LOAD ORDER for information about how the GEOPM_PLUGIN_PATH environment variable is used to select which IOGroup implementation is used in the case where more than one provides the same high level signal or control.

Signal names that end in # (for example, raw MSR values) are 64-bit integers encoded to be stored as doubles. When accessing these integer signals, the return value of read_signal() or sample() should not be used directly as a double precision number. To decode the 64-bit integer from the double use geopm_signal_to_field() described in geopm_hash(3). The geopm::MSRIOGroup(3) also provides raw MSR field signals that are encoded in this way.

Descriptions Of High Level Aliases

BOARD_ENERGY

Total energy measured on the server’s board. See geopm_pio_cnl(7) and geopm_pio_msr(7) for signal availability requirements. On systems that support both IOGroups the CNL alias will be used. The MSR::BOARD_ENERGY alias may be used to access the MSRIOGroup version on those systems.

BOARD_POWER

Power measured on the server’s board. See geopm_pio_cnl(7) and geopm_pio_msr(7) for signal availability requirements. On systems that support both IOGroups the CNL alias will be used. The MSR::BOARD_POWER alias may be used to access the MSRIOGroup version on those systems.

BOARD_POWER_LIMIT_CONTROL

The average board power usage limit over the time window specified in BOARD_POWER_TIME_WINDOW_CONTROL.

BOARD_POWER_TIME_WINDOW_CONTROL

The time window associated with BOARD_POWER_LIMIT_CONTROL.

CPU_CORE_TEMPERATURE

CPU core temperature, in degrees Celsius.

CPU_CYCLES_REFERENCE

The count of the number of cycles while the logical processor is not in a halt state and not in a stop-clock state. The count rate is fixed at the TIMESTAMP_COUNT rate.

CPU_CYCLES_THREAD

The count of the number of cycles while the logical processor is not in a halt state. The count rate may change based on core frequency.

CPU_ENERGY

An increasing meter of energy consumed by the package over time. It will reset periodically due to roll-over.

CPU_FREQUENCY_MAX_CONTROL

Target maximum operating frequency of the CPU based on the control register.

CPU_FREQUENCY_MIN_AVAIL

Minimum achievable processor frequency on the system.

CPU_FREQUENCY_MAX_AVAIL

Maximum achievable processor frequency on the system.

CPU_FREQUENCY_MIN_CONTROL

Target minimum operating frequency of the CPU based on the control register.

CPU_FREQUENCY_STATUS

The current operating frequency of the CPU.

CPU_FREQUENCY_STEP

Step size between processor frequency settings.

CPU_FREQUENCY_STICKER

Processor base frequency.

CPU_INSTRUCTIONS_RETIRED

The count of the number of instructions executed.

CPU_PACKAGE_TEMPERATURE

CPU package temperature, in degrees Celsius.

CPU_POWER_LIMIT_CONTROL

The average power usage limit over the time window specified in PL1_TIME_WINDOW.

CPU_POWER_TIME_WINDOW_CONTROL

The time window associated with power limit 1.

CPU_POWER_MAX_AVAIL

The maximum power limit based on the electrical specification.

CPU_POWER_MIN_AVAIL

The minimum power limit based on the electrical specification.

CPU_POWER_LIMIT_DEFAULT

Maximum power to stay within the thermal limits based on the design (TDP).

CPU_POWER

Total power aggregated over the processor package.

CPU_TIMESTAMP_COUNTER

An always running, monotonically increasing counter that is incremented at a constant rate. For use as a wall clock timer.

CPU_UNCORE_FREQUENCY_STATUS

Target operating frequency of the uncore.

CPU_UNCORE_FREQUENCY_MAX_CONTROL

Control that limits the maximum frequency of the uncore.

CPU_UNCORE_FREQUENCY_MIN_CONTROL

Control that limits the minimum frequency of the uncore.

DRAM_ENERGY

An increasing meter of energy consumed by the DRAM over time. It will reset periodically due to roll-over.

DRAM_POWER

Total power aggregated over the DRAM DIMMs associated with a NUMA node.

EPOCH_COUNT

Number of completed executions of an epoch. Prior to the first call by the application to geopm_prof_epoch() the signal returns as -1. With each call to geopm_prof_epoch() the count increases by one.

GPU_CORE_ACTIVITY

GPU compute core activity expressed as a ratio of cycles.

GPU_CORE_FREQUENCY_MAX_AVAIL

Maximum supported GPU core frequency over the specified domain.

GPU_CORE_FREQUENCY_MIN_AVAIL

Minimum supported GPU core frequency over the specified domain.

GPU_CORE_FREQUENCY_STEP

Step size between GPU frequency settings.

GPU_CORE_FREQUENCY_MAX_CONTROL

Control that limits the maximum GPU core frequency.

GPU_CORE_FREQUENCY_MIN_CONTROL

Control that limits the minimum GPU core frequency.

GPU_CORE_FREQUENCY_STATUS

Average achieved GPU core frequency over the specified domain.

GPU_ENERGY

Total energy aggregated over the GPU package.

GPU_POWER_LIMIT_CONTROL

Average GPU power usage limit.

GPU_POWER

Total power aggregated over the GPU package.

GPU_TEMPERATURE

Average GPU temperature in degrees Celsius.

GPU_UNCORE_ACTIVITY

GPU memory access activity expressed as a ratio of cycles.

GPU_UTILIZATION

Average GPU utilization expressed as a ratio of cycles.

REGION_HASH

The hash of the region of code (see geopm_prof(3)) currently being run by all ranks, otherwise GEOPM_REGION_HASH_UNMARKED.

REGION_HINT

The region hint (see geopm_prof(3)) associated with the currently running region. For any interval when all ranks are within an MPI function inside of a user defined region, the hint will change from the hint associated with the user defined region to GEOPM_REGION_HINT_NETWORK. If the user defined region was defined with GEOPM_REGION_HINT_NETWORK and there is an interval within the region when all ranks are within an MPI function, GEOPM will not attribute the time spent within the MPI function as MPI time in the report files. It will be instead attributed to the time spent in the region as a whole.

REGION_PROGRESS

Minimum per-rank reported progress through the current region.

REGION_RUNTIME

Maximum per-rank of the last recorded runtime for the current region.

TIME

Time elapsed since the beginning of execution.

Low Level Signals and Controls

The high level alias signals and controls defined in this man page may be supported by one or more IOGroups. These IOGroups also provide signals and controls which extend the capabilities described in this page. These signals and controls are described as “low level signals and controls”, and these have names that are prefixed by the IOGroup name that provides it. For example, the MSRIOGroup provides the MSR::PERF_CTL:FREQ low level control. This is the underlying implementation for the high level alias CPU_FREQUENCY_MAX_CONTROL on x86 platforms when HWP is disabled. Some low level signals and controls do not have high level aliases associated with them. To learn about these low level signals and controls please consult the chapter 7 man page for each IOGroup as linked below.

Environment

There are environment variables that can be used to disable performance features of GEOPM. The main purpose of these environment variables is to enable easy measurement of the impact of these features in performance testing.

GEOPM_DISABLE_MSR_SAFE

When this environment variable is set, the msr-safe driver interfaces will not be used even if they are present and accessible.

GEOPM_DISABLE_IO_URING

When this environment variable is set, the io-uring asynchronous kernel file I/O will not be used even if the kernel supports this feature and the io-uring feature is enabled in the build of libgeopmd.so.

See Also

geopm(7), geopm_prof(3), geopm_topo(3),