geopm_pio_levelzero(7) – IOGroup providing signals and controls for Intel GPUs
Description
The LevelZeroIOGroup implements the geopm::IOGroup(3) interface to provide hardware signals and controls for Intel GPUs.
Requirements
To use the GEOPM LevelZero signals and controls GEOPM must be compiled against the oneAPI LevelZero libraries and must be run on a system with discrete GPUs supported by LevelZero. To compile against the oneAPI LevelZero libraries geopm must be configured using the –enable-levelzero flag. The optional –with-levelzero flag may be used to indicate the path of the required libraries. In addition the user must export ZES_ENABLE_SYSMAN=1 as specified by the Intel oneAPI Level Zero Sysman documentation. See the Sysman specification for more info on related environment variables and their usage.
Since signals and controls are exposed via the Sysman API they will be impacted by Sysman environment variables. Please review oneAPI LevelZero Sysman Environment Variables and oneAPI LevelZero Core Programming Guide Environment Variables.
Note on RAS Signals
The Monitoring of RAS counters have a high overhead (0.5 seconds each to read). And so, reporting of any errors while monitoring these signals (for e.g., due to unsupported firmware) will be delayed until the user attempts to actually read any of these signals.
Signals
LEVELZERO::GPU_CORE_FREQUENCY_STATUSThe current frequency of the GPU Compute Hardware.
Aggregation: average
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_CORE_FREQUENCY_EFFICIENTThe efficient minimum frequency of the GPU Compute Hardware.
Aggregation: average
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_CORE_FREQUENCY_MAX_AVAILThe maximum supported frequency of the GPU Compute Hardware.
Aggregation: expect_same
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_CORE_FREQUENCY_MIN_AVAILThe minimum supported frequency of the GPU Compute Hardware.
Aggregation: expect_same
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_CORE_TEMPERATURE_MAXIMUMThe maximum measured temperature across all sensors in the GPU accelerator.”
Aggregation: max
Domain: gpu_chip
Format: double
Unit: celsius
LEVELZERO::GPU_MEMORY_TEMPERATURE_MAXIMUMThe maximum measured temperature across all sensors in the GPU memory.”
Aggregation: max
Domain: gpu_chip
Format: double
Unit: celsius
LEVELZERO::GPU_CORE_FREQUENCY_STEPThe GPU Compute Hardware frequency step size in hertz. The average step size is provided in the case where the step size is variable.
Aggregation: expect_same
Domain: gpu
Format: double
Unit: hertz
LEVELZERO::GPU_ENERGYGPU energy in joules.
Aggregation: sum
Domain: gpu
Format: double
Unit: joules
LEVELZERO::GPU_CORE_ENERGYGPU Compute Hardware chip energy in joules.
Aggregation: sum
Domain: gpu_chip for multi-chip systems or gpu for single chip per gpu systems
Format: double
Unit: joules
LEVELZERO::GPU_CORE_ENERGY_TIMESTAMPGPU compute hardware domain energy timestamp in seconds. Value cached on LEVELZERO::GPU_CORE_ENERGY read.
Aggregation: sum
Domain: gpu_chip for multi-chip systems or gpu for single chip per gpu systems
Format: double
Unit: seconds
LEVELZERO::GPU_ENERGY_TIMESTAMPTimestamp for the GPU energy read in seconds.
Aggregation: sum
Domain: gpu
Format: double
Unit: seconds
LEVELZERO::GPU_CORE_PERFORMANCE_FACTORPerformance Factor of the GPU Compute Hardware Domain. Expresses a trade-off between energy provided to the GPU compute hardware and the supporting units. A value of 1 indicates a compute focused energy trade-off, a value of 0 indicates a memory focused energy trade-off. Default value is 0.5
Aggregation: average
Domain: gpu_chip for multi-chip systems or gpu for single chip per gpu systems
Format: double
Unit: none
LEVELZERO::GPU_UNCORE_FREQUENCY_STATUSThe current frequency of the GPU Memory hardware.
Aggregation: average
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_UNCORE_FREQUENCY_MAX_AVAILThe maximum supported frequency of the GPU Memory Hardware.
Aggregation: expect_same
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_UNCORE_FREQUENCY_MIN_AVAILThe minimum supported frequency of the GPU Memory Hardware.
Aggregation: expect_same
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_POWER_LIMIT_DEFAULTDefault power limit of the GPU in watts.
Aggregation: sum
Domain: gpu
Format: double
Unit: watts
LEVELZERO::GPU_POWER_LIMIT_MIN_AVAILThe minimum supported power limit in watts.
Aggregation: sum
Domain: gpu
Format: double
Unit: watts
LEVELZERO::GPU_POWER_LIMIT_MAX_AVAILThe maximum supported power limit in watts.
Aggregation: sum
Domain: gpu
Format: double
Unit: watts
LEVELZERO::GPU_RAS_RESET_COUNT_CORRECTABLEThe number of correctable accelerator engine resets by the driver.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_PROGRAMMING_ERRCOUNT_CORRECTABLEThe number of correctable hardware exceptions generated by the way workloads have programmed the hardware.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_DRIVER_ERRCOUNT_CORRECTABLEThe number of correctable low level driver communication errors.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_COMPUTE_ERRCOUNT_CORRECTABLEThe number of correctable errors in the compute accelerator hardware.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_NONCOMPUTE_ERRCOUNT_CORRECTABLEThe number of correctable errors in the fixed-function accelerator hardware.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_CACHE_ERRCOUNT_CORRECTABLEThe number of correctable errors in caches (L1/L3/register file/shared local memory/sampler).
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_DISPLAY_ERRCOUNT_CORRECTABLEThe number of correctable errors in the display.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_RESET_COUNT_UNCORRECTABLEThe number of uncorrectable accelerator engine resets by the driver.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_PROGRAMMING_ERRCOUNT_UNCORRECTABLEThe number of uncorrectable hardware exceptions generated by the way workloads have programmed the hardware.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_DRIVER_ERRCOUNT_UNCORRECTABLEThe number of uncorrectable low level driver communication errors.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_COMPUTE_ERRCOUNT_UNCORRECTABLEThe number of uncorrectable errors in the compute accelerator hardware.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_NONCOMPUTE_ERRCOUNT_UNCORRECTABLEThe number of uncorrectable errors in the fixed-function accelerator hardware.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_CACHE_ERRCOUNT_UNCORRECTABLEThe number of uncorrectable errors in caches (L1/L3/register file/shared local memory/sampler).
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_RAS_DISPLAY_ERRCOUNT_UNCORRECTABLEThe number of uncorrectable errors in the display.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_ACTIVE_TIMETime that this resource is actively running a workload in unspecified units. See the Intel oneAPI Level Zero Sysman documentation for more info.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_ACTIVE_TIME_TIMESTAMPThe timestamp for the
LEVELZERO::GPU_ACTIVE_TIMEread in unspecified units. See the Intel oneAPI Level Zero Sysman documentation for more info.Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_CORE_ACTIVE_TIMETime that the GPU compute engines (EUs) are actively running a workload in unspecified units. See the Intel oneAPI Level Zero Sysman documentation for more info.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_CORE_ACTIVE_TIME_TIMESTAMPThe timestamp for the
LEVELZERO::GPU_CORE_ACTIVE_TIMEsignal read in unspecified units. See the Intel oneAPI Level Zero Sysman documentation for more info.Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_UNCORE_ACTIVE_TIMETime that the GPU copy engines are actively running a workload in unspecified units. See the Intel oneAPI Level Zero Sysman documentation for more info.
Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_UNCORE_ACTIVE_TIME_TIMESTAMPThe timestamp for the
LEVELZERO::GPU_UNCORE_ACTIVE_TIMEsignal read in unspecified units. See the Intel oneAPI Level Zero Sysman documentation for more info.Aggregation: sum
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_POWERAverage GPU power over 40ms (via geopmread) or 8 control loop iterations. Derivative signal based on
LEVELZERO::GPU_ENERGY.Aggregation: average
Domain: gpu
Format: double
Unit: watts
LEVELZERO::GPU_CORE_POWERAverage GPU Compute Hardware power over 40ms (via geopmread) or 8 control loop iterations. Derivative signal based on
LEVELZERO::GPU_CORE_ENERGY.Aggregation: average
Domain: gpu_chip
Format: double
Unit: watts
LEVELZERO::GPU_UTILIZATIONUtilization of all GPU engines. Level Zero logical engines may map to the same hardware, resulting in a reduced signal range (i.e. less than 0 to 1) in some cases. See the LevelZero Sysman Engine documentation for more info.
Aggregation: average
Domain: gpu
Format: double
Unit: none
LEVELZERO::GPU_CORE_UTILIZATIONUtilization of the GPU Compute Engines (EUs). Level Zero logical engines may map to the same hardware, resulting in a reduced signal range (i.e. less than 0 to 1) in some cases. See the LevelZero Sysman Engine documentation for more info.
Aggregation: average
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_UNCORE_UTILIZATIONUtilization of the GPU Copy Engines. Level Zero logical engines may map to the same hardware, resulting in a reduced signal range (i.e. less than 0 to 1) in some cases. See the LevelZero Sysman Engine documentation for more info.
Aggregation: average
Domain: gpu_chip
Format: double
Unit: none
LEVELZERO::GPU_CORE_THROTTLE_REASONSGPU Compute Hardware throttle reasons. See oneAPI Level Zero Sysman Spec for decoding.
Aggregation: integer_bitwise_or
Domain: gpu_chip
Format: integer
Unit: none
Controls
Every control is exposed as a signal with the same name. The relevant signal aggregation information is provided below.
LEVELZERO::GPU_CORE_FREQUENCY_MIN_CONTROLSets the minimum frequency request for the GPU Compute Hardware.
Aggregation: expect_same
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_CORE_FREQUENCY_MAX_CONTROLSets the minimum frequency request for the GPU Compute Hardware.
Aggregation: expect_same
Domain: gpu_chip
Format: double
Unit: hertz
LEVELZERO::GPU_CORE_PERFORMANCE_FACTOR_CONTROLPerformance Factor of the GPU Compute Hardware Domain. Expresses a trade-off between energy provided to the GPU compute hardware and the supporting units. A value of 1 indicates a compute focused energy trade-off, a value of 0 indicates a memory focused energy trade-off. Default value is 0.5
Aggregation: average
Domain: gpu_chip
Format: double
Unit: none
Aliases
This IOGroup provides the following high-level aliases:
Signal Aliases
GPU_ENERGYMaps to
LEVELZERO::GPU_ENERGY.GPU_POWERMaps to
LEVELZERO::GPU_POWER.GPU_CORE_ENERGYMaps to
LEVELZERO::GPU_CORE_ENERGY.GPU_CORE_POWERMaps to
LEVELZERO::GPU_CORE_POWER.GPU_UTILIZATIONMaps to
LEVELZERO::GPU_UTILIZATION.GPU_CORE_ACTIVITYMaps to
LEVELZERO::GPU_CORE_UTILIZATION.GPU_UNCORE_ACTIVITYMaps to
LEVELZERO::GPU_UNCORE_UTILIZATION.GPU_CORE_FREQUENCY_STATUSMaps to
LEVELZERO::GPU_CORE_FREQUENCY_STATUS.GPU_CORE_FREQUENCY_MIN_AVAILMaps to
LEVELZERO::GPU_CORE_FREQUENCY_MIN_AVAIL.GPU_CORE_FREQUENCY_MAX_AVAILMaps to
LEVELZERO::GPU_CORE_FREQUENCY_MAX_AVAIL.GPU_CORE_FREQUENCY_MIN_CONTROLMaps to
LEVELZERO::GPU_CORE_FREQUENCY_MIN_CONTROL.GPU_CORE_FREQUENCY_MAX_CONTROLMaps to
LEVELZERO::GPU_CORE_FREQUENCY_MAX_CONTROL.GPU_CORE_FREQUENCY_STEPMaps to
LEVELZERO::GPU_CORE_FREQUENCY_STEP.LEVELZERO::GPU_CORE_PERFORMANCE_FACTOR_CONTROLMaps to
LEVELZERO::GPU_CORE_PERFORMANCE_FACTORWrites to performance factor may not be granted. To confirm the actual control setting the signal must be read.
Control Aliases
GPU_CORE_FREQUENCY_MAX_CONTROLMaps to
LEVELZERO::GPU_CORE_FREQUENCY_MAX_CONTROLGPU_CORE_FREQUENCY_MIN_CONTROLMaps to
LEVELZERO::GPU_CORE_FREQUENCY_MIN_CONTROL
See Also
oneAPI LevelZero Sysman, geopm(7), geopm::IOGroup(3), geopmwrite(1), geopmread(1)