Getting Started๏
The GEOPM project consists of a two-tiered software structure: the GEOPM Service and the GEOPM Runtime. The GEOPM Service stands out by offering a secure userspace interface, facilitating access to hardware telemetry and configurations. On the other hand, the GEOPM Runtime empowers end-users to delve deeper into their application profiles for refined data analysis. Additionally, it provides the option to implement active hardware configuration algorithms, paving the way for enhanced energy efficiency.
For in-depth information see: User Guide for GEOPM Service or User Guide for GEOPM Runtime.
๐ป Install GEOPM๏
There are two recommended ways to install the GEOPM software: one is to install the pre-built packages provided for various Linux distributions while the other is to build from source code using Spack.
Packages for Linux Distributions๏
Pre-built binaries of the GEOPM Service and Runtime are available for download using the openSUSE Build Service for RPM-based Linux distributions and through Launchpad for Debian-based distributions. For information on how to configure those repositories with your system package manager or directly download the binaries see: Installation
Building with Spack๏
Note
Recipes for the v3.1 release of GEOPM are currently work-in-progress.
For users that leverage Spack to distribute software, recipes to build the geopm-service and geopm-runtime have been included in their v0.22.0 release. These recipes currently allow for building both the v3.0.1 release of GEOPM and our main development branch.
For deploying GEOPMโs layers to a compute image in an HPC system context (i.e.
PXE booted via warewulf or similar), a typical configuration would be to have a
system install of the service RPMs baked into the compute image, and use spack
to install geopm-runtime
. This is required as the GEOPM Service will be
launched via systemd, and thus must run against the system installed Python
runtime.
For GEOPM v3.0.1, system install geopm-service
, geopm-service-devel
,
libgeopmd2
, and python3-geopmdpy
For GEOPM v3.1, system install geopm-service
, geopm-service-doc
,
geopm-service-devel
, libgeopmd2
, libgeopmd-doc
,
python3-geopmdpy
, and python3-geopmdpy-doc
In order to build with spack this way, geopm-service
must be configured as an
external package in ~/.spack/packages.yaml
:
packages:
geopm-service:
externals:
- spec: "geopm-service@3.0.1"
prefix: /usr
- spec: "geopm-service@3.1.0"
prefix: /usr
- spec: "geopm-service@develop"
prefix: /usr
version:
- 3.0.1
- 3.1.0
- develop
buildable: False
Afterwards, geopm-runtime
can be installed normally with spack install
geopm-runtime
.
Admin Configuration๏
After the Service has been installed, it must be configured properly before non-root users will be able to leverage it.
To grant permissions to all non-root users to be able to use all of the features provided by the Service, execute the following commands:
$ sudo geopmaccess -a | sudo geopmaccess -w
$ sudo geopmaccess -a -c | sudo geopmaccess -w -c
These commands will create access lists in the system location that the Service will use to determine user privilege.
An administrator may use the --log
(-l
) option of geopmaccess
to
restrict an access list to the set of values that have been used since last
restart by piping the output into geopmaccess -w
:
$ sudo geopmaccess -l | sudo geopmaccess -w
$ sudo geopmaccess -l -c | sudo geopmaccess -w -c
More information on access list configuration can be found on the following pages: Service Administrators and geopmaccess(1) โ Access management for the GEOPM Service.
๐๏ธ Platform Topology๏
We refer to the different hardware layers within a system as domains. GEOPM has support for the following domains:
Board
Package
Core (physical)
CPU (Linux logical)
Memory
Package Integrated Memory
NIC
Package Integrated NIC
GPU
Package Integrated GPU
GPU Chip
For more information on the domain types, see: Domain Types.
Code Examples๏
All of the code examples require linking against libgeopmd
for C/C++. The
Python examples require that your PYTHONPATH
contains the geopmdpy
module and that libgeopmd
is available in your LD_LIBRARY_PATH
.
The following examples leverage geopmread or
geopmwrite for command-line usage, and the
C, C++, and
Python APIs of PlatformTopo
for the platform
topology.
# Print all domains:
$ geopmread --domain
# OR
$ geopmwrite --domain
board 1
package 2
core 104
cpu 208
memory 2
package_integrated_memory 2
nic 0
package_integrated_nic 0
gpu 6
package_integrated_gpu 0
gpu_chip 12
// Query the number of physical cores in the system
#include <stdio.h>
#include <geopm_topo.h>
int main (int argc, char** argv)
{
int num_cores = geopm_topo_num_domain(GEOPM_DOMAIN_CORE);
printf("Num cores = %d\n", num_cores);
return 0;
}
// Query the number of physical cores in the system
#include <iostream>
#include <geopm/PlatformTopo.hpp>
int main (int argc, char** argv)
{
int num_cores = geopm::platform_topo().num_domain(GEOPM_DOMAIN_CORE);
std::cout << "Num cores = " << num_cores << std::endl;
return 0;
}
# Query the number of physical cores in the system
import geopmdpy.topo as topo
num_core = topo.num_domain(topo.DOMAIN_CORE)
print(f'Num cores = {num_core}')
๐ฌ Reading Telemetry๏
We refer to any bit of telemetry that can be read with the Service as a
signal. Each signal has a native domain. For example, the native domain of
the current operating frequency of the CPU (i.e. CPU_FREQUENCY_STATUS
or
MSR::PERF_STATUS:FREQ
) is the CPU domain. Any signal can be aggregated to
any domain that is more coarse than its native domain; in our example, CPU
frequency can be aggregated to the package or board domains since they are
more coarse than the CPU domain.
The following examples make use of geopmread for the command-line
and the C, C++, and Python APIs for PlatformIO
in their respective languages.
Listing All Available Signals๏
$ geopmread
Listing Signal Information๏
Note
Some telemetry fields have a โhigh levelโ alias that can be used in place
of the โlow levelโ name. In this case, CPU_FREQUENCY_STATUS
is an alias
for MSR::PERF_STATUS:FREQ
. When using geopmread -i
to query for
information about a signal, the native domain and aggregation type are only
listed for the โlow levelโ name. For more information on names, see: Breaking
Down Signal/Control Names.
$ geopmread -i CPU_FREQUENCY_STATUS
CPU_FREQUENCY_STATUS:
description: The current operating frequency of the CPU.
iogroup: MSR
alias_for: MSR::PERF_STATUS:FREQ
$ geopmread -i MSR::PERF_STATUS:FREQ
MSR::PERF_STATUS:FREQ:
description: The current operating frequency of the CPU.
units: hertz
aggregation: average
domain: cpu
iogroup: MSRIOGroup
Reading Signals๏
# Read the current CPU frequency for cpu 0
$ geopmread CPU_FREQUENCY_STATUS cpu 0
// Read the current CPU frequency for cpu 0
#include <limits.h>
#include <stdio.h>
#include <geopm_topo.h>
#include <geopm_pio.h>
#include <geopm_error.h>
int main (int argc, char** argv)
{
double curr_frequency = 0;
char err_msg[GEOPM_MESSAGE_MAX];
int err = geopm_pio_read_signal("CPU_FREQUENCY_STATUS",
GEOPM_DOMAIN_CPU,
0,
&curr_frequency);
if (err != 0) {
geopm_error_message(err, err_msg, GEOPM_MESSAGE_MAX);
printf("Err msg = %s\n", err_msg);
}
printf("Current CPU frequency for core 0 = %f\n", curr_frequency);
return 0;
}
// Read the current CPU frequency for cpu 0
#include <iostream>
#include <geopm/PlatformIO.hpp>
#include <geopm/PlatformTopo.hpp>
int main (int argc, char** argv)
{
double curr_frequency =
geopm::platform_io().read_signal("CPU_FREQUENCY_STATUS",
GEOPM_DOMAIN_CPU, 0);
std::cout << "Current CPU frequency for core 0 = "
<< curr_frequency << std::endl;
return 0;
}
# Read the current CPU frequency for cpu 0
import geopmdpy.topo as topo
import geopmdpy.pio as pio
curr_frequency = pio.read_signal('CPU_FREQUENCY_STATUS', topo.DOMAIN_CPU, 0)
print(f'Current CPU frequency for core 0 = {curr_frequency}')
Understanding Aggregation๏
The telemetry that is output from geopmread
or the APIs will automatically
be aggregated based on the requested domain and the aggregation
type.
Using CPU_FREQUENCY_STATUS
as an example, the output in Listing Signal
Information shows the native domain as cpu
and the aggregation type as
average
. Notice the topology diagram shows that CPUs
are contained within cores, cores within packages, and packages within the board.
When a CPU_FREQUENCY_STATUS
request is made at the core
domain, GEOPM
reads and averages the frequencies of all CPUs linked to that core. If the
request is at the package
domain, it aggregates the frequencies of all CPUs
across every core in that package and provides the average. This methodology
escalates up to the broadest domain, the board
domain. Thus, to obtain the
average frequency spanning all packages, cores, and CPUs in the system, one
would issue a geopmread at the board
domain.
On the other hand, using CPU_ENERGY
as an example, the output in Listing
Signal Information shows the native domain as cpu
and the aggregation
type as sum
. When a CPU_ENERGY
request is made at the core
domain,
GEOPM sums the energy consumed by all CPUs linked to that core. If the request
is at the package
domain, it sums the energy consumed by all CPUs across
every core in that package and provides the total. This methodology escalates up
to the broadest domain, the board
domain. Thus, to obtain the total energy
consumed by all packages, cores, and CPUs in the system, one would issue a
geopmread at the board
domain.
For more information about aggregation types, see: geopm::Agg(3) โ data aggregation functions.
Video Demo: Using geopmread
๏
Reading Multiple Signals๏
To fetch platform telemetry and output it to the console or a file:
From the command-line: Use geopmsession. Its input arguments are similar to geopmread, but are taken from standard input rather than the command-line.
From code: Utilize the batch read API.
$ echo -e 'TIME board 0\nCPU_FREQUENCY_STATUS package 0' | geopmsession
// Read multiple signals using batch interface
#include <limits.h>
#include <stdio.h>
#include <geopm_topo.h>
#include <geopm_pio.h>
#include <geopm_error.h>
int read_signals ()
{
int time_idx, freq_idx, err;
double time_value, freq_value;
time_idx = geopm_pio_push_signal("TIME", GEOPM_DOMAIN_BOARD, 0);
if (time_idx < 0) {
// geopm_pio_push_signal will return a negative value when something went wrong
return time_idx;
}
freq_idx = geopm_pio_push_signal("CPU_FREQUENCY_STATUS", GEOPM_DOMAIN_PACKAGE, 0);
if (freq_idx < 0) {
return freq_idx;
}
err = geopm_pio_read_batch();
if (err < 0) {
return err;
}
err = geopm_pio_sample(time_idx, &time_value);
if (err < 0) {
return err;
}
err = geopm_pio_sample(freq_idx, &freq_value);
if (err < 0) {
return err;
}
printf("Elapsed time = %f\n", time_value);
printf("Current CPU frequency for core 0 = %f\n", freq_value);
return 0;
}
int main (int argc, char** argv)
{
char err_msg[GEOPM_MESSAGE_MAX];
int err = read_signals();
if (err < 0) {
geopm_error_message(err, err_msg, GEOPM_MESSAGE_MAX);
fprintf(stderr, "Err msg = %s\n", err_msg);
}
return 0;
}
// Read multiple signals using batch interface
#include <limits.h>
#include <iostream>
#include <geopm/PlatformIO.hpp>
#include <geopm/PlatformTopo.hpp>
int main (int argc, char** argv)
{
geopm::PlatformIO &pio = geopm::platform_io();
int time_idx, freq_idx;
double time_value, freq_value;
time_idx = pio.push_signal("TIME", GEOPM_DOMAIN_BOARD, 0);
freq_idx = pio.push_signal("CPU_FREQUENCY_STATUS", GEOPM_DOMAIN_PACKAGE, 0);
pio.read_batch();
time_value = pio.sample(time_idx);
freq_value = pio.sample(freq_idx);
std::cout << "Elapsed time = "
<< time_value << std::endl;
std::cout << "Current CPU frequency for core 0 = "
<< freq_value << std::endl;
return 0;
}
# Read multiple signals using batch read
import geopmdpy.topo as topo
import geopmdpy.pio as pio
time_idx = pio.push_signal('TIME', topo.DOMAIN_BOARD, 0)
freq_idx = pio.push_signal('CPU_FREQUENCY_STATUS', topo.DOMAIN_PACKAGE, 0)
pio.read_batch()
print(f"Elapsed time = {pio.sample(time_idx)}")
print(f"Current CPU frequency for core 0 = {pio.sample(freq_idx)}")
For more information on geopmsession
see: geopmsession(1) โ Command line interface for the GEOPM service batch read features.
Capturing Telemetry Over Time๏
geopmsession
can also capture data over time with the -p
and -t
options. This behavior is easily implemented in code along with the batch read
interface.
# Read 2 signals for 10 seconds, sampling once a second:
$ echo -e 'TIME board 0\nCPU_FREQUENCY_STATUS package 0' | geopmsession -p 1.0 -t 10.0
// Read multiple signals for 10 seconds using batch read, sampling once a second
#include <limits.h>
#include <stdio.h>
#include <unistd.h>
#include <geopm_topo.h>
#include <geopm_pio.h>
#include <geopm_error.h>
int read_signals ()
{
int time_idx, freq_idx, err;
double time_value, freq_value;
int ii;
time_idx = geopm_pio_push_signal("TIME", GEOPM_DOMAIN_BOARD, 0);
if (time_idx < 0) {
// geopm_pio_push_signal will return a negative value when something went wrong
return time_idx;
}
freq_idx = geopm_pio_push_signal("CPU_FREQUENCY_STATUS", GEOPM_DOMAIN_PACKAGE, 0);
if (freq_idx < 0) {
return freq_idx;
}
printf("time,frequency\n");
for (ii = 0; ii < 10; ii++) {
err = geopm_pio_read_batch();
if (err < 0) {
return err;
}
err = geopm_pio_sample(time_idx, &time_value);
if (err < 0) {
return err;
}
err = geopm_pio_sample(freq_idx, &freq_value);
if (err < 0) {
return err;
}
printf("%f,%f\n", time_value, freq_value);
sleep(1);
}
return 0;
}
int main (int argc, char** argv)
{
char err_msg[GEOPM_MESSAGE_MAX];
int err = read_signals();
if (err < 0) {
geopm_error_message(err, err_msg, GEOPM_MESSAGE_MAX);
fprintf(stderr, "Err msg = %s\n", err_msg);
}
return 0;
}
// Read multiple signals for ten seconds using batch read every second
#include <limits.h>
#include <unistd.h>
#include <iostream>
#include <geopm/PlatformIO.hpp>
#include <geopm/PlatformTopo.hpp>
int main (int argc, char** argv)
{
geopm::PlatformIO &pio = geopm::platform_io();
int time_idx, freq_idx;
double time_value, freq_value;
time_idx = pio.push_signal("TIME",
GEOPM_DOMAIN_BOARD,
0);
freq_idx = pio.push_signal("CPU_FREQUENCY_STATUS",
GEOPM_DOMAIN_PACKAGE,
0);
std::cout << "time,frequency" << std::endl;
for (int ii = 0; ii < 10; ii++) {
pio.read_batch();
time_value = pio.sample(time_idx);
freq_value = pio.sample(freq_idx);
std::cout << time_value << "," << freq_value << std::endl;
sleep(1);
}
return 0;
}
# Read multiple signals for ten seconds using batch read every second
import geopmdpy.topo as topo
import geopmdpy.pio as pio
import time
time_idx = pio.push_signal('TIME', topo.DOMAIN_BOARD, 0)
freq_idx = pio.push_signal('CPU_FREQUENCY_STATUS', topo.DOMAIN_PACKAGE, 0)
print("time,frequency")
for ii in range(10):
pio.read_batch()
print(f"{pio.sample(time_idx)},{pio.sample(freq_idx)}")
time.sleep(1)
Again, for more information on geopmsession
see geopmsession(1) โ Command line interface for the GEOPM service batch read features.
Video Demo: Using geopmsession
๏
โ๏ธ Enact Hardware-based Settings๏
We refer to any hardware setting that can be manipulated through the Service as
a control. Like signals, each control has a native domain. Any control can
be disaggregated from a coarse domain (e.g., board
) to its native domain.
See Understanding Disaggregation for more information.
The following examples make use of geopmwrite for the
command-line and the C,
C++, and Python
APIs for PlatformIO
to enact hardware controls in their respective
languages.
Listing All Available Controls๏
$ geopmwrite
Listing Control Information๏
$ geopmwrite -i CPU_FREQUENCY_MAX_CONTROL
CPU_FREQUENCY_MAX_CONTROL:
Target operating frequency of the CPU based on the control register.
# To include the aggregation type, use geopmread:
$ geopmread -i CPU_FREQUENCY_MAX_CONTROL
CPU_FREQUENCY_MAX_CONTROL:
description: Target operating frequency of the CPU based on the control register. Note: when querying at a higher domain, if NaN is returned, query at its native domain.
alias_for: MSR::PERF_CTL:FREQ
units: hertz
aggregation: expect_same
domain: core
iogroup: MSRIOGroup
Writing Controls๏
# Write the current CPU frequency for core 0 to 3.0 GHz
$ geopmwrite CPU_FREQUENCY_MAX_CONTROL core 0 3.0e9
// Write the current CPU frequency for core 0 to 3.0 GHz
#include <limits.h>
#include <stdio.h>
#include <geopm_topo.h>
#include <geopm_pio.h>
#include <geopm_error.h>
int main (int argc, char** argv)
{
char err_msg[GEOPM_MESSAGE_MAX];
int err = geopm_pio_write_control("CPU_FREQUENCY_MAX_CONTROL",
GEOPM_DOMAIN_CORE,
0,
3.0e9);
if (err != 0) {
geopm_error_message(err, err_msg, GEOPM_MESSAGE_MAX);
printf("Err msg = %s\n", err_msg);
}
return 0;
}
// Write the current CPU frequency for core 0 to 3.0 GHz
#include <iostream>
#include <geopm/PlatformIO.hpp>
#include <geopm/PlatformTopo.hpp>
int main (int argc, char** argv)
{
geopm::platform_io().write_control("CPU_FREQUENCY_MAX_CONTROL",
GEOPM_DOMAIN_CORE, 0,
3.0e9);
return 0;
}
# Write the current CPU frequency for core 0 to 3.0 GHz
import geopmdpy.topo as topo
import geopmdpy.pio as pio
pio.write_control('CPU_FREQUENCY_MAX_CONTROL', topo.DOMAIN_CORE, 0, 3.0e9)
Note
To determine the initial value of any control, use geopmread
or the
corresponding PlatformIO
APIs at the desired domain. E.g.:
$ geopmread CPU_FREQUENCY_MAX_CONTROL core 0
Understanding Disaggregation๏
Just as signals can be aggregated to a more coarse domain from their native
one, controls can be disaggregated from a coarse domain to their native domain.
This happens automatically with geopmwrite
and the corresponding APIs.
Using CPU_FREQUENCY_MAX_CONTROL
as an example, the output in Listing Control
Information shows the native domain as core
. To
write the same value to all the cores in a package, simply issue the request at
the package
domain, and the CPU_FREQUENCY_MAX_CONTROL
of all cores in
that package will be written. Likewise, to write the same value to all cores
in all packages, issue the request at the board
domain.
To understand the method of disaggregation for a specific control, you must examine its aggregation type.
For instance, CPU_FREQUENCY_MAX_CONTROL
has an aggregation type labeled
expect_same
. When setting this control at a domain level coarser than its
native domain, all native domains inherit the same value as the coarser domain.
This consistent distribution applies to all aggregation types, with the
exception of sum
; controls that use sum
aggregation will have the
requested value distributed evenly across the native domain. Taking
MSR::PKG_POWER_LIMIT:PL1_POWER_LIMIT
as an example, it has the following
information:
$ geopmread -i MSR::PKG_POWER_LIMIT:PL1_POWER_LIMIT
MSR::PKG_POWER_LIMIT:PL1_POWER_LIMIT:
description: The average power usage limit over the time window specified in PL1_TIME_WINDOW.
units: watts
aggregation: sum
domain: package
iogroup: MSRIOGroup
Since the package
domain is contained within the board
domain, writing this
control at the board
domain will evenly distribute the requested value over
all the packages in the system. This means that requesting a 200 W power limit
at the board
domain will result in each package
receiving a limit of
100 W.
Video Demo: Using geopmwrite
๏
๐ Measure Performance๏
The GEOPM Runtime offers capabilities for collecting telemetry throughout an applicationโs execution. If you want to measure a particular segment of an application, you can annotate the application code using GEOPM markup.
To integrate the Runtime with an application, you have two options:
geopmlaunch: Ideal for MPI-enabled applications. Simply launch the application using this method.
Manual Setup: This involves configuring the necessary environment settings and directly invoking geopmctl.
geopmlaunch
will bring up the Runtime alongside your application using one
of three launch methods: process
, pthread
, or application
. The
process
launch method will attempt to launch the main entity of the
Runtime, the Controller, as an extra rank in the MPI gang. The application
launch method (default when unspecified) will launch the Controller as a
separate application (useful for non-MPI applications). For more information,
see the --geopm-ctl
option description.
Using geopmlaunch
with MPI Applications๏
# Run with 1 OpenMP thread per rank, and 2 ranks
# SLURM example
$ OMP_NUM_THREADS=1 geopmlaunch srun -N 1 -n 2 --geopm-preload -- ./mpi_application
# PALS example
$ OMP_NUM_THREADS=1 geopmlaunch pals -ppn 2 -n 2 --geopm-preload -- ./mpi_application
When the run has concluded, there will be an output file from the Runtime
called geopm.report
in the current working directory. This report file
contains a summary of hardware telemetry over the course of the run.
Time-series data is also available through the use of the --geopm-trace
option to geopmlaunch
. For more information about geompmlaunch
see:
geopmlaunch(1) โ application launch wrapper. For more information about the reports, see:
geopm_report(7) โ GEOPM summary report file.
Profiling Applications without geopmlaunch
๏
The geopmlaunch(1)
command may not be best suited for your needs if you are
running a non-MPI application, or if you are running an MPI application but the
launch command is embedded in scripts that are difficult to modify. Instead of
using geopmlaunch(1)
, the user may use the geopmctl(1)
application in
conjunction with environment variables that control the GEOPM Runtime behavior.
In this simple example we run the sleep(1)
command for 10 seconds and
monitor the system during its execution. Rather than using the geopmlaunch
tool as in the above example, we will run the geopmctl
command in the
background while the application of interest is executing. The geopmctl
MPI
application should be launched with one process per compute node when executing
the runtime on multiple nodes. There are five requirements to enable the GEOPM
controller process to connect to the application process and generate a report:
Both the
geopmctl
process and the application process must have theGEOPM_PROFILE
environment variable set to the same value or both environments may leave this variable unset.The application process must have
LD_PRELOAD=libgeopm.so.2
set in the environment or the application binary must be linked directly tolibgeopm.so.2
at compile time.The
GEOPM_REPORT
environment variable must be set in the environment of thegeopmctl
process.The
GEOPM_PROGRAM_FILTER
environment variable is required and explicitly lists the program invocation names of any process to be profiled. All other programs will not be affected byLD_PRELOAD
oflibgeopm.so
. For this reason a user will typically set these two environment variables together. This is especially important when profiling programs within a bash script.The
GEOPM_NUM_PROCESS
variable must be set in thegeopmctl
environment if there is more than one process to be tracked on each compute node.
In addition to generating a report in YAML format, the example below showcases two optional features of the GEOPM Runtime:
CSV Trace File: By setting the
GEOPM_TRACE
environment variable, you can generate a trace file in CSV format.Sampling Period Adjustment: The
GEOPM_PERIOD
environment variable allows you to modify the controllerโs sampling period. For instance, setting it to 200 milliseconds, up from the default 5 milliseconds, results in approximately 50 rows of samples in the trace file (calculated as five samples per second over ten seconds).Disable Network Use The
GEOPM_CTL_LOCAL
environment variable may be set which disables all intra-node communication between the controllers on each node, thereby generating a unique report file per host node over which the application processes are launched.
These three options together will inform the GEOPM runtime controller
(geopmctl
) to profile the sleep
utility and generate a CSV trace file
with approximately 50 rows of samples (five per-second for ten seconds). In the
provided example, the awk
command extracts specific columns: time since
application start (column 1), CPU energy (column 6), and CPU power (column 8).
$ GEOPM_PROFILE=sleep-ten \
GEOPM_REPORT=sleep-ten.yaml \
GEOPM_CTL_LOCAL=true \
GEOPM_TRACE=sleep-ten.csv \
GEOPM_PERIOD=0.2 \
geopmctl &
$ GEOPM_PROFILE=sleep-ten \
GEOPM_PROGRAM_FILTER=sleep \
LD_PRELOAD=libgeopm.so.2 \
sleep 10
$ cat sleep-ten.yaml-$(hostname)
$ awk -F\| '{print $1, $6, $8}' sleep-ten.csv-$(hostname) | less
For the full listing of the environment variables accepted by the GEOPM runtime, please refer to the GEOPM Environment Variables section of the GEOPM documentation.
Profiling Specific Parts of an Application๏
The Runtime supports the automatic profiling of various application regions through several methods:
Annotation with GEOPM Profiling APIs
MPI Autodetection via PMPI
OpenMP Autodetection via OMPT
OpenCL Autodetection (WIP)
The GEOPM Profiling API enables users to annotate
specific sections of the target application for profiling. Each section that is
annotated will show up as a separate Region
in the report output files from
the runtime. An example app could be annotated as follows:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <mpi.h>
#include <geopm_prof.h>
#include <geopm_hint.h>
int main(int argc, char** argv)
{
MPI_init(&argc, &argv);
// Application setup...
// Create a GEOPM region ID for later tracking
uint64_t region_1, region_2;
geopm_prof_region("interesting_kernel",
GEOPM_REGION_HINT_COMPUTE,
®ion_1);
geopm_prof_region("synchronize_results",
GEOPM_REGION_HINT_NETWORK,
®ion_2);
//Begin execution loop...
for (int ii = 0; ii < iterations; ii++) {
// Marker to capture behavior of all regions
geopm_prof_epoch();
geopm_prof_enter(region_1);
call_interesting_kernel();
geopm_prof_exit(region_1);
geopm_prof_enter(region_2);
call_synchronize_results();
geopm_prof_exit(region_2);
}
MPI_Finalize();
return 0;
}
For more examples on how to profile applications, see the tutorials section of our GitHub repository.
โ๏ธ Advanced Topics๏
Breaking Down Signal/Control Names๏
Signal and control names in GEOPM are categorized into two types: low-level and high-level.
Low-Level Names: These are prefixed with the IOGroup name followed by two colons. For instance,
MSR::PERF_CTL:FREQ
is a low-level name.High-Level Names (Aliases): These are user-friendly alternatives to commonly used or multi-IOGroup-supported names. For example:
Alias
CPU_FREQUENCY_STATUS
corresponds toMSR::PERF_STATUS_FREQ
.Alias
CPU_FREQUENCY_MAX_CONTROL
is linked toMSR::PERF_CTL_FREQ
.
When using geopmread
or geopmwrite
to display available signals and
controls, aliases are presented first. These command-line tools also help
decipher what each alias represents. For instance:
$ geopmread -i CPU_FREQUENCY_STATUS
CPU_FREQUENCY_STATUS:
description: The current operating frequency of the CPU.
iogroup: MSR
alias_for: MSR::PERF_STATUS:FREQ
For more information about the currently supported aliases and IOGroups, see: Aliasing Signals And Controls.
Using the Programmable Counters๏
The programmable counters available on various CPUs can be read with
geopmread
from the command-line and through the use of the
InitControl
API using the Runtime.
First, determine the event code for your desired performance metric. E.g. for
Skylake Server, the event names and corresponding codes are listed here. The following example
programs the counter to track LONGEST_LAT_CACHE.MISS
on CPU 0:
$ export EVENTCODE=0x2E
$ export UMASK=0x41
# Configure which event to monitor, and under which scope
$ geopmwrite MSR::IA32_PERFEVTSEL0:EVENT_SELECT cpu 0 ${EVENTCODE}
$ geopmwrite MSR::IA32_PERFEVTSEL0:UMASK cpu 0 ${UMASK}
$ geopmwrite MSR::IA32_PERFEVTSEL0:USR cpu 0 1 # Enable user scope for events
$ geopmwrite MSR::IA32_PERFEVTSEL0:OS cpu 0 1 # Enable OS scope for events
# Turn on the counter
$ geopmwrite MSR::IA32_PERFEVTSEL0:EN cpu 0 1
$ geopmwrite MSR::PERF_GLOBAL_CTRL:EN_PMC0 cpu 0 1
# Read the counter. Repeat this read operation after a test scenario.
$ geopmread MSR::IA32_PMC0:PERFCTR cpu 0
To accomplish this with the Runtime, leverage the geopm-init-control feature along with the geopm-report-signals and/or geopm-trace-signals options to geopmlaunch
. First, create a file
in your current working directory with the following contents:
# LONGEST_LAT_CACHE.MISS: EVENT_CODE = 0x2E | UMASK = 0x41
MSR::IA32_PERFEVTSEL0:EVENT_SELECT package 0 0x2E
MSR::IA32_PERFEVTSEL0:UMASK package 0 0x41
MSR::IA32_PERFEVTSEL0:USR package 0 1
MSR::IA32_PERFEVTSEL0:OS package 0 1
MSR::IA32_PERFEVTSEL0:EN package 0 1
MSR::PERF_GLOBAL_CTRL:EN_PMC0 package 0 1
Name the file accordingly (e.g. enable_cache_misses
). This configuration
will program and enable all the counters on all of the CPUs on the first
package.Use the file, with geopmlaunch
and add the desired counter to the
reports and/or traces:
$ OMP_NUM_THREADS=1 geopmlaunch srun -N 1 -n 2 --geopm-preload \
--geopm-init-control=enable_cache_misses \
--geopm-report-signals=MSR::IA32_PMC0:PERFCTR@package \
-- ./mpi_application
As configured above, the report data associated with each region will include the counter data summarized per package.