geopmsession(1) – sample platform information over time
Synopsis
usage: geopmsession [-h] [-v] [-t TIME] [-p PERIOD] [--pid PID]
[--print-header | -n] [-d DELIMITER]
[-r REPORT_OUT] [-o TRACE_OUT] [-f REPORT_FORMAT]
[-s REPORT_SAMPLES] [-i CONFIG_PATH] [-a]
[--daemon DAEMON_PID_FILE | --enable-mpi]
[-- LAUNCH ...]
Read a signal
echo "SIGNAL_NAME DOMAIN DOMAIN_IDX" | geopmsession
Read a signal at a specific period for a specific timeout
geopmsession -p PERIOD_IN_SECONDS -t TIMEOUT_IN_SECONDS
geopmsession --period PERIOD_IN_SECONDS --time TIMEOUT_IN_SECONDS
Read a set of signals
echo -e 'TIME board 0\nCPU_FREQUENCY_STATUS package 0' | geopmsession -n
Get Help or Version
geopmsession -h
geopmsession --help
Description
Command line interface for the GEOPM service batch read features. The input to
the command line tool has one request per line. A request for reading is made up
of three strings separated by white space. The first string is the signal name,
the second string is the domain name, and the third string is the domain index.
Provide the “*” character as the second string to request the native domain
for the signal. Provide the “*” character as the third string to request
all domains available on the system.
A descriptive header is written first, unless the -n option is specified, in
which case the header omitted. The output from reading values is printed
subsequently in CSV format. By default, only one line of CSV will be generated.
Use -p to create a CSV with multiple rows providing a time series of
measurements.
Options
- -h, --help
Print help message and exit.
- -v, --version
Print version and exit.
- -t TIME, --time TIME
Total run time of the session to be opened in seconds.
- -p PERIOD, --period PERIOD
When used with a read mode session reads all values out periodically with the specified period in seconds. Default: 0.1 second.
- --pid PID
Stop the session when the given process PID ends.
- --print-header
Deprecated. Now this option is the default, see –no-header.
- -n, --no-header
Do not print the CSV header before printing any sampled values.
- -d, --delimiter
Delimiter used to separate values in CSV output. Default: “,”.
- -r, --report-out
Output summary statistics into a yaml file. Note if
--report-out=-is specified, the report will output to stdout. When used with the--enable-mpioption, reports from all hosts will be combined using the---document separator, and the output is written (stdout or to file) solely by the MPI process “rank 0”.- -o, --trace-out
Output trace data into a CSV file. Note if
--trace_out=-is specified, the trace will output to stdout which is also the default behavior. To avoid gathering trace data, set this parameter to/dev/null. When used with the--enable-mpioption, trace file names will be appended with the hostname combined with the-separator. It is not possible to write the trace output to stdout when specifying--enable-mpi, this will result in an error.- -f, --report-format
Generate reports in the specified format, either “csv” or “yaml”. Default: “yaml”.
- -s, --report-samples
Create reports each time the specified number of periods have elapsed. When in YAML format, the reports are YAML documents separated with the document separator string:
"---". When in CSV format, each report is one line of the CSV output.- -i, --signal-config
Input file containing GEOPM signal requests, specify “-” to use standard input which is also the default.
- --daemon
Run geopmsession as a background daemon. The daemon PID is written to the specified file after startup and the invoking command returns immediately once the session is ready. A signal configuration file must be provided (standard input is not supported when using this option). Using
--enable-mpiis not allowed when specifying this option, consider using--append-hostnameinstead.- -a, --append-hostname
Append the local hostname to report and trace file paths when they are regular files. This keeps per-host outputs unique on shared filesystems.
- --enable-mpi
Gather reports over MPI and write to a single file. Append hostname to trace output file if specified (trace output to stdout not permitted). Requires mpi4py module.
Launch Option
The geopmsession tool may be used to launch and monitor a subprocess, terminating
the session when the process exits. To use this feature, provide a command after
a double dash (--). For example:
$ echo TIME board 0 | geopmsession -p 1 -- sleep 5
"TIME"
0.001785301
1.008901696
2.008939004
3.009074361
4.009136714
5.009231953
6.009308985
This will launch sleep 5 as a subprocess and monitor the TIME signal until
the process exits. Using the --pid and the launch option at the same
time is forbidden.
If the geopmsession process receives a SIGTERM and SIGINT or fails dues to an unmanaged exception, the signal is forwarded to the subprocess and all of its children followed by SIGKILL after 1 second. If the geopmsession command fails due to an exception then the first signal sent is SIGINT.
If using --enable-mpi with the launch option, note that each MPI rank will
launch its own subprocess, which may not be the intended behavior for MPI
applications.
Agent Support
The geopmsession tool supports Python agent plugins that can customize session
behavior. An agent is a Python class derived from the Agent base class in
geopmdpy.session. Agents can add custom command-line arguments, override the
default signal configuration, and provide additional trace columns. The agent
can also implement custom logic for each sampling period including the ability
to modify control knobs dynamically. The agent implementation uses the main()
function from the geopmdpy.session module as the entry point.
Examples
Some examples of how to use the geopmsession command line tool are
provided.
Reading a signal
The input to the command line tool has one request per line. A
request for reading is made of up three strings separated by white
space. The first string is the signal name, the second string is the
domain name, and the third string is the domain index. An asterisk *
in place of the domain name will evaluate to the native domain of the signal.
An asterisk * in place of the domain index will result in the signal being
read for all available domain indices on the system for the specified domain type.
An example where the entire THERM_STATUS model specific register is read from
core zero:
$ echo "MSR::THERM_STATUS# core 0" | geopmsession -n
0x0000000088430800
This will execute one read of the signal.
A couple of examples reading CPU_POWER using *:
$ echo "CPU_POWER * 1" | geopmsession -n
62.95697469659621
$ echo "CPU_POWER * *" | geopmsession -n
69.16105633883998,74.9419453995438
Reading a signal periodically
Both a polling period and timeout must be specified. The polling period must be shorter than the timeout specified.
A 100ms polling period with a 300ms timeout is shown below:
$ echo 'MSR::THERM_STATUS# core 0' | geopmsession -p 0.1 -t 0.3 -n
0x0000000088350800
0x0000000088350800
0x0000000088350800
0x0000000088360800
Reading a set of signals
Multiple signals may be specified by separating them with a newline.
$ printf 'TIME board 0\nCPU_FREQUENCY_STATUS package *\nCPU_ENERGY package *\n' > session.config
$ geopmsession -i config.txt
"TIME","CPU_FREQUENCY_STATUS-package-0","CPU_FREQUENCY_STATUS-package-1","CPU_ENERGY-package-0","CPU_ENERGY-package-1"
0.658525605,1000000000,1105000000,51985.06903076172,216490.1282958984
Signals may be specified in a separate file using the -i option.
Reading a set of signals and getting summary statistics
Summary statistics may be output to stdout by setting --report-out=-.
Otherwise, the statistics will be output to the specified file path. If
unspecified, no statistics will be gathered.
The resulting report will be in yaml by default. To output as a csv, use the
-f csv option. Hostname and sample information will be output at the top.
Summary statistics (count/first/last/min/max/mean/std) will be output
for each of the specified signals at the specified domains/domain indices.
$ printf 'TIME board 0\nCPU_POWER board 0\nCPU_FREQUENCY_STATUS board 0\n' |\
geopmsession -t 10 -p 0.005 --report-out=- --trace-out=/dev/null
An example yaml report is shown below:
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:08:53.160991796-0700"
sample-time-total: 10.0013
sample-count: 2001
sample-period-mean: 0.00500067
sample-period-std: 0.000494084
metrics:
TIME:
count: 2001
first: 1.13225
last: 11.1336
min: 1.13225
max: 11.1336
mean: 6.13339
std: 2.88899
CPU_POWER:
count: 2001
first: 81.3372
last: 114.556
min: 81.3372
max: 145.638
mean: 117.871
std: 7.55413
CPU_FREQUENCY_STATUS:
count: 2001
first: 1.0775e+09
last: 1.0375e+09
min: 1e+09
max: 1.3325e+09
mean: 1.07005e+09
std: 3.80748e+07
The same report rendered into csv format:
"host","sample-time-first","sample-time-total","sample-count","sample-period-mean","sample-period-std","CPU_FREQUENCY_STATUS-count","CPU_FREQUENCY_STATUS-first","CPU_FREQUENCY_STATUS-last","CPU_FREQUENCY_STATUS-min","CPU_FREQUENCY_STATUS-max","CPU_FREQUENCY_STATUS-mean","CPU_FREQUENCY_STATUS-std","CPU_POWER-count","CPU_POWER-first","CPU_POWER-last","CPU_POWER-min","CPU_POWER-max","CPU_POWER-mean","CPU_POWER-std","TIME-count","TIME-first","TIME-last","TIME-min","TIME-max","TIME-mean","TIME-std"
"cluster-node-11","2025-05-16T09:09:16.559516035-0700",10.000596301,2001,0.0050002981505,0.0004777162996813195,2001,1085000000.0,1172500000.0,1000000000.0,1282500000.0,1071716016.9915042,38699360.87433021,2001,134.5205438626066,132.45502415239116,104.55274444442776,148.3458434284852,119.53508979088977,5.258855212642797,2001,1.574881395,11.575477696,1.574881395,11.575477696,6.576009187958503,2.888976620038315
Launching a process and monitoring signals
$ echo TIME board 0 | geopmsession -p 0.5 -- sleep 5
"TIME"
0.005026064
0.513985594
1.013916775
1.513919136
2.013917347
2.513900635
3.0139143
3.513901374
4.013907436
4.513900239
5.013895239
5.513876408
This launches sleep 5 and monitors the TIME signal until it detects that the
sleep command has exited.
Reading signals during a job execution
Signals can be read and summary statistics gathered during job execution using
the launch or --pid option. If both the --pid and -t options are used,
geopmsession will end when either the process ends or when the specified time
elapses, whichever is shorter. Below is an example gathering CPU_POWER
while running sleep.
$ echo "CPU_POWER package 0" | geopmsession -p 1 -- sleep 5
"CPU_POWER-package-0"
62.54857725485511
50.5686841290089
57.63407322142274
61.1508168939381
60.71903156269835
59.50978494274343
58.25752970206295
$ sleep 5 & apppid=$!; echo "CPU_POWER package 0" | geopmsession --pid $apppid -p 1
[1] 3100339
"CPU_POWER-package-0"
30.59618629083503
40.73598509986576
38.79754643472621
39.32981634681103
38.70242242812028
[1]+ Done sleep 5
An example gathering summary statistics while executing a job:
$ echo "CPU_POWER package 0" | geopmsession -p 1 -r - -- sleep 5
"CPU_POWER-package-0"
39.15323231457158
39.9597792259686
40.38713156571379
40.3391948981358
40.10173665761857
39.93951912781292
39.83074165703577
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:19:42.253776983-0700"
sample-time-total: 6.00905
sample-count: 7
sample-period-mean: 1.00151
sample-period-std: 0.00372384
metrics:
CPU_POWER-package-0:
count: 7
first: 39.1532
last: 39.8307
min: 39.1532
max: 40.3871
mean: 39.9588
std: 0.411159
Note that the samples are output followed by summary statistics. To output the
sample trace to a file, use -o [filename]. To output the summary statistics
report to a file, use -r [filename]. To suppress the trace, set the output
parameter to -o /dev/null. Reports will not output if -r is not specified.
Using the -s [REPORT_SAMPLES] option will generate statistics after the
specified number of samples. In default yaml format, sets of statistics will
be separated by “—”. In csv format, each set of statistics will be output as
a row.
Example:
$ sleep 5 & apppid=$!; echo "CPU_POWER package 0" |\
geopmsession --pid $apppid -p 0.1 -r - -o /dev/null -s 10
In the yaml output below, note that each report is appended, separated by “—“.
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:21:12.854647716-0700"
sample-time-total: 1.0005
sample-count: 11
sample-period-mean: 0.10005
sample-period-std: 0.00017891
metrics:
CPU_POWER-package-0:
count: 11
first: 26.1087
last: 41.1297
min: 26.1087
max: 42.36
mean: 37.5301
std: 4.85142
---
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:21:13.955151931-0700"
sample-time-total: 0.899985
sample-count: 10
sample-period-mean: 0.0999983
sample-period-std: 7.34092e-05
metrics:
CPU_POWER-package-0:
count: 10
first: 39.0531
last: 39.4301
min: 35.197
max: 40.6032
mean: 38.1963
std: 2.09879
---
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:21:14.955165125-0700"
sample-time-total: 0.899977
sample-count: 10
sample-period-mean: 0.0999974
sample-period-std: 5.01953e-05
metrics:
CPU_POWER-package-0:
count: 10
first: 37.8657
last: 41.0475
min: 36.256
max: 44.8203
mean: 40.6383
std: 3.36432
---
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:21:15.955163708-0700"
sample-time-total: 0.600004
sample-count: 7
sample-period-mean: 0.100001
sample-period-std: 0.000118453
metrics:
CPU_POWER-package-0:
count: 7
first: 38.3654
last: 39.4388
min: 34.8214
max: 39.4388
mean: 36.8331
std: 1.82103
Sample csv output below shows each statistics sample output on a new row:
$ sleep 5 & apppid=$!; echo "CPU_POWER package 0" |\
geopmsession --pid $apppid -p 0.1 -r - -o /dev/null -s 10 -f csv
"host","sample-time-first","sample-time-total","sample-count","sample-period-mean","sample-period-std","CPU_POWER-package-0-count","CPU_POWER-package-0-first","CPU_POWER-package-0-last","CPU_POWER-package-0-min","CPU_POWER-package-0-max","CPU_POWER-package-0-mean","CPU_POWER-package-0-std"
"cluster-node-11","2025-05-16T09:23:33.620790309-0700",1.000574921,11,0.10005749209999999,0.0001964757920469247,11,39.44969148140212,41.9486178974159,34.84014539393113,41.9486178974159,36.49265394626341,2.465552738326359
"cluster-node-11","2025-05-16T09:23:34.721404331-0700",0.899903938,10,0.09998932644444444,3.917289581063831e-05,10,43.77618495751115,39.447376090215,35.07951019074365,44.37725410303373,39.60711256469672,3.6046116965994814
"cluster-node-11","2025-05-16T09:23:35.721373075-0700",0.8999950860000001,10,0.09999945400000002,0.00010700698475670486,10,40.55431660299817,39.64840048680215,34.56523283529788,40.81876798613059,38.042165265899925,2.384395315318058
"cluster-node-11","2025-05-16T09:23:36.721240770-0700",0.6000795879999998,7,0.10001326466666664,7.467743569208487e-05,7,40.97335095740563,34.912491389496914,34.912491389496914,41.4496234018482,38.75680603826903,2.716795748688028
Gathering Reports using MPI
The --enable-mpi command line option can be used to aggregate reports using
an MPI communicator. This can be helpful when running sessions on more than one
compute node in an MPI enabled environment. The user must install the optional
mpi4py package to use the --enable-mpi command line option . This can
be done using the OS package manager or PyPi. When running in this way the
geopmsession command line tool must be launched with an MPI launch wrapper
like mpiexec or mpirun. The user should run this command specifying one
geopmsession process per compute node. When using this option, trace output
to stdout is disabled. The aggregated report is created by the “rank 0” process
of the geopmsession MPI communicator.
$ printf "TIME board 0\nCPU_POWER board 0\nCPU_FREQUENCY_STATUS board 0" |\
srun -n 2 -N 2 geopmsession -t 10 -p 0.005 -r- -o /dev/null --enable-mpi
An example report is shown below:
host: "cluster-node-11"
sample-time-first: "2025-05-16T09:26:37.114177564-0700"
sample-time-total: 10.0011
sample-count: 2001
sample-period-mean: 0.00500056
sample-period-std: 0.000493951
metrics:
TIME:
count: 2001
first: 0.668767
last: 10.6699
min: 0.668767
max: 10.6699
mean: 5.66989
std: 2.88899
CPU_POWER:
count: 2001
first: 84.186
last: 117.827
min: 77.7429
max: 141.491
mean: 118.953
std: 7.03585
CPU_FREQUENCY_STATUS:
count: 2001
first: 1.04375e+09
last: 1.035e+09
min: 1e+09
max: 1.25625e+09
mean: 1.07255e+09
std: 3.91874e+07
---
host: "cluster-node-12"
sample-time-first: "2025-05-16T09:26:37.112549059-0700"
sample-time-total: 10.0012
sample-count: 2001
sample-period-mean: 0.00500062
sample-period-std: 0.000465038
metrics:
TIME:
count: 2001
first: 1.10684
last: 11.1081
min: 1.10684
max: 11.1081
mean: 6.10793
std: 2.88901
CPU_POWER:
count: 2001
first: 76.1151
last: 118.133
min: 76.1151
max: 142.527
mean: 119.894
std: 7.05624
CPU_FREQUENCY_STATUS:
count: 2001
first: 1.045e+09
last: 1.045e+09
min: 1e+09
max: 1.28e+09
mean: 1.07346e+09
std: 4.24776e+07
Writing a Custom Agent
The geopmsession command line tool supports Python agent plugins that can
customize session behavior. This example shows a simple agent that monitors the
CPU_POWER signal at high or low resolution.
#!/usr/bin/env python
# File: cpu_power_agent.py
from geopmdpy.session import main
from geopmdpy.session import Agent
class CPUPowerAgent(Agent):
"""Agent for monitoring CPU power.
The CPUPowerAgent provides a --hi-res option to read CPU power at the
finest granularity available. This allows users to measure CPU power from
all domains and indices. By default, CPU power is sampled at the board
domain.
Command-line options:
--hi-res Measure at finest granularity (all domains/indices).
Example:
python3 cpu_power_agent.py --hi-res -- sleep 5
"""
def __init__(self):
self._hi_res = False
def help(self):
return 'Measure CPU_POWER as default signal configuration'
def update_parser(self, parser):
parser.add_argument('--hi-res', action='store_true',
help='Measure power at finest granularity (all domains/indices)')
return parser
def update_args(self, args):
self._hi_res = args.hi_res
return args
def signal_config_override(self):
if self._hi_res:
return "CPU_POWER * *"
else:
return "CPU_POWER board 0"
if __name__ == '__main__':
main(CPUPowerAgent())