From: Stephen Hemminger <stephen@networkplumber.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH v4 0/7] Add script for real-time telemetry monitoring
Date: Tue, 31 Mar 2026 11:10:25 -0700 [thread overview]
Message-ID: <20260331111025.2e0b46ce@phoenix.local> (raw)
In-Reply-To: <20260205150230.123076-1-bruce.richardson@intel.com>
On Thu, 5 Feb 2026 15:02:23 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:
> TL;DR
> ------
>
> For a quick demo, apply patches, run e.g. testpmd and then in a separate
> terminal run:
>
> ./usertools/dpdk-telemetry-watcher.py -d1T eth.tx
>
> Output, updated once per second, will be traffic rate per port e.g.:
>
> Connected to application: "dpdk-testpmd"
> Time /ethdev/stats,0.opackets /ethdev/stats,1.opackets Total
> 16:29:12 5,213,119 5,214,304 10,427,423
>
>
> Fuller details
> --------------
>
> While we have the dpdk-telemetry.py CLI app for interactive querying of
> telemetry on the commandline, and a telemetry exporter script for
> sending telemetry to external tools for real-time monitoring, we don't
> have an app that can print real-time stats for DPDK apps on the
> terminal. This patchset adds such a script, developed with the help of
> Github copilot to fill a need that I found in my testing. Submitting it
> here in the hopes that others find it of use.
>
> The script acts as a wrapper around the existing dpdk-telemetry.py
> script, and pipes the commands to that script and reads the responses,
> querying it once per second. It takes a number of flag parameters, such
> as the ones above:
> - "-d" for delta values, i.e. PPS rather than total packets
> - "-1" for single-line output, i.e. no scrolling up the screen
> - "-T" to display a total column
>
> Other flag parameters can be seen by looking at the help output.
>
> Beyond the flags, the script also takes a number of positional
> parameters, which refer to specific stats to display. These stats must
> be numeric values, and should take the form of the telemetry command to
> send, followed by a "." and the stat within the result which is to be
> tracked. As above, a stat would be e.g. "/ethdev/stats,0.opackets",
> where we send "/ethdev/stats,0" to telemetry and extract the "opackets"
> part of the result.
>
> However, specifying individual stats can be awkward, so some shortcuts
> are provided too for the common case of monitoring ethernet ports. Any
> positional arg starting with "eth" will be replaced by the set of
> equivalent values for each port, e.g. "eth.imissed" will track the
> imissed value on all ports in use in the app. The ipackets and opackets
> values, as common metrics, are also available as shortened values as
> just "rx" and "tx", so in the example above, "eth.tx" means to track the
> opackets stat for every ethdev port.
>
> Finally, the script also has reconnection support so you can leave it
> running while you start and stop your application in another terminal.
> The watcher will try and reconnect to a running instance every second.
>
> v4:
> - Updated docs following AI review
> - Converted one missed f-string to regular string
>
> v3:
> Updated following AI review
> - removed unnecessary f-string
> - added documnentation in guides/tools
> - added release note entry
>
> v2:
> - improve reconnection handling, eliminating some crashes seen in testing.
>
> Bruce Richardson (7):
> usertools: add new script to monitor telemetry on terminal
> usertools/telemetry-watcher: add displaying stats
> usertools/telemetry-watcher: add delta and timeout opts
> usertools/telemetry-watcher: add total and one-line opts
> usertools/telemetry-watcher: add thousands separator
> usertools/telemetry-watcher: add eth name shortcuts
> usertools/telemetry-watcher: support reconnection
>
> doc/guides/rel_notes/release_26_03.rst | 7 +
> doc/guides/tools/index.rst | 1 +
> doc/guides/tools/telemetrywatcher.rst | 184 +++++++++++
> usertools/dpdk-telemetry-watcher.py | 435 +++++++++++++++++++++++++
> usertools/meson.build | 1 +
> 5 files changed, 628 insertions(+)
> create mode 100644 doc/guides/tools/telemetrywatcher.rst
> create mode 100755 usertools/dpdk-telemetry-watcher.py
>
> --
> 2.51.0
>
This didn't get merged so will need to be rebased.
You may want to address these AI review comments.
Review of [PATCH v4 1-7/7] usertools: dpdk-telemetry-watcher
============================================================
Nice tool — having a continuous monitoring wrapper around
dpdk-telemetry.py is a practical addition. Patches 1-6 are
clean and well-structured. Patch 7 (reconnection support)
has several correctness issues described below.
Patch 7/7: usertools/telemetry-watcher: support reconnection
------------------------------------------------------------
Error: monitor_stats `continue` on failed query causes
IndexError on next delta iteration.
When `query_telemetry` returns (process, None), the code
does `continue`, skipping `current_values.append(...)`.
At the end of the loop body, `prev_values = current_values`
stores a shorter list. On the next iteration,
`prev_values[i]` raises IndexError for the missing indices.
Suggested fix: when data is None, append prev_values[i]
(or 0) as the current_value so the list length is preserved:
process, data = query_telemetry(process, command)
if not data:
current_values.append(prev_values[i] if i < len(prev_values) else 0)
row += "N/A".rjust(25)
continue
Error: BrokenPipeError not handled in query_telemetry.
When the DPDK application dies, the subprocess's stdin pipe
breaks. The initial `process.stdin.write()` / `.flush()`
before the reconnection loop will raise BrokenPipeError
instead of returning an empty readline(). The reconnection
logic never triggers.
Suggested fix: wrap the write+flush+readline in a
try/except (BrokenPipeError, OSError) and treat it the
same as an empty response — fall into the reconnection
loop.
Warning: old subprocess not cleaned up on reconnection.
In query_telemetry, when readline() returns empty and
reconnection begins, the old process object is replaced
without calling process.terminate() or process.wait().
The dead subprocess accumulates as a zombie. Similarly,
create_telemetry_process now calls print_connected_app
which can fail and return None, leaking the just-created
Popen object.
Suggested fix: add a small helper to clean up a process
(terminate, close pipes, wait), and call it before setting
process = None in the reconnection path. In
create_telemetry_process, if print_connected_app fails,
terminate the process before returning None.
Warning: expand_shortcuts and validate_stats lose the
reconnected process handle.
Both functions update their local `process` variable via
query_telemetry's return value, but neither returns the
(possibly new) process to the caller. If a reconnection
happens during shortcut expansion or validation, the
caller in monitor_stats still holds the old dead process
object.
Suggested fix: have expand_shortcuts and validate_stats
return the process alongside their current return values,
or restructure so monitor_stats passes process by
reference (e.g., as a mutable container).
Info: recursive call between create_telemetry_process and
print_connected_app.
create_telemetry_process calls print_connected_app, which
calls query_telemetry, which on disconnect calls
create_telemetry_process again. This indirect recursion
works in practice (Python has a high default recursion
limit and the retry loop in query_telemetry breaks the
chain), but it is fragile and hard to follow. Consider
separating the "connect" step from the "verify connection"
step to avoid the recursive dependency.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
next prev parent reply other threads:[~2026-03-31 18:10 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 16:55 [RFC PATCH 0/7] Add script for real-time telemetry monitoring Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2025-12-11 1:09 ` [RFC PATCH 0/7] Add script for real-time telemetry monitoring Stephen Hemminger
2025-12-11 9:10 ` Bruce Richardson
2025-12-12 5:32 ` Stephen Hemminger
2025-12-12 17:52 ` Bruce Richardson
2025-12-12 23:23 ` Stephen Hemminger
2026-01-05 17:55 ` [PATCH v2 " Bruce Richardson
2026-01-05 17:55 ` [PATCH v2 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-01-05 17:56 ` [PATCH v2 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-01-05 17:56 ` [PATCH v2 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-01-05 17:56 ` [PATCH v2 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-01-05 17:56 ` [PATCH v2 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-01-05 17:56 ` [PATCH v2 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-01-05 17:56 ` [PATCH v2 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2026-01-14 2:00 ` [PATCH v2 0/7] Add script for real-time telemetry monitoring Stephen Hemminger
2026-01-15 19:03 ` [PATCH v3 " Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-01-15 19:03 ` [PATCH v3 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2026-01-16 6:53 ` [PATCH v3 0/7] Add script for real-time telemetry monitoring Stephen Hemminger
2026-01-16 6:57 ` Stephen Hemminger
2026-02-05 15:02 ` [PATCH v4 " Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-02-05 15:02 ` [PATCH v4 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2026-03-31 18:10 ` Stephen Hemminger [this message]
2026-05-05 10:08 ` [PATCH v5 0/7] Add script for real-time telemetry monitoring Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-05-05 10:08 ` [PATCH v5 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260331111025.2e0b46ce@phoenix.local \
--to=stephen@networkplumber.org \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.