public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH v4 0/7] Add script for real-time telemetry monitoring
Date: Tue, 31 Mar 2026 11:10:25 -0700	[thread overview]
Message-ID: <20260331111025.2e0b46ce@phoenix.local> (raw)
In-Reply-To: <20260205150230.123076-1-bruce.richardson@intel.com>

On Thu,  5 Feb 2026 15:02:23 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> TL;DR
> ------
> 
> For a  quick demo, apply patches, run e.g. testpmd and then in a separate
> terminal run:
> 
>   ./usertools/dpdk-telemetry-watcher.py -d1T eth.tx
> 
> Output, updated once per second, will be traffic rate per port e.g.:
> 
> Connected to application: "dpdk-testpmd"
> Time       /ethdev/stats,0.opackets /ethdev/stats,1.opackets        Total
> 16:29:12                  5,213,119                5,214,304   10,427,423
> 
> 
> Fuller details
> --------------
> 
> While we have the dpdk-telemetry.py CLI app for interactive querying of
> telemetry on the commandline, and a telemetry exporter script for
> sending telemetry to external tools for real-time monitoring, we don't
> have an app that can print real-time stats for DPDK apps on the
> terminal. This patchset adds such a script, developed with the help of
> Github copilot to fill a need that I found in my testing. Submitting it
> here in the hopes that others find it of use.
> 
> The script acts as a wrapper around the existing dpdk-telemetry.py
> script, and pipes the commands to that script and reads the responses,
> querying it once per second. It takes a number of flag parameters, such
> as the ones above:
>  - "-d" for delta values, i.e. PPS rather than total packets
>  - "-1" for single-line output, i.e. no scrolling up the screen
>  - "-T" to display a total column
> 
> Other flag parameters can be seen by looking at the help output.
> 
> Beyond the flags, the script also takes a number of positional
> parameters, which refer to specific stats to display. These stats must
> be numeric values, and should take the form of the telemetry command to
> send, followed by a "." and the stat within the result which is to be
> tracked. As above, a stat would be e.g. "/ethdev/stats,0.opackets",
> where we send "/ethdev/stats,0" to telemetry and extract the "opackets"
> part of the result.
> 
> However, specifying individual stats can be awkward, so some shortcuts
> are provided too for the common case of monitoring ethernet ports. Any
> positional arg starting with "eth" will be replaced by the set of
> equivalent values for each port, e.g. "eth.imissed" will track the
> imissed value on all ports in use in the app. The ipackets and opackets
> values, as common metrics, are also available as shortened values as
> just "rx" and "tx", so in the example above, "eth.tx" means to track the
> opackets stat for every ethdev port.
> 
> Finally, the script also has reconnection support so you can leave it
> running while you start and stop your application in another terminal.
> The watcher will try and reconnect to a running instance every second.
> 
> v4:
> - Updated docs following AI review
> - Converted one missed f-string to regular string
> 
> v3:
> Updated following AI review
> - removed unnecessary f-string
> - added documnentation in guides/tools
> - added release note entry
> 
> v2:
> - improve reconnection handling, eliminating some crashes seen in testing.
> 
> Bruce Richardson (7):
>   usertools: add new script to monitor telemetry on terminal
>   usertools/telemetry-watcher: add displaying stats
>   usertools/telemetry-watcher: add delta and timeout opts
>   usertools/telemetry-watcher: add total and one-line opts
>   usertools/telemetry-watcher: add thousands separator
>   usertools/telemetry-watcher: add eth name shortcuts
>   usertools/telemetry-watcher: support reconnection
> 
>  doc/guides/rel_notes/release_26_03.rst |   7 +
>  doc/guides/tools/index.rst             |   1 +
>  doc/guides/tools/telemetrywatcher.rst  | 184 +++++++++++
>  usertools/dpdk-telemetry-watcher.py    | 435 +++++++++++++++++++++++++
>  usertools/meson.build                  |   1 +
>  5 files changed, 628 insertions(+)
>  create mode 100644 doc/guides/tools/telemetrywatcher.rst
>  create mode 100755 usertools/dpdk-telemetry-watcher.py
> 
> --
> 2.51.0
> 

This didn't get merged so will need to be rebased.
You may want to address these AI review comments.

Review of [PATCH v4 1-7/7] usertools: dpdk-telemetry-watcher
============================================================

Nice tool — having a continuous monitoring wrapper around
dpdk-telemetry.py is a practical addition. Patches 1-6 are
clean and well-structured. Patch 7 (reconnection support)
has several correctness issues described below.


Patch 7/7: usertools/telemetry-watcher: support reconnection
------------------------------------------------------------

Error: monitor_stats `continue` on failed query causes
  IndexError on next delta iteration.

  When `query_telemetry` returns (process, None), the code
  does `continue`, skipping `current_values.append(...)`.
  At the end of the loop body, `prev_values = current_values`
  stores a shorter list. On the next iteration,
  `prev_values[i]` raises IndexError for the missing indices.

  Suggested fix: when data is None, append prev_values[i]
  (or 0) as the current_value so the list length is preserved:

    process, data = query_telemetry(process, command)
    if not data:
        current_values.append(prev_values[i] if i < len(prev_values) else 0)
        row += "N/A".rjust(25)
        continue


Error: BrokenPipeError not handled in query_telemetry.

  When the DPDK application dies, the subprocess's stdin pipe
  breaks. The initial `process.stdin.write()` / `.flush()`
  before the reconnection loop will raise BrokenPipeError
  instead of returning an empty readline(). The reconnection
  logic never triggers.

  Suggested fix: wrap the write+flush+readline in a
  try/except (BrokenPipeError, OSError) and treat it the
  same as an empty response — fall into the reconnection
  loop.


Warning: old subprocess not cleaned up on reconnection.

  In query_telemetry, when readline() returns empty and
  reconnection begins, the old process object is replaced
  without calling process.terminate() or process.wait().
  The dead subprocess accumulates as a zombie. Similarly,
  create_telemetry_process now calls print_connected_app
  which can fail and return None, leaking the just-created
  Popen object.

  Suggested fix: add a small helper to clean up a process
  (terminate, close pipes, wait), and call it before setting
  process = None in the reconnection path. In
  create_telemetry_process, if print_connected_app fails,
  terminate the process before returning None.


Warning: expand_shortcuts and validate_stats lose the
  reconnected process handle.

  Both functions update their local `process` variable via
  query_telemetry's return value, but neither returns the
  (possibly new) process to the caller. If a reconnection
  happens during shortcut expansion or validation, the
  caller in monitor_stats still holds the old dead process
  object.

  Suggested fix: have expand_shortcuts and validate_stats
  return the process alongside their current return values,
  or restructure so monitor_stats passes process by
  reference (e.g., as a mutable container).


Info: recursive call between create_telemetry_process and
  print_connected_app.

  create_telemetry_process calls print_connected_app, which
  calls query_telemetry, which on disconnect calls
  create_telemetry_process again. This indirect recursion
  works in practice (Python has a high default recursion
  limit and the retry loop in query_telemetry breaks the
  chain), but it is fragile and hard to follow. Consider
  separating the "connect" step from the "verify connection"
  step to avoid the recursive dependency.


Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

      parent reply	other threads:[~2026-03-31 18:10 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10 16:55 [RFC PATCH 0/7] Add script for real-time telemetry monitoring Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2025-12-10 16:55 ` [RFC PATCH 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2025-12-11  1:09 ` [RFC PATCH 0/7] Add script for real-time telemetry monitoring Stephen Hemminger
2025-12-11  9:10   ` Bruce Richardson
2025-12-12  5:32 ` Stephen Hemminger
2025-12-12 17:52   ` Bruce Richardson
2025-12-12 23:23     ` Stephen Hemminger
2026-01-05 17:55 ` [PATCH v2 " Bruce Richardson
2026-01-05 17:55   ` [PATCH v2 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-01-05 17:56   ` [PATCH v2 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-01-05 17:56   ` [PATCH v2 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-01-05 17:56   ` [PATCH v2 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-01-05 17:56   ` [PATCH v2 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-01-05 17:56   ` [PATCH v2 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-01-05 17:56   ` [PATCH v2 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2026-01-14  2:00   ` [PATCH v2 0/7] Add script for real-time telemetry monitoring Stephen Hemminger
2026-01-15 19:03 ` [PATCH v3 " Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-01-15 19:03   ` [PATCH v3 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2026-01-16  6:53   ` [PATCH v3 0/7] Add script for real-time telemetry monitoring Stephen Hemminger
2026-01-16  6:57   ` Stephen Hemminger
2026-02-05 15:02 ` [PATCH v4 " Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 1/7] usertools: add new script to monitor telemetry on terminal Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 2/7] usertools/telemetry-watcher: add displaying stats Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 3/7] usertools/telemetry-watcher: add delta and timeout opts Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 4/7] usertools/telemetry-watcher: add total and one-line opts Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 5/7] usertools/telemetry-watcher: add thousands separator Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 6/7] usertools/telemetry-watcher: add eth name shortcuts Bruce Richardson
2026-02-05 15:02   ` [PATCH v4 7/7] usertools/telemetry-watcher: support reconnection Bruce Richardson
2026-03-31 18:10   ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260331111025.2e0b46ce@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox