* [RFC 0/4] alternative capture mechanism
@ 2026-06-09 21:02 Stephen Hemminger
2026-06-09 21:02 ` [RFC 1/4] telemetry: allow commands to receive file descriptors Stephen Hemminger
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-06-09 21:02 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
This is an RFC for an alternative way to capture packets from a DPDK
application. I did brief demo of similar mechanism at DPDK summit but
this is more complete. Capture runs in the primary process and is driven
entirely over telemetry; no secondary process is involved.
A client asks the application to start capturing and passes it a file
descriptor to write to. The application writes pcapng to that descriptor.
A Wireshark extcap script is the intended front end, but the control path
is just telemetry and the output is just a pipe, so other front ends are
possible.
1/4 telemetry: let a command receive file descriptors from the client
2/4 capture: the library
3/4 test: functional test
4/4 app: the Wireshark extcap script and its documentation
Setup and usage are in doc/guides/tools/wireshark_extcap.rst.
Primary process only for now; secondary-process capture is possible as
follow-on. Posting as RFC to get feedback on the approach.
The extcap script is dual licensed (BSD-3-Clause OR GPL-2.0-or-later) as
it may be more useful in the Wireshark tree.
Stephen Hemminger (4):
telemetry: allow commands to receive file descriptors
capture: infrastructure wireshark packet capture
test: add test for capture hooks
usertools/dpdk-wireshark-extcap.py: script for external capture
MAINTAINERS | 4 +
app/test/meson.build | 1 +
app/test/test_capture.c | 365 +++++++++++
doc/guides/rel_notes/release_26_07.rst | 12 +
doc/guides/tools/index.rst | 1 +
doc/guides/tools/wireshark_extcap.rst | 155 +++++
lib/capture/capture.c | 821 +++++++++++++++++++++++++
lib/capture/capture_impl.h | 56 ++
lib/capture/filter.c | 108 ++++
lib/capture/meson.build | 19 +
lib/meson.build | 1 +
lib/telemetry/rte_telemetry.h | 66 ++
lib/telemetry/telemetry.c | 115 +++-
usertools/dpdk-wireshark-extcap.py | 274 +++++++++
14 files changed, 1986 insertions(+), 12 deletions(-)
create mode 100644 app/test/test_capture.c
create mode 100644 doc/guides/tools/wireshark_extcap.rst
create mode 100644 lib/capture/capture.c
create mode 100644 lib/capture/capture_impl.h
create mode 100644 lib/capture/filter.c
create mode 100644 lib/capture/meson.build
create mode 100755 usertools/dpdk-wireshark-extcap.py
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC 1/4] telemetry: allow commands to receive file descriptors
2026-06-09 21:02 [RFC 0/4] alternative capture mechanism Stephen Hemminger
@ 2026-06-09 21:02 ` Stephen Hemminger
2026-06-09 21:02 ` [RFC 2/4] capture: infrastructure wireshark packet capture Stephen Hemminger
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-06-09 21:02 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
Add rte_telemetry_register_cmd_fd_arg() to register a command whose
callback also receives file descriptors passed by the client as
SCM_RIGHTS ancillary data. The callback owns the descriptors and must
close them.
This lets a client open a file itself and hand the descriptor to the
primary process, so DPDK never opens the path. That avoids path and
permission problems and works across container filesystem namespaces.
Existing commands and clients are unaffected. If unsolicited file
descriptor is passed, it is closed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
doc/guides/rel_notes/release_26_07.rst | 5 ++
lib/telemetry/rte_telemetry.h | 66 ++++++++++++++
lib/telemetry/telemetry.c | 115 ++++++++++++++++++++++---
3 files changed, 174 insertions(+), 12 deletions(-)
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index b5285af5fe..d7a2df88c1 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -141,6 +141,11 @@ New Features
Added AGENTS.md file for AI review
and supporting scripts to review patches and documentation.
+* **Added telemetry support for passing file descriptors.**
+
+ Add experimental telemetry callback ``rte_telemetry_register_cmd_fd_arg()``
+ to allow command to receive file descriptors passed by client.
+
Removed Items
-------------
diff --git a/lib/telemetry/rte_telemetry.h b/lib/telemetry/rte_telemetry.h
index 0a58e518f7..3e32d2902b 100644
--- a/lib/telemetry/rte_telemetry.h
+++ b/lib/telemetry/rte_telemetry.h
@@ -325,6 +325,37 @@ typedef int (*telemetry_cb)(const char *cmd, const char *params,
typedef int (*telemetry_arg_cb)(const char *cmd, const char *params, void *arg,
struct rte_tel_data *info);
+/**
+ * This telemetry callback is used when registering a telemetry command with
+ * rte_telemetry_register_cmd_fd_arg().
+ *
+ * It behaves like telemetry_arg_cb, but additionally receives any file
+ * descriptors the client passed alongside the command as SCM_RIGHTS ancillary
+ * data. The callback takes ownership of these descriptors and is responsible
+ * for closing them.
+ *
+ * @param cmd
+ * The cmd that was requested by the client.
+ * @param params
+ * Contains data required by the callback function.
+ * @param arg
+ * The opaque value that was passed to rte_telemetry_register_cmd_fd_arg().
+ * @param fds
+ * Array of file descriptors received from the client. May be NULL when
+ * n_fds is zero.
+ * @param n_fds
+ * Number of file descriptors in the fds array.
+ * @param info
+ * The information to be returned to the caller.
+ *
+ * @return
+ * Length of buffer used on success.
+ * @return
+ * Negative integer on error.
+ */
+typedef int (*telemetry_fd_cb)(const char *cmd, const char *params, void *arg,
+ const int *fds, unsigned int n_fds, struct rte_tel_data *info);
+
/**
* Used when registering a command and callback function with telemetry.
*
@@ -368,6 +399,41 @@ __rte_experimental
int
rte_telemetry_register_cmd_arg(const char *cmd, telemetry_arg_cb fn, void *arg, const char *help);
+/**
+ * Register a command and a file-descriptor-aware callback with telemetry.
+ *
+ * The callback is invoked like rte_telemetry_register_cmd_arg(), but also
+ * receives any file descriptors the client passed alongside the command as
+ * SCM_RIGHTS ancillary data. This lets a client open a file (for example a
+ * capture output file) itself and hand the descriptor to the DPDK process,
+ * which never opens the path - avoiding path and permission concerns and
+ * working across container filesystem namespaces.
+ *
+ * Descriptors sent to a command registered with rte_telemetry_register_cmd()
+ * or rte_telemetry_register_cmd_arg() are rejected and the connection is
+ * closed.
+ *
+ * @param cmd
+ * The command to register with telemetry.
+ * @param fn
+ * Callback function to be called when the command is requested.
+ * @param arg
+ * An opaque value that will be passed to the callback function.
+ * @param help
+ * Help text for the command.
+ *
+ * @return
+ * 0 on success.
+ * @return
+ * -EINVAL for invalid parameters failure.
+ * @return
+ * -ENOMEM for mem allocation failure.
+ */
+__rte_experimental
+int
+rte_telemetry_register_cmd_fd_arg(const char *cmd, telemetry_fd_cb fn, void *arg,
+ const char *help);
+
/**
* @internal
* Free a container that has memory allocated.
diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index b109d076d4..30d3ae3a13 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -29,6 +29,8 @@
#define MAX_CMD_LEN 56
#define MAX_OUTPUT_LEN (1024 * 16)
#define MAX_CONNECTIONS 10
+/* Maximum number of file descriptors a client may pass with one command. */
+#define MAX_FDS 8
#ifndef RTE_EXEC_ENV_WINDOWS
static void *
@@ -39,6 +41,7 @@ struct cmd_callback {
char cmd[MAX_CMD_LEN];
telemetry_cb fn;
telemetry_arg_cb fn_arg;
+ telemetry_fd_cb fn_fd;
void *arg;
char help[RTE_TEL_MAX_STRING_LEN];
};
@@ -72,15 +75,15 @@ static RTE_ATOMIC(uint16_t) v2_clients;
#endif /* !RTE_EXEC_ENV_WINDOWS */
static int
-register_cmd(const char *cmd, const char *help,
- telemetry_cb fn, telemetry_arg_cb fn_arg, void *arg)
+register_cmd(const char *cmd, const char *help, telemetry_cb fn,
+ telemetry_arg_cb fn_arg, telemetry_fd_cb fn_fd, void *arg)
{
struct cmd_callback *new_callbacks;
const char *cmdp = cmd;
int i = 0;
- if (strlen(cmd) >= MAX_CMD_LEN || (fn == NULL && fn_arg == NULL) || cmd[0] != '/'
- || strlen(help) >= RTE_TEL_MAX_STRING_LEN)
+ if (strlen(cmd) >= MAX_CMD_LEN || (fn == NULL && fn_arg == NULL && fn_fd == NULL)
+ || cmd[0] != '/' || strlen(help) >= RTE_TEL_MAX_STRING_LEN)
return -EINVAL;
while (*cmdp != '\0') {
@@ -107,6 +110,7 @@ register_cmd(const char *cmd, const char *help,
strlcpy(callbacks[i].cmd, cmd, MAX_CMD_LEN);
callbacks[i].fn = fn;
callbacks[i].fn_arg = fn_arg;
+ callbacks[i].fn_fd = fn_fd;
callbacks[i].arg = arg;
strlcpy(callbacks[i].help, help, RTE_TEL_MAX_STRING_LEN);
num_callbacks++;
@@ -119,14 +123,22 @@ RTE_EXPORT_SYMBOL(rte_telemetry_register_cmd)
int
rte_telemetry_register_cmd(const char *cmd, telemetry_cb fn, const char *help)
{
- return register_cmd(cmd, help, fn, NULL, NULL);
+ return register_cmd(cmd, help, fn, NULL, NULL, NULL);
}
RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_telemetry_register_cmd_arg, 24.11)
int
rte_telemetry_register_cmd_arg(const char *cmd, telemetry_arg_cb fn, void *arg, const char *help)
{
- return register_cmd(cmd, help, NULL, fn, arg);
+ return register_cmd(cmd, help, NULL, fn, NULL, arg);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_telemetry_register_cmd_fd_arg, 26.07)
+int
+rte_telemetry_register_cmd_fd_arg(const char *cmd, telemetry_fd_cb fn, void *arg,
+ const char *help)
+{
+ return register_cmd(cmd, help, NULL, NULL, fn, arg);
}
#ifndef RTE_EXEC_ENV_WINDOWS
@@ -368,13 +380,70 @@ output_json(const char *cmd, const struct rte_tel_data *d, int s)
TMTY_LOG_LINE(ERR, "Error writing to socket: %s", strerror(errno));
}
+/*
+ * Receive a command and any file descriptors the client passed alongside it
+ * as SCM_RIGHTS ancillary data. The payload length is returned (0 if the
+ * client sent an empty message or closed the connection, negative on error).
+ * Descriptors that arrive are returned in fds[]/n_fds and are owned by the
+ * caller. MSG_CTRUNC means more descriptors were sent than the control buffer
+ * could hold; *ctrunc is set so the caller can reject the command, but the
+ * descriptors that did fit are still returned so they can be closed rather
+ * than leaked.
+ */
+static int
+recv_with_fds(int s, char *buf, size_t buf_len, int *fds, unsigned int *n_fds,
+ bool *ctrunc)
+{
+ char cmsgbuf[CMSG_SPACE(sizeof(int) * MAX_FDS)];
+ struct iovec iov = { .iov_base = buf, .iov_len = buf_len };
+ struct msghdr msg = {
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ .msg_control = cmsgbuf,
+ .msg_controllen = sizeof(cmsgbuf),
+ };
+ struct cmsghdr *cmsg;
+ int bytes;
+
+ *n_fds = 0;
+ *ctrunc = false;
+
+ bytes = recvmsg(s, &msg, 0);
+ if (bytes < 0)
+ return bytes;
+
+ if (msg.msg_flags & MSG_CTRUNC)
+ *ctrunc = true;
+
+ for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+ if (cmsg->cmsg_level != SOL_SOCKET || cmsg->cmsg_type != SCM_RIGHTS)
+ continue;
+ *n_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int);
+ memcpy(fds, CMSG_DATA(cmsg), *n_fds * sizeof(int));
+ break;
+ }
+ return bytes;
+}
+
static void
-perform_command(const struct cmd_callback *cb, const char *cmd, const char *param, int s)
+close_fds(const int *fds, unsigned int n_fds)
+{
+ unsigned int i;
+
+ for (i = 0; i < n_fds; i++)
+ close(fds[i]);
+}
+
+static void
+perform_command(const struct cmd_callback *cb, const char *cmd, const char *param,
+ const int *fds, unsigned int n_fds, int s)
{
struct rte_tel_data data = {0};
int ret;
- if (cb->fn_arg != NULL)
+ if (cb->fn_fd != NULL)
+ ret = cb->fn_fd(cmd, param, cb->arg, fds, n_fds, &data);
+ else if (cb->fn_arg != NULL)
ret = cb->fn_arg(cmd, param, cb->arg, &data);
else
ret = cb->fn(cmd, param, &data);
@@ -412,8 +481,11 @@ client_handler(void *sock_id)
}
/* receive data is not null terminated */
- int bytes = read(s, buffer, sizeof(buffer) - 1);
- while (bytes > 0) {
+ int fds[MAX_FDS];
+ unsigned int n_fds = 0;
+ bool ctrunc = false;
+ int bytes = recv_with_fds(s, buffer, sizeof(buffer) - 1, fds, &n_fds, &ctrunc);
+ while (bytes > 0 || (bytes == 0 && n_fds > 0)) {
buffer[bytes] = 0;
const char *cmd = strtok(buffer, ",");
const char *param = strtok(NULL, "\0");
@@ -429,9 +501,28 @@ client_handler(void *sock_id)
}
rte_spinlock_unlock(&callback_sl);
}
- perform_command(&cb, cmd, param, s);
- bytes = read(s, buffer, sizeof(buffer) - 1);
+ /*
+ * File descriptors go only to a command that registered to
+ * receive them. A command that did not, or a truncated control
+ * message, is a client error: close the descriptors and drop the
+ * connection rather than silently discarding them.
+ */
+ if (n_fds > 0 && (cb.fn_fd == NULL || ctrunc)) {
+ TMTY_LOG_LINE(ERR,
+ "Closing connection: %u file descriptor(s) passed to '%s'%s",
+ n_fds, cmd ? cmd : "(none)",
+ ctrunc ? " (truncated)" : " which does not accept them");
+ close_fds(fds, n_fds);
+ break;
+ }
+
+ /* an fd-aware callback takes ownership of the descriptors */
+ perform_command(&cb, cmd, param, fds, n_fds, s);
+
+ n_fds = 0;
+ ctrunc = false;
+ bytes = recv_with_fds(s, buffer, sizeof(buffer) - 1, fds, &n_fds, &ctrunc);
}
exit:
close(s);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC 2/4] capture: infrastructure wireshark packet capture
2026-06-09 21:02 [RFC 0/4] alternative capture mechanism Stephen Hemminger
2026-06-09 21:02 ` [RFC 1/4] telemetry: allow commands to receive file descriptors Stephen Hemminger
@ 2026-06-09 21:02 ` Stephen Hemminger
2026-06-09 21:02 ` [RFC 3/4] test: add test for capture hooks Stephen Hemminger
2026-06-09 21:02 ` [RFC 4/4] usertools/dpdk-wireshark-extcap.py: script for external capture Stephen Hemminger
3 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-06-09 21:02 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Thomas Monjalon, Reshma Pattan,
Anatoly Burakov
This provides a telemetry extension to provide packet capture.
It is intended to be used with a front end script to provide
external packet capture for wireshark.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
MAINTAINERS | 1 +
doc/guides/rel_notes/release_26_07.rst | 7 +
lib/capture/capture.c | 821 +++++++++++++++++++++++++
lib/capture/capture_impl.h | 56 ++
lib/capture/filter.c | 108 ++++
lib/capture/meson.build | 19 +
lib/meson.build | 1 +
7 files changed, 1013 insertions(+)
create mode 100644 lib/capture/capture.c
create mode 100644 lib/capture/capture_impl.h
create mode 100644 lib/capture/filter.c
create mode 100644 lib/capture/meson.build
diff --git a/MAINTAINERS b/MAINTAINERS
index 4a68a19b32..dd359d956e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1723,6 +1723,7 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
Packet capture
M: Reshma Pattan <reshma.pattan@intel.com>
M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/capture/
F: lib/pdump/
F: doc/guides/prog_guide/pdump_lib.rst
F: app/test/test_pdump.*
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index d7a2df88c1..309a6078bd 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -146,6 +146,13 @@ New Features
Add experimental telemetry callback ``rte_telemetry_register_cmd_fd_arg()``
to allow command to receive file descriptors passed by client.
+* **Added packet capture library.**
+
+ Added a new ``capture`` library which provides a mechanism via telemetry
+ interface for capturing packets to a file descriptor. This mechanism
+ is used by the new ``dpdk-wireshark-extcap.py`` script which provides
+ seamless integration with Wireshark.
+
Removed Items
-------------
diff --git a/lib/capture/capture.c b/lib/capture/capture.c
new file mode 100644
index 0000000000..a837c377fc
--- /dev/null
+++ b/lib/capture/capture.c
@@ -0,0 +1,821 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Stephen Hemminger
+ */
+
+#include <ctype.h>
+#include <errno.h>
+#include <pthread.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <sys/queue.h>
+#include <sys/utsname.h>
+#include <net/if.h>
+#include <unistd.h>
+
+#include <rte_branch_prediction.h>
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_ethdev.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_pause.h>
+#include <rte_ring.h>
+#include <rte_spinlock.h>
+#include <rte_stdatomic.h>
+#include <rte_string_fns.h>
+#include <rte_telemetry.h>
+#include <rte_version.h>
+
+#include "capture_impl.h"
+
+#ifndef DLT_EN10MB
+#define DLT_EN10MB 1
+#endif
+
+RTE_LOG_REGISTER_DEFAULT(rte_capture_logtype, NOTICE);
+
+/*
+ * List of active captures.
+ *
+ * This is a control-plane only structure: it is created, walked and torn down
+ * from the telemetry handler thread and from the per-capture drain threads,
+ * never from the dataplane. A plain spinlock is therefore enough; the EAL
+ * shared tailq (rte_tailq) is not used because captures are not visible to
+ * secondary processes in this design.
+ */
+TAILQ_HEAD(capture_list, capture);
+static struct capture_list capture_list = TAILQ_HEAD_INITIALIZER(capture_list);
+static rte_spinlock_t capture_lock = RTE_SPINLOCK_INITIALIZER;
+
+#define DEFAULT_SNAPLEN 262144u /* from tcpdump et.al. */
+#define CAPTURE_BURST_SIZE 32u
+#define MBUF_POOL_CACHE_SIZE 32
+#define CAPTURE_RING_SIZE 256
+#define CAPTURE_POOL_SIZE 1024
+#define SLEEP_THRESHOLD 100
+#define SLEEP_US 100
+
+/* Parameter values: only used on stack inside parsing */
+struct capture_config {
+ uint16_t port_id;
+ uint32_t snaplen;
+ const char *filter_str;
+};
+
+/*
+ * Data used by callback
+ * This per-queue to avoid cache thrashing
+ */
+struct __rte_cache_aligned capture_rxtx_cb {
+ RTE_ATOMIC(uint32_t) use_count;
+ const struct rte_eth_rxtx_callback *cb;
+
+ struct capture_stats {
+ RTE_ATOMIC(uint64_t) accepted; /**< Number of packets accepted by filter. */
+ RTE_ATOMIC(uint64_t) filtered; /**< Number of packets rejected by filter. */
+ RTE_ATOMIC(uint64_t) nombuf; /**< Number of mbuf allocation failures. */
+ RTE_ATOMIC(uint64_t) ringfull; /**< Number of missed packets due to ring full. */
+ } stats;
+};
+
+/*
+ * Per-capture instance state.
+ */
+struct capture {
+ TAILQ_ENTRY(capture) next; /* links into capture_list */
+ unsigned int idx;
+ RTE_ATOMIC(bool) running;
+ int fd; /* file descriptor of FIFO */
+ struct rte_capture_filter *filter;
+ struct rte_ring *ring; /* ring from dataplane to capture thread */
+ struct rte_mempool *mp; /* mempool for capture mbufs */
+
+ uint32_t snaplen; /* amount of data to copy */
+ uint16_t port_id;
+ uint16_t tx_queues;
+ uint16_t rx_queues;
+
+ /* per-queue data sized to max(tx_queue, rx_queues) */
+ struct capture_cbs {
+ struct capture_rxtx_cb tx_cb;
+ struct capture_rxtx_cb rx_cb;
+ } cbs[];
+};
+
+/* Wait for callbacks to be idle before free */
+static void
+capture_cb_wait(struct capture_rxtx_cb *cbs)
+{
+ /* wait until use_count is even (not in use) */
+ RTE_WAIT_UNTIL_MASKED(&cbs->use_count, 1, ==, 0, rte_memory_order_acquire);
+}
+
+/* Hold a reference to callback while active */
+static inline __rte_hot void
+capture_cb_hold(struct capture_rxtx_cb *cbs)
+{
+ rte_atomic_fetch_add_explicit(&cbs->use_count, 1, rte_memory_order_acquire);
+}
+
+/* Drop reference to callback when done */
+static inline __rte_hot void
+capture_cb_release(struct capture_rxtx_cb *cbs)
+{
+ rte_atomic_fetch_sub_explicit(&cbs->use_count, 1, rte_memory_order_release);
+}
+
+/* Cleanup call backs */
+static void __rte_cold
+capture_cb_cleanup(struct capture *cap)
+{
+
+ for (unsigned int q = 0; q < cap->tx_queues; q++) {
+ struct capture_rxtx_cb *tx_cb = &cap->cbs[q].tx_cb;
+ if (tx_cb->cb) {
+ rte_eth_remove_tx_callback(cap->port_id, q, tx_cb->cb);
+ capture_cb_wait(tx_cb);
+ tx_cb->cb = NULL;
+ }
+ }
+
+ for (unsigned int q = 0; q < cap->rx_queues; q++) {
+ struct capture_rxtx_cb *rx_cb = &cap->cbs[q].rx_cb;
+ if (rx_cb->cb) {
+ rte_eth_remove_rx_callback(cap->port_id, q, rx_cb->cb);
+ capture_cb_wait(rx_cb);
+ rx_cb->cb = NULL;
+ }
+ }
+}
+
+/* Create a clone of mbuf to be placed into ring. */
+static inline __rte_hot void
+capture_copy_burst(uint16_t port_id, uint16_t queue_id,
+ enum rte_pcapng_direction direction,
+ struct rte_mbuf **pkts, unsigned int nb_pkts,
+ const struct capture *cap,
+ struct capture_stats *stats)
+{
+ unsigned int i, ring_enq, d_pkts = 0;
+ struct rte_mbuf *dup_bufs[CAPTURE_BURST_SIZE]; /* duplicated packets */
+ struct rte_ring *ring = cap->ring;
+ struct rte_mempool *mp = cap->mp;
+ uint32_t snaplen = cap->snaplen;
+ struct rte_mbuf *p;
+
+ RTE_ASSERT(nb_pkts <= CAPTURE_BURST_SIZE);
+
+ for (i = 0; i < nb_pkts; i++) {
+ /*
+ * This uses same BPF return value convention as socket filter and pcap_offline_filter.
+ * if program returns zero then packet doesn't match the filter (will be ignored).
+ */
+ if (cap->filter) {
+ uint64_t rc = __rte_capture_filter(cap->filter, pkts[i]);
+ if (rc == 0) {
+ rte_atomic_fetch_add_explicit(&stats->filtered, 1,
+ rte_memory_order_relaxed);
+ continue;
+ }
+ }
+
+ p = rte_pcapng_copy(port_id, queue_id, pkts[i], mp,
+ snaplen, direction, NULL);
+
+ if (unlikely(p == NULL))
+ rte_atomic_fetch_add_explicit(&stats->nombuf, 1,
+ rte_memory_order_relaxed);
+ else
+ dup_bufs[d_pkts++] = p;
+ }
+
+ if (d_pkts == 0)
+ return;
+
+ rte_atomic_fetch_add_explicit(&stats->accepted, d_pkts, rte_memory_order_relaxed);
+
+ ring_enq = rte_ring_enqueue_burst(ring, (void *)&dup_bufs[0], d_pkts, NULL);
+ if (unlikely(ring_enq < d_pkts)) {
+ unsigned int drops = d_pkts - ring_enq;
+
+ rte_atomic_fetch_add_explicit(&stats->ringfull, drops, rte_memory_order_relaxed);
+ rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
+ }
+}
+
+/* Create a clone of mbuf to be placed into ring. */
+static __rte_hot inline void
+capture_copy(uint16_t port_id, uint16_t queue_id,
+ enum rte_pcapng_direction direction,
+ struct rte_mbuf **pkts, uint16_t nb_pkts,
+ const struct capture *cap,
+ struct capture_stats *stats)
+{
+ unsigned int offs = 0;
+
+ do {
+ unsigned int n = RTE_MIN(nb_pkts - offs, CAPTURE_BURST_SIZE);
+
+ capture_copy_burst(port_id, queue_id, direction, &pkts[offs], n, cap, stats);
+ offs += n;
+ } while (offs < nb_pkts);
+}
+
+static __rte_hot uint16_t
+capture_rx(uint16_t port, uint16_t queue,
+ struct rte_mbuf **pkts, uint16_t nb_pkts,
+ uint16_t max_pkts __rte_unused, void *user_params)
+{
+ struct capture *cap = user_params;
+ struct capture_rxtx_cb *cbs = &cap->cbs[queue].rx_cb;
+
+ capture_cb_hold(cbs);
+ capture_copy(port, queue, RTE_PCAPNG_DIRECTION_IN, pkts, nb_pkts, cap, &cbs->stats);
+ capture_cb_release(cbs);
+
+ return nb_pkts;
+}
+
+static __rte_hot uint16_t
+capture_tx(uint16_t port, uint16_t queue,
+ struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+{
+ struct capture *capture = user_params;
+ struct capture_rxtx_cb *cbs = &capture->cbs[queue].tx_cb;
+
+ capture_cb_hold(cbs);
+ capture_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT, pkts, nb_pkts, capture, &cbs->stats);
+ capture_cb_release(cbs);
+
+ return nb_pkts;
+}
+
+/*
+ * Break the comma separated parameter string into tokens
+ * and fill in the capture config structure.
+ *
+ * Does not use rte_kvargs because that would mangle [] etc in filter expression.
+ */
+static __rte_cold int
+parse_params(char *str, struct capture_config *cfg)
+{
+ uint32_t snaplen = DEFAULT_SNAPLEN;
+
+ char *args[4];
+ int nargs = rte_strsplit(str, strlen(str), args, RTE_DIM(args), ',');
+ /* Need at least the port id */
+ if (nargs < 1) {
+ CAPTURE_LOG(ERR, "missing parameters '%s'", str);
+ return -1;
+ }
+
+ /* Parse port id (required) */
+ char *endp;
+ errno = 0;
+ unsigned long port_id = strtoul(args[0], &endp, 10);
+ if (errno != 0 || port_id >= RTE_MAX_ETHPORTS) {
+ CAPTURE_LOG(ERR, "invalid port_id=%s", args[0]);
+ return -1;
+ }
+ if (*endp != '\0') {
+ CAPTURE_LOG(ERR, "garbage after port_id value");
+ return -1;
+ }
+
+ /* parse remainder as name=value parameters */
+ for (int i = 1; i < nargs; i++) {
+ char *key = args[i];
+
+ /* split at the = */
+ char *eq = strchr(args[i], '=');
+
+ /* all current options require argument after = */
+ if (eq == NULL || eq[1] == '\0') {
+ CAPTURE_LOG(ERR, "missing value for '%s'", key);
+ return -1;
+ }
+ *eq = '\0';
+ char *value = eq + 1;
+
+ if (strcmp(key, "filter") == 0) {
+ cfg->filter_str = value;
+ } else if (strcmp(key, "snaplen") == 0) {
+ errno = 0;
+ unsigned long len = strtoul(value, &endp, 10);
+ if (errno != 0 || *endp != '\0' || len >= UINT32_MAX) {
+ CAPTURE_LOG(ERR, "invalid snaplen '%lu'", len);
+ return -1;
+ }
+ snaplen = len;
+ } else {
+ CAPTURE_LOG(ERR, "unknown parameter '%s'", key);
+ return -1;
+ }
+ }
+
+ cfg->port_id = port_id;
+
+ /*
+ * Default is 256K from tcpdump legacy
+ * using snaplen=0 means everything.
+ */
+ cfg->snaplen = snaplen > 0 ? snaplen : UINT32_MAX;
+
+ return 0;
+}
+
+/*
+ * Open pcapng handle.
+ * Look up OS name and add DPDK version.
+ */
+static __rte_cold rte_pcapng_t *
+capture_pcapng_open(int fd, uint16_t port_id, const char *filter)
+{
+ rte_pcapng_t *pcapng = NULL;
+ char port_name[RTE_ETH_NAME_MAX_LEN];
+ char ifname[IFNAMSIZ];
+ char *ifdescr = NULL;
+ struct utsname uts;
+ char *osname = NULL;
+
+ /* OS name is optional, just keep going if not found */
+ if (uname(&uts) == 0 &&
+ asprintf(&osname, "%s %s", uts.sysname, uts.release) < 0)
+ osname = NULL;
+
+ /* add DPDK internal name */
+ if (rte_eth_dev_get_name_by_port(port_id, port_name) != 0) {
+ CAPTURE_LOG(NOTICE, "Could not find port name for %u", port_id);
+ goto close_fd;
+ }
+
+ /* match name convention used by dpdk-wireshark-extcap.py */
+ snprintf(ifname, sizeof(ifname), "dpdk:%u", port_id);
+ if (asprintf(&ifdescr, "DPDK %s (port %u)", port_name, port_id) < 0)
+ ifdescr = NULL;
+
+ pcapng = rte_pcapng_fdopen(fd, osname, NULL, rte_version(), NULL);
+ if (pcapng == NULL) {
+ CAPTURE_LOG(ERR, "Add section block failed");
+ goto close_fd;
+ }
+
+ if (rte_pcapng_add_interface(pcapng, port_id, DLT_EN10MB, ifname, ifdescr, filter) < 0) {
+ CAPTURE_LOG(ERR, "Add interface for port %u:%s failed", port_id, ifname);
+ rte_pcapng_close(pcapng); /* closes fd */
+ pcapng = NULL;
+ }
+ goto cleanup;
+
+close_fd:
+ close(fd);
+cleanup:
+ free(osname);
+ free(ifdescr);
+ return pcapng;
+}
+
+static __rte_cold void
+capture_link(struct capture *cap)
+{
+ rte_spinlock_lock(&capture_lock);
+ TAILQ_INSERT_TAIL(&capture_list, cap, next);
+ rte_spinlock_unlock(&capture_lock);
+}
+
+static __rte_cold void
+capture_unlink(struct capture *cap)
+{
+ rte_spinlock_lock(&capture_lock);
+ TAILQ_REMOVE(&capture_list, cap, next);
+ rte_spinlock_unlock(&capture_lock);
+}
+
+static __rte_cold void
+capture_free(struct capture *cap)
+{
+ if (cap == NULL)
+ return;
+
+ __rte_capture_filter_free(cap->filter);
+ rte_ring_free(cap->ring);
+ rte_mempool_free(cap->mp);
+ rte_free(cap);
+}
+
+/* Generate unique id for naming and telemetry */
+static unsigned int
+get_unique_id(void)
+{
+ static RTE_ATOMIC(unsigned int) capture_instance;
+
+ return rte_atomic_fetch_add_explicit(&capture_instance, 1, rte_memory_order_relaxed);
+}
+
+/*
+ * Convert configuration into running state
+ */
+static struct capture *
+capture_alloc(const struct capture_config *cfg, int fd,
+ const struct rte_eth_dev_info *dev_info,
+ int socket_id)
+{
+ struct capture *cap;
+ char ring_name[RTE_RING_NAMESIZE];
+ uint16_t mbuf_size;
+ uint16_t num_queues = RTE_MAX(dev_info->nb_tx_queues, dev_info->nb_rx_queues);
+ size_t cb_size = sizeof(*cap) + num_queues * sizeof(cap->cbs[0]);
+
+ cap = rte_zmalloc_socket("capture", cb_size, RTE_CACHE_LINE_SIZE, socket_id);
+ if (cap == NULL) {
+ CAPTURE_LOG(ERR, "Could not allocate capture struct");
+ goto err_close_fd;
+ }
+
+ cap->idx = get_unique_id();
+
+ snprintf(ring_name, sizeof(ring_name), "capture-%u", cap->idx);
+ cap->ring = rte_ring_create(ring_name, CAPTURE_RING_SIZE, socket_id, 0);
+ if (cap->ring == NULL) {
+ CAPTURE_LOG(ERR, "Could not create ring");
+ goto err_close_fd;
+ }
+
+ /*
+ * If snapshot length is smaller than one mbuf segment then pool
+ * element size can be reduced; otherwise can just use the default
+ * and rte_pktmbuf_copy handle multiple segments.
+ */
+ if (cfg->snaplen < RTE_MBUF_DEFAULT_BUF_SIZE)
+ mbuf_size = rte_pcapng_mbuf_size(cfg->snaplen);
+ else
+ mbuf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+ cap->mp = rte_pktmbuf_pool_create_by_ops(ring_name, CAPTURE_POOL_SIZE,
+ MBUF_POOL_CACHE_SIZE, 0, mbuf_size,
+ socket_id, "ring_mp_mc");
+ if (cap->mp == NULL) {
+ CAPTURE_LOG(ERR, "Could not create mempool");
+ goto err_close_fd;
+ }
+
+ if (cfg->filter_str) {
+ cap->filter = __rte_capture_filter_create(cfg->filter_str);
+ if (cap->filter == NULL) {
+ CAPTURE_LOG(ERR, "Could not compile filter: %s", cfg->filter_str);
+ goto err_close_fd;
+ }
+ }
+
+ cap->fd = fd;
+ cap->port_id = cfg->port_id;
+ rte_atomic_store_explicit(&cap->running, true, rte_memory_order_relaxed);
+ cap->snaplen = cfg->snaplen;
+ cap->tx_queues = dev_info->nb_tx_queues;
+ cap->rx_queues = dev_info->nb_rx_queues;
+
+ for (unsigned int q = 0; q < cap->tx_queues; q++) {
+ struct capture_rxtx_cb *tx_cb = &cap->cbs[q].tx_cb;
+ tx_cb->cb = rte_eth_add_tx_callback(cfg->port_id, q, capture_tx, cap);
+ if (tx_cb->cb == NULL)
+ CAPTURE_LOG(ERR, "Register tx callback for %u:%u failed",
+ cfg->port_id, q);
+ }
+
+ for (unsigned int q = 0; q < cap->rx_queues; q++) {
+ struct capture_rxtx_cb *rx_cb = &cap->cbs[q].rx_cb;
+ rx_cb->cb = rte_eth_add_rx_callback(cfg->port_id, q, capture_rx, cap);
+ if (rx_cb->cb == NULL)
+ CAPTURE_LOG(ERR, "Register rx callback for %u:%u failed",
+ cfg->port_id, q);
+ }
+
+ return cap;
+
+err_close_fd:
+ close(fd);
+ capture_free(cap);
+ return NULL;
+}
+
+/*
+ * The capture thread that moves packets from ring into the FIFO
+ */
+static void *
+capture_thread(void *arg)
+{
+ struct capture *cap = arg;
+ unsigned int empty_count = 0;
+
+ CAPTURE_LOG(INFO, "capture thread starting");
+
+ /* This thread wants to detect when FIFO gets closed */
+ sigset_t set;
+ sigemptyset(&set);
+ sigaddset(&set, SIGPIPE);
+ pthread_sigmask(SIG_BLOCK, &set, NULL);
+
+ rte_pcapng_t *pcapng = capture_pcapng_open(cap->fd, cap->port_id,
+ __rte_capture_filter_string(cap->filter));
+ if (pcapng == NULL)
+ goto error;
+
+ while (rte_atomic_load_explicit(&cap->running, rte_memory_order_relaxed)) {
+ unsigned int avail, n;
+ struct rte_mbuf *pkts[CAPTURE_BURST_SIZE];
+
+ n = rte_ring_sc_dequeue_burst(cap->ring, (void **) pkts, CAPTURE_BURST_SIZE, &avail);
+
+ /*
+ * If the ring is empty, apply simple heuristic to keep this
+ * thread from fully consuming the CPU.
+ */
+ if (n == 0) {
+ /* repeat a few times before waiting */
+ if (empty_count < SLEEP_THRESHOLD) {
+ ++empty_count;
+ } else {
+ struct pollfd pfd = { .fd = cap->fd };
+ struct timespec ts = { .tv_nsec = SLEEP_US * 1000 };
+
+ if (ppoll(&pfd, 1, &ts, NULL) > 0 &&
+ (pfd.revents & (POLLERR | POLLHUP | POLLNVAL))) {
+ CAPTURE_LOG(NOTICE, "fifo reader closed");
+ break; /* reader is gone */
+ }
+ }
+ continue;
+ }
+
+ /* If this drained the ring count it as first emptying */
+ empty_count = (avail == 0);
+
+ if (unlikely(rte_pcapng_write_packets(pcapng, pkts, n) < 0)) {
+ CAPTURE_LOG(NOTICE, "write to fifo failed: %s", strerror(errno));
+ break;
+ }
+ }
+
+ rte_atomic_store_explicit(&cap->running, false, rte_memory_order_relaxed);
+
+ /* Capture exiting */
+ CAPTURE_LOG(INFO, "capture thread stopping");
+ rte_pcapng_close(pcapng);
+
+error:
+
+ capture_cb_cleanup(cap);
+ capture_unlink(cap);
+ capture_free(cap);
+
+ return NULL;
+}
+
+/*
+ * Callback handler for telemetry library to start capture.
+ *
+ * Need to handle: <iface>,snaplen=<n>,filter=<str>
+ */
+static int
+capture_start_req(const char *cmd, const char *params, void *arg __rte_unused,
+ const int *fds, unsigned int n_fds, struct rte_tel_data *d)
+{
+ struct capture *cap = NULL;
+ struct capture_config cfg = { };
+ struct rte_eth_dev_info dev_info;
+
+ CAPTURE_LOG(DEBUG, "telemetry: %s %s", cmd, params);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ CAPTURE_LOG(ERR, "capture can only be started from primary");
+ goto error;
+ }
+
+ if (params == NULL || !isdigit((unsigned char)*params))
+ goto error;
+
+ /* Note: params is const so need non-const copy for parsing */
+ if (parse_params(strdupa(params), &cfg) < 0)
+ goto error;
+
+ /* Need one fd for output */
+ if (n_fds != 1) {
+ if (n_fds == 0)
+ CAPTURE_LOG(ERR, "missing output fd");
+ else
+ CAPTURE_LOG(ERR, "too many fds");
+ goto error;
+ }
+
+ /* Lookup number of queues etc, also validates port_id */
+ if (rte_eth_dev_info_get(cfg.port_id, &dev_info) < 0) {
+ CAPTURE_LOG(ERR, "can not get info for port %u", cfg.port_id);
+ goto error;
+ }
+
+ int socket_id = rte_eth_dev_socket_id(cfg.port_id);
+ if (socket_id < 0) {
+ CAPTURE_LOG(NOTICE, "could not determine socket for port %u", cfg.port_id);
+ socket_id = SOCKET_ID_ANY;
+ }
+
+ cap = capture_alloc(&cfg, fds[0], &dev_info, socket_id);
+ if (cap == NULL)
+ return -1; /* fd already closed by capture_alloc */
+
+ /*
+ * Publish into the active list before starting the drain thread so the
+ * thread is guaranteed to find itself there when it removes itself on
+ * exit (it may exit immediately, e.g. if the FIFO reader is already
+ * gone). On thread-create failure we undo the insertion here.
+ */
+ unsigned int idx = cap->idx;
+ capture_link(cap);
+
+ /*
+ * Make a new thread to do the capture work
+ * Thread will inherit affinity from the telemetry handler that calls us
+ */
+ pthread_t thread_id;
+ if (pthread_create(&thread_id, NULL, capture_thread, cap) != 0) {
+ CAPTURE_LOG(ERR, "Capture thread start failed: %s", strerror(errno));
+
+ close(cap->fd);
+ capture_unlink(cap);
+ capture_cb_cleanup(cap);
+ capture_free(cap);
+ return -1;
+ }
+
+ /* Nothing will be waiting for this thread. */
+ pthread_detach(thread_id);
+
+ /* Return id back for later use. */
+ rte_tel_data_start_dict(d);
+ rte_tel_data_add_dict_uint(d, "id", idx);
+ rte_tel_data_add_dict_string(d, "status", "running");
+ return 0;
+
+error:
+ for (unsigned int i = 0; i < n_fds; i++)
+ close(fds[i]);
+ return -1;
+}
+
+
+
+/* Telemetry: stop active capture. */
+static int
+capture_stop_req(const char *cmd, const char *params, struct rte_tel_data *d)
+{
+
+ CAPTURE_LOG(DEBUG, "telemetry %s %s", cmd, params);
+
+ if (params == NULL || *params == '\0')
+ return -EINVAL;
+
+ errno = 0;
+ char *endp;
+ unsigned long idx = strtoul(params, &endp, 10);
+ if (errno != 0 || *endp != '\0')
+ return -EINVAL;
+
+ rte_spinlock_lock(&capture_lock);
+ struct capture *cap;
+ TAILQ_FOREACH(cap, &capture_list, next) {
+ if (cap->idx == idx)
+ break;
+ }
+ if (cap == NULL) {
+ CAPTURE_LOG(ERR, "Capture index %lu not found", idx);
+ rte_spinlock_unlock(&capture_lock);
+ return -ENOENT;
+ }
+ rte_atomic_store_explicit(&cap->running, false, rte_memory_order_relaxed);
+ rte_spinlock_unlock(&capture_lock);
+ rte_tel_data_start_dict(d);
+ rte_tel_data_add_dict_string(d, "status", "stopped");
+ return 0;
+}
+
+/* Telemetry: list the ids of all active captures. */
+static int
+capture_list_req(const char *cmd __rte_unused, const char *params __rte_unused,
+ struct rte_tel_data *d)
+{
+ struct capture *cap;
+
+ CAPTURE_LOG(DEBUG, "telemetry %s %s", cmd, params);
+ rte_tel_data_start_array(d, RTE_TEL_UINT_VAL);
+
+ rte_spinlock_lock(&capture_lock);
+ TAILQ_FOREACH(cap, &capture_list, next)
+ rte_tel_data_add_array_uint(d, cap->idx);
+ rte_spinlock_unlock(&capture_lock);
+
+ return 0;
+}
+
+/* Aggregate per-queue counters of a capture instance. */
+struct capture_total {
+ uint64_t accepted;
+ uint64_t filtered;
+ uint64_t nombuf;
+ uint64_t ringfull;
+};
+
+static void
+capture_sum_one(struct capture_total *t, const struct capture_stats *s)
+{
+ t->accepted += rte_atomic_load_explicit(&s->accepted, rte_memory_order_relaxed);
+ t->filtered += rte_atomic_load_explicit(&s->filtered, rte_memory_order_relaxed);
+ t->nombuf += rte_atomic_load_explicit(&s->nombuf, rte_memory_order_relaxed);
+ t->ringfull += rte_atomic_load_explicit(&s->ringfull, rte_memory_order_relaxed);
+}
+
+/* Sum the rx and tx counters across all queues. Caller holds capture_lock. */
+static void
+capture_sum_stats(const struct capture *cap, struct capture_total *t)
+{
+ *t = (struct capture_total){ };
+
+ for (unsigned int q = 0; q < cap->rx_queues; q++)
+ capture_sum_one(t, &cap->cbs[q].rx_cb.stats);
+ for (unsigned int q = 0; q < cap->tx_queues; q++)
+ capture_sum_one(t, &cap->cbs[q].tx_cb.stats);
+}
+
+/* Telemetry: report configuration and counters for one capture. */
+static int
+capture_stats_req(const char *cmd, const char *params,
+ struct rte_tel_data *d)
+{
+ struct capture *cap;
+ struct capture_total t;
+ char *endp;
+
+ CAPTURE_LOG(DEBUG, "telemetry %s %s", cmd, params);
+ if (params == NULL || *params == '\0')
+ return -EINVAL;
+
+ errno = 0;
+ unsigned long idx = strtoul(params, &endp, 10);
+ if (errno != 0 || *endp != '\0')
+ return -EINVAL;
+
+ /* Find the instance and snapshot what we need while holding the lock. */
+ rte_spinlock_lock(&capture_lock);
+ TAILQ_FOREACH(cap, &capture_list, next) {
+ if (cap->idx == idx)
+ break;
+ }
+ if (cap == NULL) {
+ CAPTURE_LOG(ERR, "Capture index %lu not found", idx);
+ rte_spinlock_unlock(&capture_lock);
+ return -ENOENT;
+ }
+
+ rte_tel_data_start_dict(d);
+ rte_tel_data_add_dict_uint(d, "port_id", cap->port_id);
+ if (cap->filter)
+ rte_tel_data_add_dict_string(d, "filter",
+ __rte_capture_filter_string(cap->filter));
+ rte_tel_data_add_dict_int(d, "running",
+ rte_atomic_load_explicit(&cap->running,
+ rte_memory_order_relaxed));
+ rte_tel_data_add_dict_uint(d, "snaplen", cap->snaplen);
+ rte_tel_data_add_dict_uint(d, "rx_queues", cap->rx_queues);
+ rte_tel_data_add_dict_uint(d, "tx_queues", cap->tx_queues);
+ capture_sum_stats(cap, &t);
+ rte_spinlock_unlock(&capture_lock);
+
+ rte_tel_data_add_dict_uint(d, "accepted", t.accepted);
+ rte_tel_data_add_dict_uint(d, "filtered", t.filtered);
+ rte_tel_data_add_dict_uint(d, "nombuf", t.nombuf);
+ rte_tel_data_add_dict_uint(d, "ringfull", t.ringfull);
+
+ return 0;
+}
+
+RTE_INIT(capture_telemetry)
+{
+ rte_telemetry_register_cmd("/ethdev/capture/list", capture_list_req,
+ "List ids of active captures. Takes no parameters.");
+ rte_telemetry_register_cmd("/ethdev/capture/stats", capture_stats_req,
+ "Report configuration and counters for a capture. Parameters: id");
+ rte_telemetry_register_cmd_fd_arg("/ethdev/capture/start", capture_start_req, NULL,
+ "Start capture."
+ "Parameters: port_id,snaplen=N(optional),filter=string(optional)");
+ rte_telemetry_register_cmd("/ethdev/capture/stop", capture_stop_req,
+ "Stop an active capture. Parameters: id");
+}
diff --git a/lib/capture/capture_impl.h b/lib/capture/capture_impl.h
new file mode 100644
index 0000000000..adee734b6c
--- /dev/null
+++ b/lib/capture/capture_impl.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Stephen Hemminger
+ */
+#ifndef CAPTURE_IMPL_H
+#define CAPTURE_IMPL_H
+
+#define RTE_LOGTYPE_CAPTURE rte_capture_logtype
+extern int rte_capture_logtype;
+#define CAPTURE_LOG(level, ...) \
+ RTE_LOG_LINE_PREFIX(level, CAPTURE, "%s(): ", __func__, __VA_ARGS__)
+
+struct rte_capture_filter;
+
+#ifdef RTE_HAS_LIBPCAP
+struct rte_capture_filter *__rte_capture_filter_create(const char *str);
+const char *__rte_capture_filter_string(struct rte_capture_filter *filter);
+void __rte_capture_filter_free(struct rte_capture_filter *filter);
+uint64_t __rte_capture_filter(const struct rte_capture_filter *filter, struct rte_mbuf *mb);
+
+#else /* !RTE_HAS_LIBPCAP */
+
+/* Stub version if pcap is not available */
+static inline struct rte_capture_filter *
+__rte_capture_filter_create(const char *str)
+{
+ RTE_SET_USED(str);
+ return NULL; /* not supported */
+}
+
+static inline const char *
+__rte_capture_filter_string(struct rte_capture_filter *filter)
+{
+ RTE_SET_USED(filter);
+ return NULL;
+}
+
+static inline void
+__rte_capture_filter_free(struct rte_capture_filter *filter)
+{
+ RTE_SET_USED(filter);
+}
+
+/*
+ * This will be zero if the packet doesn't match the filter and non-zero if
+ * the packet matches the filter.
+ */
+static inline uint64_t
+__rte_capture_filter(const struct rte_capture_filter *filter, struct rte_mbuf *mb)
+{
+ RTE_SET_USED(filter);
+ RTE_SET_USED(mb);
+ return 1;
+}
+
+#endif /* !RTE_HAS_LIBPCAP */
+#endif /* CAPTURE_IMPL_H */
diff --git a/lib/capture/filter.c b/lib/capture/filter.c
new file mode 100644
index 0000000000..ecb5e8a765
--- /dev/null
+++ b/lib/capture/filter.c
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Stephen Hemminger
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <pcap/pcap.h>
+
+#include <rte_bpf.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "capture_impl.h"
+
+struct rte_capture_filter {
+ struct rte_bpf *bpf;
+ struct rte_bpf_jit jit;
+ char expr[]; /* original filter text */
+};
+
+/*
+ * Convert text string into an eBPF program
+ */
+struct rte_capture_filter *
+__rte_capture_filter_create(const char *filter)
+{
+ struct rte_capture_filter *flt = NULL;
+ struct rte_bpf_prm *prm = NULL;
+
+ /* libpcap needs a handle */
+ pcap_t *pcap = pcap_open_dead(DLT_EN10MB, UINT16_MAX);
+ if (!pcap) {
+ CAPTURE_LOG(ERR, "pcap: can not open handle");
+ return NULL;
+ }
+
+ flt = rte_zmalloc("capture_filter", sizeof(*flt) + strlen(filter) + 1, 0);
+ if (flt == NULL) {
+ CAPTURE_LOG(ERR, "capture filter alloc failed");
+ goto error;
+ }
+
+ /* convert string to cBPF program */
+ struct bpf_program bf;
+ if (pcap_compile(pcap, &bf, filter, 1, PCAP_NETMASK_UNKNOWN) != 0) {
+ CAPTURE_LOG(ERR, "pcap: can not compile filter: %s",
+ pcap_geterr(pcap));
+ goto error;
+ }
+ strcpy(flt->expr, filter);
+
+ /* convert cBPF to eBPF */
+ prm = rte_bpf_convert(&bf);
+ pcap_freecode(&bf); /* drop the cBPF program */
+
+ if (prm == NULL) {
+ CAPTURE_LOG(ERR, "BPF convert interface %s(%d)",
+ rte_strerror(rte_errno), rte_errno);
+ goto error;
+ }
+
+ flt->bpf = rte_bpf_load(prm);
+ if (flt->bpf == NULL) {
+ CAPTURE_LOG(ERR, "BPF load failed: %s(%d)",
+ rte_strerror(rte_errno), rte_errno);
+ goto error;
+ }
+
+ rte_bpf_get_jit(flt->bpf, &flt->jit);
+ if (flt->jit.func == NULL)
+ CAPTURE_LOG(NOTICE, "No JIT available for filter");
+
+ pcap_close(pcap);
+ rte_free(prm);
+ return flt;
+
+error:
+ pcap_close(pcap);
+ rte_free(prm);
+ rte_free(flt);
+ return NULL;
+}
+
+const char *__rte_capture_filter_string(struct rte_capture_filter *filter)
+{
+ return filter ? filter->expr : NULL;
+}
+
+void __rte_capture_filter_free(struct rte_capture_filter *filter)
+{
+ if (filter == NULL)
+ return;
+
+ rte_bpf_destroy(filter->bpf);
+ rte_free(filter);
+}
+
+uint64_t __rte_capture_filter(const struct rte_capture_filter *filter, struct rte_mbuf *mb)
+{
+ if (filter->jit.func)
+ return filter->jit.func(mb);
+ else
+ return rte_bpf_exec(filter->bpf, mb);
+}
diff --git a/lib/capture/meson.build b/lib/capture/meson.build
new file mode 100644
index 0000000000..4dbe0d1a78
--- /dev/null
+++ b/lib/capture/meson.build
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2026 Stephen Hemminger
+
+if is_windows
+ build = false
+ reason = 'not supported on Windows'
+ subdir_done()
+endif
+
+sources = files('capture.c')
+
+deps += ['ethdev', 'pcapng', 'bpf']
+
+if dpdk_conf.has('RTE_HAS_LIBPCAP')
+ sources += files('filter.c')
+ ext_deps += pcap_dep
+else
+ warning('libpcap is missing, capture filtering will be disabled')
+endif
diff --git a/lib/meson.build b/lib/meson.build
index af5c160cb8..6d9992f61f 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -49,6 +49,7 @@ libraries = [
'lpm',
'member',
'pcapng',
+ 'capture', # depends on pcapng and bpf
'power',
'rawdev',
'regexdev',
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC 3/4] test: add test for capture hooks
2026-06-09 21:02 [RFC 0/4] alternative capture mechanism Stephen Hemminger
2026-06-09 21:02 ` [RFC 1/4] telemetry: allow commands to receive file descriptors Stephen Hemminger
2026-06-09 21:02 ` [RFC 2/4] capture: infrastructure wireshark packet capture Stephen Hemminger
@ 2026-06-09 21:02 ` Stephen Hemminger
2026-06-09 21:02 ` [RFC 4/4] usertools/dpdk-wireshark-extcap.py: script for external capture Stephen Hemminger
3 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-06-09 21:02 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Thomas Monjalon, Reshma Pattan
Provide tests to exercise telemetry based packet capture.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
MAINTAINERS | 1 +
app/test/meson.build | 1 +
app/test/test_capture.c | 365 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 367 insertions(+)
create mode 100644 app/test/test_capture.c
diff --git a/MAINTAINERS b/MAINTAINERS
index dd359d956e..ff5f31c770 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1724,6 +1724,7 @@ Packet capture
M: Reshma Pattan <reshma.pattan@intel.com>
M: Stephen Hemminger <stephen@networkplumber.org>
F: lib/capture/
+F: app/test/test_capture.c
F: lib/pdump/
F: doc/guides/prog_guide/pdump_lib.rst
F: app/test/test_pdump.*
diff --git a/app/test/meson.build b/app/test/meson.build
index 61024125a7..e1806ec4ca 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -137,6 +137,7 @@ source_file_deps = {
'test_net_ip6.c': ['net'],
'test_pcapng.c': ['net_null', 'net', 'ethdev', 'pcapng', 'bus_vdev'],
'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'],
+ 'test_capture.c': ['net_ring', 'net', 'ethdev', 'bus_vdev', 'telemetry'],
'test_pdump.c': ['pdump'] + sample_packet_forward_deps,
'test_per_lcore.c': [],
'test_pflock.c': [],
diff --git a/app/test/test_capture.c b/app/test/test_capture.c
new file mode 100644
index 0000000000..ac4dfc43c9
--- /dev/null
+++ b/app/test/test_capture.c
@@ -0,0 +1,365 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Stephen Hemminger
+ */
+
+/*
+ * Functional test for the capture library.
+ *
+ * The capture library has no public C API: it is driven entirely through the
+ * telemetry socket, and the pcapng output is delivered over a file descriptor
+ * passed to the primary process with SCM_RIGHTS. This test therefore behaves
+ * like an external capture tool. It:
+ *
+ * 1. builds a virtual ethdev backed by rings (net_ring), like test_pdump.c;
+ * 2. connects to this process's own telemetry socket;
+ * 3. starts a capture, passing the write end of a pipe as the output fd;
+ * 4. injects packets through the port and checks that
+ * - a pcapng stream appears on the pipe,
+ * - /ethdev/capture/list reports the capture,
+ * - /ethdev/capture/stats reports the expected accepted count;
+ * 5. closes the read end and checks the capture tears itself down and
+ * disappears from /ethdev/capture/list.
+ *
+ * The test is skipped (not failed) if telemetry is not enabled or the ring
+ * driver is not available.
+ */
+
+#include <ctype.h>
+#include <errno.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/select.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_ethdev.h>
+#include <rte_eth_ring.h>
+#include <rte_mbuf.h>
+#include <rte_ring.h>
+
+#include "test.h"
+
+#define TELEMETRY_VERSION "v2"
+#define CAPTURE_START "/ethdev/capture/start"
+#define CAPTURE_LIST "/ethdev/capture/list"
+#define CAPTURE_STATS "/ethdev/capture/stats"
+
+#define RING_SIZE 256
+#define NB_MBUFS 1024
+#define MBUF_CACHE 32
+#define NB_PKTS 32
+#define PKT_LEN 64
+#define REPLY_LEN 16384
+
+/* pcapng Section Header Block type, byte-order independent on disk. */
+static const uint8_t pcapng_shb_magic[4] = { 0x0a, 0x0d, 0x0d, 0x0a };
+
+static struct rte_mempool *test_mp;
+static struct rte_ring *rx_ring, *tx_ring;
+static uint16_t test_port = RTE_MAX_ETHPORTS;
+
+/* --- telemetry client helpers ------------------------------------------ */
+
+/* Connect to this process's telemetry socket; -1 (and skip) if unavailable. */
+static int
+tel_connect(void)
+{
+ struct sockaddr_un addr = { .sun_family = AF_UNIX };
+ char buf[REPLY_LEN];
+ int s;
+
+ snprintf(addr.sun_path, sizeof(addr.sun_path), "%s/dpdk_telemetry.%s",
+ rte_eal_get_runtime_dir(), TELEMETRY_VERSION);
+
+ s = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (s < 0)
+ return -1;
+
+ if (connect(s, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
+ close(s);
+ return -1;
+ }
+
+ /* Server greets with an info message; consume it. */
+ if (recv(s, buf, sizeof(buf), 0) <= 0) {
+ close(s);
+ return -1;
+ }
+ return s;
+}
+
+/* Send a command (no fd) and read the reply. */
+static int
+tel_cmd(int s, const char *cmd, char *reply, size_t reply_sz)
+{
+ ssize_t n;
+
+ if (send(s, cmd, strlen(cmd), 0) < 0)
+ return -1;
+ n = recv(s, reply, reply_sz - 1, 0);
+ if (n < 0)
+ return -1;
+ reply[n] = '\0';
+ return 0;
+}
+
+/* Send a command passing one fd as SCM_RIGHTS, discard the reply. */
+static int
+tel_cmd_fd(int s, const char *cmd, int fd)
+{
+ char cbuf[CMSG_SPACE(sizeof(int))] = { 0 };
+ char reply[REPLY_LEN];
+ struct iovec iov = { .iov_base = (void *)(uintptr_t)cmd, .iov_len = strlen(cmd) };
+ struct msghdr msg = {
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ .msg_control = cbuf,
+ .msg_controllen = sizeof(cbuf),
+ };
+ struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
+
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ memcpy(CMSG_DATA(cmsg), &fd, sizeof(int));
+
+ if (sendmsg(s, &msg, 0) < 0)
+ return -1;
+ if (recv(s, reply, sizeof(reply), 0) < 0)
+ return -1;
+ return 0;
+}
+
+/* Minimal JSON scanning: find "key" and read the unsigned number after it. */
+static int
+json_uint(const char *s, const char *key, uint64_t *out)
+{
+ const char *p = strstr(s, key);
+
+ if (p == NULL)
+ return -1;
+ for (p += strlen(key); *p != '\0' && !isdigit((unsigned char)*p); p++)
+ ;
+ if (*p == '\0')
+ return -1;
+ *out = strtoull(p, NULL, 10);
+ return 0;
+}
+
+/* Read the first element of the array in a list reply; -1 if empty/absent. */
+static int
+json_first_array_uint(const char *s, uint64_t *out)
+{
+ const char *p = strchr(s, '[');
+
+ if (p == NULL)
+ return -1;
+ for (p++; *p == ' '; p++)
+ ;
+ if (*p == ']' || !isdigit((unsigned char)*p))
+ return -1;
+ *out = strtoull(p, NULL, 10);
+ return 0;
+}
+
+/* --- packet injection --------------------------------------------------- */
+
+/* Push NB_PKTS minimal packets through the port's Rx path. */
+static int
+inject_rx(unsigned int count)
+{
+ struct rte_mbuf *bufs[NB_PKTS];
+ uint16_t got;
+
+ if (count > NB_PKTS)
+ count = NB_PKTS;
+
+ for (unsigned int i = 0; i < count; i++) {
+ struct rte_mbuf *m = rte_pktmbuf_alloc(test_mp);
+
+ if (m == NULL) {
+ rte_pktmbuf_free_bulk(bufs, i);
+ return -1;
+ }
+ m->pkt_len = m->data_len = PKT_LEN;
+ memset(rte_pktmbuf_mtod(m, void *), 0, PKT_LEN);
+ bufs[i] = m;
+ }
+
+ if (rte_ring_enqueue_bulk(rx_ring, (void **)bufs, count, NULL) != count) {
+ rte_pktmbuf_free_bulk(bufs, count);
+ return -1;
+ }
+
+ /* Pulling from the port runs the capture Rx callback on each packet. */
+ got = rte_eth_rx_burst(test_port, 0, bufs, count);
+ rte_pktmbuf_free_bulk(bufs, got);
+ return 0;
+}
+
+/* --- fixture ------------------------------------------------------------ */
+
+static int
+build_port(void)
+{
+ struct rte_eth_conf conf = { 0 };
+ int ret;
+
+ test_mp = rte_pktmbuf_pool_create("capture_test_mp", NB_MBUFS, MBUF_CACHE,
+ 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
+ if (test_mp == NULL)
+ return -1;
+
+ rx_ring = rte_ring_create("capture_test_rx", RING_SIZE, rte_socket_id(),
+ RING_F_SP_ENQ | RING_F_SC_DEQ);
+ tx_ring = rte_ring_create("capture_test_tx", RING_SIZE, rte_socket_id(),
+ RING_F_SP_ENQ | RING_F_SC_DEQ);
+ if (rx_ring == NULL || tx_ring == NULL)
+ return -1;
+
+ ret = rte_eth_from_rings("net_capture_test", &rx_ring, 1, &tx_ring, 1, rte_socket_id());
+ if (ret < 0)
+ return -1;
+ test_port = ret;
+
+ if (rte_eth_dev_configure(test_port, 1, 1, &conf) < 0)
+ return -1;
+ if (rte_eth_rx_queue_setup(test_port, 0, RING_SIZE, rte_socket_id(), NULL, test_mp) < 0)
+ return -1;
+ if (rte_eth_tx_queue_setup(test_port, 0, RING_SIZE, rte_socket_id(), NULL) < 0)
+ return -1;
+ if (rte_eth_dev_start(test_port) < 0)
+ return -1;
+
+ return 0;
+}
+
+static void
+teardown_port(void)
+{
+ if (test_port != RTE_MAX_ETHPORTS) {
+ rte_eth_dev_stop(test_port);
+ rte_eth_dev_close(test_port);
+ test_port = RTE_MAX_ETHPORTS;
+ }
+ rte_ring_free(rx_ring);
+ rte_ring_free(tx_ring);
+ rte_mempool_free(test_mp);
+ rx_ring = tx_ring = NULL;
+ test_mp = NULL;
+}
+
+/* --- the test ----------------------------------------------------------- */
+
+static int
+test_capture(void)
+{
+ char cmd[128], reply[REPLY_LEN], pcapng[REPLY_LEN];
+ int sock = -1, pipefd[2] = { -1, -1 };
+ int ret = TEST_FAILED;
+ uint64_t id, accepted;
+ struct timeval tv;
+ fd_set rfds;
+ ssize_t n;
+
+ /* The drain thread writes to the pipe; a closed reader must give EPIPE,
+ * not a fatal SIGPIPE. (The library itself should arguably ignore
+ * SIGPIPE too; see review notes.)
+ */
+ signal(SIGPIPE, SIG_IGN);
+
+ sock = tel_connect();
+ if (sock < 0) {
+ printf("telemetry socket not available, skipping\n");
+ return TEST_SKIPPED;
+ }
+
+ if (build_port() < 0) {
+ printf("could not build ring-backed test port, skipping\n");
+ ret = TEST_SKIPPED;
+ goto out;
+ }
+
+ if (pipe(pipefd) < 0)
+ goto out;
+
+ /* Start the capture, handing it the write end of the pipe. */
+ snprintf(cmd, sizeof(cmd), "%s,%u", CAPTURE_START, test_port);
+ TEST_ASSERT_SUCCESS(tel_cmd_fd(sock, cmd, pipefd[1]),
+ "capture start command failed");
+
+ /* The library now holds its own dup of the write end; drop ours so the
+ * capture sees a hangup once we close the read end below.
+ */
+ close(pipefd[1]);
+ pipefd[1] = -1;
+
+ /* Inject traffic. Rx callbacks run synchronously inside rx_burst, so the
+ * accepted counter is up to date as soon as this returns.
+ */
+ TEST_ASSERT_SUCCESS(inject_rx(NB_PKTS), "packet injection failed");
+
+ /* A pcapng stream (at least the section header) must appear. */
+ FD_ZERO(&rfds);
+ FD_SET(pipefd[0], &rfds);
+ tv = (struct timeval){ .tv_sec = 2 };
+ TEST_ASSERT(select(pipefd[0] + 1, &rfds, NULL, NULL, &tv) > 0,
+ "no pcapng output within timeout");
+ n = read(pipefd[0], pcapng, sizeof(pcapng));
+ TEST_ASSERT(n >= 4, "short pcapng read (%zd)", n);
+ TEST_ASSERT(memcmp(pcapng, pcapng_shb_magic, sizeof(pcapng_shb_magic)) == 0,
+ "output does not start with a pcapng section header block");
+
+ /* The capture must show up in the list. */
+ TEST_ASSERT_SUCCESS(tel_cmd(sock, CAPTURE_LIST, reply, sizeof(reply)),
+ "capture list command failed");
+ TEST_ASSERT_SUCCESS(json_first_array_uint(reply, &id),
+ "no capture id in list reply: %s", reply);
+
+ /* Stats must report exactly the packets we injected. */
+ snprintf(cmd, sizeof(cmd), "%s,%" PRIu64, CAPTURE_STATS, id);
+ TEST_ASSERT_SUCCESS(tel_cmd(sock, cmd, reply, sizeof(reply)),
+ "capture stats command failed");
+ TEST_ASSERT_SUCCESS(json_uint(reply, "\"accepted\"", &accepted),
+ "no accepted counter in stats reply: %s", reply);
+ TEST_ASSERT_EQUAL(accepted, (uint64_t)NB_PKTS,
+ "accepted %" PRIu64 " != %d", accepted, NB_PKTS);
+
+ /* Close the reader: the capture should tear itself down. The drain
+ * thread only notices on its next write, so nudge it with more traffic.
+ */
+ close(pipefd[0]);
+ pipefd[0] = -1;
+ inject_rx(NB_PKTS);
+
+ for (int i = 0; i < 200; i++) { /* up to ~2s */
+ TEST_ASSERT_SUCCESS(tel_cmd(sock, CAPTURE_LIST, reply, sizeof(reply)),
+ "capture list command failed");
+ if (json_first_array_uint(reply, &id) < 0) {
+ ret = TEST_SUCCESS;
+ goto out;
+ }
+ rte_delay_ms(10);
+ }
+ printf("capture did not tear down after reader closed: %s\n", reply);
+
+out:
+ if (pipefd[0] >= 0)
+ close(pipefd[0]);
+ if (pipefd[1] >= 0)
+ close(pipefd[1]);
+ if (sock >= 0)
+ close(sock);
+ teardown_port();
+ return ret;
+}
+
+REGISTER_TEST_COMMAND(capture_autotest, test_capture);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC 4/4] usertools/dpdk-wireshark-extcap.py: script for external capture
2026-06-09 21:02 [RFC 0/4] alternative capture mechanism Stephen Hemminger
` (2 preceding siblings ...)
2026-06-09 21:02 ` [RFC 3/4] test: add test for capture hooks Stephen Hemminger
@ 2026-06-09 21:02 ` Stephen Hemminger
3 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-06-09 21:02 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Thomas Monjalon, Reshma Pattan, Robin Jarry
Provide glue script that wireshark can use to access
telemetry based packet capture. It is dual licensed because
it maybe desirable to put this in wireshark repository.
See https://www.wireshark.org/docs/man-pages/extcap.html
Also add MAINTAINERS and release note.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
MAINTAINERS | 2 +
doc/guides/tools/index.rst | 1 +
doc/guides/tools/wireshark_extcap.rst | 155 +++++++++++++++
usertools/dpdk-wireshark-extcap.py | 274 ++++++++++++++++++++++++++
4 files changed, 432 insertions(+)
create mode 100644 doc/guides/tools/wireshark_extcap.rst
create mode 100755 usertools/dpdk-wireshark-extcap.py
diff --git a/MAINTAINERS b/MAINTAINERS
index ff5f31c770..7cb8782910 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1725,6 +1725,8 @@ M: Reshma Pattan <reshma.pattan@intel.com>
M: Stephen Hemminger <stephen@networkplumber.org>
F: lib/capture/
F: app/test/test_capture.c
+F: usertools/dpdk-wireshark-extcap.py
+F: doc/guides/tools/wireshark_extcap.rst
F: lib/pdump/
F: doc/guides/prog_guide/pdump_lib.rst
F: app/test/test_pdump.*
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 8ec429ec53..580c7d28b1 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -14,6 +14,7 @@ DPDK Tools User Guides
pmdinfo
dumpcap
pdump
+ wireshark_extcap
dmaperf
flow-perf
securityperf
diff --git a/doc/guides/tools/wireshark_extcap.rst b/doc/guides/tools/wireshark_extcap.rst
new file mode 100644
index 0000000000..fae39fd393
--- /dev/null
+++ b/doc/guides/tools/wireshark_extcap.rst
@@ -0,0 +1,155 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2026 Stephen Hemminger
+
+Wireshark Extcap Plugin
+=======================
+
+The ``dpdk-wireshark-extcap.py`` script is an external capture (extcap)
+plugin that lets Wireshark capture live traffic from the Ethernet ports of a
+running DPDK application. Each DPDK port appears as a capture interface in the
+Wireshark interface list, alongside the host's own network interfaces.
+
+The plugin does not attach to the DPDK application as a secondary process and
+never touches packet data itself. It connects to the application's telemetry
+socket, asks it to start capturing, and hands Wireshark's capture pipe to the
+application over that socket. The DPDK capture library writes pcapng packets
+directly into the pipe; the plugin only sets the capture up and tears it down
+when Wireshark closes the pipe.
+
+
+Requirements
+------------
+
+* A DPDK application built with the capture library and with telemetry
+ enabled. Telemetry is enabled by default.
+
+* Wireshark with extcap support.
+
+* The plugin, and therefore Wireshark, must run as the same user as the DPDK
+ application. See `Permissions`_.
+
+
+Installation
+------------
+
+For Wireshark to discover the plugin it must be present in an extcap
+directory. The configured locations are listed in Wireshark under
+*Help > About Wireshark > Folders*. Copy or symbolically link the script into
+the personal extcap directory, for example::
+
+ ln -s $RTE_SDK/usertools/dpdk-wireshark-extcap.py \
+ ~/.local/lib/wireshark/extcap/
+
+The DPDK ports then appear in the interface list the next time the capture
+options dialog is opened.
+
+
+Usage
+-----
+
+In normal use the plugin is not run by hand; Wireshark invokes it. The ports
+of a running DPDK application appear in the interface list as
+``DPDK <name> (port <N>)``, where ``<name>`` is the device name reported by
+the application, such as ``net_tap0``. Selecting a port and starting the
+capture is all that is required.
+
+The plugin can also be run directly, which is useful for confirming that a
+DPDK application is reachable::
+
+ $ usertools/dpdk-wireshark-extcap.py --extcap-interfaces
+ extcap {version=0.1}{display=DPDK telemetry capture}
+ interface {value=dpdk:0}{display=DPDK net_tap0 (port 0)}
+
+
+Capture options
+---------------
+
+The following options are offered in the Wireshark capture options dialog for
+a DPDK interface:
+
+Snapshot length
+ Number of bytes captured from each packet. ``0`` captures the whole
+ packet. The default is 262144.
+
+Capture filter
+ A libpcap filter expression, applied by the DPDK application to the
+ captured traffic.
+
+
+Permissions
+-----------
+
+The DPDK runtime directory is created mode ``0700``, so only the user that
+started the DPDK application can reach its telemetry socket. Wireshark, and
+the plugin it launches, must run as that same user. Run as a different user,
+the interface list is simply empty; running the plugin directly with
+``--extcap-interfaces`` prints a diagnostic to standard error explaining the
+permission failure.
+
+No privilege beyond access to the telemetry socket is required: if you can
+run ``dpdk-dumpcap`` against an application, you can capture from it with this
+plugin.
+
+
+Selecting a DPDK application
+----------------------------
+
+A host usually runs a single DPDK application, started with the default
+file-prefix, and no configuration is needed: its ports appear automatically.
+
+Running several DPDK applications on one host is uncommon. Each primary
+process needs its own dedicated cores, memory, and network ports, so it is
+generally done only on large hosts deliberately partitioned for the purpose.
+In that case each application is started with a distinct ``--file-prefix`` so
+that its runtime state is kept separate.
+
+Each file-prefix is an independent namespace, much like a network namespace.
+The plugin operates within exactly one of them at a time and lists only the
+ports of the application using that prefix. The prefix is selected by the
+``DPDK_EXTCAP_FILE_PREFIX`` environment variable, which corresponds to the EAL
+``--file-prefix`` option and defaults to ``rte`` (the EAL default). It must be
+present in the environment that Wireshark inherits, so it has to be set before
+Wireshark is launched, not from within the capture dialog::
+
+ DPDK_EXTCAP_FILE_PREFIX=myapp wireshark
+
+The prefix cannot be chosen per capture from the Wireshark GUI, by design.
+Wireshark builds the interface list once, before any interface or its options
+are selected, so the prefix must be known at enumeration time. It is also
+deliberately not a per-interface option: the device names in the list are
+resolved against one application, and a per-capture override would let the
+name shown disagree with the port actually captured.
+
+
+Environment variables
+----------------------
+
+``DPDK_EXTCAP_FILE_PREFIX``
+ Selects which DPDK application, by EAL file-prefix, the plugin operates
+ on. Defaults to ``rte``. See `Selecting a DPDK application`_.
+
+``DPDK_EXTCAP_PATH``
+ Overrides the base DPDK runtime directory that holds the per-prefix
+ subdirectories. Use it when the runtime directory is in a non-standard
+ location. It composes with ``DPDK_EXTCAP_FILE_PREFIX``: this variable
+ gives the base directory, the prefix selects the subdirectory within it.
+
+
+Troubleshooting
+---------------
+
+The DPDK ports do not appear in Wireshark
+ Confirm the application is running and was built with the capture library
+ and telemetry. Confirm Wireshark runs as the same user as the application;
+ see `Permissions`_. If the application was started with a non-default
+ ``--file-prefix``, set ``DPDK_EXTCAP_FILE_PREFIX`` to match before
+ launching Wireshark; see `Selecting a DPDK application`_.
+
+ Running the plugin directly with ``--extcap-interfaces`` prints
+ diagnostics to standard error that the Wireshark GUI does not surface.
+
+A port is listed as ``portN`` instead of a device name
+ The port was reported by the application, but its details could not be
+ read, usually because the application stopped between listing and naming
+ its ports. A capture started against it will fail; restart the
+ application.
diff --git a/usertools/dpdk-wireshark-extcap.py b/usertools/dpdk-wireshark-extcap.py
new file mode 100755
index 0000000000..2d710bdf5c
--- /dev/null
+++ b/usertools/dpdk-wireshark-extcap.py
@@ -0,0 +1,274 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-or-later
+# Copyright(c) 2026 Stephen Hemminger
+
+"""
+Wireshark extcap plugin for live capture from DPDK ethdev ports.
+
+Capture path: this plugin opens the FIFO that Wireshark hands it, then passes
+that file descriptor to the DPDK primary process over the telemetry socket
+(via SCM_RIGHTS). The DPDK 'capture' library writes pcapng straight into the
+FIFO; this plugin never touches packet data. Teardown is implicit: when
+Wireshark closes the read end, both the DPDK writer and this plugin see the
+hangup.
+
+Interface values are encoded as 'dpdk:<port>'. The DPDK file-prefix is
+ambient, not part of the interface value: it comes from
+DPDK_EXTCAP_FILE_PREFIX (default 'rte') in the environment Wireshark inherits,
+so one invocation is scoped to a single primary like a namespace. See
+doc/guides/tools/wireshark_extcap.rst for the rationale and the multi-prefix
+case.
+"""
+
+import argparse
+import array
+import json
+import os
+import select
+import signal
+import socket
+import sys
+
+EXTCAP_VERSION = "0.1"
+TELEMETRY_SOCKET = "dpdk_telemetry.v2"
+CAPTURE_CMD = "/ethdev/capture/start"
+ETHDEV_LIST = "/ethdev/list"
+ETHDEV_INFO = "/ethdev/info"
+DEFAULT_SNAPLEN = 262144
+DEFAULT_PREFIX = "rte" # EAL HUGEFILE_PREFIX_DEFAULT
+DLT_EN10MB = 1
+
+
+# --- DPDK runtime directory / socket discovery ---------------------------
+
+
+def dpdk_dir():
+ """Directory holding the per-file-prefix runtime subdirectories."""
+ override = os.environ.get("DPDK_EXTCAP_PATH")
+ if override:
+ return override
+ if os.geteuid() == 0:
+ base = "/var/run"
+ else:
+ base = os.environ.get("XDG_RUNTIME_DIR", "/tmp")
+ return os.path.join(base, "dpdk")
+
+
+def file_prefix():
+ """The EAL file-prefix to operate on; see the module docstring."""
+ return os.environ.get("DPDK_EXTCAP_FILE_PREFIX", DEFAULT_PREFIX)
+
+
+def socket_path():
+ return os.path.join(dpdk_dir(), file_prefix(), TELEMETRY_SOCKET)
+
+
+# --- Telemetry transport -------------------------------------------------
+
+
+class Telemetry:
+ """Minimal client for the DPDK v2 telemetry socket (SOCK_SEQPACKET)."""
+
+ def __init__(self, path):
+ self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_SEQPACKET)
+ self.sock.connect(path)
+ info = json.loads(self.sock.recv(1024).decode())
+ self.max_output_len = info.get("max_output_len", 16384)
+ self.pid = info.get("pid")
+ self.version = info.get("version")
+
+ def command(self, cmd, fds=None):
+ """Send a command, optionally with file descriptors as ancillary data.
+
+ Returns the decoded JSON reply, or None if the peer sent nothing.
+ """
+ if fds:
+ fd_arr = array.array("i", fds)
+ self.sock.sendmsg(
+ [cmd.encode()], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, fd_arr)]
+ )
+ else:
+ self.sock.send(cmd.encode())
+
+ reply = self.sock.recv(self.max_output_len)
+ if not reply:
+ return None
+ return json.loads(reply.decode())
+
+ def close(self):
+ self.sock.close()
+
+
+# --- extcap query operations --------------------------------------------
+
+
+def port_name(tel, port):
+ """Device name for a port via /ethdev/info, or 'port<N>' if unreadable."""
+ try:
+ reply = tel.command(f"{ETHDEV_INFO},{port}")
+ except OSError:
+ reply = None
+ info = (reply or {}).get(ETHDEV_INFO) or {}
+ return info.get("name") or f"port{port}"
+
+
+def cmd_interfaces():
+ print(f"extcap {{version={EXTCAP_VERSION}}}{{display=DPDK telemetry capture}}")
+ path = socket_path()
+ try:
+ tel = Telemetry(path)
+ except FileNotFoundError:
+ # No telemetry socket -> no DPDK primary with this file-prefix.
+ return
+ except PermissionError:
+ # The runtime dir is mode 0700; a different user cannot traverse it.
+ sys.stderr.write(
+ f"cannot access {path}: permission denied. The DPDK runtime "
+ "directory is created mode 0700, so capture must run as the same "
+ "user as the DPDK application (or set DPDK_EXTCAP_PATH / "
+ "DPDK_EXTCAP_FILE_PREFIX).\n"
+ )
+ return
+
+ # One connection for the whole enumeration: list the ports, then name
+ # each over the same socket (each telemetry connection costs the primary
+ # a handler thread).
+ try:
+ reply = tel.command(ETHDEV_LIST)
+ ports = (reply or {}).get(ETHDEV_LIST) or []
+ for port in ports:
+ name = port_name(tel, port)
+ print(
+ f"interface {{value=dpdk:{port}}}"
+ f"{{display=DPDK {name} (port {port})}}"
+ )
+ except OSError as e:
+ sys.stderr.write(f"cannot query {path}: {e}\n")
+ finally:
+ tel.close()
+
+
+def cmd_dlts(_iface):
+ print(f"dlt {{number={DLT_EN10MB}}}{{name=EN10MB}}{{display=Ethernet}}")
+
+
+def cmd_config(_iface):
+ print(
+ f"arg {{number=0}}{{call=--snaplen}}{{display=Snapshot length}}"
+ f"{{tooltip=Bytes captured per packet (0 = whole packet)}}"
+ f"{{type=integer}}{{range=0,{DEFAULT_SNAPLEN}}}"
+ f"{{default={DEFAULT_SNAPLEN}}}{{group=Capture}}"
+ )
+
+
+# --- capture -------------------------------------------------------------
+
+
+def parse_iface(iface):
+ """Return the port number from a 'dpdk:<port>' interface value."""
+ scheme, sep, port = iface.partition(":")
+ if scheme != "dpdk" or not sep:
+ raise SystemExit(f"unsupported interface '{iface}'")
+ try:
+ return int(port)
+ except ValueError:
+ raise SystemExit(f"malformed interface '{iface}'")
+
+
+def wait_for_stop(fifo_fd):
+ """Block until Wireshark stops us: either it closes the FIFO read end
+ (POLLERR on our write fd) or it sends SIGINT/SIGTERM."""
+ rd, wr = os.pipe()
+ os.set_blocking(wr, False)
+ signal.set_wakeup_fd(wr)
+ for sig in (signal.SIGINT, signal.SIGTERM):
+ signal.signal(sig, lambda *_: None)
+
+ poller = select.poll()
+ poller.register(fifo_fd, select.POLLERR)
+ poller.register(rd, select.POLLIN)
+ poller.poll()
+
+ signal.set_wakeup_fd(-1)
+ os.close(rd)
+ os.close(wr)
+
+
+def cmd_capture(iface, fifo, snaplen, cfilter):
+ port = parse_iface(iface)
+ path = socket_path()
+
+ # Open the FIFO Wireshark created; this blocks until it has the read end.
+ fifo_fd = os.open(fifo, os.O_WRONLY)
+
+ try:
+ tel = Telemetry(path)
+ except OSError as e:
+ os.close(fifo_fd)
+ raise SystemExit(f"cannot connect to DPDK telemetry at {path}: {e}")
+
+ params = [str(port)]
+ if snaplen is not None:
+ params.append(f"snaplen={snaplen}")
+ if cfilter:
+ params.append(f"filter={cfilter}")
+ cmd = CAPTURE_CMD + "," + ",".join(params)
+
+ try:
+ tel.command(cmd, fds=[fifo_fd])
+ except OSError as e:
+ os.close(fifo_fd)
+ tel.close()
+ raise SystemExit(f"capture start failed: {e}")
+
+ # DPDK now holds its own dup of the FIFO write end. We keep ours only as a
+ # hangup sentinel: when Wireshark closes the read end we get POLLERR, the
+ # same event that stops the DPDK-side writer.
+ wait_for_stop(fifo_fd)
+
+ os.close(fifo_fd)
+ tel.close()
+
+
+# --- entry point ---------------------------------------------------------
+
+
+def main():
+ p = argparse.ArgumentParser(
+ prog="dpdk-wireshark-extcap.py",
+ allow_abbrev=False,
+ description="Wireshark extcap plugin for live packet capture from the "
+ "Ethernet ports of a running DPDK application. Normally "
+ "invoked by Wireshark; see the DPDK Wireshark extcap guide.",
+ )
+ p.add_argument("--version", action="version", version=f"%(prog)s {EXTCAP_VERSION}")
+
+ p.add_argument("--extcap-interfaces", action="store_true")
+ p.add_argument("--extcap-dlts", action="store_true")
+ p.add_argument("--extcap-config", action="store_true")
+ p.add_argument("--capture", action="store_true")
+ p.add_argument("--extcap-interface")
+ p.add_argument("--fifo")
+ p.add_argument("--extcap-capture-filter")
+ p.add_argument("--extcap-version")
+ p.add_argument("--snaplen", type=int)
+ args, _ = p.parse_known_args()
+
+ if args.extcap_interfaces:
+ cmd_interfaces()
+ elif args.extcap_dlts:
+ cmd_dlts(args.extcap_interface)
+ elif args.extcap_config:
+ cmd_config(args.extcap_interface)
+ elif args.capture:
+ if not args.extcap_interface or not args.fifo:
+ raise SystemExit("--capture requires --extcap-interface and --fifo")
+ cmd_capture(
+ args.extcap_interface, args.fifo, args.snaplen, args.extcap_capture_filter
+ )
+ else:
+ raise SystemExit("no extcap operation specified")
+
+
+if __name__ == "__main__":
+ main()
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-09 21:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 21:02 [RFC 0/4] alternative capture mechanism Stephen Hemminger
2026-06-09 21:02 ` [RFC 1/4] telemetry: allow commands to receive file descriptors Stephen Hemminger
2026-06-09 21:02 ` [RFC 2/4] capture: infrastructure wireshark packet capture Stephen Hemminger
2026-06-09 21:02 ` [RFC 3/4] test: add test for capture hooks Stephen Hemminger
2026-06-09 21:02 ` [RFC 4/4] usertools/dpdk-wireshark-extcap.py: script for external capture Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox