* [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets
@ 2025-06-04 11:29 Daniel Borkmann via
2025-06-04 11:29 ` [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation Daniel Borkmann via
2025-06-17 11:59 ` [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Ilya Maximets
0 siblings, 2 replies; 8+ messages in thread
From: Daniel Borkmann via @ 2025-06-04 11:29 UTC (permalink / raw)
To: qemu-devel; +Cc: daniel, Ilya Maximets, Jason Wang, Anton Protopopov
Extend 'inhibit=on' setting with the option to specify a pinned XSK map
path along with a starting index (default 0) to push the created XSK
sockets into. Example usage:
# ./build/qemu-system-x86_64 [...] \
-netdev af-xdp,ifname=enp2s0f0np0,id=net0,mode=native,queues=2,start-queue=14,inhibit=on,map-path=/sys/fs/bpf/xsks_map,map-start-index=14 \
-device virtio-net-pci,netdev=net0 [...]
This is useful for the case where an existing XDP program with XSK map
is present on the AF_XDP supported phys device and the XSK map is not
yet populated. For example, the former could have been pre-loaded onto
the netdevice by a control plane, which later launches qemu to populate
it with XSK sockets.
Normally, the main idea behind 'inhibit=on' is that the qemu instance
doesn't need to have a lot of privileges to use the pre-loaded program
and the pre-created sockets, but this mentioned use-case here is different
where qemu still needs privileges to create the sockets.
The 'map-start-index' parameter is optional and defaults to 0. It allows
flexible placement of the XSK sockets, and is up to the user to specify
when the XDP program with XSK map was already preloaded. In the simplest
case the queue-to-map-slot mapping is just 1:1 based on ctx->rx_queue_index
but the user might as well have a different scheme (or smaller map size,
e.g. ctx->rx_queue_index % max_size) to push the inbound traffic to one
of the XSK sockets.
Note that the bpf_xdp_query_id() is now only tested for 'inhibit=off'
since only in the latter case the libxdp takes care of installing the
XDP program which was installed based on the s->xdp_flags pointing to
either driver or skb mode. For 'inhibit=on' we don't make any assumptions
and neither go down the path of probing all possible options in which
way the user installed the XDP program.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
---
net/af-xdp.c | 80 +++++++++++++++++++++++++++++++++++++++++++++----
qapi/net.json | 29 +++++++++++-------
qemu-options.hx | 23 ++++++++++++--
3 files changed, 114 insertions(+), 18 deletions(-)
diff --git a/net/af-xdp.c b/net/af-xdp.c
index 01c5fb914e..b83d9bc47f 100644
--- a/net/af-xdp.c
+++ b/net/af-xdp.c
@@ -52,6 +52,10 @@ typedef struct AFXDPState {
uint32_t n_queues;
uint32_t xdp_flags;
bool inhibit;
+
+ char *map_path;
+ int map_fd;
+ uint32_t map_start_index;
} AFXDPState;
#define AF_XDP_BATCH_SIZE 64
@@ -261,6 +265,7 @@ static void af_xdp_send(void *opaque)
static void af_xdp_cleanup(NetClientState *nc)
{
AFXDPState *s = DO_UPCAST(AFXDPState, nc, nc);
+ int idx;
qemu_purge_queued_packets(nc);
@@ -282,6 +287,18 @@ static void af_xdp_cleanup(NetClientState *nc)
"af-xdp: unable to remove XDP program from '%s', ifindex: %d\n",
s->ifname, s->ifindex);
}
+
+ if (s->map_fd >= 0) {
+ idx = nc->queue_index + s->map_start_index;
+ if (bpf_map_delete_elem(s->map_fd, &idx)) {
+ fprintf(stderr, "af-xdp: unable to remove AF_XDP socket from map %s\n",
+ s->map_path);
+ }
+ close(s->map_fd);
+ s->map_fd = -1;
+ }
+ g_free(s->map_path);
+ s->map_path = NULL;
}
static int af_xdp_umem_create(AFXDPState *s, int sock_fd, Error **errp)
@@ -345,7 +362,6 @@ static int af_xdp_socket_create(AFXDPState *s,
};
int queue_id, error = 0;
- s->inhibit = opts->has_inhibit && opts->inhibit;
if (s->inhibit) {
cfg.libxdp_flags |= XSK_LIBXDP_FLAGS__INHIBIT_PROG_LOAD;
}
@@ -396,6 +412,34 @@ static int af_xdp_socket_create(AFXDPState *s,
return 0;
}
+static int af_xdp_update_xsk_map(AFXDPState *s, Error **errp)
+{
+ int xsk_fd, idx, error = 0;
+
+ if (!s->map_path)
+ return 0;
+
+ s->map_fd = bpf_obj_get(s->map_path);
+ if (s->map_fd < 0) {
+ error = errno;
+ } else {
+ xsk_fd = xsk_socket__fd(s->xsk);
+ idx = s->nc.queue_index + s->map_start_index;
+ if (bpf_map_update_elem(s->map_fd, &idx, &xsk_fd, 0)) {
+ error = errno;
+ }
+ }
+
+ if (error) {
+ error_setg_errno(errp, error,
+ "failed to insert AF_XDP socket into map %s",
+ s->map_path);
+ return -1;
+ }
+
+ return 0;
+}
+
/* NetClientInfo methods. */
static NetClientInfo net_af_xdp_info = {
.type = NET_CLIENT_DRIVER_AF_XDP,
@@ -450,6 +494,7 @@ int net_init_af_xdp(const Netdev *netdev,
int64_t i, queues;
Error *err = NULL;
AFXDPState *s;
+ bool inhibit;
ifindex = if_nametoindex(opts->ifname);
if (!ifindex) {
@@ -465,8 +510,21 @@ int net_init_af_xdp(const Netdev *netdev,
return -1;
}
- if ((opts->has_inhibit && opts->inhibit) != !!opts->sock_fds) {
- error_setg(errp, "'inhibit=on' requires 'sock-fds' and vice versa");
+ inhibit = opts->has_inhibit && opts->inhibit;
+ if (inhibit && !opts->sock_fds && !opts->map_path) {
+ error_setg(errp, "'inhibit=on' requires 'sock-fds' or 'map-path'");
+ return -1;
+ }
+ if (!inhibit && (opts->sock_fds || opts->map_path)) {
+ error_setg(errp, "'sock-fds' and 'map-path' require 'inhibit=on'");
+ return -1;
+ }
+ if (opts->sock_fds && opts->map_path) {
+ error_setg(errp, "'sock-fds' and 'map-path' are mutually exclusive");
+ return -1;
+ }
+ if (!opts->map_path && opts->has_map_start_index) {
+ error_setg(errp, "'map-start-index' requires 'map-path'");
return -1;
}
@@ -492,8 +550,18 @@ int net_init_af_xdp(const Netdev *netdev,
s->ifindex = ifindex;
s->n_queues = queues;
- if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp)
- || af_xdp_socket_create(s, opts, errp)) {
+ s->inhibit = inhibit;
+
+ s->map_path = g_strdup(opts->map_path);
+ s->map_fd = -1;
+ s->map_start_index = 0;
+ if (opts->has_map_start_index && opts->map_start_index > 0) {
+ s->map_start_index = opts->map_start_index;
+ }
+
+ if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp) ||
+ af_xdp_socket_create(s, opts, errp) ||
+ af_xdp_update_xsk_map(s, errp)) {
/* Make sure the XDP program will be removed. */
s->n_queues = i;
error_propagate(errp, err);
@@ -501,7 +569,7 @@ int net_init_af_xdp(const Netdev *netdev,
}
}
- if (nc0) {
+ if (nc0 && !inhibit) {
s = DO_UPCAST(AFXDPState, nc, nc0);
if (bpf_xdp_query_id(s->ifindex, s->xdp_flags, &prog_id) || !prog_id) {
error_setg_errno(errp, errno,
diff --git a/qapi/net.json b/qapi/net.json
index 310cc4fd19..4ebc9580d8 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -454,25 +454,34 @@
# (default: 0).
#
# @inhibit: Don't load a default XDP program, use one already loaded
-# to the interface (default: false). Requires @sock-fds.
+# to the interface (default: false). Requires @sock-fds or @map-path.
#
# @sock-fds: A colon (:) separated list of file descriptors for
# already open but not bound AF_XDP sockets in the queue order.
# One fd per queue. These descriptors should already be added
-# into XDP socket map for corresponding queues. Requires
-# @inhibit.
+# into XDP socket map for corresponding queues. @sock-fds and
+# @map-path are mutually exclusive. Requires @inhibit.
+#
+# @map-path: The path to a pinned xsk map to push file descriptors
+# for bound AF_XDP sockets into. @map-path and @sock-fds are
+# mutually exclusive. Requires @inhibit. (Since 10.1)
+#
+# @map-start-index: Use @map-path to insert xsk sockets starting from
+# this index number (default: 0). Requires @map-path. (Since 10.1)
#
# Since: 8.2
##
{ 'struct': 'NetdevAFXDPOptions',
'data': {
- 'ifname': 'str',
- '*mode': 'AFXDPMode',
- '*force-copy': 'bool',
- '*queues': 'int',
- '*start-queue': 'int',
- '*inhibit': 'bool',
- '*sock-fds': 'str' },
+ 'ifname': 'str',
+ '*mode': 'AFXDPMode',
+ '*force-copy': 'bool',
+ '*queues': 'int',
+ '*start-queue': 'int',
+ '*inhibit': 'bool',
+ '*sock-fds': 'str',
+ '*map-path': 'str',
+ '*map-start-index': 'int' },
'if': 'CONFIG_AF_XDP' }
##
diff --git a/qemu-options.hx b/qemu-options.hx
index 7eb8e02b4b..a2eb34f483 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2909,6 +2909,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
#ifdef CONFIG_AF_XDP
"-netdev af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off]\n"
" [,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z]\n"
+ " [,map-path=/path/to/socket/map][,map-start-index=i]\n"
" attach to the existing network interface 'name' with AF_XDP socket\n"
" use 'mode=MODE' to specify an XDP program attach mode\n"
" use 'force-copy=on|off' to force XDP copy mode even if device supports zero-copy (default: off)\n"
@@ -2916,6 +2917,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
" with inhibit=on,\n"
" use 'sock-fds' to provide file descriptors for already open AF_XDP sockets\n"
" added to a socket map in XDP program. One socket per queue.\n"
+ " use 'map-path' to provide the socket map location to populate AF_XDP sockets with\n"
+ " beginning from the 'map-start-index' specified index (default: 0) (Since 10.1)\n"
" use 'queues=n' to specify how many queues of a multiqueue interface should be used\n"
" use 'start-queue=m' to specify the first queue that should be used\n"
#endif
@@ -3610,7 +3613,7 @@ SRST
# launch QEMU instance
|qemu_system| linux.img -nic vde,sock=/tmp/myswitch
-``-netdev af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off][,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z]``
+``-netdev af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off][,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z][,map-path=/path/to/socket/map][,map-start-index=i]``
Configure AF_XDP backend to connect to a network interface 'name'
using AF_XDP socket. A specific program attach mode for a default
XDP program can be forced with 'mode', defaults to best-effort,
@@ -3650,7 +3653,8 @@ SRST
-netdev af-xdp,id=n1,ifname=eth0,queues=1,start-queue=1
XDP program can also be loaded externally. In this case 'inhibit' option
- should be set to 'on' and 'sock-fds' provided with file descriptors for
+ should be set to 'on'. Either 'sock-fds' or 'map-path' can be used with
+ 'inhibit' enabled. 'sock-fds' can be provided with file descriptors for
already open but not bound XDP sockets already added to a socket map for
corresponding queues. One socket per queue.
@@ -3659,6 +3663,21 @@ SRST
|qemu_system| linux.img -device virtio-net-pci,netdev=n1 \\
-netdev af-xdp,id=n1,ifname=eth0,queues=3,inhibit=on,sock-fds=15:16:17
+ For the 'inhibit' option set to 'on' used together with 'map-path' it is
+ expected that the XDP program with the socket map is already loaded on
+ the networking device and the map pinned into BPF file system. The path
+ to the pinned map is then passed to QEMU which then creates the file
+ descriptors and inserts them into the existing socket map.
+
+ .. parsed-literal::
+
+ |qemu_system| linux.img -device virtio-net-pci,netdev=n1 \\
+ -netdev af-xdp,id=n1,ifname=eth0,queues=2,inhibit=on,map-path=/sys/fs/bpf/xsks_map
+
+ Additionally, 'map-start-index' can be used to specify the start offset
+ for insertion into the socket map. The combination of 'map-path' and
+ 'sock-fds' together is not supported.
+
``-netdev vhost-user,chardev=id[,vhostforce=on|off][,queues=n]``
Establish a vhost-user netdev, backed by a chardev id. The chardev
should be a unix domain socket backed one. The vhost-user uses a
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation
2025-06-04 11:29 [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Daniel Borkmann via
@ 2025-06-04 11:29 ` Daniel Borkmann via
2025-06-17 11:59 ` Ilya Maximets
2025-06-17 11:59 ` [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Ilya Maximets
1 sibling, 1 reply; 8+ messages in thread
From: Daniel Borkmann via @ 2025-06-04 11:29 UTC (permalink / raw)
To: qemu-devel; +Cc: daniel, Ilya Maximets, Jason Wang, Anton Protopopov
While testing, it turned out that upon error in the queue creation loop,
we never trigger the af_xdp_cleanup() handler. This is because we pass
errp instead of a local err pointer into the various AF_XDP setup functions
instead of a scheme like:
bool fn(..., Error **errp)
{
Error *err = NULL;
foo(arg, &err);
if (err) {
handle the error...
error_propagate(errp, err);
return false;
}
...
}
With a conversion into the above format, the af_xdp_cleanup() handler is
called as expected. Also, making sure the XDP program will be removed does
require to set s->n_queues to i + 1 since the test is nc->queue_index ==
s->n_queues - 1, where nc->queue_index was set to i earlier. With both
fixed the cleanup triggers as expected. Note the error_propagate() handles
a NULL err internally.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
---
net/af-xdp.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/af-xdp.c b/net/af-xdp.c
index b83d9bc47f..5d9857fdd8 100644
--- a/net/af-xdp.c
+++ b/net/af-xdp.c
@@ -559,12 +559,11 @@ int net_init_af_xdp(const Netdev *netdev,
s->map_start_index = opts->map_start_index;
}
- if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp) ||
- af_xdp_socket_create(s, opts, errp) ||
- af_xdp_update_xsk_map(s, errp)) {
+ if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, &err) ||
+ af_xdp_socket_create(s, opts, &err) ||
+ af_xdp_update_xsk_map(s, &err)) {
/* Make sure the XDP program will be removed. */
- s->n_queues = i;
- error_propagate(errp, err);
+ s->n_queues = i + 1;
goto err;
}
}
@@ -586,6 +585,7 @@ int net_init_af_xdp(const Netdev *netdev,
err:
if (nc0) {
qemu_del_net_client(nc0);
+ error_propagate(errp, err);
}
return -1;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets
2025-06-04 11:29 [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Daniel Borkmann via
2025-06-04 11:29 ` [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation Daniel Borkmann via
@ 2025-06-17 11:59 ` Ilya Maximets
2025-06-17 12:43 ` Daniel Borkmann via
1 sibling, 1 reply; 8+ messages in thread
From: Ilya Maximets @ 2025-06-17 11:59 UTC (permalink / raw)
To: Daniel Borkmann, qemu-devel; +Cc: i.maximets, Jason Wang, Anton Protopopov
On 6/4/25 1:29 PM, Daniel Borkmann wrote:
> Extend 'inhibit=on' setting with the option to specify a pinned XSK map
> path along with a starting index (default 0) to push the created XSK
> sockets into. Example usage:
>
> # ./build/qemu-system-x86_64 [...] \
> -netdev af-xdp,ifname=enp2s0f0np0,id=net0,mode=native,queues=2,start-queue=14,inhibit=on,map-path=/sys/fs/bpf/xsks_map,map-start-index=14 \
> -device virtio-net-pci,netdev=net0 [...]
>
> This is useful for the case where an existing XDP program with XSK map
> is present on the AF_XDP supported phys device and the XSK map is not
> yet populated. For example, the former could have been pre-loaded onto
> the netdevice by a control plane, which later launches qemu to populate
> it with XSK sockets.
>
> Normally, the main idea behind 'inhibit=on' is that the qemu instance
> doesn't need to have a lot of privileges to use the pre-loaded program
> and the pre-created sockets, but this mentioned use-case here is different
> where qemu still needs privileges to create the sockets.
>
> The 'map-start-index' parameter is optional and defaults to 0. It allows
> flexible placement of the XSK sockets, and is up to the user to specify
> when the XDP program with XSK map was already preloaded. In the simplest
> case the queue-to-map-slot mapping is just 1:1 based on ctx->rx_queue_index
> but the user might as well have a different scheme (or smaller map size,
> e.g. ctx->rx_queue_index % max_size) to push the inbound traffic to one
> of the XSK sockets.
>
> Note that the bpf_xdp_query_id() is now only tested for 'inhibit=off'
> since only in the latter case the libxdp takes care of installing the
> XDP program which was installed based on the s->xdp_flags pointing to
> either driver or skb mode. For 'inhibit=on' we don't make any assumptions
> and neither go down the path of probing all possible options in which
> way the user installed the XDP program.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Ilya Maximets <i.maximets@ovn.org>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Anton Protopopov <aspsk@isovalent.com>
> ---
> net/af-xdp.c | 80 +++++++++++++++++++++++++++++++++++++++++++++----
> qapi/net.json | 29 +++++++++++-------
> qemu-options.hx | 23 ++++++++++++--
> 3 files changed, 114 insertions(+), 18 deletions(-)
Hi, Daniel. Thanks for v3!
checkpatch complains about 2 issues in the patch - a missing braces and
onle too long line that can be wrapped.
There also seems to be some issue on a cleanup path that you attempt to
fix in the second patch, so I reply about it there.
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation
2025-06-04 11:29 ` [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation Daniel Borkmann via
@ 2025-06-17 11:59 ` Ilya Maximets
2025-06-17 13:03 ` Daniel Borkmann via
0 siblings, 1 reply; 8+ messages in thread
From: Ilya Maximets @ 2025-06-17 11:59 UTC (permalink / raw)
To: Daniel Borkmann, qemu-devel; +Cc: i.maximets, Jason Wang, Anton Protopopov
On 6/4/25 1:29 PM, Daniel Borkmann wrote:
> While testing, it turned out that upon error in the queue creation loop,
> we never trigger the af_xdp_cleanup() handler. This is because we pass
> errp instead of a local err pointer into the various AF_XDP setup functions
> instead of a scheme like:
>
> bool fn(..., Error **errp)
> {
> Error *err = NULL;
>
> foo(arg, &err);
> if (err) {
> handle the error...
> error_propagate(errp, err);
> return false;
> }
> ...
> }
>
> With a conversion into the above format, the af_xdp_cleanup() handler is
> called as expected.
How exactly this prevents calling the cleanup function? I don't see the
errp being checked anywhere in the qemu_del_net_client() path.
Could you provide a more detailed call sequence description where the cleanup
is not called?
I agree thought that the local err variable is actually unused. We should
be able to just remove it and remove the error_propagate() call as well.
> Also, making sure the XDP program will be removed does
> require to set s->n_queues to i + 1 since the test is nc->queue_index ==
> s->n_queues - 1, where nc->queue_index was set to i earlier.
The idea behind 'i' instead of 'i + 1' was that if either af_xdp_umem_create()
or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the
last queue. And without it we can't remove the program, so we remove it while
destroying the last actually configured queue. And this is OK, because the
failed queue was not added to the program, and if the af_xdp_socket_create()
fails for the very first queue, then we don't have a program loaded at all.
With the new changes in this patch set, we have an extra function that can fail,
which is a new af_xdp_update_xsk_map(), and only if this one fails, we need to
remove the program while cleaning up the current failed queue, since it was
already created and xdp_flags are available.
If we get this patch as-is and the af_xdp_socket_create() fails, we will not
remove the program, AFAICT.
Or am I missing something?
Best regards, Ilya Maximets.
> With both
> fixed the cleanup triggers as expected. Note the error_propagate() handles
> a NULL err internally.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Ilya Maximets <i.maximets@ovn.org>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Anton Protopopov <aspsk@isovalent.com>
> ---
> net/af-xdp.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/net/af-xdp.c b/net/af-xdp.c
> index b83d9bc47f..5d9857fdd8 100644
> --- a/net/af-xdp.c
> +++ b/net/af-xdp.c
> @@ -559,12 +559,11 @@ int net_init_af_xdp(const Netdev *netdev,
> s->map_start_index = opts->map_start_index;
> }
>
> - if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp) ||
> - af_xdp_socket_create(s, opts, errp) ||
> - af_xdp_update_xsk_map(s, errp)) {
> + if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, &err) ||
> + af_xdp_socket_create(s, opts, &err) ||
> + af_xdp_update_xsk_map(s, &err)) {
> /* Make sure the XDP program will be removed. */
> - s->n_queues = i;
> - error_propagate(errp, err);
> + s->n_queues = i + 1;
> goto err;
> }
> }
> @@ -586,6 +585,7 @@ int net_init_af_xdp(const Netdev *netdev,
> err:
> if (nc0) {
> qemu_del_net_client(nc0);
> + error_propagate(errp, err);
> }
>
> return -1;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets
2025-06-17 11:59 ` [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Ilya Maximets
@ 2025-06-17 12:43 ` Daniel Borkmann via
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Borkmann via @ 2025-06-17 12:43 UTC (permalink / raw)
To: Ilya Maximets, qemu-devel; +Cc: Jason Wang, Anton Protopopov
On 6/17/25 1:59 PM, Ilya Maximets wrote:
> On 6/4/25 1:29 PM, Daniel Borkmann wrote:
>> Extend 'inhibit=on' setting with the option to specify a pinned XSK map
>> path along with a starting index (default 0) to push the created XSK
>> sockets into. Example usage:
>>
>> # ./build/qemu-system-x86_64 [...] \
>> -netdev af-xdp,ifname=enp2s0f0np0,id=net0,mode=native,queues=2,start-queue=14,inhibit=on,map-path=/sys/fs/bpf/xsks_map,map-start-index=14 \
>> -device virtio-net-pci,netdev=net0 [...]
>>
>> This is useful for the case where an existing XDP program with XSK map
>> is present on the AF_XDP supported phys device and the XSK map is not
>> yet populated. For example, the former could have been pre-loaded onto
>> the netdevice by a control plane, which later launches qemu to populate
>> it with XSK sockets.
>>
>> Normally, the main idea behind 'inhibit=on' is that the qemu instance
>> doesn't need to have a lot of privileges to use the pre-loaded program
>> and the pre-created sockets, but this mentioned use-case here is different
>> where qemu still needs privileges to create the sockets.
>>
>> The 'map-start-index' parameter is optional and defaults to 0. It allows
>> flexible placement of the XSK sockets, and is up to the user to specify
>> when the XDP program with XSK map was already preloaded. In the simplest
>> case the queue-to-map-slot mapping is just 1:1 based on ctx->rx_queue_index
>> but the user might as well have a different scheme (or smaller map size,
>> e.g. ctx->rx_queue_index % max_size) to push the inbound traffic to one
>> of the XSK sockets.
>>
>> Note that the bpf_xdp_query_id() is now only tested for 'inhibit=off'
>> since only in the latter case the libxdp takes care of installing the
>> XDP program which was installed based on the s->xdp_flags pointing to
>> either driver or skb mode. For 'inhibit=on' we don't make any assumptions
>> and neither go down the path of probing all possible options in which
>> way the user installed the XDP program.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Ilya Maximets <i.maximets@ovn.org>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Anton Protopopov <aspsk@isovalent.com>
>> ---
>> net/af-xdp.c | 80 +++++++++++++++++++++++++++++++++++++++++++++----
>> qapi/net.json | 29 +++++++++++-------
>> qemu-options.hx | 23 ++++++++++++--
>> 3 files changed, 114 insertions(+), 18 deletions(-)
>
> Hi, Daniel. Thanks for v3!
>
> checkpatch complains about 2 issues in the patch - a missing braces and
> onle too long line that can be wrapped.
Thanks for review, I wasn't aware of this, will fix these two in v4!
> There also seems to be some issue on a cleanup path that you attempt to
> fix in the second patch, so I reply about it there.
>
> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation
2025-06-17 11:59 ` Ilya Maximets
@ 2025-06-17 13:03 ` Daniel Borkmann via
2025-06-17 21:41 ` Ilya Maximets
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Borkmann via @ 2025-06-17 13:03 UTC (permalink / raw)
To: Ilya Maximets, qemu-devel; +Cc: Jason Wang, Anton Protopopov
On 6/17/25 1:59 PM, Ilya Maximets wrote:
> On 6/4/25 1:29 PM, Daniel Borkmann wrote:
>> While testing, it turned out that upon error in the queue creation loop,
>> we never trigger the af_xdp_cleanup() handler. This is because we pass
>> errp instead of a local err pointer into the various AF_XDP setup functions
>> instead of a scheme like:
>>
>> bool fn(..., Error **errp)
>> {
>> Error *err = NULL;
>>
>> foo(arg, &err);
>> if (err) {
>> handle the error...
>> error_propagate(errp, err);
>> return false;
>> }
>> ...
>> }
>>
>> With a conversion into the above format, the af_xdp_cleanup() handler is
>> called as expected.
>
> How exactly this prevents calling the cleanup function? I don't see the
> errp being checked anywhere in the qemu_del_net_client() path.
>
> Could you provide a more detailed call sequence description where the cleanup
> is not called?
>
> I agree thought that the local err variable is actually unused. We should
> be able to just remove it and remove the error_propagate() call as well.
Ok, I basically manually injected an error in af_xdp_{umem_create,socket_create,
update_xsk_map} and noticed that in fact none of the af_xdp_cleanup() callback
was called and qemu was exiting right away.
From reading up on the qemu error handling patterns that should be used via
include/qapi/error.h I noticed that this was due to passing in errp directly
rather than a local error variable as done in many other places in qemu code.
>> Also, making sure the XDP program will be removed does
>> require to set s->n_queues to i + 1 since the test is nc->queue_index ==
>> s->n_queues - 1, where nc->queue_index was set to i earlier.
>
> The idea behind 'i' instead of 'i + 1' was that if either af_xdp_umem_create()
> or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the
> last queue. And without it we can't remove the program, so we remove it while
> destroying the last actually configured queue. And this is OK, because the
> failed queue was not added to the program, and if the af_xdp_socket_create()
> fails for the very first queue, then we don't have a program loaded at all.
>
> With the new changes in this patch set, we have an extra function that can fail,
> which is a new af_xdp_update_xsk_map(), and only if this one fails, we need to
> remove the program while cleaning up the current failed queue, since it was
> already created and xdp_flags are available.
>
> If we get this patch as-is and the af_xdp_socket_create() fails, we will not
> remove the program, AFAICT.
I'll double check this concern and see if it can be solved (iirc we do test for
s->xdp_flags in the cleanup callback).. in case of xsk map, it should not detach
anything from an XDP program PoV (given inhibit) but rather it should remove
prior installed xsk sockets from the xsk map to not leave them around.
Best,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation
2025-06-17 13:03 ` Daniel Borkmann via
@ 2025-06-17 21:41 ` Ilya Maximets
2025-06-18 8:00 ` Ilya Maximets
0 siblings, 1 reply; 8+ messages in thread
From: Ilya Maximets @ 2025-06-17 21:41 UTC (permalink / raw)
To: Daniel Borkmann, qemu-devel; +Cc: i.maximets, Jason Wang, Anton Protopopov
On 6/17/25 3:03 PM, Daniel Borkmann wrote:
> On 6/17/25 1:59 PM, Ilya Maximets wrote:
>> On 6/4/25 1:29 PM, Daniel Borkmann wrote:
>>> While testing, it turned out that upon error in the queue creation loop,
>>> we never trigger the af_xdp_cleanup() handler. This is because we pass
>>> errp instead of a local err pointer into the various AF_XDP setup functions
>>> instead of a scheme like:
>>>
>>> bool fn(..., Error **errp)
>>> {
>>> Error *err = NULL;
>>>
>>> foo(arg, &err);
>>> if (err) {
>>> handle the error...
>>> error_propagate(errp, err);
>>> return false;
>>> }
>>> ...
>>> }
>>>
>>> With a conversion into the above format, the af_xdp_cleanup() handler is
>>> called as expected.
>>
>> How exactly this prevents calling the cleanup function? I don't see the
>> errp being checked anywhere in the qemu_del_net_client() path.
>>
>> Could you provide a more detailed call sequence description where the cleanup
>> is not called?
>>
>> I agree thought that the local err variable is actually unused. We should
>> be able to just remove it and remove the error_propagate() call as well.
>
> Ok, I basically manually injected an error in af_xdp_{umem_create,socket_create,
> update_xsk_map} and noticed that in fact none of the af_xdp_cleanup() callback
> was called and qemu was exiting right away.
>
> From reading up on the qemu error handling patterns that should be used via
> include/qapi/error.h I noticed that this was due to passing in errp directly
> rather than a local error variable as done in many other places in qemu code.
Hmm, you're right. I can reproduce this issue.
I think, this fix should be a first patch of the set and it should have
a Fixes tag on it, so it can be backported, if necessary.
>
>>> Also, making sure the XDP program will be removed does
>>> require to set s->n_queues to i + 1 since the test is nc->queue_index ==
>>> s->n_queues - 1, where nc->queue_index was set to i earlier.
>>
>> The idea behind 'i' instead of 'i + 1' was that if either af_xdp_umem_create()
>> or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the
>> last queue. And without it we can't remove the program, so we remove it while
>> destroying the last actually configured queue. And this is OK, because the
>> failed queue was not added to the program, and if the af_xdp_socket_create()
>> fails for the very first queue, then we don't have a program loaded at all.
>>
>> With the new changes in this patch set, we have an extra function that can fail,
>> which is a new af_xdp_update_xsk_map(), and only if this one fails, we need to
>> remove the program while cleaning up the current failed queue, since it was
>> already created and xdp_flags are available.
>>
>> If we get this patch as-is and the af_xdp_socket_create() fails, we will not
>> remove the program, AFAICT.
>
> I'll double check this concern and see if it can be solved (iirc we do test for
> s->xdp_flags in the cleanup callback)..
Yes, we check 'nc->queue_index == s->n_queues - 1 && s->xdp_flags'. And if the
last queue doesn't have xdp_flags (because it wasn't fully created), then the
program will not be detached.
But it seems that code is broken anyway even on the current main, because we're
setting n_queues into the last queue that never has xdp_flags on failure.
To fix that we need to do something like:
uint32_t xdp_flags = 0;
...
for each queue {
if (failed) {
s->n_queues = i + 1;
s->xdp_flags = xdp_flags;
}
xdp_flags = s->xdp_flags;
}
This way we'll have xdp_flags set for the last queue and have a proper queue number
and will be able to call bpf_xdp_detach().
One thing I do not understand is why the program is actually getting removed even
if we do not call bpf_xdp_detach()... We're not calling this function pretty much
at all, and yet, when qemu fails, there is no program left behind... It looks like
the program gets automatically detached when we call xsk_socket__delete() for the
last successfully configured queue. Which is strange, I don't remember it doing
this before.
> in case of xsk map, it should not detach
> anything from an XDP program PoV (given inhibit) but rather it should remove
> prior installed xsk sockets from the xsk map to not leave them around.
>
> Best,
> Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation
2025-06-17 21:41 ` Ilya Maximets
@ 2025-06-18 8:00 ` Ilya Maximets
0 siblings, 0 replies; 8+ messages in thread
From: Ilya Maximets @ 2025-06-18 8:00 UTC (permalink / raw)
To: Daniel Borkmann, qemu-devel; +Cc: i.maximets, Jason Wang, Anton Protopopov
On 6/17/25 11:41 PM, Ilya Maximets wrote:
> On 6/17/25 3:03 PM, Daniel Borkmann wrote:
>> On 6/17/25 1:59 PM, Ilya Maximets wrote:
>>> On 6/4/25 1:29 PM, Daniel Borkmann wrote:
>>>> While testing, it turned out that upon error in the queue creation loop,
>>>> we never trigger the af_xdp_cleanup() handler. This is because we pass
>>>> errp instead of a local err pointer into the various AF_XDP setup functions
>>>> instead of a scheme like:
>>>>
>>>> bool fn(..., Error **errp)
>>>> {
>>>> Error *err = NULL;
>>>>
>>>> foo(arg, &err);
>>>> if (err) {
>>>> handle the error...
>>>> error_propagate(errp, err);
>>>> return false;
>>>> }
>>>> ...
>>>> }
>>>>
>>>> With a conversion into the above format, the af_xdp_cleanup() handler is
>>>> called as expected.
>>>
>>> How exactly this prevents calling the cleanup function? I don't see the
>>> errp being checked anywhere in the qemu_del_net_client() path.
>>>
>>> Could you provide a more detailed call sequence description where the cleanup
>>> is not called?
>>>
>>> I agree thought that the local err variable is actually unused. We should
>>> be able to just remove it and remove the error_propagate() call as well.
>>
>> Ok, I basically manually injected an error in af_xdp_{umem_create,socket_create,
>> update_xsk_map} and noticed that in fact none of the af_xdp_cleanup() callback
>> was called and qemu was exiting right away.
>>
>> From reading up on the qemu error handling patterns that should be used via
>> include/qapi/error.h I noticed that this was due to passing in errp directly
>> rather than a local error variable as done in many other places in qemu code.
>
> Hmm, you're right. I can reproduce this issue.
>
> I think, this fix should be a first patch of the set and it should have
> a Fixes tag on it, so it can be backported, if necessary.
>
>>
>>>> Also, making sure the XDP program will be removed does
>>>> require to set s->n_queues to i + 1 since the test is nc->queue_index ==
>>>> s->n_queues - 1, where nc->queue_index was set to i earlier.
>>>
>>> The idea behind 'i' instead of 'i + 1' was that if either af_xdp_umem_create()
>>> or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the
>>> last queue. And without it we can't remove the program, so we remove it while
>>> destroying the last actually configured queue. And this is OK, because the
>>> failed queue was not added to the program, and if the af_xdp_socket_create()
>>> fails for the very first queue, then we don't have a program loaded at all.
>>>
>>> With the new changes in this patch set, we have an extra function that can fail,
>>> which is a new af_xdp_update_xsk_map(), and only if this one fails, we need to
>>> remove the program while cleaning up the current failed queue, since it was
>>> already created and xdp_flags are available.
>>>
>>> If we get this patch as-is and the af_xdp_socket_create() fails, we will not
>>> remove the program, AFAICT.
>>
>> I'll double check this concern and see if it can be solved (iirc we do test for
>> s->xdp_flags in the cleanup callback)..
>
> Yes, we check 'nc->queue_index == s->n_queues - 1 && s->xdp_flags'. And if the
> last queue doesn't have xdp_flags (because it wasn't fully created), then the
> program will not be detached.
>
> But it seems that code is broken anyway even on the current main, because we're
> setting n_queues into the last queue that never has xdp_flags on failure.
>
> To fix that we need to do something like:
>
> uint32_t xdp_flags = 0;
>
> ...
> for each queue {
> if (failed) {
> s->n_queues = i + 1;
> s->xdp_flags = xdp_flags;
> }
> xdp_flags = s->xdp_flags;
> }
>
> This way we'll have xdp_flags set for the last queue and have a proper queue number
> and will be able to call bpf_xdp_detach().
>
> One thing I do not understand is why the program is actually getting removed even
> if we do not call bpf_xdp_detach()... We're not calling this function pretty much
> at all, and yet, when qemu fails, there is no program left behind... It looks like
> the program gets automatically detached when we call xsk_socket__delete() for the
> last successfully configured queue. Which is strange, I don't remember it doing
> this before.
OK, I figured this one out. This is a "new" behavior in libxdp 1.3.0:
https://github.com/xdp-project/xdp-tools/commit/38c2914988fd5c1ef65f2381fc8af9f3e8404e2b
We require libxdp >= 1.4.0 for QEMU to build though, so we may probably just drop
the bpf_xdp_detach() call together with the n_queues hack. We're not loading the
program from QEMU side, so not removing it manually makes sense. Should be able
to drop the n_queues field from the AFXDPState as well.
>
>> in case of xsk map, it should not detach
>> anything from an XDP program PoV (given inhibit) but rather it should remove
>> prior installed xsk sockets from the xsk map to not leave them around.
>>
>> Best,
>> Daniel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-18 8:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 11:29 [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Daniel Borkmann via
2025-06-04 11:29 ` [PATCH v3 2/2] net/af-xdp: Fix up cleanup path upon failure in queue creation Daniel Borkmann via
2025-06-17 11:59 ` Ilya Maximets
2025-06-17 13:03 ` Daniel Borkmann via
2025-06-17 21:41 ` Ilya Maximets
2025-06-18 8:00 ` Ilya Maximets
2025-06-17 11:59 ` [PATCH v3 1/2] net/af-xdp: Support pinned map path for AF_XDP sockets Ilya Maximets
2025-06-17 12:43 ` Daniel Borkmann via
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).