* [PULL V2 00/16] Net patches
@ 2025-07-15 4:35 Jason Wang
2025-07-15 4:35 ` [PULL V2 01/16] net: fix buffer overflow in af_xdp_umem_create() Jason Wang
` (16 more replies)
0 siblings, 17 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jason Wang
The following changes since commit 9a4e273ddec3927920c5958d2226c6b38b543336:
Merge tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu into staging (2025-07-13 01:46:04 -0400)
are available in the Git repository at:
https://github.com/jasowang/qemu.git net-pull-request
for you to fetch changes up to e53d9ec7ccc2dbb9378353fe2a89ebdca5cd7015:
net/af-xdp: Support pinned map path for AF_XDP sockets (2025-07-15 10:26:55 +0800)
----------------------------------------------------------------
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmh11cgACgkQ7wSWWzmN
YhGZKAf+PZ3ZnOoHXd5z8hA5d9Xf+U/01YyPN+Q0NPLWVXhYZBeNhhYEnZwGeSwS
n0YFTLiYIrcaSrt74QtBvUVCX7KoILRnzgoLquUnFBlI0BrR5pFKB70gHmLU3Dxw
xOdxtIm/chfiicE39ziTfO28Cv0N1k9NCHsuMsydbhQL8kc/aRaMofizO8MjPLbr
J8hf8N7jivh8fzH3F5vyglaNl2ijSkPm+XDQYAb04laGfdsIlYkmB7lB/17def2a
S9gur484x5w+Yb2LNdyq/3IPzDqzlNbRGVcfTZS8FIc65R+5idIN+7lKHCffURrr
W8zWFy1wA54hJoTxAq0nsf1TSvc9UA==
=DiBC
-----END PGP SIGNATURE-----
Changes since V1:
- add AF_XDP enhancement series
----------------------------------------------------------------
Akihiko Odaki (1):
virtio-net: Add queues for RSS during migration
Anastasia Belova (1):
net: fix buffer overflow in af_xdp_umem_create()
Daniel Borkmann (3):
net/af-xdp: Remove XDP program cleanup logic
net/af-xdp: Fix up cleanup path upon failure in queue creation
net/af-xdp: Support pinned map path for AF_XDP sockets
Laurent Vivier (11):
net: Refactor stream logic for reuse in '-net passt'
net: Define net_client_set_link()
vhost_net: Rename vhost_set_vring_enable() for clarity
net: Add get_vhost_net callback to NetClientInfo
net: Consolidate vhost feature bits into vhost_net structure
net: Add get_acked_features callback to VhostNetOptions
net: Add save_acked_features callback to vhost_net
net: Allow network backends to advertise max TX queue size
net: Add is_vhost_user flag to vhost_net struct
net: Add passt network backend
net/passt: Implement vhost-user backend support
docs/system/devices/net.rst | 50 ++-
hmp-commands.hx | 3 +
hw/net/vhost_net-stub.c | 3 +-
hw/net/vhost_net.c | 145 ++------
hw/net/virtio-net.c | 47 +--
hw/virtio/virtio.c | 14 +-
include/hw/virtio/vhost.h | 5 +
include/hw/virtio/virtio.h | 10 +-
include/net/net.h | 3 +
include/net/tap.h | 3 -
include/net/vhost-user.h | 19 --
include/net/vhost-vdpa.h | 4 -
include/net/vhost_net.h | 10 +-
meson.build | 6 +
meson_options.txt | 2 +
net/af-xdp.c | 99 ++++--
net/clients.h | 4 +
net/hub.c | 3 +
net/meson.build | 6 +-
net/net.c | 36 +-
net/passt.c | 753 ++++++++++++++++++++++++++++++++++++++++++
net/stream.c | 282 ++++------------
net/stream_data.c | 193 +++++++++++
net/stream_data.h | 31 ++
net/tap-win32.c | 5 -
net/tap.c | 43 ++-
net/vhost-user-stub.c | 1 -
net/vhost-user.c | 60 +++-
net/vhost-vdpa.c | 11 +-
qapi/net.json | 147 ++++++++-
qemu-options.hx | 176 +++++++++-
scripts/meson-buildoptions.sh | 3 +
32 files changed, 1703 insertions(+), 474 deletions(-)
delete mode 100644 include/net/vhost-user.h
create mode 100644 net/passt.c
create mode 100644 net/stream_data.c
create mode 100644 net/stream_data.h
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PULL V2 01/16] net: fix buffer overflow in af_xdp_umem_create()
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 02/16] virtio-net: Add queues for RSS during migration Jason Wang
` (15 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Anastasia Belova, qemu-stable, Ilya Maximets, Jason Wang
From: Anastasia Belova <nabelova31@gmail.com>
s->pool has n_descs elements so maximum i should be
n_descs - 1. Fix the upper bound.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: cb039ef3d9 ("net: add initial support for AF_XDP network backend")
Cc: qemu-stable@nongnu.org
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Anastasia Belova <nabelova31@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net/af-xdp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/af-xdp.c b/net/af-xdp.c
index 01c5fb914e..d022534d76 100644
--- a/net/af-xdp.c
+++ b/net/af-xdp.c
@@ -323,7 +323,7 @@ static int af_xdp_umem_create(AFXDPState *s, int sock_fd, Error **errp)
s->pool = g_new(uint64_t, n_descs);
/* Fill the pool in the opposite order, because it's a LIFO queue. */
- for (i = n_descs; i >= 0; i--) {
+ for (i = n_descs - 1; i >= 0; i--) {
s->pool[i] = i * XSK_UMEM__DEFAULT_FRAME_SIZE;
}
s->n_pool = n_descs;
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 02/16] virtio-net: Add queues for RSS during migration
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
2025-07-15 4:35 ` [PULL V2 01/16] net: fix buffer overflow in af_xdp_umem_create() Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 03/16] net: Refactor stream logic for reuse in '-net passt' Jason Wang
` (14 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Akihiko Odaki, Lei Yang, qemu-stable, Jason Wang
From: Akihiko Odaki <akihiko.odaki@daynix.com>
virtio_net_pre_load_queues() inspects vdev->guest_features to tell if
VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required
number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for
VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features
is set at the point and VIRTIO_NET_F_RSS uses bit 60 while
VIRTIO_NET_F_MQ uses bit 22.
Instead of inferring the required number of queues from
vdev->guest_features, use the number loaded from the vm state. This
change also has a nice side effect to remove a duplicate peer queue
pair change by circumventing virtio_net_set_multiqueue().
Also update the comment in include/hw/virtio/virtio.h to prevent an
implementation of pre_load_queues() from refering to any fields being
loaded during migration by accident in the future.
Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing")
Tested-by: Lei Yang <leiyang@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/virtio-net.c | 11 ++++-------
hw/virtio/virtio.c | 14 +++++++-------
include/hw/virtio/virtio.h | 10 ++++++++--
3 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index eb93607b8c..351377c025 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3022,11 +3022,10 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
virtio_del_queue(vdev, index * 2 + 1);
}
-static void virtio_net_change_num_queue_pairs(VirtIONet *n, int new_max_queue_pairs)
+static void virtio_net_change_num_queues(VirtIONet *n, int new_num_queues)
{
VirtIODevice *vdev = VIRTIO_DEVICE(n);
int old_num_queues = virtio_get_num_queues(vdev);
- int new_num_queues = new_max_queue_pairs * 2 + 1;
int i;
assert(old_num_queues >= 3);
@@ -3062,16 +3061,14 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue)
int max = multiqueue ? n->max_queue_pairs : 1;
n->multiqueue = multiqueue;
- virtio_net_change_num_queue_pairs(n, max);
+ virtio_net_change_num_queues(n, max * 2 + 1);
virtio_net_set_queue_pairs(n);
}
-static int virtio_net_pre_load_queues(VirtIODevice *vdev)
+static int virtio_net_pre_load_queues(VirtIODevice *vdev, uint32_t n)
{
- virtio_net_set_multiqueue(VIRTIO_NET(vdev),
- virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_RSS) ||
- virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_MQ));
+ virtio_net_change_num_queues(VIRTIO_NET(vdev), n);
return 0;
}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 82a285a31d..7e38b1ca97 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3270,13 +3270,6 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
config_len--;
}
- if (vdc->pre_load_queues) {
- ret = vdc->pre_load_queues(vdev);
- if (ret) {
- return ret;
- }
- }
-
num = qemu_get_be32(f);
if (num > VIRTIO_QUEUE_MAX) {
@@ -3284,6 +3277,13 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
return -1;
}
+ if (vdc->pre_load_queues) {
+ ret = vdc->pre_load_queues(vdev, num);
+ if (ret) {
+ return ret;
+ }
+ }
+
for (i = 0; i < num; i++) {
vdev->vq[i].vring.num = qemu_get_be32(f);
if (k->has_variable_vring_alignment) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 214d4a77e9..c594764f23 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -210,8 +210,14 @@ struct VirtioDeviceClass {
void (*guest_notifier_mask)(VirtIODevice *vdev, int n, bool mask);
int (*start_ioeventfd)(VirtIODevice *vdev);
void (*stop_ioeventfd)(VirtIODevice *vdev);
- /* Called before loading queues. Useful to add queues before loading. */
- int (*pre_load_queues)(VirtIODevice *vdev);
+ /*
+ * Called before loading queues.
+ * If the number of queues change at runtime, use @n to know the
+ * number and add or remove queues accordingly.
+ * Note that this function is called in the middle of loading vmsd;
+ * no assumption should be made on states being loaded from vmsd.
+ */
+ int (*pre_load_queues)(VirtIODevice *vdev, uint32_t n);
/* Saving and loading of a device; trying to deprecate save/load
* use vmsd for new devices.
*/
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 03/16] net: Refactor stream logic for reuse in '-net passt'
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
2025-07-15 4:35 ` [PULL V2 01/16] net: fix buffer overflow in af_xdp_umem_create() Jason Wang
2025-07-15 4:35 ` [PULL V2 02/16] virtio-net: Add queues for RSS during migration Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 04/16] net: Define net_client_set_link() Jason Wang
` (13 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
To prepare for the implementation of '-net passt', this patch moves
the generic stream handling functions from net/stream.c into new
net/stream_data.c and net/stream_data.h files.
This refactoring introduces a NetStreamData struct that encapsulates
the generic fields and logic previously in NetStreamState. The
NetStreamState now embeds NetStreamData and delegates the core
stream operations to the new generic functions.
To maintain flexibility for different users of this generic code,
callbacks for send and listen operations are now passed via
function pointers within the NetStreamData struct. This allows
callers to provide their own specific implementations while reusing
the common connection and data transfer logic.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net/meson.build | 3 +-
net/stream.c | 282 +++++++++++-----------------------------------
net/stream_data.c | 193 +++++++++++++++++++++++++++++++
net/stream_data.h | 31 +++++
4 files changed, 290 insertions(+), 219 deletions(-)
create mode 100644 net/stream_data.c
create mode 100644 net/stream_data.h
diff --git a/net/meson.build b/net/meson.build
index bb97b4dcbe..bb3c011e5a 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -1,6 +1,7 @@
system_ss.add(files(
'announce.c',
'checksum.c',
+ 'dgram.c',
'dump.c',
'eth.c',
'filter-buffer.c',
@@ -12,7 +13,7 @@ system_ss.add(files(
'queue.c',
'socket.c',
'stream.c',
- 'dgram.c',
+ 'stream_data.c',
'util.c',
))
diff --git a/net/stream.c b/net/stream.c
index 6152d2a05e..d893f02cab 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -27,173 +27,50 @@
#include "net/net.h"
#include "clients.h"
-#include "monitor/monitor.h"
#include "qapi/error.h"
-#include "qemu/error-report.h"
-#include "qemu/option.h"
-#include "qemu/sockets.h"
-#include "qemu/iov.h"
-#include "qemu/main-loop.h"
-#include "qemu/cutils.h"
-#include "io/channel.h"
-#include "io/channel-socket.h"
#include "io/net-listener.h"
#include "qapi/qapi-events-net.h"
#include "qapi/qapi-visit-sockets.h"
#include "qapi/clone-visitor.h"
+#include "stream_data.h"
+
typedef struct NetStreamState {
- NetClientState nc;
- QIOChannel *listen_ioc;
- QIONetListener *listener;
- QIOChannel *ioc;
- guint ioc_read_tag;
- guint ioc_write_tag;
- SocketReadState rs;
- unsigned int send_index; /* number of bytes sent*/
+ NetStreamData data;
uint32_t reconnect_ms;
guint timer_tag;
SocketAddress *addr;
} NetStreamState;
-static void net_stream_listen(QIONetListener *listener,
- QIOChannelSocket *cioc,
- void *opaque);
static void net_stream_arm_reconnect(NetStreamState *s);
-static gboolean net_stream_writable(QIOChannel *ioc,
- GIOCondition condition,
- gpointer data)
-{
- NetStreamState *s = data;
-
- s->ioc_write_tag = 0;
-
- qemu_flush_queued_packets(&s->nc);
-
- return G_SOURCE_REMOVE;
-}
-
static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf,
size_t size)
{
- NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc);
- uint32_t len = htonl(size);
- struct iovec iov[] = {
- {
- .iov_base = &len,
- .iov_len = sizeof(len),
- }, {
- .iov_base = (void *)buf,
- .iov_len = size,
- },
- };
- struct iovec local_iov[2];
- unsigned int nlocal_iov;
- size_t remaining;
- ssize_t ret;
-
- remaining = iov_size(iov, 2) - s->send_index;
- nlocal_iov = iov_copy(local_iov, 2, iov, 2, s->send_index, remaining);
- ret = qio_channel_writev(s->ioc, local_iov, nlocal_iov, NULL);
- if (ret == QIO_CHANNEL_ERR_BLOCK) {
- ret = 0; /* handled further down */
- }
- if (ret == -1) {
- s->send_index = 0;
- return -errno;
- }
- if (ret < (ssize_t)remaining) {
- s->send_index += ret;
- s->ioc_write_tag = qio_channel_add_watch(s->ioc, G_IO_OUT,
- net_stream_writable, s, NULL);
- return 0;
- }
- s->send_index = 0;
- return size;
-}
-
-static gboolean net_stream_send(QIOChannel *ioc,
- GIOCondition condition,
- gpointer data);
-
-static void net_stream_send_completed(NetClientState *nc, ssize_t len)
-{
- NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc);
-
- if (!s->ioc_read_tag) {
- s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN,
- net_stream_send, s, NULL);
- }
-}
+ NetStreamData *d = DO_UPCAST(NetStreamData, nc, nc);
-static void net_stream_rs_finalize(SocketReadState *rs)
-{
- NetStreamState *s = container_of(rs, NetStreamState, rs);
-
- if (qemu_send_packet_async(&s->nc, rs->buf,
- rs->packet_len,
- net_stream_send_completed) == 0) {
- if (s->ioc_read_tag) {
- g_source_remove(s->ioc_read_tag);
- s->ioc_read_tag = 0;
- }
- }
+ return net_stream_data_receive(d, buf, size);
}
static gboolean net_stream_send(QIOChannel *ioc,
GIOCondition condition,
gpointer data)
{
- NetStreamState *s = data;
- int size;
- int ret;
- QEMU_UNINITIALIZED char buf1[NET_BUFSIZE];
- const char *buf;
-
- size = qio_channel_read(s->ioc, buf1, sizeof(buf1), NULL);
- if (size < 0) {
- if (errno != EWOULDBLOCK) {
- goto eoc;
- }
- } else if (size == 0) {
- /* end of connection */
- eoc:
- s->ioc_read_tag = 0;
- if (s->ioc_write_tag) {
- g_source_remove(s->ioc_write_tag);
- s->ioc_write_tag = 0;
- }
- if (s->listener) {
- qemu_set_info_str(&s->nc, "listening");
- qio_net_listener_set_client_func(s->listener, net_stream_listen,
- s, NULL);
- }
- object_unref(OBJECT(s->ioc));
- s->ioc = NULL;
-
- net_socket_rs_init(&s->rs, net_stream_rs_finalize, false);
- s->nc.link_down = true;
+ if (net_stream_data_send(ioc, condition, data) == G_SOURCE_REMOVE) {
+ NetStreamState *s = DO_UPCAST(NetStreamState, data, data);
- qapi_event_send_netdev_stream_disconnected(s->nc.name);
+ qapi_event_send_netdev_stream_disconnected(s->data.nc.name);
net_stream_arm_reconnect(s);
return G_SOURCE_REMOVE;
}
- buf = buf1;
-
- ret = net_fill_rstate(&s->rs, (const uint8_t *)buf, size);
-
- if (ret == -1) {
- goto eoc;
- }
return G_SOURCE_CONTINUE;
}
static void net_stream_cleanup(NetClientState *nc)
{
- NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc);
+ NetStreamState *s = DO_UPCAST(NetStreamState, data.nc, nc);
if (s->timer_tag) {
g_source_remove(s->timer_tag);
s->timer_tag = 0;
@@ -202,28 +79,28 @@ static void net_stream_cleanup(NetClientState *nc)
qapi_free_SocketAddress(s->addr);
s->addr = NULL;
}
- if (s->ioc) {
- if (QIO_CHANNEL_SOCKET(s->ioc)->fd != -1) {
- if (s->ioc_read_tag) {
- g_source_remove(s->ioc_read_tag);
- s->ioc_read_tag = 0;
+ if (s->data.ioc) {
+ if (QIO_CHANNEL_SOCKET(s->data.ioc)->fd != -1) {
+ if (s->data.ioc_read_tag) {
+ g_source_remove(s->data.ioc_read_tag);
+ s->data.ioc_read_tag = 0;
}
- if (s->ioc_write_tag) {
- g_source_remove(s->ioc_write_tag);
- s->ioc_write_tag = 0;
+ if (s->data.ioc_write_tag) {
+ g_source_remove(s->data.ioc_write_tag);
+ s->data.ioc_write_tag = 0;
}
}
- object_unref(OBJECT(s->ioc));
- s->ioc = NULL;
+ object_unref(OBJECT(s->data.ioc));
+ s->data.ioc = NULL;
}
- if (s->listen_ioc) {
- if (s->listener) {
- qio_net_listener_disconnect(s->listener);
- object_unref(OBJECT(s->listener));
- s->listener = NULL;
+ if (s->data.listen_ioc) {
+ if (s->data.listener) {
+ qio_net_listener_disconnect(s->data.listener);
+ object_unref(OBJECT(s->data.listener));
+ s->data.listener = NULL;
}
- object_unref(OBJECT(s->listen_ioc));
- s->listen_ioc = NULL;
+ object_unref(OBJECT(s->data.listen_ioc));
+ s->data.listen_ioc = NULL;
}
}
@@ -235,23 +112,13 @@ static NetClientInfo net_stream_info = {
};
static void net_stream_listen(QIONetListener *listener,
- QIOChannelSocket *cioc,
- void *opaque)
+ QIOChannelSocket *cioc, gpointer data)
{
- NetStreamState *s = opaque;
+ NetStreamData *d = data;
SocketAddress *addr;
char *uri;
- object_ref(OBJECT(cioc));
-
- qio_net_listener_set_client_func(s->listener, NULL, s, NULL);
-
- s->ioc = QIO_CHANNEL(cioc);
- qio_channel_set_name(s->ioc, "stream-server");
- s->nc.link_down = false;
-
- s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN, net_stream_send,
- s, NULL);
+ net_stream_data_listen(listener, cioc, data);
if (cioc->localAddr.ss_family == AF_UNIX) {
addr = qio_channel_socket_get_local_address(cioc, NULL);
@@ -260,22 +127,22 @@ static void net_stream_listen(QIONetListener *listener,
}
g_assert(addr != NULL);
uri = socket_uri(addr);
- qemu_set_info_str(&s->nc, "%s", uri);
+ qemu_set_info_str(&d->nc, "%s", uri);
g_free(uri);
- qapi_event_send_netdev_stream_connected(s->nc.name, addr);
+ qapi_event_send_netdev_stream_connected(d->nc.name, addr);
qapi_free_SocketAddress(addr);
}
static void net_stream_server_listening(QIOTask *task, gpointer opaque)
{
- NetStreamState *s = opaque;
- QIOChannelSocket *listen_sioc = QIO_CHANNEL_SOCKET(s->listen_ioc);
+ NetStreamData *d = opaque;
+ QIOChannelSocket *listen_sioc = QIO_CHANNEL_SOCKET(d->listen_ioc);
SocketAddress *addr;
int ret;
Error *err = NULL;
if (qio_task_propagate_error(task, &err)) {
- qemu_set_info_str(&s->nc, "error: %s", error_get_pretty(err));
+ qemu_set_info_str(&d->nc, "error: %s", error_get_pretty(err));
error_free(err);
return;
}
@@ -284,20 +151,21 @@ static void net_stream_server_listening(QIOTask *task, gpointer opaque)
g_assert(addr != NULL);
ret = qemu_socket_try_set_nonblock(listen_sioc->fd);
if (addr->type == SOCKET_ADDRESS_TYPE_FD && ret < 0) {
- qemu_set_info_str(&s->nc, "can't use file descriptor %s (errno %d)",
+ qemu_set_info_str(&d->nc, "can't use file descriptor %s (errno %d)",
addr->u.fd.str, -ret);
return;
}
g_assert(ret == 0);
qapi_free_SocketAddress(addr);
- s->nc.link_down = true;
- s->listener = qio_net_listener_new();
+ d->nc.link_down = true;
+ d->listener = qio_net_listener_new();
- qemu_set_info_str(&s->nc, "listening");
- net_socket_rs_init(&s->rs, net_stream_rs_finalize, false);
- qio_net_listener_set_client_func(s->listener, net_stream_listen, s, NULL);
- qio_net_listener_add(s->listener, listen_sioc);
+ qemu_set_info_str(&d->nc, "listening");
+ net_socket_rs_init(&d->rs, net_stream_data_rs_finalize, false);
+ qio_net_listener_set_client_func(d->listener, d->listen, d,
+ NULL);
+ qio_net_listener_add(d->listener, listen_sioc);
}
static int net_stream_server_init(NetClientState *peer,
@@ -307,16 +175,18 @@ static int net_stream_server_init(NetClientState *peer,
Error **errp)
{
NetClientState *nc;
- NetStreamState *s;
+ NetStreamData *d;
QIOChannelSocket *listen_sioc = qio_channel_socket_new();
nc = qemu_new_net_client(&net_stream_info, peer, model, name);
- s = DO_UPCAST(NetStreamState, nc, nc);
- qemu_set_info_str(&s->nc, "initializing");
+ d = DO_UPCAST(NetStreamData, nc, nc);
+ d->send = net_stream_send;
+ d->listen = net_stream_listen;
+ qemu_set_info_str(&d->nc, "initializing");
- s->listen_ioc = QIO_CHANNEL(listen_sioc);
+ d->listen_ioc = QIO_CHANNEL(listen_sioc);
qio_channel_socket_listen_async(listen_sioc, addr, 0,
- net_stream_server_listening, s,
+ net_stream_server_listening, d,
NULL, NULL);
return 0;
@@ -325,49 +195,23 @@ static int net_stream_server_init(NetClientState *peer,
static void net_stream_client_connected(QIOTask *task, gpointer opaque)
{
NetStreamState *s = opaque;
- QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(s->ioc);
+ NetStreamData *d = &s->data;
+ QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(d->ioc);
SocketAddress *addr;
gchar *uri;
- int ret;
- Error *err = NULL;
- if (qio_task_propagate_error(task, &err)) {
- qemu_set_info_str(&s->nc, "error: %s", error_get_pretty(err));
- error_free(err);
- goto error;
+ if (net_stream_data_client_connected(task, d) == -1) {
+ net_stream_arm_reconnect(s);
+ return;
}
addr = qio_channel_socket_get_remote_address(sioc, NULL);
g_assert(addr != NULL);
uri = socket_uri(addr);
- qemu_set_info_str(&s->nc, "%s", uri);
+ qemu_set_info_str(&d->nc, "%s", uri);
g_free(uri);
-
- ret = qemu_socket_try_set_nonblock(sioc->fd);
- if (addr->type == SOCKET_ADDRESS_TYPE_FD && ret < 0) {
- qemu_set_info_str(&s->nc, "can't use file descriptor %s (errno %d)",
- addr->u.fd.str, -ret);
- qapi_free_SocketAddress(addr);
- goto error;
- }
- g_assert(ret == 0);
-
- net_socket_rs_init(&s->rs, net_stream_rs_finalize, false);
-
- /* Disable Nagle algorithm on TCP sockets to reduce latency */
- qio_channel_set_delay(s->ioc, false);
-
- s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN, net_stream_send,
- s, NULL);
- s->nc.link_down = false;
- qapi_event_send_netdev_stream_connected(s->nc.name, addr);
+ qapi_event_send_netdev_stream_connected(d->nc.name, addr);
qapi_free_SocketAddress(addr);
-
- return;
-error:
- object_unref(OBJECT(s->ioc));
- s->ioc = NULL;
- net_stream_arm_reconnect(s);
}
static gboolean net_stream_reconnect(gpointer data)
@@ -378,7 +222,7 @@ static gboolean net_stream_reconnect(gpointer data)
s->timer_tag = 0;
sioc = qio_channel_socket_new();
- s->ioc = QIO_CHANNEL(sioc);
+ s->data.ioc = QIO_CHANNEL(sioc);
qio_channel_socket_connect_async(sioc, s->addr,
net_stream_client_connected, s,
NULL, NULL);
@@ -388,7 +232,7 @@ static gboolean net_stream_reconnect(gpointer data)
static void net_stream_arm_reconnect(NetStreamState *s)
{
if (s->reconnect_ms && s->timer_tag == 0) {
- qemu_set_info_str(&s->nc, "connecting");
+ qemu_set_info_str(&s->data.nc, "connecting");
s->timer_tag = g_timeout_add(s->reconnect_ms, net_stream_reconnect, s);
}
}
@@ -405,11 +249,13 @@ static int net_stream_client_init(NetClientState *peer,
QIOChannelSocket *sioc = qio_channel_socket_new();
nc = qemu_new_net_client(&net_stream_info, peer, model, name);
- s = DO_UPCAST(NetStreamState, nc, nc);
- qemu_set_info_str(&s->nc, "connecting");
+ s = DO_UPCAST(NetStreamState, data.nc, nc);
+ qemu_set_info_str(&s->data.nc, "connecting");
- s->ioc = QIO_CHANNEL(sioc);
- s->nc.link_down = true;
+ s->data.ioc = QIO_CHANNEL(sioc);
+ s->data.nc.link_down = true;
+ s->data.send = net_stream_send;
+ s->data.listen = net_stream_listen;
s->reconnect_ms = reconnect_ms;
if (reconnect_ms) {
diff --git a/net/stream_data.c b/net/stream_data.c
new file mode 100644
index 0000000000..5af27e0d1d
--- /dev/null
+++ b/net/stream_data.c
@@ -0,0 +1,193 @@
+/*
+ * net stream generic functions
+ *
+ * Copyright Red Hat
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iov.h"
+#include "qapi/error.h"
+#include "net/net.h"
+#include "io/channel.h"
+#include "io/net-listener.h"
+
+#include "stream_data.h"
+
+static gboolean net_stream_data_writable(QIOChannel *ioc,
+ GIOCondition condition, gpointer data)
+{
+ NetStreamData *d = data;
+
+ d->ioc_write_tag = 0;
+
+ qemu_flush_queued_packets(&d->nc);
+
+ return G_SOURCE_REMOVE;
+}
+
+ssize_t net_stream_data_receive(NetStreamData *d, const uint8_t *buf,
+ size_t size)
+{
+ uint32_t len = htonl(size);
+ struct iovec iov[] = {
+ {
+ .iov_base = &len,
+ .iov_len = sizeof(len),
+ }, {
+ .iov_base = (void *)buf,
+ .iov_len = size,
+ },
+ };
+ struct iovec local_iov[2];
+ unsigned int nlocal_iov;
+ size_t remaining;
+ ssize_t ret;
+
+ remaining = iov_size(iov, 2) - d->send_index;
+ nlocal_iov = iov_copy(local_iov, 2, iov, 2, d->send_index, remaining);
+ ret = qio_channel_writev(d->ioc, local_iov, nlocal_iov, NULL);
+ if (ret == QIO_CHANNEL_ERR_BLOCK) {
+ ret = 0; /* handled further down */
+ }
+ if (ret == -1) {
+ d->send_index = 0;
+ return -errno;
+ }
+ if (ret < (ssize_t)remaining) {
+ d->send_index += ret;
+ d->ioc_write_tag = qio_channel_add_watch(d->ioc, G_IO_OUT,
+ net_stream_data_writable, d,
+ NULL);
+ return 0;
+ }
+ d->send_index = 0;
+ return size;
+}
+
+static void net_stream_data_send_completed(NetClientState *nc, ssize_t len)
+{
+ NetStreamData *d = DO_UPCAST(NetStreamData, nc, nc);
+
+ if (!d->ioc_read_tag) {
+ d->ioc_read_tag = qio_channel_add_watch(d->ioc, G_IO_IN, d->send, d,
+ NULL);
+ }
+}
+
+void net_stream_data_rs_finalize(SocketReadState *rs)
+{
+ NetStreamData *d = container_of(rs, NetStreamData, rs);
+
+ if (qemu_send_packet_async(&d->nc, rs->buf,
+ rs->packet_len,
+ net_stream_data_send_completed) == 0) {
+ if (d->ioc_read_tag) {
+ g_source_remove(d->ioc_read_tag);
+ d->ioc_read_tag = 0;
+ }
+ }
+}
+
+gboolean net_stream_data_send(QIOChannel *ioc, GIOCondition condition,
+ NetStreamData *d)
+{
+ int size;
+ int ret;
+ QEMU_UNINITIALIZED char buf1[NET_BUFSIZE];
+ const char *buf;
+
+ size = qio_channel_read(d->ioc, buf1, sizeof(buf1), NULL);
+ if (size < 0) {
+ if (errno != EWOULDBLOCK) {
+ goto eoc;
+ }
+ } else if (size == 0) {
+ /* end of connection */
+ eoc:
+ d->ioc_read_tag = 0;
+ if (d->ioc_write_tag) {
+ g_source_remove(d->ioc_write_tag);
+ d->ioc_write_tag = 0;
+ }
+ if (d->listener) {
+ qemu_set_info_str(&d->nc, "listening");
+ qio_net_listener_set_client_func(d->listener,
+ d->listen, d, NULL);
+ }
+ object_unref(OBJECT(d->ioc));
+ d->ioc = NULL;
+
+ net_socket_rs_init(&d->rs, net_stream_data_rs_finalize, false);
+ d->nc.link_down = true;
+
+ return G_SOURCE_REMOVE;
+ }
+ buf = buf1;
+
+ ret = net_fill_rstate(&d->rs, (const uint8_t *)buf, size);
+
+ if (ret == -1) {
+ goto eoc;
+ }
+
+ return G_SOURCE_CONTINUE;
+}
+
+void net_stream_data_listen(QIONetListener *listener,
+ QIOChannelSocket *cioc,
+ NetStreamData *d)
+{
+ object_ref(OBJECT(cioc));
+
+ qio_net_listener_set_client_func(d->listener, NULL, d, NULL);
+
+ d->ioc = QIO_CHANNEL(cioc);
+ qio_channel_set_name(d->ioc, "stream-server");
+ d->nc.link_down = false;
+
+ d->ioc_read_tag = qio_channel_add_watch(d->ioc, G_IO_IN, d->send, d, NULL);
+}
+
+int net_stream_data_client_connected(QIOTask *task, NetStreamData *d)
+{
+ QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(d->ioc);
+ SocketAddress *addr;
+ int ret;
+ Error *err = NULL;
+
+ if (qio_task_propagate_error(task, &err)) {
+ qemu_set_info_str(&d->nc, "error: %s", error_get_pretty(err));
+ error_free(err);
+ goto error;
+ }
+
+ addr = qio_channel_socket_get_remote_address(sioc, NULL);
+ g_assert(addr != NULL);
+
+ ret = qemu_socket_try_set_nonblock(sioc->fd);
+ if (addr->type == SOCKET_ADDRESS_TYPE_FD && ret < 0) {
+ qemu_set_info_str(&d->nc, "can't use file descriptor %s (errno %d)",
+ addr->u.fd.str, -ret);
+ qapi_free_SocketAddress(addr);
+ goto error;
+ }
+ g_assert(ret == 0);
+ qapi_free_SocketAddress(addr);
+
+ net_socket_rs_init(&d->rs, net_stream_data_rs_finalize, false);
+
+ /* Disable Nagle algorithm on TCP sockets to reduce latency */
+ qio_channel_set_delay(d->ioc, false);
+
+ d->ioc_read_tag = qio_channel_add_watch(d->ioc, G_IO_IN, d->send, d, NULL);
+ d->nc.link_down = false;
+
+ return 0;
+error:
+ object_unref(OBJECT(d->ioc));
+ d->ioc = NULL;
+
+ return -1;
+}
diff --git a/net/stream_data.h b/net/stream_data.h
new file mode 100644
index 0000000000..b868625665
--- /dev/null
+++ b/net/stream_data.h
@@ -0,0 +1,31 @@
+/*
+ * net stream generic functions
+ *
+ * Copyright Red Hat
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+typedef struct NetStreamData {
+ NetClientState nc;
+ QIOChannel *ioc;
+ guint ioc_read_tag;
+ guint ioc_write_tag;
+ SocketReadState rs;
+ unsigned int send_index; /* number of bytes sent*/
+ QIOChannelFunc send;
+ /* server data */
+ QIOChannel *listen_ioc;
+ QIONetListener *listener;
+ QIONetListenerClientFunc listen;
+} NetStreamData;
+
+ssize_t net_stream_data_receive(NetStreamData *d, const uint8_t *buf,
+ size_t size);
+void net_stream_data_rs_finalize(SocketReadState *rs);
+gboolean net_stream_data_send(QIOChannel *ioc, GIOCondition condition,
+ NetStreamData *d);
+int net_stream_data_client_connected(QIOTask *task, NetStreamData *d);
+void net_stream_data_listen(QIONetListener *listener,
+ QIOChannelSocket *cioc,
+ NetStreamData *d);
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 04/16] net: Define net_client_set_link()
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (2 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 03/16] net: Refactor stream logic for reuse in '-net passt' Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 05/16] vhost_net: Rename vhost_set_vring_enable() for clarity Jason Wang
` (12 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
The code to set the link status is currently located in
qmp_set_link(). This function identifies the device by name,
searches for the corresponding NetClientState, and then updates
the link status.
In some parts of the code, such as vhost-user.c, the
NetClientState are already available. Calling qmp_set_link()
from these locations leads to a redundant search for the clients.
This patch refactors the logic by introducing a new function,
net_client_set_link(), which accepts a NetClientState array
directly. qmp_set_link() is simplified to be a wrapper that
performs the client search and then calls the new function.
The vhost-user implementation is updated to use net_client_set_link()
directly, thereby eliminating the unnecessary client lookup.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
include/net/net.h | 1 +
net/net.c | 32 ++++++++++++++++++++------------
net/vhost-user.c | 4 ++--
3 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/include/net/net.h b/include/net/net.h
index cdd5b109b0..ac59b593ba 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -298,6 +298,7 @@ void net_client_parse(QemuOptsList *opts_list, const char *optstr);
void show_netdevs(void);
void net_init_clients(void);
void net_check_clients(void);
+void net_client_set_link(NetClientState **ncs, int queues, bool up);
void net_cleanup(void);
void hmp_host_net_add(Monitor *mon, const QDict *qdict);
void hmp_host_net_remove(Monitor *mon, const QDict *qdict);
diff --git a/net/net.c b/net/net.c
index 39d6f28158..cfa2d8e958 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1601,21 +1601,11 @@ void colo_notify_filters_event(int event, Error **errp)
}
}
-void qmp_set_link(const char *name, bool up, Error **errp)
+void net_client_set_link(NetClientState **ncs, int queues, bool up)
{
- NetClientState *ncs[MAX_QUEUE_NUM];
NetClientState *nc;
- int queues, i;
-
- queues = qemu_find_net_clients_except(name, ncs,
- NET_CLIENT_DRIVER__MAX,
- MAX_QUEUE_NUM);
+ int i;
- if (queues == 0) {
- error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
- "Device '%s' not found", name);
- return;
- }
nc = ncs[0];
for (i = 0; i < queues; i++) {
@@ -1646,6 +1636,24 @@ void qmp_set_link(const char *name, bool up, Error **errp)
}
}
+void qmp_set_link(const char *name, bool up, Error **errp)
+{
+ NetClientState *ncs[MAX_QUEUE_NUM];
+ int queues;
+
+ queues = qemu_find_net_clients_except(name, ncs,
+ NET_CLIENT_DRIVER__MAX,
+ MAX_QUEUE_NUM);
+
+ if (queues == 0) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", name);
+ return;
+ }
+
+ net_client_set_link(ncs, queues, up);
+}
+
static void net_vm_change_state_handler(void *opaque, bool running,
RunState state)
{
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 0b235e50c6..10ac8dc0b3 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -264,7 +264,7 @@ static void chr_closed_bh(void *opaque)
vhost_user_save_acked_features(ncs[i]);
}
- qmp_set_link(name, false, &err);
+ net_client_set_link(ncs, queues, false);
qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
NULL, opaque, NULL, true);
@@ -300,7 +300,7 @@ static void net_vhost_user_event(void *opaque, QEMUChrEvent event)
}
s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
net_vhost_user_watch, s);
- qmp_set_link(name, true, &err);
+ net_client_set_link(ncs, queues, true);
s->started = true;
qapi_event_send_netdev_vhost_user_connected(name, chr->label);
break;
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 05/16] vhost_net: Rename vhost_set_vring_enable() for clarity
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (3 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 04/16] net: Define net_client_set_link() Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 06/16] net: Add get_vhost_net callback to NetClientInfo Jason Wang
` (11 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
This is a cosmetic change with no functional impact.
The function vhost_set_vring_enable() is specific to vhost_net and
is used outside of vhost_net.c (specifically, in
hw/net/virtio-net.c). To prevent confusion with other similarly named
vhost functions, such as the one found in cryptodev-vhost.c, it has
been renamed to vhost_net_set_vring_enable(). This clarifies that the
function belongs to the vhost_net module.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net-stub.c | 2 +-
hw/net/vhost_net.c | 4 ++--
hw/net/virtio-net.c | 4 ++--
include/net/vhost_net.h | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
index 72df6d757e..7bed0bf92b 100644
--- a/hw/net/vhost_net-stub.c
+++ b/hw/net/vhost_net-stub.c
@@ -101,7 +101,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
return 0;
}
-int vhost_set_vring_enable(NetClientState *nc, int enable)
+int vhost_net_set_vring_enable(NetClientState *nc, int enable)
{
return 0;
}
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 891f235a0a..cb87056397 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -551,7 +551,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
if (peer->vring_enable) {
/* restore vring enable state */
- r = vhost_set_vring_enable(peer, peer->vring_enable);
+ r = vhost_net_set_vring_enable(peer, peer->vring_enable);
if (r < 0) {
goto err_guest_notifiers;
@@ -686,7 +686,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
return vhost_net;
}
-int vhost_set_vring_enable(NetClientState *nc, int enable)
+int vhost_net_set_vring_enable(NetClientState *nc, int enable)
{
VHostNetState *net = get_vhost_net(nc);
const VhostOps *vhost_ops = net->dev.vhost_ops;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 351377c025..e3400f18c8 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -697,7 +697,7 @@ static int peer_attach(VirtIONet *n, int index)
}
if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
- vhost_set_vring_enable(nc->peer, 1);
+ vhost_net_set_vring_enable(nc->peer, 1);
}
if (nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
@@ -720,7 +720,7 @@ static int peer_detach(VirtIONet *n, int index)
}
if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
- vhost_set_vring_enable(nc->peer, 0);
+ vhost_net_set_vring_enable(nc->peer, 0);
}
if (nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index c6a5361a2a..0f40049f34 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -41,7 +41,7 @@ void vhost_net_config_mask(VHostNetState *net, VirtIODevice *dev, bool mask);
int vhost_net_notify_migration_done(VHostNetState *net, char* mac_addr);
VHostNetState *get_vhost_net(NetClientState *nc);
-int vhost_set_vring_enable(NetClientState * nc, int enable);
+int vhost_net_set_vring_enable(NetClientState *nc, int enable);
uint64_t vhost_net_get_acked_features(VHostNetState *net);
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 06/16] net: Add get_vhost_net callback to NetClientInfo
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (4 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 05/16] vhost_net: Rename vhost_set_vring_enable() for clarity Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 07/16] net: Consolidate vhost feature bits into vhost_net structure Jason Wang
` (10 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
The get_vhost_net() function previously contained a large switch
statement to find the VHostNetState pointer based on the net
client's type. This created a tight coupling, requiring the generic
vhost layer to be aware of every specific backend that supported
vhost, such as tap, vhost-user, and vhost-vdpa.
This approach is not scalable and requires modifying a central function
for any new backend. It also forced each backend to expose its internal
getter function in a public header file.
This patch refactors the logic by introducing a new get_vhost_net
function pointer to the NetClientInfo struct. The central
get_vhost_net() function is now a simple, generic dispatcher that
invokes the callback provided by the net client.
Each backend now implements its own private getter and registers it in
its NetClientInfo.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net.c | 31 ++++---------------------------
include/net/net.h | 2 ++
include/net/tap.h | 3 ---
include/net/vhost-user.h | 1 -
include/net/vhost-vdpa.h | 2 --
net/tap-win32.c | 5 -----
net/tap.c | 20 +++++++++++++-------
net/vhost-user.c | 3 ++-
net/vhost-vdpa.c | 4 +++-
9 files changed, 24 insertions(+), 47 deletions(-)
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index cb87056397..db8b97b753 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -649,41 +649,18 @@ void vhost_net_config_mask(VHostNetState *net, VirtIODevice *dev, bool mask)
{
vhost_config_mask(&net->dev, dev, mask);
}
+
VHostNetState *get_vhost_net(NetClientState *nc)
{
- VHostNetState *vhost_net = 0;
-
if (!nc) {
return 0;
}
- switch (nc->info->type) {
- case NET_CLIENT_DRIVER_TAP:
- vhost_net = tap_get_vhost_net(nc);
- /*
- * tap_get_vhost_net() can return NULL if a tap net-device backend is
- * created with 'vhost=off' option, 'vhostforce=off' or no vhost or
- * vhostforce or vhostfd options at all. Please see net_init_tap_one().
- * Hence, we omit the assertion here.
- */
- break;
-#ifdef CONFIG_VHOST_NET_USER
- case NET_CLIENT_DRIVER_VHOST_USER:
- vhost_net = vhost_user_get_vhost_net(nc);
- assert(vhost_net);
- break;
-#endif
-#ifdef CONFIG_VHOST_NET_VDPA
- case NET_CLIENT_DRIVER_VHOST_VDPA:
- vhost_net = vhost_vdpa_get_vhost_net(nc);
- assert(vhost_net);
- break;
-#endif
- default:
- break;
+ if (nc->info->get_vhost_net) {
+ return nc->info->get_vhost_net(nc);
}
- return vhost_net;
+ return NULL;
}
int vhost_net_set_vring_enable(NetClientState *nc, int enable)
diff --git a/include/net/net.h b/include/net/net.h
index ac59b593ba..e67b375626 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -67,6 +67,7 @@ typedef void (SocketReadStateFinalize)(SocketReadState *rs);
typedef void (NetAnnounce)(NetClientState *);
typedef bool (SetSteeringEBPF)(NetClientState *, int);
typedef bool (NetCheckPeerType)(NetClientState *, ObjectClass *, Error **);
+typedef struct vhost_net *(GetVHostNet)(NetClientState *nc);
typedef struct NetClientInfo {
NetClientDriver type;
@@ -92,6 +93,7 @@ typedef struct NetClientInfo {
NetAnnounce *announce;
SetSteeringEBPF *set_steering_ebpf;
NetCheckPeerType *check_peer_type;
+ GetVHostNet *get_vhost_net;
} NetClientInfo;
struct NetClientState {
diff --git a/include/net/tap.h b/include/net/tap.h
index 5d585515f9..6f34f13eae 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -33,7 +33,4 @@ int tap_disable(NetClientState *nc);
int tap_get_fd(NetClientState *nc);
-struct vhost_net;
-struct vhost_net *tap_get_vhost_net(NetClientState *nc);
-
#endif /* QEMU_NET_TAP_H */
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 35bf619709..0b233a2673 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -12,7 +12,6 @@
#define VHOST_USER_H
struct vhost_net;
-struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
uint64_t vhost_user_get_acked_features(NetClientState *nc);
void vhost_user_save_acked_features(NetClientState *nc);
diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
index b81f9a6f2a..916ead3793 100644
--- a/include/net/vhost-vdpa.h
+++ b/include/net/vhost-vdpa.h
@@ -14,8 +14,6 @@
#define TYPE_VHOST_VDPA "vhost-vdpa"
-struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc);
-
extern const int vdpa_feature_bits[];
#endif /* VHOST_VDPA_H */
diff --git a/net/tap-win32.c b/net/tap-win32.c
index 671dee970f..38baf90e0b 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -704,11 +704,6 @@ static void tap_win32_send(void *opaque)
}
}
-struct vhost_net *tap_get_vhost_net(NetClientState *nc)
-{
- return NULL;
-}
-
static NetClientInfo net_tap_win32_info = {
.type = NET_CLIENT_DRIVER_TAP,
.size = sizeof(TAPState),
diff --git a/net/tap.c b/net/tap.c
index ae1c7e3983..4beba6d7a7 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -329,6 +329,18 @@ int tap_get_fd(NetClientState *nc)
return s->fd;
}
+/*
+ * tap_get_vhost_net() can return NULL if a tap net-device backend is
+ * created with 'vhost=off' option, 'vhostforce=off' or no vhost or
+ * vhostforce or vhostfd options at all. Please see net_init_tap_one().
+ */
+static VHostNetState *tap_get_vhost_net(NetClientState *nc)
+{
+ TAPState *s = DO_UPCAST(TAPState, nc, nc);
+ assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
+ return s->vhost_net;
+}
+
/* fd support */
static NetClientInfo net_tap_info = {
@@ -347,6 +359,7 @@ static NetClientInfo net_tap_info = {
.set_vnet_le = tap_set_vnet_le,
.set_vnet_be = tap_set_vnet_be,
.set_steering_ebpf = tap_set_steering_ebpf,
+ .get_vhost_net = tap_get_vhost_net,
};
static TAPState *net_tap_fd_init(NetClientState *peer,
@@ -980,13 +993,6 @@ free_fail:
return 0;
}
-VHostNetState *tap_get_vhost_net(NetClientState *nc)
-{
- TAPState *s = DO_UPCAST(TAPState, nc, nc);
- assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
- return s->vhost_net;
-}
-
int tap_enable(NetClientState *nc)
{
TAPState *s = DO_UPCAST(TAPState, nc, nc);
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 10ac8dc0b3..b7bf0d2042 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -32,7 +32,7 @@ typedef struct NetVhostUserState {
bool started;
} NetVhostUserState;
-VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
+static struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc)
{
NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
@@ -231,6 +231,7 @@ static NetClientInfo net_vhost_user_info = {
.set_vnet_be = vhost_user_set_vnet_endianness,
.set_vnet_le = vhost_user_set_vnet_endianness,
.check_peer_type = vhost_user_check_peer_type,
+ .get_vhost_net = vhost_user_get_vhost_net,
};
static gboolean net_vhost_user_watch(void *do_not_use, GIOCondition cond,
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 58d738945d..0b86c917ed 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -132,7 +132,7 @@ static const uint64_t vdpa_svq_device_features =
#define VHOST_VDPA_NET_CVQ_ASID 1
-VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
+static struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc)
{
VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
@@ -432,6 +432,7 @@ static NetClientInfo net_vhost_vdpa_info = {
.set_vnet_le = vhost_vdpa_set_vnet_le,
.check_peer_type = vhost_vdpa_check_peer_type,
.set_steering_ebpf = vhost_vdpa_set_steering_ebpf,
+ .get_vhost_net = vhost_vdpa_get_vhost_net,
};
static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index,
@@ -1287,6 +1288,7 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
.has_ufo = vhost_vdpa_has_ufo,
.check_peer_type = vhost_vdpa_check_peer_type,
.set_steering_ebpf = vhost_vdpa_set_steering_ebpf,
+ .get_vhost_net = vhost_vdpa_get_vhost_net,
};
/*
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 07/16] net: Consolidate vhost feature bits into vhost_net structure
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (5 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 06/16] net: Add get_vhost_net callback to NetClientInfo Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 08/16] net: Add get_acked_features callback to VhostNetOptions Jason Wang
` (9 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
Previously, the vhost_net_get_feature_bits() function in
hw/net/vhost_net.c used a large switch statement to determine
the appropriate feature bits based on the NetClientDriver type.
This created unnecessary coupling between the generic vhost layer
and specific network backends (like TAP, vhost-user, and
vhost-vdpa).
This patch moves the definition of vhost feature bits directly into the
vhost_net structure for each relevant network client.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net.c | 90 ++-------------------------------------
include/hw/virtio/vhost.h | 1 +
include/net/vhost-vdpa.h | 2 -
include/net/vhost_net.h | 1 +
net/tap.c | 19 +++++++++
net/vhost-user.c | 43 +++++++++++++++++++
net/vhost-vdpa.c | 3 +-
7 files changed, 69 insertions(+), 90 deletions(-)
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index db8b97b753..787c769ccc 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -36,94 +36,9 @@
#include "hw/virtio/virtio-bus.h"
#include "linux-headers/linux/vhost.h"
-
-/* Features supported by host kernel. */
-static const int kernel_feature_bits[] = {
- VIRTIO_F_NOTIFY_ON_EMPTY,
- VIRTIO_RING_F_INDIRECT_DESC,
- VIRTIO_RING_F_EVENT_IDX,
- VIRTIO_NET_F_MRG_RXBUF,
- VIRTIO_F_VERSION_1,
- VIRTIO_NET_F_MTU,
- VIRTIO_F_IOMMU_PLATFORM,
- VIRTIO_F_RING_PACKED,
- VIRTIO_F_RING_RESET,
- VIRTIO_F_IN_ORDER,
- VIRTIO_F_NOTIFICATION_DATA,
- VIRTIO_NET_F_RSC_EXT,
- VIRTIO_NET_F_HASH_REPORT,
- VHOST_INVALID_FEATURE_BIT
-};
-
-/* Features supported by others. */
-static const int user_feature_bits[] = {
- VIRTIO_F_NOTIFY_ON_EMPTY,
- VIRTIO_F_NOTIFICATION_DATA,
- VIRTIO_RING_F_INDIRECT_DESC,
- VIRTIO_RING_F_EVENT_IDX,
-
- VIRTIO_F_ANY_LAYOUT,
- VIRTIO_F_VERSION_1,
- VIRTIO_NET_F_CSUM,
- VIRTIO_NET_F_GUEST_CSUM,
- VIRTIO_NET_F_GSO,
- VIRTIO_NET_F_GUEST_TSO4,
- VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_ECN,
- VIRTIO_NET_F_GUEST_UFO,
- VIRTIO_NET_F_HOST_TSO4,
- VIRTIO_NET_F_HOST_TSO6,
- VIRTIO_NET_F_HOST_ECN,
- VIRTIO_NET_F_HOST_UFO,
- VIRTIO_NET_F_MRG_RXBUF,
- VIRTIO_NET_F_MTU,
- VIRTIO_F_IOMMU_PLATFORM,
- VIRTIO_F_RING_PACKED,
- VIRTIO_F_RING_RESET,
- VIRTIO_F_IN_ORDER,
- VIRTIO_NET_F_RSS,
- VIRTIO_NET_F_RSC_EXT,
- VIRTIO_NET_F_HASH_REPORT,
- VIRTIO_NET_F_GUEST_USO4,
- VIRTIO_NET_F_GUEST_USO6,
- VIRTIO_NET_F_HOST_USO,
-
- /* This bit implies RARP isn't sent by QEMU out of band */
- VIRTIO_NET_F_GUEST_ANNOUNCE,
-
- VIRTIO_NET_F_MQ,
-
- VHOST_INVALID_FEATURE_BIT
-};
-
-static const int *vhost_net_get_feature_bits(struct vhost_net *net)
-{
- const int *feature_bits = 0;
-
- switch (net->nc->info->type) {
- case NET_CLIENT_DRIVER_TAP:
- feature_bits = kernel_feature_bits;
- break;
- case NET_CLIENT_DRIVER_VHOST_USER:
- feature_bits = user_feature_bits;
- break;
-#ifdef CONFIG_VHOST_NET_VDPA
- case NET_CLIENT_DRIVER_VHOST_VDPA:
- feature_bits = vdpa_feature_bits;
- break;
-#endif
- default:
- error_report("Feature bits not defined for this type: %d",
- net->nc->info->type);
- break;
- }
-
- return feature_bits;
-}
-
uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
{
- return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
+ return vhost_get_features(&net->dev, net->feature_bits,
features);
}
int vhost_net_get_config(struct vhost_net *net, uint8_t *config,
@@ -140,7 +55,7 @@ int vhost_net_set_config(struct vhost_net *net, const uint8_t *data,
void vhost_net_ack_features(struct vhost_net *net, uint64_t features)
{
net->dev.acked_features = net->dev.backend_features;
- vhost_ack_features(&net->dev, vhost_net_get_feature_bits(net), features);
+ vhost_ack_features(&net->dev, net->feature_bits, features);
}
uint64_t vhost_net_get_max_queues(VHostNetState *net)
@@ -329,6 +244,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
}
net->nc = options->net_backend;
net->dev.nvqs = options->nvqs;
+ net->feature_bits = options->feature_bits;
net->dev.max_queues = 1;
net->dev.vqs = net->vqs;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 38800a7156..6a75fdc021 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -143,6 +143,7 @@ struct vhost_net {
struct vhost_dev dev;
struct vhost_virtqueue vqs[2];
int backend;
+ const int *feature_bits;
NetClientState *nc;
};
diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
index 916ead3793..f8d7d6c904 100644
--- a/include/net/vhost-vdpa.h
+++ b/include/net/vhost-vdpa.h
@@ -14,6 +14,4 @@
#define TYPE_VHOST_VDPA "vhost-vdpa"
-extern const int vdpa_feature_bits[];
-
#endif /* VHOST_VDPA_H */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 0f40049f34..fbed37385a 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -12,6 +12,7 @@ typedef struct VhostNetOptions {
NetClientState *net_backend;
uint32_t busyloop_timeout;
unsigned int nvqs;
+ const int *feature_bits;
void *opaque;
} VhostNetOptions;
diff --git a/net/tap.c b/net/tap.c
index 4beba6d7a7..a33eb23212 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -42,11 +42,29 @@
#include "qemu/error-report.h"
#include "qemu/main-loop.h"
#include "qemu/sockets.h"
+#include "hw/virtio/vhost.h"
#include "net/tap.h"
#include "net/vhost_net.h"
+static const int kernel_feature_bits[] = {
+ VIRTIO_F_NOTIFY_ON_EMPTY,
+ VIRTIO_RING_F_INDIRECT_DESC,
+ VIRTIO_RING_F_EVENT_IDX,
+ VIRTIO_NET_F_MRG_RXBUF,
+ VIRTIO_F_VERSION_1,
+ VIRTIO_NET_F_MTU,
+ VIRTIO_F_IOMMU_PLATFORM,
+ VIRTIO_F_RING_PACKED,
+ VIRTIO_F_RING_RESET,
+ VIRTIO_F_IN_ORDER,
+ VIRTIO_F_NOTIFICATION_DATA,
+ VIRTIO_NET_F_RSC_EXT,
+ VIRTIO_NET_F_HASH_REPORT,
+ VHOST_INVALID_FEATURE_BIT
+};
+
typedef struct TAPState {
NetClientState nc;
int fd;
@@ -725,6 +743,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
}
options.opaque = (void *)(uintptr_t)vhostfd;
options.nvqs = 2;
+ options.feature_bits = kernel_feature_bits;
s->vhost_net = vhost_net_init(&options);
if (!s->vhost_net) {
diff --git a/net/vhost-user.c b/net/vhost-user.c
index b7bf0d2042..bc8e82a092 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,7 +12,9 @@
#include "clients.h"
#include "net/vhost_net.h"
#include "net/vhost-user.h"
+#include "hw/virtio/vhost.h"
#include "hw/virtio/vhost-user.h"
+#include "standard-headers/linux/virtio_net.h"
#include "chardev/char-fe.h"
#include "qapi/error.h"
#include "qapi/qapi-commands-net.h"
@@ -22,6 +24,46 @@
#include "qemu/option.h"
#include "trace.h"
+static const int user_feature_bits[] = {
+ VIRTIO_F_NOTIFY_ON_EMPTY,
+ VIRTIO_F_NOTIFICATION_DATA,
+ VIRTIO_RING_F_INDIRECT_DESC,
+ VIRTIO_RING_F_EVENT_IDX,
+
+ VIRTIO_F_ANY_LAYOUT,
+ VIRTIO_F_VERSION_1,
+ VIRTIO_NET_F_CSUM,
+ VIRTIO_NET_F_GUEST_CSUM,
+ VIRTIO_NET_F_GSO,
+ VIRTIO_NET_F_GUEST_TSO4,
+ VIRTIO_NET_F_GUEST_TSO6,
+ VIRTIO_NET_F_GUEST_ECN,
+ VIRTIO_NET_F_GUEST_UFO,
+ VIRTIO_NET_F_HOST_TSO4,
+ VIRTIO_NET_F_HOST_TSO6,
+ VIRTIO_NET_F_HOST_ECN,
+ VIRTIO_NET_F_HOST_UFO,
+ VIRTIO_NET_F_MRG_RXBUF,
+ VIRTIO_NET_F_MTU,
+ VIRTIO_F_IOMMU_PLATFORM,
+ VIRTIO_F_RING_PACKED,
+ VIRTIO_F_RING_RESET,
+ VIRTIO_F_IN_ORDER,
+ VIRTIO_NET_F_RSS,
+ VIRTIO_NET_F_RSC_EXT,
+ VIRTIO_NET_F_HASH_REPORT,
+ VIRTIO_NET_F_GUEST_USO4,
+ VIRTIO_NET_F_GUEST_USO6,
+ VIRTIO_NET_F_HOST_USO,
+
+ /* This bit implies RARP isn't sent by QEMU out of band */
+ VIRTIO_NET_F_GUEST_ANNOUNCE,
+
+ VIRTIO_NET_F_MQ,
+
+ VHOST_INVALID_FEATURE_BIT
+};
+
typedef struct NetVhostUserState {
NetClientState nc;
CharBackend chr; /* only queue index 0 */
@@ -96,6 +138,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[],
options.opaque = be;
options.busyloop_timeout = 0;
options.nvqs = 2;
+ options.feature_bits = user_feature_bits;
net = vhost_net_init(&options);
if (!net) {
error_report("failed to init vhost_net for queue %d", i);
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 0b86c917ed..cbbea0eb71 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -55,7 +55,7 @@ typedef struct VhostVDPAState {
* with the exception of VHOST_INVALID_FEATURE_BIT,
* which should always be the last entry.
*/
-const int vdpa_feature_bits[] = {
+static const int vdpa_feature_bits[] = {
VIRTIO_F_ANY_LAYOUT,
VIRTIO_F_IOMMU_PLATFORM,
VIRTIO_F_NOTIFY_ON_EMPTY,
@@ -201,6 +201,7 @@ static int vhost_vdpa_add(NetClientState *ncs, void *be,
options.opaque = be;
options.busyloop_timeout = 0;
options.nvqs = nvqs;
+ options.feature_bits = vdpa_feature_bits;
net = vhost_net_init(&options);
if (!net) {
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 08/16] net: Add get_acked_features callback to VhostNetOptions
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (6 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 07/16] net: Consolidate vhost feature bits into vhost_net structure Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 09/16] net: Add save_acked_features callback to vhost_net Jason Wang
` (8 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
This patch continues the effort to decouple the generic vhost layer
from specific network backend implementations.
Previously, the vhost_net initialization code contained a hardcoded
check for the vhost-user client type to retrieve its acked features
by calling vhost_user_get_acked_features(). This exposed an
internal vhost-user function in a public header and coupled the two
modules.
The vhost-user backend is updated to provide a callback, and its
getter function is now static. The call site in vhost_net.c is
simplified to use the new generic helper, removing the type check and
the direct dependency.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net.c | 6 ++----
include/net/vhost-user.h | 2 --
include/net/vhost_net.h | 3 +++
net/tap.c | 1 +
net/vhost-user.c | 4 +++-
net/vhost-vdpa.c | 1 +
6 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 787c769ccc..fb169af0e8 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -288,9 +288,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
}
/* Set sane init value. Override when guest acks. */
-#ifdef CONFIG_VHOST_NET_USER
- if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
- features = vhost_user_get_acked_features(net->nc);
+ if (options->get_acked_features) {
+ features = options->get_acked_features(net->nc);
if (~net->dev.features & features) {
fprintf(stderr, "vhost lacks feature mask 0x%" PRIx64
" for backend\n",
@@ -298,7 +297,6 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
goto fail;
}
}
-#endif
vhost_net_ack_features(net, features);
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 0b233a2673..a4d0ce4b8d 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -11,8 +11,6 @@
#ifndef VHOST_USER_H
#define VHOST_USER_H
-struct vhost_net;
-uint64_t vhost_user_get_acked_features(NetClientState *nc);
void vhost_user_save_acked_features(NetClientState *nc);
#endif /* VHOST_USER_H */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index fbed37385a..a8d281c8f7 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -7,12 +7,15 @@
struct vhost_net;
typedef struct vhost_net VHostNetState;
+typedef uint64_t (GetAckedFeatures)(NetClientState *nc);
+
typedef struct VhostNetOptions {
VhostBackendType backend_type;
NetClientState *net_backend;
uint32_t busyloop_timeout;
unsigned int nvqs;
const int *feature_bits;
+ GetAckedFeatures *get_acked_features;
void *opaque;
} VhostNetOptions;
diff --git a/net/tap.c b/net/tap.c
index a33eb23212..acd77f816f 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -744,6 +744,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
options.opaque = (void *)(uintptr_t)vhostfd;
options.nvqs = 2;
options.feature_bits = kernel_feature_bits;
+ options.get_acked_features = NULL;
s->vhost_net = vhost_net_init(&options);
if (!s->vhost_net) {
diff --git a/net/vhost-user.c b/net/vhost-user.c
index bc8e82a092..93b413b49f 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -81,7 +81,7 @@ static struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc)
return s->vhost_net;
}
-uint64_t vhost_user_get_acked_features(NetClientState *nc)
+static uint64_t vhost_user_get_acked_features(NetClientState *nc)
{
NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
@@ -139,6 +139,8 @@ static int vhost_user_start(int queues, NetClientState *ncs[],
options.busyloop_timeout = 0;
options.nvqs = 2;
options.feature_bits = user_feature_bits;
+ options.get_acked_features = vhost_user_get_acked_features;
+
net = vhost_net_init(&options);
if (!net) {
error_report("failed to init vhost_net for queue %d", i);
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index cbbea0eb71..a3980d1fb5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -202,6 +202,7 @@ static int vhost_vdpa_add(NetClientState *ncs, void *be,
options.busyloop_timeout = 0;
options.nvqs = nvqs;
options.feature_bits = vdpa_feature_bits;
+ options.get_acked_features = NULL;
net = vhost_net_init(&options);
if (!net) {
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 09/16] net: Add save_acked_features callback to vhost_net
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (7 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 08/16] net: Add get_acked_features callback to VhostNetOptions Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 10/16] net: Allow network backends to advertise max TX queue size Jason Wang
` (7 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
This commit introduces a save_acked_features function pointer to
vhost_net and converts the vhost_net function into a generic dispatcher.
The vhost-user backend provides the callback, making its function static.
With this change, no other module has a direct dependency on the
vhost-user implementation.
This cleanup allows for the complete removal of the net/vhost-user.h
header file.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net-stub.c | 1 -
hw/net/vhost_net.c | 10 +++++-----
include/hw/virtio/vhost.h | 2 ++
include/net/vhost-user.h | 16 ----------------
include/net/vhost_net.h | 2 ++
net/tap.c | 1 +
net/vhost-user-stub.c | 1 -
net/vhost-user.c | 4 ++--
net/vhost-vdpa.c | 1 +
9 files changed, 13 insertions(+), 25 deletions(-)
delete mode 100644 include/net/vhost-user.h
diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
index 7bed0bf92b..7d49f82906 100644
--- a/hw/net/vhost_net-stub.c
+++ b/hw/net/vhost_net-stub.c
@@ -13,7 +13,6 @@
#include "qemu/osdep.h"
#include "net/net.h"
#include "net/tap.h"
-#include "net/vhost-user.h"
#include "hw/virtio/virtio-net.h"
#include "net/vhost_net.h"
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fb169af0e8..976d2b315a 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -16,7 +16,6 @@
#include "qemu/osdep.h"
#include "net/net.h"
#include "net/tap.h"
-#include "net/vhost-user.h"
#include "net/vhost-vdpa.h"
#include "standard-headers/linux/vhost_types.h"
@@ -70,11 +69,11 @@ uint64_t vhost_net_get_acked_features(VHostNetState *net)
void vhost_net_save_acked_features(NetClientState *nc)
{
-#ifdef CONFIG_VHOST_NET_USER
- if (nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
- vhost_user_save_acked_features(nc);
+ struct vhost_net *net = get_vhost_net(nc);
+
+ if (net && net->save_acked_features) {
+ net->save_acked_features(nc);
}
-#endif
}
static void vhost_net_disable_notifiers_nvhosts(VirtIODevice *dev,
@@ -245,6 +244,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
net->nc = options->net_backend;
net->dev.nvqs = options->nvqs;
net->feature_bits = options->feature_bits;
+ net->save_acked_features = options->save_acked_features;
net->dev.max_queues = 1;
net->dev.vqs = net->vqs;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6a75fdc021..b0830bac79 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -1,6 +1,7 @@
#ifndef VHOST_H
#define VHOST_H
+#include "net/vhost_net.h"
#include "hw/virtio/vhost-backend.h"
#include "hw/virtio/virtio.h"
#include "system/memory.h"
@@ -144,6 +145,7 @@ struct vhost_net {
struct vhost_virtqueue vqs[2];
int backend;
const int *feature_bits;
+ SaveAcketFeatures *save_acked_features;
NetClientState *nc;
};
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
deleted file mode 100644
index a4d0ce4b8d..0000000000
--- a/include/net/vhost-user.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/*
- * vhost-user.h
- *
- * Copyright (c) 2013 Virtual Open Systems Sarl.
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef VHOST_USER_H
-#define VHOST_USER_H
-
-void vhost_user_save_acked_features(NetClientState *nc);
-
-#endif /* VHOST_USER_H */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index a8d281c8f7..eb26ed9bdc 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -8,6 +8,7 @@ struct vhost_net;
typedef struct vhost_net VHostNetState;
typedef uint64_t (GetAckedFeatures)(NetClientState *nc);
+typedef void (SaveAcketFeatures)(NetClientState *nc);
typedef struct VhostNetOptions {
VhostBackendType backend_type;
@@ -16,6 +17,7 @@ typedef struct VhostNetOptions {
unsigned int nvqs;
const int *feature_bits;
GetAckedFeatures *get_acked_features;
+ SaveAcketFeatures *save_acked_features;
void *opaque;
} VhostNetOptions;
diff --git a/net/tap.c b/net/tap.c
index acd77f816f..79fa02a65c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -745,6 +745,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
options.nvqs = 2;
options.feature_bits = kernel_feature_bits;
options.get_acked_features = NULL;
+ options.save_acked_features = NULL;
s->vhost_net = vhost_net_init(&options);
if (!s->vhost_net) {
diff --git a/net/vhost-user-stub.c b/net/vhost-user-stub.c
index 52ab4e13f1..283dee87db 100644
--- a/net/vhost-user-stub.c
+++ b/net/vhost-user-stub.c
@@ -11,7 +11,6 @@
#include "qemu/osdep.h"
#include "clients.h"
#include "net/vhost_net.h"
-#include "net/vhost-user.h"
#include "qemu/error-report.h"
#include "qapi/error.h"
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 93b413b49f..8a3df27b02 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -11,7 +11,6 @@
#include "qemu/osdep.h"
#include "clients.h"
#include "net/vhost_net.h"
-#include "net/vhost-user.h"
#include "hw/virtio/vhost.h"
#include "hw/virtio/vhost-user.h"
#include "standard-headers/linux/virtio_net.h"
@@ -88,7 +87,7 @@ static uint64_t vhost_user_get_acked_features(NetClientState *nc)
return s->acked_features;
}
-void vhost_user_save_acked_features(NetClientState *nc)
+static void vhost_user_save_acked_features(NetClientState *nc)
{
NetVhostUserState *s;
@@ -140,6 +139,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[],
options.nvqs = 2;
options.feature_bits = user_feature_bits;
options.get_acked_features = vhost_user_get_acked_features;
+ options.save_acked_features = vhost_user_save_acked_features;
net = vhost_net_init(&options);
if (!net) {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a3980d1fb5..c63225d3d2 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -203,6 +203,7 @@ static int vhost_vdpa_add(NetClientState *ncs, void *be,
options.nvqs = nvqs;
options.feature_bits = vdpa_feature_bits;
options.get_acked_features = NULL;
+ options.save_acked_features = NULL;
net = vhost_net_init(&options);
if (!net) {
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 10/16] net: Allow network backends to advertise max TX queue size
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (8 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 09/16] net: Add save_acked_features callback to vhost_net Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 11/16] net: Add is_vhost_user flag to vhost_net struct Jason Wang
` (6 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
This commit refactors how the maximum transmit queue size for
virtio-net devices is determined, making the mechanism more generic
and extensible.
Previously, virtio_net_max_tx_queue_size() contained hardcoded
checks for specific network backend types (vhost-user and
vhost-vdpa) to determine their supported maximum queue size. This
created direct dependencies and would require modifications for
every new backend that supports variable queue sizes.
To improve flexibility, a new max_tx_queue_size field is added
to the vhost_net structure. This allows each network backend
to advertise its supported maximum transmit queue size directly.
The virtio_net_max_tx_queue_size() function now retrieves the max
TX queue size from the vhost_net struct, if available and set.
Otherwise, it defaults to VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net.c | 1 +
hw/net/virtio-net.c | 24 ++++++++++++------------
include/hw/virtio/vhost.h | 1 +
include/net/vhost_net.h | 1 +
net/tap.c | 1 +
net/vhost-user.c | 1 +
net/vhost-vdpa.c | 1 +
7 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 976d2b315a..74d2e3ed90 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -245,6 +245,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
net->dev.nvqs = options->nvqs;
net->feature_bits = options->feature_bits;
net->save_acked_features = options->save_acked_features;
+ net->max_tx_queue_size = options->max_tx_queue_size;
net->dev.max_queues = 1;
net->dev.vqs = net->vqs;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e3400f18c8..39fc280839 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -670,22 +670,22 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
static int virtio_net_max_tx_queue_size(VirtIONet *n)
{
NetClientState *peer = n->nic_conf.peers.ncs[0];
+ struct vhost_net *net;
- /*
- * Backends other than vhost-user or vhost-vdpa don't support max queue
- * size.
- */
if (!peer) {
- return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
+ goto default_value;
}
- switch(peer->info->type) {
- case NET_CLIENT_DRIVER_VHOST_USER:
- case NET_CLIENT_DRIVER_VHOST_VDPA:
- return VIRTQUEUE_MAX_SIZE;
- default:
- return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
- };
+ net = get_vhost_net(peer);
+
+ if (!net || !net->max_tx_queue_size) {
+ goto default_value;
+ }
+
+ return net->max_tx_queue_size;
+
+default_value:
+ return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
}
static int peer_attach(VirtIONet *n, int index)
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index b0830bac79..a62992c819 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -145,6 +145,7 @@ struct vhost_net {
struct vhost_virtqueue vqs[2];
int backend;
const int *feature_bits;
+ int max_tx_queue_size;
SaveAcketFeatures *save_acked_features;
NetClientState *nc;
};
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index eb26ed9bdc..8f4fddfb69 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -16,6 +16,7 @@ typedef struct VhostNetOptions {
uint32_t busyloop_timeout;
unsigned int nvqs;
const int *feature_bits;
+ int max_tx_queue_size;
GetAckedFeatures *get_acked_features;
SaveAcketFeatures *save_acked_features;
void *opaque;
diff --git a/net/tap.c b/net/tap.c
index 79fa02a65c..2f0cb55c9a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -746,6 +746,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
options.feature_bits = kernel_feature_bits;
options.get_acked_features = NULL;
options.save_acked_features = NULL;
+ options.max_tx_queue_size = 0;
s->vhost_net = vhost_net_init(&options);
if (!s->vhost_net) {
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 8a3df27b02..bf892915de 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -138,6 +138,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[],
options.busyloop_timeout = 0;
options.nvqs = 2;
options.feature_bits = user_feature_bits;
+ options.max_tx_queue_size = VIRTQUEUE_MAX_SIZE;
options.get_acked_features = vhost_user_get_acked_features;
options.save_acked_features = vhost_user_save_acked_features;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index c63225d3d2..353392b3d7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -204,6 +204,7 @@ static int vhost_vdpa_add(NetClientState *ncs, void *be,
options.feature_bits = vdpa_feature_bits;
options.get_acked_features = NULL;
options.save_acked_features = NULL;
+ options.max_tx_queue_size = VIRTQUEUE_MAX_SIZE;
net = vhost_net_init(&options);
if (!net) {
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 11/16] net: Add is_vhost_user flag to vhost_net struct
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (9 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 10/16] net: Allow network backends to advertise max TX queue size Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 12/16] net: Add passt network backend Jason Wang
` (5 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
Introduce a boolean is_vhost_user field to the vhost_net
structure. This flag is initialized during vhost_net_init based
on whether the backend is vhost-user.
This refactoring simplifies checks for vhost-user specific behavior,
replacing direct comparisons of 'net->nc->info->type' with the new
flag. It improves readability and encapsulates the backend type
information directly within the vhost_net instance.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/net/vhost_net.c | 3 ++-
hw/net/virtio-net.c | 8 ++++++--
include/hw/virtio/vhost.h | 1 +
include/net/vhost_net.h | 1 +
net/tap.c | 1 +
net/vhost-user.c | 1 +
net/vhost-vdpa.c | 1 +
7 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 74d2e3ed90..540492b37d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -246,6 +246,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
net->feature_bits = options->feature_bits;
net->save_acked_features = options->save_acked_features;
net->max_tx_queue_size = options->max_tx_queue_size;
+ net->is_vhost_user = options->is_vhost_user;
net->dev.max_queues = 1;
net->dev.vqs = net->vqs;
@@ -440,7 +441,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
* because vhost user doesn't interrupt masking/unmasking
* properly.
*/
- if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+ if (net->is_vhost_user) {
dev->use_guest_notifier_mask = false;
}
}
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 39fc280839..00df5fd6cd 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -691,12 +691,14 @@ default_value:
static int peer_attach(VirtIONet *n, int index)
{
NetClientState *nc = qemu_get_subqueue(n->nic, index);
+ struct vhost_net *net;
if (!nc->peer) {
return 0;
}
- if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+ net = get_vhost_net(nc->peer);
+ if (net && net->is_vhost_user) {
vhost_net_set_vring_enable(nc->peer, 1);
}
@@ -714,12 +716,14 @@ static int peer_attach(VirtIONet *n, int index)
static int peer_detach(VirtIONet *n, int index)
{
NetClientState *nc = qemu_get_subqueue(n->nic, index);
+ struct vhost_net *net;
if (!nc->peer) {
return 0;
}
- if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+ net = get_vhost_net(nc->peer);
+ if (net && net->is_vhost_user) {
vhost_net_set_vring_enable(nc->peer, 0);
}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a62992c819..f178cf9e1d 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -147,6 +147,7 @@ struct vhost_net {
const int *feature_bits;
int max_tx_queue_size;
SaveAcketFeatures *save_acked_features;
+ bool is_vhost_user;
NetClientState *nc;
};
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 8f4fddfb69..879781dad7 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -17,6 +17,7 @@ typedef struct VhostNetOptions {
unsigned int nvqs;
const int *feature_bits;
int max_tx_queue_size;
+ bool is_vhost_user;
GetAckedFeatures *get_acked_features;
SaveAcketFeatures *save_acked_features;
void *opaque;
diff --git a/net/tap.c b/net/tap.c
index 2f0cb55c9a..23536c09b4 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -747,6 +747,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
options.get_acked_features = NULL;
options.save_acked_features = NULL;
options.max_tx_queue_size = 0;
+ options.is_vhost_user = false;
s->vhost_net = vhost_net_init(&options);
if (!s->vhost_net) {
diff --git a/net/vhost-user.c b/net/vhost-user.c
index bf892915de..1c3b8b36f3 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -141,6 +141,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[],
options.max_tx_queue_size = VIRTQUEUE_MAX_SIZE;
options.get_acked_features = vhost_user_get_acked_features;
options.save_acked_features = vhost_user_save_acked_features;
+ options.is_vhost_user = true;
net = vhost_net_init(&options);
if (!net) {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 353392b3d7..943e9c585c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -205,6 +205,7 @@ static int vhost_vdpa_add(NetClientState *ncs, void *be,
options.get_acked_features = NULL;
options.save_acked_features = NULL;
options.max_tx_queue_size = VIRTQUEUE_MAX_SIZE;
+ options.is_vhost_user = false;
net = vhost_net_init(&options);
if (!net) {
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 12/16] net: Add passt network backend
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (10 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 11/16] net: Add is_vhost_user flag to vhost_net struct Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-17 9:28 ` Peter Maydell
2025-07-15 4:35 ` [PULL V2 13/16] net/passt: Implement vhost-user backend support Jason Wang
` (4 subsequent siblings)
16 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
This commit introduces support for passt as a new network backend.
passt is an unprivileged, user-mode networking solution that provides
connectivity for virtual machines by launching an external helper process.
The implementation reuses the generic stream data handling logic. It
launches the passt binary using GSubprocess, passing it a file
descriptor from a socketpair() for communication. QEMU connects to
the other end of the socket pair to establish the network data stream.
The PID of the passt daemon is tracked via a temporary file to
ensure it is terminated when QEMU exits.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
docs/system/devices/net.rst | 40 +++-
hmp-commands.hx | 3 +
meson.build | 6 +
meson_options.txt | 2 +
net/clients.h | 4 +
net/hub.c | 3 +
net/meson.build | 3 +
net/net.c | 4 +
net/passt.c | 407 ++++++++++++++++++++++++++++++++++
qapi/net.json | 115 ++++++++++
qemu-options.hx | 145 +++++++++++-
scripts/meson-buildoptions.sh | 3 +
12 files changed, 731 insertions(+), 4 deletions(-)
create mode 100644 net/passt.c
diff --git a/docs/system/devices/net.rst b/docs/system/devices/net.rst
index a3efbdcabd..c586ee0f40 100644
--- a/docs/system/devices/net.rst
+++ b/docs/system/devices/net.rst
@@ -85,13 +85,49 @@ passt doesn't require any capability or privilege. passt has
better performance than ``-net user``, full IPv6 support and better security
as it's a daemon that is not executed in QEMU context.
-passt can be connected to QEMU either by using a socket
-(``-netdev stream``) or using the vhost-user interface (``-netdev vhost-user``).
+passt_ can be used in the same way as the user backend (using ``-net passt``,
+``-netdev passt`` or ``-nic passt``) or it can be launched manually and
+connected to QEMU either by using a socket (``-netdev stream``) or by using
+the vhost-user interface (``-netdev vhost-user``).
+
+Using ``-netdev stream`` or ``-netdev vhost-user`` will allow the user to
+enable functionalities not available through the passt backend interface
+(like migration).
+
See `passt(1)`_ for more details on passt.
.. _passt: https://passt.top/
.. _passt(1): https://passt.top/builds/latest/web/passt.1.html
+To use the passt backend interface
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+There is no need to start the daemon as QEMU will do it for you.
+
+passt is started in the socket-based mode.
+
+.. parsed-literal::
+ |qemu_system| [...OPTIONS...] -nic passt
+
+ (qemu) info network
+ e1000e.0: index=0,type=nic,model=e1000e,macaddr=52:54:00:12:34:56
+ \ #net071: index=0,type=passt,stream,connected to pid 24846
+
+.. parsed-literal::
+ |qemu_system| [...OPTIONS...] -net nic -net passt,tcp-ports=10001,udp-ports=10001
+
+ (qemu) info network
+ hub 0
+ \ hub0port1: #net136: index=0,type=passt,stream,connected to pid 25204
+ \ hub0port0: e1000e.0: index=0,type=nic,model=e1000e,macaddr=52:54:00:12:34:56
+
+.. parsed-literal::
+ |qemu_system| [...OPTIONS...] -netdev passt,id=netdev0 -device virtio-net,mac=9a:2b:2c:2d:2e:2f,id=virtio0,netdev=netdev0
+
+ (qemu) info network
+ virtio0: index=0,type=nic,model=virtio-net-pci,macaddr=9a:2b:2c:2d:2e:2f
+ \ netdev0: index=0,type=passt,stream,connected to pid 25428
+
To use socket based passt interface:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 06746f0afc..d0e4f35a30 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1287,6 +1287,9 @@ ERST
.name = "netdev_add",
.args_type = "netdev:O",
.params = "[user|tap|socket|stream|dgram|vde|bridge|hubport|netmap|vhost-user"
+#ifdef CONFIG_PASST
+ "|passt"
+#endif
#ifdef CONFIG_AF_XDP
"|af-xdp"
#endif
diff --git a/meson.build b/meson.build
index b5f74aa37a..2adb22f198 100644
--- a/meson.build
+++ b/meson.build
@@ -1285,6 +1285,10 @@ if not get_option('slirp').auto() or have_system
endif
endif
+enable_passt = get_option('passt') \
+ .require(host_os == 'linux', error_message: 'passt is supported only on Linux') \
+ .allowed()
+
vde = not_found
if not get_option('vde').auto() or have_system or have_tools
vde = cc.find_library('vdeplug', has_headers: ['libvdeplug.h'],
@@ -2538,6 +2542,7 @@ if seccomp.found()
config_host_data.set('CONFIG_SECCOMP_SYSRAWRC', seccomp_has_sysrawrc)
endif
config_host_data.set('CONFIG_PIXMAN', pixman.found())
+config_host_data.set('CONFIG_PASST', enable_passt)
config_host_data.set('CONFIG_SLIRP', slirp.found())
config_host_data.set('CONFIG_SNAPPY', snappy.found())
config_host_data.set('CONFIG_SOLARIS', host_os == 'sunos')
@@ -4926,6 +4931,7 @@ if host_os == 'darwin'
summary_info += {'vmnet.framework support': vmnet}
endif
summary_info += {'AF_XDP support': libxdp}
+summary_info += {'passt support': enable_passt}
summary_info += {'slirp support': slirp}
summary_info += {'vde support': vde}
summary_info += {'netmap support': have_netmap}
diff --git a/meson_options.txt b/meson_options.txt
index a442be2995..3146eec194 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -234,6 +234,8 @@ option('pixman', type : 'feature', value : 'auto',
description: 'pixman support')
option('slirp', type: 'feature', value: 'auto',
description: 'libslirp user mode network backend support')
+option('passt', type: 'feature', value: 'auto',
+ description: 'passt network backend support')
option('vde', type : 'feature', value : 'auto',
description: 'vde network backend support')
option('vmnet', type : 'feature', value : 'auto',
diff --git a/net/clients.h b/net/clients.h
index be53794582..e786ab4203 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -29,6 +29,10 @@
int net_init_dump(const Netdev *netdev, const char *name,
NetClientState *peer, Error **errp);
+#ifdef CONFIG_PASST
+int net_init_passt(const Netdev *netdev, const char *name,
+ NetClientState *peer, Error **errp);
+#endif
#ifdef CONFIG_SLIRP
int net_init_slirp(const Netdev *netdev, const char *name,
NetClientState *peer, Error **errp);
diff --git a/net/hub.c b/net/hub.c
index cba20ebd87..e3b58b1c4f 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -285,6 +285,9 @@ void net_hub_check_clients(void)
case NET_CLIENT_DRIVER_NIC:
has_nic = 1;
break;
+#ifdef CONFIG_PASST
+ case NET_CLIENT_DRIVER_PASST:
+#endif
case NET_CLIENT_DRIVER_USER:
case NET_CLIENT_DRIVER_TAP:
case NET_CLIENT_DRIVER_SOCKET:
diff --git a/net/meson.build b/net/meson.build
index bb3c011e5a..da6ea635e9 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -34,6 +34,9 @@ system_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))
if have_l2tpv3
system_ss.add(files('l2tpv3.c'))
endif
+if enable_passt
+ system_ss.add(files('passt.c'))
+endif
system_ss.add(when: slirp, if_true: files('slirp.c'))
system_ss.add(when: vde, if_true: files('vde.c'))
if have_netmap
diff --git a/net/net.c b/net/net.c
index cfa2d8e958..90f69fdf39 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1248,6 +1248,9 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
const char *name,
NetClientState *peer, Error **errp) = {
[NET_CLIENT_DRIVER_NIC] = net_init_nic,
+#ifdef CONFIG_PASST
+ [NET_CLIENT_DRIVER_PASST] = net_init_passt,
+#endif
#ifdef CONFIG_SLIRP
[NET_CLIENT_DRIVER_USER] = net_init_slirp,
#endif
@@ -1353,6 +1356,7 @@ void show_netdevs(void)
"dgram",
"hubport",
"tap",
+ "passt",
#ifdef CONFIG_SLIRP
"user",
#endif
diff --git a/net/passt.c b/net/passt.c
new file mode 100644
index 0000000000..0a4a1ba6aa
--- /dev/null
+++ b/net/passt.c
@@ -0,0 +1,407 @@
+/*
+ * passt network backend
+ *
+ * Copyright Red Hat
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include <glib/gstdio.h>
+#include <gio/gio.h>
+#include "net/net.h"
+#include "clients.h"
+#include "qapi/error.h"
+#include "io/net-listener.h"
+#include "stream_data.h"
+
+typedef struct NetPasstState {
+ NetStreamData data;
+ GPtrArray *args;
+ gchar *pidfile;
+ pid_t pid;
+} NetPasstState;
+
+static int net_passt_stream_start(NetPasstState *s, Error **errp);
+
+static void net_passt_cleanup(NetClientState *nc)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ kill(s->pid, SIGTERM);
+ g_remove(s->pidfile);
+ g_free(s->pidfile);
+ g_ptr_array_free(s->args, TRUE);
+}
+
+static ssize_t net_passt_receive(NetClientState *nc, const uint8_t *buf,
+ size_t size)
+{
+ NetStreamData *d = DO_UPCAST(NetStreamData, nc, nc);
+
+ return net_stream_data_receive(d, buf, size);
+}
+
+static gboolean net_passt_send(QIOChannel *ioc, GIOCondition condition,
+ gpointer data)
+{
+ if (net_stream_data_send(ioc, condition, data) == G_SOURCE_REMOVE) {
+ NetPasstState *s = DO_UPCAST(NetPasstState, data, data);
+ Error *error;
+
+ /* we need to restart passt */
+ kill(s->pid, SIGTERM);
+ if (net_passt_stream_start(s, &error) == -1) {
+ error_report_err(error);
+ }
+
+ return G_SOURCE_REMOVE;
+ }
+
+ return G_SOURCE_CONTINUE;
+}
+
+static NetClientInfo net_passt_info = {
+ .type = NET_CLIENT_DRIVER_PASST,
+ .size = sizeof(NetPasstState),
+ .receive = net_passt_receive,
+ .cleanup = net_passt_cleanup,
+};
+
+static void net_passt_client_connected(QIOTask *task, gpointer opaque)
+{
+ NetPasstState *s = opaque;
+
+ if (net_stream_data_client_connected(task, &s->data) == 0) {
+ qemu_set_info_str(&s->data.nc, "stream,connected to pid %d", s->pid);
+ }
+}
+
+static int net_passt_start_daemon(NetPasstState *s, int sock, Error **errp)
+{
+ g_autoptr(GSubprocess) daemon = NULL;
+ g_autofree gchar *contents = NULL;
+ g_autoptr(GError) error = NULL;
+ GSubprocessLauncher *launcher;
+
+ qemu_set_info_str(&s->data.nc, "launching passt");
+
+ launcher = g_subprocess_launcher_new(G_SUBPROCESS_FLAGS_NONE);
+ g_subprocess_launcher_take_fd(launcher, sock, 3);
+
+ daemon = g_subprocess_launcher_spawnv(launcher,
+ (const gchar *const *)s->args->pdata,
+ &error);
+ g_object_unref(launcher);
+
+ if (!daemon) {
+ error_setg(errp, "Error creating daemon: %s", error->message);
+ return -1;
+ }
+
+ if (!g_subprocess_wait(daemon, NULL, &error)) {
+ error_setg(errp, "Error waiting for daemon: %s", error->message);
+ return -1;
+ }
+
+ if (g_subprocess_get_if_exited(daemon) &&
+ g_subprocess_get_exit_status(daemon)) {
+ return -1;
+ }
+
+ if (!g_file_get_contents(s->pidfile, &contents, NULL, &error)) {
+ error_setg(errp, "Cannot read passt pid: %s", error->message);
+ return -1;
+ }
+
+ s->pid = (pid_t)g_ascii_strtoll(contents, NULL, 10);
+ if (s->pid <= 0) {
+ error_setg(errp, "File '%s' did not contain a valid PID.", s->pidfile);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int net_passt_stream_start(NetPasstState *s, Error **errp)
+{
+ QIOChannelSocket *sioc;
+ SocketAddress *addr;
+ int sv[2];
+
+ if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+ error_setg_errno(errp, errno, "socketpair() failed");
+ return -1;
+ }
+
+ /* connect to passt */
+ qemu_set_info_str(&s->data.nc, "connecting to passt");
+
+ /* create socket channel */
+ sioc = qio_channel_socket_new();
+ s->data.ioc = QIO_CHANNEL(sioc);
+ s->data.nc.link_down = true;
+ s->data.send = net_passt_send;
+
+ addr = g_new0(SocketAddress, 1);
+ addr->type = SOCKET_ADDRESS_TYPE_FD;
+ addr->u.fd.str = g_strdup_printf("%d", sv[0]);
+
+ qio_channel_socket_connect_async(sioc, addr,
+ net_passt_client_connected, s,
+ NULL, NULL);
+
+ qapi_free_SocketAddress(addr);
+
+ /* start passt */
+ if (net_passt_start_daemon(s, sv[1], errp) == -1) {
+ close(sv[0]);
+ close(sv[1]);
+ return -1;
+ }
+ close(sv[1]);
+
+ return 0;
+}
+
+static GPtrArray *net_passt_decode_args(const NetDevPasstOptions *passt,
+ gchar *pidfile, Error **errp)
+{
+ GPtrArray *args = g_ptr_array_new_with_free_func(g_free);
+
+ if (passt->path) {
+ g_ptr_array_add(args, g_strdup(passt->path));
+ } else {
+ g_ptr_array_add(args, g_strdup("passt"));
+ }
+
+ /* by default, be quiet */
+ if (!passt->has_quiet || passt->quiet) {
+ g_ptr_array_add(args, g_strdup("--quiet"));
+ }
+
+ if (passt->has_mtu) {
+ g_ptr_array_add(args, g_strdup("--mtu"));
+ g_ptr_array_add(args, g_strdup_printf("%"PRId64, passt->mtu));
+ }
+
+ if (passt->address) {
+ g_ptr_array_add(args, g_strdup("--address"));
+ g_ptr_array_add(args, g_strdup(passt->address));
+ }
+
+ if (passt->netmask) {
+ g_ptr_array_add(args, g_strdup("--netmask"));
+ g_ptr_array_add(args, g_strdup(passt->netmask));
+ }
+
+ if (passt->mac) {
+ g_ptr_array_add(args, g_strdup("--mac-addr"));
+ g_ptr_array_add(args, g_strdup(passt->mac));
+ }
+
+ if (passt->gateway) {
+ g_ptr_array_add(args, g_strdup("--gateway"));
+ g_ptr_array_add(args, g_strdup(passt->gateway));
+ }
+
+ if (passt->interface) {
+ g_ptr_array_add(args, g_strdup("--interface"));
+ g_ptr_array_add(args, g_strdup(passt->interface));
+ }
+
+ if (passt->outbound) {
+ g_ptr_array_add(args, g_strdup("--outbound"));
+ g_ptr_array_add(args, g_strdup(passt->outbound));
+ }
+
+ if (passt->outbound_if4) {
+ g_ptr_array_add(args, g_strdup("--outbound-if4"));
+ g_ptr_array_add(args, g_strdup(passt->outbound_if4));
+ }
+
+ if (passt->outbound_if6) {
+ g_ptr_array_add(args, g_strdup("--outbound-if6"));
+ g_ptr_array_add(args, g_strdup(passt->outbound_if6));
+ }
+
+ if (passt->dns) {
+ g_ptr_array_add(args, g_strdup("--dns"));
+ g_ptr_array_add(args, g_strdup(passt->dns));
+ }
+ if (passt->fqdn) {
+ g_ptr_array_add(args, g_strdup("--fqdn"));
+ g_ptr_array_add(args, g_strdup(passt->fqdn));
+ }
+
+ if (passt->has_dhcp_dns && !passt->dhcp_dns) {
+ g_ptr_array_add(args, g_strdup("--no-dhcp-dns"));
+ }
+
+ if (passt->has_dhcp_search && !passt->dhcp_search) {
+ g_ptr_array_add(args, g_strdup("--no-dhcp-search"));
+ }
+
+ if (passt->map_host_loopback) {
+ g_ptr_array_add(args, g_strdup("--map-host-loopback"));
+ g_ptr_array_add(args, g_strdup(passt->map_host_loopback));
+ }
+
+ if (passt->map_guest_addr) {
+ g_ptr_array_add(args, g_strdup("--map-guest-addr"));
+ g_ptr_array_add(args, g_strdup(passt->map_guest_addr));
+ }
+
+ if (passt->dns_forward) {
+ g_ptr_array_add(args, g_strdup("--dns-forward"));
+ g_ptr_array_add(args, g_strdup(passt->dns_forward));
+ }
+
+ if (passt->dns_host) {
+ g_ptr_array_add(args, g_strdup("--dns-host"));
+ g_ptr_array_add(args, g_strdup(passt->dns_host));
+ }
+
+ if (passt->has_tcp && !passt->tcp) {
+ g_ptr_array_add(args, g_strdup("--no-tcp"));
+ }
+
+ if (passt->has_udp && !passt->udp) {
+ g_ptr_array_add(args, g_strdup("--no-udp"));
+ }
+
+ if (passt->has_icmp && !passt->icmp) {
+ g_ptr_array_add(args, g_strdup("--no-icmp"));
+ }
+
+ if (passt->has_dhcp && !passt->dhcp) {
+ g_ptr_array_add(args, g_strdup("--no-dhcp"));
+ }
+
+ if (passt->has_ndp && !passt->ndp) {
+ g_ptr_array_add(args, g_strdup("--no-ndp"));
+ }
+ if (passt->has_dhcpv6 && !passt->dhcpv6) {
+ g_ptr_array_add(args, g_strdup("--no-dhcpv6"));
+ }
+
+ if (passt->has_ra && !passt->ra) {
+ g_ptr_array_add(args, g_strdup("--no-ra"));
+ }
+
+ if (passt->has_freebind && passt->freebind) {
+ g_ptr_array_add(args, g_strdup("--freebind"));
+ }
+
+ if (passt->has_ipv4 && !passt->ipv4) {
+ g_ptr_array_add(args, g_strdup("--ipv6-only"));
+ }
+
+ if (passt->has_ipv6 && !passt->ipv6) {
+ g_ptr_array_add(args, g_strdup("--ipv4-only"));
+ }
+
+ if (passt->has_search && passt->search) {
+ const StringList *list = passt->search;
+ GString *domains = g_string_new(list->value->str);
+
+ list = list->next;
+ while (list) {
+ g_string_append(domains, " ");
+ g_string_append(domains, list->value->str);
+ list = list->next;
+ }
+
+ g_ptr_array_add(args, g_strdup("--search"));
+ g_ptr_array_add(args, g_string_free(domains, FALSE));
+ }
+
+ if (passt->has_tcp_ports && passt->tcp_ports) {
+ const StringList *list = passt->tcp_ports;
+ GString *tcp_ports = g_string_new(list->value->str);
+
+ list = list->next;
+ while (list) {
+ g_string_append(tcp_ports, ",");
+ g_string_append(tcp_ports, list->value->str);
+ list = list->next;
+ }
+
+ g_ptr_array_add(args, g_strdup("--tcp-ports"));
+ g_ptr_array_add(args, g_string_free(tcp_ports, FALSE));
+ }
+
+ if (passt->has_udp_ports && passt->udp_ports) {
+ const StringList *list = passt->udp_ports;
+ GString *udp_ports = g_string_new(list->value->str);
+
+ list = list->next;
+ while (list) {
+ g_string_append(udp_ports, ",");
+ g_string_append(udp_ports, list->value->str);
+ list = list->next;
+ }
+
+ g_ptr_array_add(args, g_strdup("--udp-ports"));
+ g_ptr_array_add(args, g_string_free(udp_ports, FALSE));
+ }
+
+ if (passt->has_param && passt->param) {
+ const StringList *list = passt->param;
+
+ while (list) {
+ g_ptr_array_add(args, g_strdup(list->value->str));
+ list = list->next;
+ }
+ }
+
+ /* provide a pid file to be able to kil passt on exit */
+ g_ptr_array_add(args, g_strdup("--pid"));
+ g_ptr_array_add(args, g_strdup(pidfile));
+
+ /* g_subprocess_launcher_take_fd() will set the socket on fd 3 */
+ g_ptr_array_add(args, g_strdup("--fd"));
+ g_ptr_array_add(args, g_strdup("3"));
+
+ g_ptr_array_add(args, NULL);
+
+ return args;
+}
+
+int net_init_passt(const Netdev *netdev, const char *name,
+ NetClientState *peer, Error **errp)
+{
+ g_autoptr(GError) error = NULL;
+ NetClientState *nc;
+ NetPasstState *s;
+ GPtrArray *args;
+ gchar *pidfile;
+ int pidfd;
+
+ assert(netdev->type == NET_CLIENT_DRIVER_PASST);
+
+ pidfd = g_file_open_tmp("passt-XXXXXX.pid", &pidfile, &error);
+ if (pidfd == -1) {
+ error_setg(errp, "Failed to create temporary file: %s", error->message);
+ return -1;
+ }
+ close(pidfd);
+
+ args = net_passt_decode_args(&netdev->u.passt, pidfile, errp);
+ if (args == NULL) {
+ g_free(pidfile);
+ return -1;
+ }
+
+ nc = qemu_new_net_client(&net_passt_info, peer, "passt", name);
+ s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ s->args = args;
+ s->pidfile = pidfile;
+
+ if (net_passt_stream_start(s, errp) == -1) {
+ qemu_del_net_client(nc);
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/qapi/net.json b/qapi/net.json
index 97ea183981..24999f6752 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -112,6 +112,116 @@
'data': {
'str': 'str' } }
+##
+# @NetDevPasstOptions:
+#
+# Unprivileged user-mode network connectivity using passt
+#
+# @path: Filename of the passt program to run (by default 'passt', and use PATH)
+#
+# @quiet: don't print informational messages (default, passed as '--quiet')
+#
+# @mtu: assign MTU via DHCP/NDP
+#
+# @address: IPv4 or IPv6 address
+#
+# @netmask: IPv4 mask
+#
+# @mac: source MAC address
+#
+# @gateway: IPv4 or IPv6 address as gateway
+#
+# @interface: interface for addresses and routes
+#
+# @outbound: bind to address as outbound source
+#
+# @outbound-if4: bind to outbound interface for IPv4
+#
+# @outbound-if6: bind to outbound interface for IPv6
+#
+# @dns: IPv4 or IPv6 address as DNS
+#
+# @search: search domains
+#
+# @fqdn: FQDN to configure client with
+#
+# @dhcp-dns: enable/disable DNS list in DHCP/DHCPv6/NDP
+#
+# @dhcp-search: enable/disable list in DHCP/DHCPv6/NDP
+#
+# @map-host-loopback: addresse to refer to host
+#
+# @map-guest-addr: addr to translate to guest's address
+#
+# @dns-forward: forward DNS queries sent to
+#
+# @dns-host: host nameserver to direct queries to
+#
+# @tcp: enable/disable TCP
+#
+# @udp: enable/disable UDP
+#
+# @icmp: enable/disable ICMP
+#
+# @dhcp: enable/disable DHCP
+#
+# @ndp: enable/disable NDP
+#
+# @dhcpv6: enable/disable DHCPv6
+#
+# @ra: enable/disable route advertisements
+#
+# @freebind: bind to any address for forwarding
+#
+# @ipv4: enable/disable IPv4
+#
+# @ipv6: enable/disable IPv6
+#
+# @tcp-ports: TCP ports to forward
+#
+# @udp-ports: UDP ports to forward
+#
+# @param: parameter to pass to passt command
+#
+# Since: 10.1
+##
+{ 'struct': 'NetDevPasstOptions',
+ 'data': {
+ '*path': 'str',
+ '*quiet': 'bool',
+ '*mtu': 'int',
+ '*address': 'str',
+ '*netmask': 'str',
+ '*mac': 'str',
+ '*gateway': 'str',
+ '*interface': 'str',
+ '*outbound': 'str',
+ '*outbound-if4': 'str',
+ '*outbound-if6': 'str',
+ '*dns': 'str',
+ '*search': ['String'],
+ '*fqdn': 'str',
+ '*dhcp-dns': 'bool',
+ '*dhcp-search': 'bool',
+ '*map-host-loopback': 'str',
+ '*map-guest-addr': 'str',
+ '*dns-forward': 'str',
+ '*dns-host': 'str',
+ '*tcp': 'bool',
+ '*udp': 'bool',
+ '*icmp': 'bool',
+ '*dhcp': 'bool',
+ '*ndp': 'bool',
+ '*dhcpv6': 'bool',
+ '*ra': 'bool',
+ '*freebind': 'bool',
+ '*ipv4': 'bool',
+ '*ipv6': 'bool',
+ '*tcp-ports': ['String'],
+ '*udp-ports': ['String'],
+ '*param': ['String'] },
+ 'if': 'CONFIG_PASST' }
+
##
# @NetdevUserOptions:
#
@@ -729,12 +839,15 @@
#
# @af-xdp: since 8.2
#
+# @passt: since 10.1
+#
# Since: 2.7
##
{ 'enum': 'NetClientDriver',
'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'stream',
'dgram', 'vde', 'bridge', 'hubport', 'netmap', 'vhost-user',
'vhost-vdpa',
+ { 'name': 'passt', 'if': 'CONFIG_PASST' },
{ 'name': 'af-xdp', 'if': 'CONFIG_AF_XDP' },
{ 'name': 'vmnet-host', 'if': 'CONFIG_VMNET' },
{ 'name': 'vmnet-shared', 'if': 'CONFIG_VMNET' },
@@ -756,6 +869,8 @@
'discriminator': 'type',
'data': {
'nic': 'NetLegacyNicOptions',
+ 'passt': { 'type': 'NetDevPasstOptions',
+ 'if': 'CONFIG_PASST' },
'user': 'NetdevUserOptions',
'tap': 'NetdevTapOptions',
'l2tpv3': 'NetdevL2TPv3Options',
diff --git a/qemu-options.hx b/qemu-options.hx
index 1f862b19a6..e8252cd5e8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2796,6 +2796,24 @@ DEFHEADING()
DEFHEADING(Network options:)
DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
+#ifdef CONFIG_PASST
+ "-netdev passt,id=str[,path=file][,quiet=on|off]\n"
+ "[,mtu=mtu][,address=addr][,netmask=mask][,mac=addr][,gateway=addr]\n"
+ " [,interface=name][,outbound=address][,outbound-if4=name]\n"
+ " [,outbound-if6=name][,dns=addr][,search=list][,fqdn=name]\n"
+ " [,dhcp-dns=on|off][,dhcp-search=on|off][,map-host-loopback=addr]\n"
+ " [,map-guest-addr=addr][,dns-forward=addr][,dns-host=addr]\n"
+ " [,tcp=on|off][,udp=on|off][,icmp=on|off][,dhcp=on|off]\n"
+ " [,ndp=on|off][,dhcpv6=on|off][,ra=on|off][,freebind=on|off]\n"
+ " [,ipv4=on|off][,ipv6=on|off][,tcp-ports=spec][,udp-ports=spec]\n"
+ " [,param=list]\n"
+ " configure a passt network backend with ID 'str'\n"
+ " if 'path' is not provided 'passt' will be started according to PATH\n"
+ " by default, informational message of passt are not displayed (quiet=on)\n"
+ " to display this message, use 'quiet=off'\n"
+ " for details on other options, refer to passt(1)\n"
+ " 'param' allows to pass any option defined by passt(1)\n"
+#endif
#ifdef CONFIG_SLIRP
"-netdev user,id=str[,ipv4=on|off][,net=addr[/mask]][,host=addr]\n"
" [,ipv6=on|off][,ipv6-net=addr[/int]][,ipv6-host=addr]\n"
@@ -2952,6 +2970,9 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
" configure a hub port on the hub with ID 'n'\n", QEMU_ARCH_ALL)
DEF("nic", HAS_ARG, QEMU_OPTION_nic,
"-nic [tap|bridge|"
+#ifdef CONFIG_PASST
+ "passt|"
+#endif
#ifdef CONFIG_SLIRP
"user|"
#endif
@@ -2984,6 +3005,9 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
" configure or create an on-board (or machine default) NIC and\n"
" connect it to hub 0 (please use -nic unless you need a hub)\n"
"-net ["
+#ifdef CONFIG_PASST
+ "passt|"
+#endif
#ifdef CONFIG_SLIRP
"user|"
#endif
@@ -3005,7 +3029,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
" old way to initialize a host network interface\n"
" (use the -netdev option if possible instead)\n", QEMU_ARCH_ALL)
SRST
-``-nic [tap|bridge|user|l2tpv3|vde|netmap|af-xdp|vhost-user|socket][,...][,mac=macaddr][,model=mn]``
+``-nic [tap|passt|bridge|user|l2tpv3|vde|netmap|af-xdp|vhost-user|socket][,...][,mac=macaddr][,model=mn]``
This option is a shortcut for configuring both the on-board
(default) guest NIC hardware and the host network backend in one go.
The host backend options are the same as with the corresponding
@@ -3027,6 +3051,123 @@ SRST
network backend) which is activated if no other networking options
are provided.
+``-netdev passt,id=str[,option][,...]``
+ Configure a passt network backend which requires no administrator
+ privilege to run. Valid options are:
+
+ ``id=id``
+ Assign symbolic name for use in monitor commands.
+
+ ``path=file``
+ Filename of the passt program to run. If it is not provided,
+ passt command will be started with the help of the PATH environment
+ variable.
+
+ ``quiet=on|off``
+ By default, ``quiet=on`` to disable informational message from
+ passt. ``quiet=on`` is passed as ``--quiet`` to passt.
+
+ ``@mtu``
+ Assign MTU via DHCP/NDP
+
+ ``address``
+ IPv4 or IPv6 address
+
+ ``netmask``
+ IPv4 mask
+
+ ``mac``
+ source MAC address
+
+ ``gateway``
+ IPv4 or IPv6 address as gateway
+
+ ``interface``
+ Interface for addresses and routes
+
+ ``outbound``
+ Bind to address as outbound source
+
+ ``outbound-if4``
+ Bind to outbound interface for IPv4
+
+ ``outbound-if6``
+ Bind to outbound interface for IPv6
+
+ ``dns``
+ IPv4 or IPv6 address as DNS
+
+ ``search``
+ Search domains
+
+ ``fqdn``
+ FQDN to configure client with
+
+ ``dhcp-dns``
+ Enable/disable DNS list in DHCP/DHCPv6/NDP
+
+ ``dhcp-search``
+ Enable/disable list in DHCP/DHCPv6/NDP
+
+ ``map-host-loopback``
+ Addresse to refer to host
+
+ ``map-guest-addr``
+ Addr to translate to guest's address
+
+ ``dns-forward``
+ Forward DNS queries sent to
+
+ ``dns-host``
+ Host nameserver to direct queries to
+
+ ``tcp``
+ Enable/disable TCP
+
+ ``udp``
+ Enable/disable UDP
+
+ ``icmp``
+ Enable/disable ICMP
+
+ ``dhcp``
+ Enable/disable DHCP
+
+ ``ndp``
+ Enable/disable NDP
+
+ ``dhcpv6``
+ Enable/disable DHCPv6
+
+ ``ra``
+ Enable/disable route advertisements
+
+ ``freebind``
+ Bind to any address for forwarding
+
+ ``ipv4``
+ Enable/disable IPv4
+
+ ``ipv6``
+ Enable/disable IPv6
+
+ ``tcp-ports``
+ TCP ports to forward
+
+ ``udp-ports``
+ UDP ports to forward
+
+ ``param=string``
+ ``string`` will be passed to passt has a command line parameter,
+ we can have multiple occurences of the ``param`` parameter to
+ pass multiple parameters to passt.
+
+ For instance, to pass ``--trace --log=trace.log``:
+
+ .. parsed-literal::
+
+ |qemu_system| -nic passt,param=--trace,param=--log=trace.log
+
``-netdev user,id=id[,option][,option][,...]``
Configure user mode host network backend which requires no
administrator privilege to run. Valid options are:
@@ -3711,7 +3852,7 @@ SRST
Use ``-net nic,model=help`` for a list of available devices for your
target.
-``-net user|tap|bridge|socket|l2tpv3|vde[,...][,name=name]``
+``-net user|passt|tap|bridge|socket|l2tpv3|vde[,...][,name=name]``
Configure a host network backend (with the options corresponding to
the same ``-netdev`` option) and connect it to the emulated hub 0
(the default hub). Use name to specify the name of the hub port.
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 73e0770f42..bb3e34d852 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -162,6 +162,7 @@ meson_options_help() {
printf "%s\n" ' oss OSS sound support'
printf "%s\n" ' pa PulseAudio sound support'
printf "%s\n" ' parallels parallels image format support'
+ printf "%s\n" ' passt passt network backend support'
printf "%s\n" ' pipewire PipeWire sound support'
printf "%s\n" ' pixman pixman support'
printf "%s\n" ' plugins TCG plugins via shared library loading'
@@ -422,6 +423,8 @@ _meson_option_parse() {
--disable-pa) printf "%s" -Dpa=disabled ;;
--enable-parallels) printf "%s" -Dparallels=enabled ;;
--disable-parallels) printf "%s" -Dparallels=disabled ;;
+ --enable-passt) printf "%s" -Dpasst=enabled ;;
+ --disable-passt) printf "%s" -Dpasst=disabled ;;
--enable-pipewire) printf "%s" -Dpipewire=enabled ;;
--disable-pipewire) printf "%s" -Dpipewire=disabled ;;
--enable-pixman) printf "%s" -Dpixman=enabled ;;
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 13/16] net/passt: Implement vhost-user backend support
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (11 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 12/16] net: Add passt network backend Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-17 9:32 ` Peter Maydell
2025-07-15 4:35 ` [PULL V2 14/16] net/af-xdp: Remove XDP program cleanup logic Jason Wang
` (3 subsequent siblings)
16 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Laurent Vivier, Jason Wang
From: Laurent Vivier <lvivier@redhat.com>
This commit adds support for the vhost-user interface to the passt
network backend, enabling high-performance, accelerated networking for
guests using passt.
The passt backend can now operate in a vhost-user mode, where it
communicates with the guest's virtio-net device over a socket pair
using the vhost-user protocol. This offloads the datapath from the
main QEMU loop, significantly improving network performance.
When the vhost-user=on option is used with -netdev passt, the new
vhost initialization path is taken instead of the standard
stream-based connection.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
docs/system/devices/net.rst | 12 +-
net/passt.c | 346 ++++++++++++++++++++++++++++++++++++
qapi/net.json | 3 +
qemu-options.hx | 10 +-
4 files changed, 369 insertions(+), 2 deletions(-)
diff --git a/docs/system/devices/net.rst b/docs/system/devices/net.rst
index c586ee0f40..4d787c3aeb 100644
--- a/docs/system/devices/net.rst
+++ b/docs/system/devices/net.rst
@@ -104,7 +104,7 @@ To use the passt backend interface
There is no need to start the daemon as QEMU will do it for you.
-passt is started in the socket-based mode.
+By default, passt will be started in the socket-based mode.
.. parsed-literal::
|qemu_system| [...OPTIONS...] -nic passt
@@ -128,6 +128,16 @@ passt is started in the socket-based mode.
virtio0: index=0,type=nic,model=virtio-net-pci,macaddr=9a:2b:2c:2d:2e:2f
\ netdev0: index=0,type=passt,stream,connected to pid 25428
+To use the vhost-based interface, add the ``vhost-user=on`` parameter and
+select the virtio-net device:
+
+.. parsed-literal::
+ |qemu_system| [...OPTIONS...] -nic passt,model=virtio,vhost-user=on
+
+ (qemu) info network
+ virtio-net-pci.0: index=0,type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
+ \ #net006: index=0,type=passt,vhost-user,connected to pid 25731
+
To use socket based passt interface:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/net/passt.c b/net/passt.c
index 0a4a1ba6aa..6f616ba3c2 100644
--- a/net/passt.c
+++ b/net/passt.c
@@ -7,18 +7,75 @@
*/
#include "qemu/osdep.h"
#include <glib/gstdio.h>
+#include "qemu/error-report.h"
#include <gio/gio.h>
#include "net/net.h"
#include "clients.h"
#include "qapi/error.h"
#include "io/net-listener.h"
+#include "chardev/char-fe.h"
+#include "net/vhost_net.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+#include "standard-headers/linux/virtio_net.h"
#include "stream_data.h"
+#ifdef CONFIG_VHOST_USER
+static const int user_feature_bits[] = {
+ VIRTIO_F_NOTIFY_ON_EMPTY,
+ VIRTIO_F_NOTIFICATION_DATA,
+ VIRTIO_RING_F_INDIRECT_DESC,
+ VIRTIO_RING_F_EVENT_IDX,
+
+ VIRTIO_F_ANY_LAYOUT,
+ VIRTIO_F_VERSION_1,
+ VIRTIO_NET_F_CSUM,
+ VIRTIO_NET_F_GUEST_CSUM,
+ VIRTIO_NET_F_GSO,
+ VIRTIO_NET_F_GUEST_TSO4,
+ VIRTIO_NET_F_GUEST_TSO6,
+ VIRTIO_NET_F_GUEST_ECN,
+ VIRTIO_NET_F_GUEST_UFO,
+ VIRTIO_NET_F_HOST_TSO4,
+ VIRTIO_NET_F_HOST_TSO6,
+ VIRTIO_NET_F_HOST_ECN,
+ VIRTIO_NET_F_HOST_UFO,
+ VIRTIO_NET_F_MRG_RXBUF,
+ VIRTIO_NET_F_MTU,
+ VIRTIO_F_IOMMU_PLATFORM,
+ VIRTIO_F_RING_PACKED,
+ VIRTIO_F_RING_RESET,
+ VIRTIO_F_IN_ORDER,
+ VIRTIO_NET_F_RSS,
+ VIRTIO_NET_F_RSC_EXT,
+ VIRTIO_NET_F_HASH_REPORT,
+ VIRTIO_NET_F_GUEST_USO4,
+ VIRTIO_NET_F_GUEST_USO6,
+ VIRTIO_NET_F_HOST_USO,
+
+ /* This bit implies RARP isn't sent by QEMU out of band */
+ VIRTIO_NET_F_GUEST_ANNOUNCE,
+
+ VIRTIO_NET_F_MQ,
+
+ VHOST_INVALID_FEATURE_BIT
+};
+#endif
+
typedef struct NetPasstState {
NetStreamData data;
GPtrArray *args;
gchar *pidfile;
pid_t pid;
+#ifdef CONFIG_VHOST_USER
+ /* vhost user */
+ VhostUserState *vhost_user;
+ VHostNetState *vhost_net;
+ CharBackend vhost_chr;
+ guint vhost_watch;
+ uint64_t acked_features;
+ bool started;
+#endif
} NetPasstState;
static int net_passt_stream_start(NetPasstState *s, Error **errp);
@@ -27,6 +84,24 @@ static void net_passt_cleanup(NetClientState *nc)
{
NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+#ifdef CONFIG_VHOST_USER
+ if (s->vhost_net) {
+ vhost_net_cleanup(s->vhost_net);
+ g_free(s->vhost_net);
+ s->vhost_net = NULL;
+ }
+ if (s->vhost_watch) {
+ g_source_remove(s->vhost_watch);
+ s->vhost_watch = 0;
+ }
+ qemu_chr_fe_deinit(&s->vhost_chr, true);
+ if (s->vhost_user) {
+ vhost_user_cleanup(s->vhost_user);
+ g_free(s->vhost_user);
+ s->vhost_user = NULL;
+ }
+#endif
+
kill(s->pid, SIGTERM);
g_remove(s->pidfile);
g_free(s->pidfile);
@@ -60,11 +135,98 @@ static gboolean net_passt_send(QIOChannel *ioc, GIOCondition condition,
return G_SOURCE_CONTINUE;
}
+#ifdef CONFIG_VHOST_USER
+static int passt_set_vnet_endianness(NetClientState *nc, bool enable)
+{
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ return 0;
+}
+
+static bool passt_has_vnet_hdr(NetClientState *nc)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ return s->vhost_user != NULL;
+}
+
+static bool passt_has_ufo(NetClientState *nc)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ return s->vhost_user != NULL;
+}
+
+static bool passt_check_peer_type(NetClientState *nc, ObjectClass *oc,
+ Error **errp)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+ const char *driver = object_class_get_name(oc);
+
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ if (s->vhost_user == NULL) {
+ return true;
+ }
+
+ if (!g_str_has_prefix(driver, "virtio-net-")) {
+ error_setg(errp, "vhost-user requires frontend driver virtio-net-*");
+ return false;
+ }
+
+ return true;
+}
+
+static struct vhost_net *passt_get_vhost_net(NetClientState *nc)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ return s->vhost_net;
+}
+
+static uint64_t passt_get_acked_features(NetClientState *nc)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ return s->acked_features;
+}
+
+static void passt_save_acked_features(NetClientState *nc)
+{
+ NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_DRIVER_PASST);
+
+ if (s->vhost_net) {
+ uint64_t features = vhost_net_get_acked_features(s->vhost_net);
+ if (features) {
+ s->acked_features = features;
+ }
+ }
+}
+#endif
+
static NetClientInfo net_passt_info = {
.type = NET_CLIENT_DRIVER_PASST,
.size = sizeof(NetPasstState),
.receive = net_passt_receive,
.cleanup = net_passt_cleanup,
+#ifdef CONFIG_VHOST_USER
+ .has_vnet_hdr = passt_has_vnet_hdr,
+ .has_ufo = passt_has_ufo,
+ .set_vnet_be = passt_set_vnet_endianness,
+ .set_vnet_le = passt_set_vnet_endianness,
+ .check_peer_type = passt_check_peer_type,
+ .get_vhost_net = passt_get_vhost_net,
+#endif
};
static void net_passt_client_connected(QIOTask *task, gpointer opaque)
@@ -163,6 +325,177 @@ static int net_passt_stream_start(NetPasstState *s, Error **errp)
return 0;
}
+#ifdef CONFIG_VHOST_USER
+static gboolean passt_vhost_user_watch(void *do_not_use, GIOCondition cond,
+ void *opaque)
+{
+ NetPasstState *s = opaque;
+
+ qemu_chr_fe_disconnect(&s->vhost_chr);
+
+ return G_SOURCE_CONTINUE;
+}
+
+static void passt_vhost_user_event(void *opaque, QEMUChrEvent event);
+
+static void chr_closed_bh(void *opaque)
+{
+ NetPasstState *s = opaque;
+
+ passt_save_acked_features(&s->data.nc);
+
+ net_client_set_link(&(NetClientState *){ &s->data.nc }, 1, false);
+
+ qemu_chr_fe_set_handlers(&s->vhost_chr, NULL, NULL, passt_vhost_user_event,
+ NULL, s, NULL, true);
+}
+
+static void passt_vhost_user_stop(NetPasstState *s)
+{
+ passt_save_acked_features(&s->data.nc);
+ vhost_net_cleanup(s->vhost_net);
+}
+
+static int passt_vhost_user_start(NetPasstState *s, VhostUserState *be)
+{
+ struct vhost_net *net = NULL;
+ VhostNetOptions options;
+
+ options.backend_type = VHOST_BACKEND_TYPE_USER;
+ options.net_backend = &s->data.nc;
+ options.opaque = be;
+ options.busyloop_timeout = 0;
+ options.nvqs = 2;
+ options.feature_bits = user_feature_bits;
+ options.max_tx_queue_size = VIRTQUEUE_MAX_SIZE;
+ options.get_acked_features = passt_get_acked_features;
+ options.save_acked_features = passt_save_acked_features;
+ options.is_vhost_user = true;
+
+ net = vhost_net_init(&options);
+ if (!net) {
+ error_report("failed to init passt vhost_net");
+ goto err;
+ }
+
+ if (s->vhost_net) {
+ vhost_net_cleanup(s->vhost_net);
+ g_free(s->vhost_net);
+ }
+ s->vhost_net = net;
+
+ return 0;
+err:
+ if (net) {
+ vhost_net_cleanup(net);
+ g_free(net);
+ }
+ passt_vhost_user_stop(s);
+ return -1;
+}
+
+static void passt_vhost_user_event(void *opaque, QEMUChrEvent event)
+{
+ NetPasstState *s = opaque;
+ Error *err = NULL;
+
+ switch (event) {
+ case CHR_EVENT_OPENED:
+ if (passt_vhost_user_start(s, s->vhost_user) < 0) {
+ qemu_chr_fe_disconnect(&s->vhost_chr);
+ return;
+ }
+ s->vhost_watch = qemu_chr_fe_add_watch(&s->vhost_chr, G_IO_HUP,
+ passt_vhost_user_watch, s);
+ net_client_set_link(&(NetClientState *){ &s->data.nc }, 1, true);
+ s->started = true;
+ break;
+ case CHR_EVENT_CLOSED:
+ if (s->vhost_watch) {
+ AioContext *ctx = qemu_get_current_aio_context();
+
+ g_source_remove(s->vhost_watch);
+ s->vhost_watch = 0;
+ qemu_chr_fe_set_handlers(&s->vhost_chr, NULL, NULL, NULL, NULL,
+ NULL, NULL, false);
+
+ aio_bh_schedule_oneshot(ctx, chr_closed_bh, s);
+ }
+ break;
+ case CHR_EVENT_BREAK:
+ case CHR_EVENT_MUX_IN:
+ case CHR_EVENT_MUX_OUT:
+ /* Ignore */
+ break;
+ }
+
+ if (err) {
+ error_report_err(err);
+ }
+}
+
+static int net_passt_vhost_user_init(NetPasstState *s, Error **errp)
+{
+ Chardev *chr;
+ int sv[2];
+
+ if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+ error_setg_errno(errp, errno, "socketpair() failed");
+ return -1;
+ }
+
+ /* connect to passt */
+ qemu_set_info_str(&s->data.nc, "connecting to passt");
+
+ /* create chardev */
+
+ chr = CHARDEV(object_new(TYPE_CHARDEV_SOCKET));
+ if (!chr || qemu_chr_add_client(chr, sv[0]) == -1) {
+ object_unref(OBJECT(chr));
+ error_setg(errp, "Failed to make socket chardev");
+ goto err;
+ }
+
+ s->vhost_user = g_new0(struct VhostUserState, 1);
+ if (!qemu_chr_fe_init(&s->vhost_chr, chr, errp) ||
+ !vhost_user_init(s->vhost_user, &s->vhost_chr, errp)) {
+ goto err;
+ }
+
+ /* start passt */
+ if (net_passt_start_daemon(s, sv[1], errp) == -1) {
+ goto err;
+ }
+
+ do {
+ if (qemu_chr_fe_wait_connected(&s->vhost_chr, errp) < 0) {
+ goto err;
+ }
+
+ qemu_chr_fe_set_handlers(&s->vhost_chr, NULL, NULL,
+ passt_vhost_user_event, NULL, s, NULL,
+ true);
+ } while (!s->started);
+
+ qemu_set_info_str(&s->data.nc, "vhost-user,connected to pid %d", s->pid);
+
+ close(sv[1]);
+ return 0;
+err:
+ close(sv[0]);
+ close(sv[1]);
+
+ return -1;
+}
+#else
+static int net_passt_vhost_user_init(NetPasstState *s, Error **errp)
+{
+ error_setg(errp, "vhost-user support has not been built");
+
+ return -1;
+}
+#endif
+
static GPtrArray *net_passt_decode_args(const NetDevPasstOptions *passt,
gchar *pidfile, Error **errp)
{
@@ -174,6 +507,10 @@ static GPtrArray *net_passt_decode_args(const NetDevPasstOptions *passt,
g_ptr_array_add(args, g_strdup("passt"));
}
+ if (passt->has_vhost_user && passt->vhost_user) {
+ g_ptr_array_add(args, g_strdup("--vhost-user"));
+ }
+
/* by default, be quiet */
if (!passt->has_quiet || passt->quiet) {
g_ptr_array_add(args, g_strdup("--quiet"));
@@ -398,6 +735,15 @@ int net_init_passt(const Netdev *netdev, const char *name,
s->args = args;
s->pidfile = pidfile;
+ if (netdev->u.passt.has_vhost_user && netdev->u.passt.vhost_user) {
+ if (net_passt_vhost_user_init(s, errp) == -1) {
+ qemu_del_net_client(nc);
+ return -1;
+ }
+
+ return 0;
+ }
+
if (net_passt_stream_start(s, errp) == -1) {
qemu_del_net_client(nc);
return -1;
diff --git a/qapi/net.json b/qapi/net.json
index 24999f6752..0f766041a3 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -121,6 +121,8 @@
#
# @quiet: don't print informational messages (default, passed as '--quiet')
#
+# @vhost-user: enable vhost-user
+#
# @mtu: assign MTU via DHCP/NDP
#
# @address: IPv4 or IPv6 address
@@ -189,6 +191,7 @@
'data': {
'*path': 'str',
'*quiet': 'bool',
+ '*vhost-user': 'bool',
'*mtu': 'int',
'*address': 'str',
'*netmask': 'str',
diff --git a/qemu-options.hx b/qemu-options.hx
index e8252cd5e8..a3c066c678 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2797,7 +2797,7 @@ DEFHEADING(Network options:)
DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
#ifdef CONFIG_PASST
- "-netdev passt,id=str[,path=file][,quiet=on|off]\n"
+ "-netdev passt,id=str[,path=file][,quiet=on|off][,vhost-user=on|off]\n"
"[,mtu=mtu][,address=addr][,netmask=mask][,mac=addr][,gateway=addr]\n"
" [,interface=name][,outbound=address][,outbound-if4=name]\n"
" [,outbound-if6=name][,dns=addr][,search=list][,fqdn=name]\n"
@@ -2811,6 +2811,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
" if 'path' is not provided 'passt' will be started according to PATH\n"
" by default, informational message of passt are not displayed (quiet=on)\n"
" to display this message, use 'quiet=off'\n"
+ " by default, passt will be started in socket-based mode, to enable vhost-mode,\n"
+ " use 'vhost-user=on'\n"
" for details on other options, refer to passt(1)\n"
" 'param' allows to pass any option defined by passt(1)\n"
#endif
@@ -3067,6 +3069,12 @@ SRST
By default, ``quiet=on`` to disable informational message from
passt. ``quiet=on`` is passed as ``--quiet`` to passt.
+ ``vhost-user=on|off``
+ By default, ``vhost-user=off`` and QEMU uses the stream network
+ backend to communicate with passt. If ``vhost-user=on``, passt is
+ started with ``--vhost-user`` and QEMU uses the vhost-user network
+ backend to communicate with passt.
+
``@mtu``
Assign MTU via DHCP/NDP
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 14/16] net/af-xdp: Remove XDP program cleanup logic
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (12 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 13/16] net/passt: Implement vhost-user backend support Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 15/16] net/af-xdp: Fix up cleanup path upon failure in queue creation Jason Wang
` (2 subsequent siblings)
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Daniel Borkmann, Ilya Maximets, Jason Wang, Anton Protopopov
From: Daniel Borkmann <daniel@iogearbox.net>
There are two issues with the XDP program removal in af_xdp_cleanup():
1) Starting from libxdp 1.3.0 [0] the XDP program gets automatically
detached when we call xsk_socket__delete() for the last successfully
configured queue. libxdp internally keeps track of that. For QEMU
we require libxdp >= 1.4.0. Given QEMU is not loading the program,
lets also not attempt to remove it and delegate this instead.
2) The removal logic is incorrect anyway because we are setting n_queues
into the last queue that never has xdp_flags on failure, so the logic
is always skipped since the non-zero test for s->xdp_flags in
af_xdp_cleanup() fails.
Fixes: cb039ef3d9e3 ("net: add initial support for AF_XDP network backend")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
Link: https://github.com/xdp-project/xdp-tools/commit/38c2914988fd5c1ef65f2381fc8af9f3e8404e2b [0]
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net/af-xdp.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/net/af-xdp.c b/net/af-xdp.c
index d022534d76..3d3268e18b 100644
--- a/net/af-xdp.c
+++ b/net/af-xdp.c
@@ -49,7 +49,6 @@ typedef struct AFXDPState {
char *buffer;
struct xsk_umem *umem;
- uint32_t n_queues;
uint32_t xdp_flags;
bool inhibit;
} AFXDPState;
@@ -274,14 +273,6 @@ static void af_xdp_cleanup(NetClientState *nc)
s->umem = NULL;
qemu_vfree(s->buffer);
s->buffer = NULL;
-
- /* Remove the program if it's the last open queue. */
- if (!s->inhibit && nc->queue_index == s->n_queues - 1 && s->xdp_flags
- && bpf_xdp_detach(s->ifindex, s->xdp_flags, NULL) != 0) {
- fprintf(stderr,
- "af-xdp: unable to remove XDP program from '%s', ifindex: %d\n",
- s->ifname, s->ifindex);
- }
}
static int af_xdp_umem_create(AFXDPState *s, int sock_fd, Error **errp)
@@ -490,12 +481,9 @@ int net_init_af_xdp(const Netdev *netdev,
pstrcpy(s->ifname, sizeof(s->ifname), opts->ifname);
s->ifindex = ifindex;
- s->n_queues = queues;
if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp)
|| af_xdp_socket_create(s, opts, errp)) {
- /* Make sure the XDP program will be removed. */
- s->n_queues = i;
error_propagate(errp, err);
goto err;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 15/16] net/af-xdp: Fix up cleanup path upon failure in queue creation
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (13 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 14/16] net/af-xdp: Remove XDP program cleanup logic Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-15 4:35 ` [PULL V2 16/16] net/af-xdp: Support pinned map path for AF_XDP sockets Jason Wang
2025-07-16 12:39 ` [PULL V2 00/16] Net patches Stefan Hajnoczi
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Daniel Borkmann, Ilya Maximets, Jason Wang, Anton Protopopov
From: Daniel Borkmann <daniel@iogearbox.net>
While testing, it turned out that upon error in the queue creation loop,
we never trigger the af_xdp_cleanup() handler. This is because we pass
errp instead of a local err pointer into the various AF_XDP setup functions
instead of a scheme like:
bool fn(..., Error **errp)
{
Error *err = NULL;
foo(arg, &err);
if (err) {
handle the error...
error_propagate(errp, err);
return false;
}
...
}
The same is true for the attachment probing with bpf_xdp_query_id(). With a
conversion into the above format, the af_xdp_cleanup() handler is called as
expected. Note the error_propagate() handles a NULL err internally.
Fixes: cb039ef3d9e3 ("net: add initial support for AF_XDP network backend")
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net/af-xdp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/af-xdp.c b/net/af-xdp.c
index 3d3268e18b..cbe1cae700 100644
--- a/net/af-xdp.c
+++ b/net/af-xdp.c
@@ -482,9 +482,8 @@ int net_init_af_xdp(const Netdev *netdev,
pstrcpy(s->ifname, sizeof(s->ifname), opts->ifname);
s->ifindex = ifindex;
- if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp)
- || af_xdp_socket_create(s, opts, errp)) {
- error_propagate(errp, err);
+ if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, &err) ||
+ af_xdp_socket_create(s, opts, &err)) {
goto err;
}
}
@@ -492,7 +491,7 @@ int net_init_af_xdp(const Netdev *netdev,
if (nc0) {
s = DO_UPCAST(AFXDPState, nc, nc0);
if (bpf_xdp_query_id(s->ifindex, s->xdp_flags, &prog_id) || !prog_id) {
- error_setg_errno(errp, errno,
+ error_setg_errno(&err, errno,
"no XDP program loaded on '%s', ifindex: %d",
s->ifname, s->ifindex);
goto err;
@@ -506,6 +505,7 @@ int net_init_af_xdp(const Netdev *netdev,
err:
if (nc0) {
qemu_del_net_client(nc0);
+ error_propagate(errp, err);
}
return -1;
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PULL V2 16/16] net/af-xdp: Support pinned map path for AF_XDP sockets
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (14 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 15/16] net/af-xdp: Fix up cleanup path upon failure in queue creation Jason Wang
@ 2025-07-15 4:35 ` Jason Wang
2025-07-16 12:39 ` [PULL V2 00/16] Net patches Stefan Hajnoczi
16 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2025-07-15 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Daniel Borkmann, Ilya Maximets, Jason Wang, Anton Protopopov
From: Daniel Borkmann <daniel@iogearbox.net>
Extend 'inhibit=on' setting with the option to specify a pinned XSK map
path along with a starting index (default 0) to push the created XSK
sockets into. Example usage:
# ./build/qemu-system-x86_64 [...] \
-netdev af-xdp,ifname=enp2s0f0np0,id=net0,mode=native,queues=2,start-queue=14,inhibit=on,map-path=/sys/fs/bpf/xsks_map,map-start-index=14 \
-device virtio-net-pci,netdev=net0 [...]
This is useful for the case where an existing XDP program with XSK map
is present on the AF_XDP supported phys device and the XSK map is not
yet populated. For example, the former could have been pre-loaded onto
the netdevice by a control plane, which later launches QEMU to populate
it with XSK sockets.
Normally, the main idea behind 'inhibit=on' is that the QEMU instance
doesn't need to have a lot of privileges to use the pre-loaded program
and the pre-created sockets, but this mentioned use-case here is different
where QEMU still needs privileges to create the sockets.
The 'map-start-index' parameter is optional and defaults to 0. It allows
flexible placement of the XSK sockets, and is up to the user to specify
when the XDP program with XSK map was already preloaded. In the simplest
case the queue-to-map-slot mapping is just 1:1 based on ctx->rx_queue_index
but the user might as well have a different scheme (or smaller map size,
e.g. ctx->rx_queue_index % max_size) to push the inbound traffic to one
of the XSK sockets.
Note that the bpf_xdp_query_id() is now only tested for 'inhibit=off'
since only in the latter case the libxdp takes care of installing the
XDP program which was installed based on the s->xdp_flags pointing to
either driver or skb mode. For 'inhibit=on' we don't make any assumptions
and neither go down the path of probing all possible options in which
way the user installed the XDP program.
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net/af-xdp.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++---
qapi/net.json | 29 +++++++++++------
qemu-options.hx | 23 ++++++++++++--
3 files changed, 118 insertions(+), 17 deletions(-)
diff --git a/net/af-xdp.c b/net/af-xdp.c
index cbe1cae700..14f302ea21 100644
--- a/net/af-xdp.c
+++ b/net/af-xdp.c
@@ -51,6 +51,10 @@ typedef struct AFXDPState {
uint32_t xdp_flags;
bool inhibit;
+
+ char *map_path;
+ int map_fd;
+ uint32_t map_start_index;
} AFXDPState;
#define AF_XDP_BATCH_SIZE 64
@@ -260,6 +264,7 @@ static void af_xdp_send(void *opaque)
static void af_xdp_cleanup(NetClientState *nc)
{
AFXDPState *s = DO_UPCAST(AFXDPState, nc, nc);
+ int idx;
qemu_purge_queued_packets(nc);
@@ -273,6 +278,18 @@ static void af_xdp_cleanup(NetClientState *nc)
s->umem = NULL;
qemu_vfree(s->buffer);
s->buffer = NULL;
+
+ if (s->map_fd >= 0) {
+ idx = nc->queue_index + s->map_start_index;
+ if (bpf_map_delete_elem(s->map_fd, &idx)) {
+ fprintf(stderr, "af-xdp: unable to remove AF_XDP socket from map"
+ " %s\n", s->map_path);
+ }
+ close(s->map_fd);
+ s->map_fd = -1;
+ }
+ g_free(s->map_path);
+ s->map_path = NULL;
}
static int af_xdp_umem_create(AFXDPState *s, int sock_fd, Error **errp)
@@ -336,7 +353,6 @@ static int af_xdp_socket_create(AFXDPState *s,
};
int queue_id, error = 0;
- s->inhibit = opts->has_inhibit && opts->inhibit;
if (s->inhibit) {
cfg.libxdp_flags |= XSK_LIBXDP_FLAGS__INHIBIT_PROG_LOAD;
}
@@ -387,6 +403,35 @@ static int af_xdp_socket_create(AFXDPState *s,
return 0;
}
+static int af_xdp_update_xsk_map(AFXDPState *s, Error **errp)
+{
+ int xsk_fd, idx, error = 0;
+
+ if (!s->map_path) {
+ return 0;
+ }
+
+ s->map_fd = bpf_obj_get(s->map_path);
+ if (s->map_fd < 0) {
+ error = errno;
+ } else {
+ xsk_fd = xsk_socket__fd(s->xsk);
+ idx = s->nc.queue_index + s->map_start_index;
+ if (bpf_map_update_elem(s->map_fd, &idx, &xsk_fd, 0)) {
+ error = errno;
+ }
+ }
+
+ if (error) {
+ error_setg_errno(errp, error,
+ "failed to insert AF_XDP socket into map %s",
+ s->map_path);
+ return -1;
+ }
+
+ return 0;
+}
+
/* NetClientInfo methods. */
static NetClientInfo net_af_xdp_info = {
.type = NET_CLIENT_DRIVER_AF_XDP,
@@ -435,12 +480,14 @@ int net_init_af_xdp(const Netdev *netdev,
{
const NetdevAFXDPOptions *opts = &netdev->u.af_xdp;
NetClientState *nc, *nc0 = NULL;
+ int32_t map_start_index;
unsigned int ifindex;
uint32_t prog_id = 0;
g_autofree int *sock_fds = NULL;
int64_t i, queues;
Error *err = NULL;
AFXDPState *s;
+ bool inhibit;
ifindex = if_nametoindex(opts->ifname);
if (!ifindex) {
@@ -456,8 +503,28 @@ int net_init_af_xdp(const Netdev *netdev,
return -1;
}
- if ((opts->has_inhibit && opts->inhibit) != !!opts->sock_fds) {
- error_setg(errp, "'inhibit=on' requires 'sock-fds' and vice versa");
+ inhibit = opts->has_inhibit && opts->inhibit;
+ if (inhibit && !opts->sock_fds && !opts->map_path) {
+ error_setg(errp, "'inhibit=on' requires 'sock-fds' or 'map-path'");
+ return -1;
+ }
+ if (!inhibit && (opts->sock_fds || opts->map_path)) {
+ error_setg(errp, "'sock-fds' and 'map-path' require 'inhibit=on'");
+ return -1;
+ }
+ if (opts->sock_fds && opts->map_path) {
+ error_setg(errp, "'sock-fds' and 'map-path' are mutually exclusive");
+ return -1;
+ }
+ if (!opts->map_path && opts->has_map_start_index) {
+ error_setg(errp, "'map-start-index' requires 'map-path'");
+ return -1;
+ }
+
+ map_start_index = opts->has_map_start_index ? opts->map_start_index : 0;
+ if (map_start_index < 0) {
+ error_setg(errp, "'map-start-index' cannot be negative (%d)",
+ map_start_index);
return -1;
}
@@ -481,14 +548,20 @@ int net_init_af_xdp(const Netdev *netdev,
pstrcpy(s->ifname, sizeof(s->ifname), opts->ifname);
s->ifindex = ifindex;
+ s->inhibit = inhibit;
+
+ s->map_path = g_strdup(opts->map_path);
+ s->map_start_index = map_start_index;
+ s->map_fd = -1;
if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, &err) ||
- af_xdp_socket_create(s, opts, &err)) {
+ af_xdp_socket_create(s, opts, &err) ||
+ af_xdp_update_xsk_map(s, &err)) {
goto err;
}
}
- if (nc0) {
+ if (nc0 && !inhibit) {
s = DO_UPCAST(AFXDPState, nc, nc0);
if (bpf_xdp_query_id(s->ifindex, s->xdp_flags, &prog_id) || !prog_id) {
error_setg_errno(&err, errno,
diff --git a/qapi/net.json b/qapi/net.json
index 0f766041a3..1f40bf46bb 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -567,25 +567,34 @@
# (default: 0).
#
# @inhibit: Don't load a default XDP program, use one already loaded
-# to the interface (default: false). Requires @sock-fds.
+# to the interface (default: false). Requires @sock-fds or @map-path.
#
# @sock-fds: A colon (:) separated list of file descriptors for
# already open but not bound AF_XDP sockets in the queue order.
# One fd per queue. These descriptors should already be added
-# into XDP socket map for corresponding queues. Requires
-# @inhibit.
+# into XDP socket map for corresponding queues. @sock-fds and
+# @map-path are mutually exclusive. Requires @inhibit.
+#
+# @map-path: The path to a pinned xsk map to push file descriptors
+# for bound AF_XDP sockets into. @map-path and @sock-fds are
+# mutually exclusive. Requires @inhibit. (Since 10.1)
+#
+# @map-start-index: Use @map-path to insert xsk sockets starting from
+# this index number (default: 0). Requires @map-path. (Since 10.1)
#
# Since: 8.2
##
{ 'struct': 'NetdevAFXDPOptions',
'data': {
- 'ifname': 'str',
- '*mode': 'AFXDPMode',
- '*force-copy': 'bool',
- '*queues': 'int',
- '*start-queue': 'int',
- '*inhibit': 'bool',
- '*sock-fds': 'str' },
+ 'ifname': 'str',
+ '*mode': 'AFXDPMode',
+ '*force-copy': 'bool',
+ '*queues': 'int',
+ '*start-queue': 'int',
+ '*inhibit': 'bool',
+ '*sock-fds': 'str',
+ '*map-path': 'str',
+ '*map-start-index': 'int32' },
'if': 'CONFIG_AF_XDP' }
##
diff --git a/qemu-options.hx b/qemu-options.hx
index a3c066c678..bf19987cb0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2929,6 +2929,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
#ifdef CONFIG_AF_XDP
"-netdev af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off]\n"
" [,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z]\n"
+ " [,map-path=/path/to/socket/map][,map-start-index=i]\n"
" attach to the existing network interface 'name' with AF_XDP socket\n"
" use 'mode=MODE' to specify an XDP program attach mode\n"
" use 'force-copy=on|off' to force XDP copy mode even if device supports zero-copy (default: off)\n"
@@ -2936,6 +2937,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
" with inhibit=on,\n"
" use 'sock-fds' to provide file descriptors for already open AF_XDP sockets\n"
" added to a socket map in XDP program. One socket per queue.\n"
+ " use 'map-path' to provide the socket map location to populate AF_XDP sockets with,\n"
+ " and use 'map-start-index' to specify the starting index for the map (default: 0) (Since 10.1)\n"
" use 'queues=n' to specify how many queues of a multiqueue interface should be used\n"
" use 'start-queue=m' to specify the first queue that should be used\n"
#endif
@@ -3759,7 +3762,7 @@ SRST
# launch QEMU instance
|qemu_system| linux.img -nic vde,sock=/tmp/myswitch
-``-netdev af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off][,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z]``
+``-netdev af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off][,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z][,map-path=/path/to/socket/map][,map-start-index=i]``
Configure AF_XDP backend to connect to a network interface 'name'
using AF_XDP socket. A specific program attach mode for a default
XDP program can be forced with 'mode', defaults to best-effort,
@@ -3799,7 +3802,8 @@ SRST
-netdev af-xdp,id=n1,ifname=eth0,queues=1,start-queue=1
XDP program can also be loaded externally. In this case 'inhibit' option
- should be set to 'on' and 'sock-fds' provided with file descriptors for
+ should be set to 'on'. Either 'sock-fds' or 'map-path' can be used with
+ 'inhibit' enabled. 'sock-fds' can be provided with file descriptors for
already open but not bound XDP sockets already added to a socket map for
corresponding queues. One socket per queue.
@@ -3808,6 +3812,21 @@ SRST
|qemu_system| linux.img -device virtio-net-pci,netdev=n1 \\
-netdev af-xdp,id=n1,ifname=eth0,queues=3,inhibit=on,sock-fds=15:16:17
+ For the 'inhibit' option set to 'on' used together with 'map-path' it is
+ expected that the XDP program with the socket map is already loaded on
+ the networking device and the map pinned into BPF file system. The path
+ to the pinned map is then passed to QEMU which then creates the file
+ descriptors and inserts them into the existing socket map.
+
+ .. parsed-literal::
+
+ |qemu_system| linux.img -device virtio-net-pci,netdev=n1 \\
+ -netdev af-xdp,id=n1,ifname=eth0,queues=2,inhibit=on,map-path=/sys/fs/bpf/xsks_map
+
+ Additionally, 'map-start-index' can be used to specify the start offset
+ for insertion into the socket map. The combination of 'map-path' and
+ 'sock-fds' together is not supported.
+
``-netdev vhost-user,chardev=id[,vhostforce=on|off][,queues=n]``
Establish a vhost-user netdev, backed by a chardev id. The chardev
should be a unix domain socket backed one. The vhost-user uses a
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PULL V2 00/16] Net patches
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
` (15 preceding siblings ...)
2025-07-15 4:35 ` [PULL V2 16/16] net/af-xdp: Support pinned map path for AF_XDP sockets Jason Wang
@ 2025-07-16 12:39 ` Stefan Hajnoczi
16 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2025-07-16 12:39 UTC (permalink / raw)
To: Jason Wang; +Cc: qemu-devel, Jason Wang
[-- Attachment #1: Type: text/plain, Size: 116 bytes --]
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/10.1 for any user-visible changes.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL V2 12/16] net: Add passt network backend
2025-07-15 4:35 ` [PULL V2 12/16] net: Add passt network backend Jason Wang
@ 2025-07-17 9:28 ` Peter Maydell
2025-07-17 11:12 ` Laurent Vivier
0 siblings, 1 reply; 21+ messages in thread
From: Peter Maydell @ 2025-07-17 9:28 UTC (permalink / raw)
To: Jason Wang; +Cc: qemu-devel, Laurent Vivier
On Tue, 15 Jul 2025 at 05:42, Jason Wang <jasowang@redhat.com> wrote:
>
> From: Laurent Vivier <lvivier@redhat.com>
>
> This commit introduces support for passt as a new network backend.
> passt is an unprivileged, user-mode networking solution that provides
> connectivity for virtual machines by launching an external helper process.
>
> The implementation reuses the generic stream data handling logic. It
> launches the passt binary using GSubprocess, passing it a file
> descriptor from a socketpair() for communication. QEMU connects to
> the other end of the socket pair to establish the network data stream.
>
> The PID of the passt daemon is tracked via a temporary file to
> ensure it is terminated when QEMU exits.
Hi; Coverity points out some potential issues with this code:
> +static void net_passt_cleanup(NetClientState *nc)
> +{
> + NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
> +
> + kill(s->pid, SIGTERM);
CID 1612369: we don't check the return value from kill().
> + g_remove(s->pidfile);
> + g_free(s->pidfile);
> + g_ptr_array_free(s->args, TRUE);
> +}
> +
> +static ssize_t net_passt_receive(NetClientState *nc, const uint8_t *buf,
> + size_t size)
> +{
> + NetStreamData *d = DO_UPCAST(NetStreamData, nc, nc);
> +
> + return net_stream_data_receive(d, buf, size);
> +}
> +
> +static gboolean net_passt_send(QIOChannel *ioc, GIOCondition condition,
> + gpointer data)
> +{
> + if (net_stream_data_send(ioc, condition, data) == G_SOURCE_REMOVE) {
> + NetPasstState *s = DO_UPCAST(NetPasstState, data, data);
> + Error *error;
CID 1612368: you forgot to initialize error to NULL.
> +
> + /* we need to restart passt */
> + kill(s->pid, SIGTERM);
Another kill() without checking for failure.
> + if (net_passt_stream_start(s, &error) == -1) {
> + error_report_err(error);
> + }
> +
> + return G_SOURCE_REMOVE;
> + }
> +
> + return G_SOURCE_CONTINUE;
> +}
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL V2 13/16] net/passt: Implement vhost-user backend support
2025-07-15 4:35 ` [PULL V2 13/16] net/passt: Implement vhost-user backend support Jason Wang
@ 2025-07-17 9:32 ` Peter Maydell
0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-17 9:32 UTC (permalink / raw)
To: Jason Wang; +Cc: qemu-devel, Laurent Vivier
On Tue, 15 Jul 2025 at 05:37, Jason Wang <jasowang@redhat.com> wrote:
>
> From: Laurent Vivier <lvivier@redhat.com>
>
> This commit adds support for the vhost-user interface to the passt
> network backend, enabling high-performance, accelerated networking for
> guests using passt.
>
> The passt backend can now operate in a vhost-user mode, where it
> communicates with the guest's virtio-net device over a socket pair
> using the vhost-user protocol. This offloads the datapath from the
> main QEMU loop, significantly improving network performance.
>
> When the vhost-user=on option is used with -netdev passt, the new
> vhost initialization path is taken instead of the standard
> stream-based connection.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Another couple of coverity issues:
> +static int passt_vhost_user_start(NetPasstState *s, VhostUserState *be)
> +{
> + struct vhost_net *net = NULL;
> + VhostNetOptions options;
> +
> + options.backend_type = VHOST_BACKEND_TYPE_USER;
> + options.net_backend = &s->data.nc;
> + options.opaque = be;
> + options.busyloop_timeout = 0;
> + options.nvqs = 2;
> + options.feature_bits = user_feature_bits;
> + options.max_tx_queue_size = VIRTQUEUE_MAX_SIZE;
> + options.get_acked_features = passt_get_acked_features;
> + options.save_acked_features = passt_save_acked_features;
> + options.is_vhost_user = true;
> +
> + net = vhost_net_init(&options);
> + if (!net) {
> + error_report("failed to init passt vhost_net");
> + goto err;
> + }
> +
> + if (s->vhost_net) {
> + vhost_net_cleanup(s->vhost_net);
> + g_free(s->vhost_net);
> + }
> + s->vhost_net = net;
> +
> + return 0;
> +err:
> + if (net) {
There is no path of code execution which can get here with
net not being NULL, so this code in the if() is dead. CID 1612371.
> + vhost_net_cleanup(net);
> + g_free(net);
> + }
> + passt_vhost_user_stop(s);
> + return -1;
> +}
> +
> +static void passt_vhost_user_event(void *opaque, QEMUChrEvent event)
> +{
> + NetPasstState *s = opaque;
> + Error *err = NULL;
We declare err here...
> +
> + switch (event) {
> + case CHR_EVENT_OPENED:
> + if (passt_vhost_user_start(s, s->vhost_user) < 0) {
> + qemu_chr_fe_disconnect(&s->vhost_chr);
> + return;
> + }
> + s->vhost_watch = qemu_chr_fe_add_watch(&s->vhost_chr, G_IO_HUP,
> + passt_vhost_user_watch, s);
> + net_client_set_link(&(NetClientState *){ &s->data.nc }, 1, true);
> + s->started = true;
> + break;
> + case CHR_EVENT_CLOSED:
> + if (s->vhost_watch) {
> + AioContext *ctx = qemu_get_current_aio_context();
> +
> + g_source_remove(s->vhost_watch);
> + s->vhost_watch = 0;
> + qemu_chr_fe_set_handlers(&s->vhost_chr, NULL, NULL, NULL, NULL,
> + NULL, NULL, false);
> +
> + aio_bh_schedule_oneshot(ctx, chr_closed_bh, s);
> + }
> + break;
> + case CHR_EVENT_BREAK:
> + case CHR_EVENT_MUX_IN:
> + case CHR_EVENT_MUX_OUT:
> + /* Ignore */
> + break;
> + }
...but we never use it in any of the event handling code..
> +
> + if (err) {
> + error_report_err(err);
...so this if() block is dead code. CID 1612375.
> + }
> +}
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL V2 12/16] net: Add passt network backend
2025-07-17 9:28 ` Peter Maydell
@ 2025-07-17 11:12 ` Laurent Vivier
0 siblings, 0 replies; 21+ messages in thread
From: Laurent Vivier @ 2025-07-17 11:12 UTC (permalink / raw)
To: Peter Maydell, Jason Wang; +Cc: qemu-devel
On 17/07/2025 11:28, Peter Maydell wrote:
> On Tue, 15 Jul 2025 at 05:42, Jason Wang <jasowang@redhat.com> wrote:
>>
>> From: Laurent Vivier <lvivier@redhat.com>
>>
>> This commit introduces support for passt as a new network backend.
>> passt is an unprivileged, user-mode networking solution that provides
>> connectivity for virtual machines by launching an external helper process.
>>
>> The implementation reuses the generic stream data handling logic. It
>> launches the passt binary using GSubprocess, passing it a file
>> descriptor from a socketpair() for communication. QEMU connects to
>> the other end of the socket pair to establish the network data stream.
>>
>> The PID of the passt daemon is tracked via a temporary file to
>> ensure it is terminated when QEMU exits.
>
> Hi; Coverity points out some potential issues with this code:
>
>> +static void net_passt_cleanup(NetClientState *nc)
>> +{
>> + NetPasstState *s = DO_UPCAST(NetPasstState, data.nc, nc);
>> +
>> + kill(s->pid, SIGTERM);
>
> CID 1612369: we don't check the return value from kill().
Do we want to check it or "(void)kill()" is enough to fix this?
Thanks,
Laurent
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-17 11:13 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 4:35 [PULL V2 00/16] Net patches Jason Wang
2025-07-15 4:35 ` [PULL V2 01/16] net: fix buffer overflow in af_xdp_umem_create() Jason Wang
2025-07-15 4:35 ` [PULL V2 02/16] virtio-net: Add queues for RSS during migration Jason Wang
2025-07-15 4:35 ` [PULL V2 03/16] net: Refactor stream logic for reuse in '-net passt' Jason Wang
2025-07-15 4:35 ` [PULL V2 04/16] net: Define net_client_set_link() Jason Wang
2025-07-15 4:35 ` [PULL V2 05/16] vhost_net: Rename vhost_set_vring_enable() for clarity Jason Wang
2025-07-15 4:35 ` [PULL V2 06/16] net: Add get_vhost_net callback to NetClientInfo Jason Wang
2025-07-15 4:35 ` [PULL V2 07/16] net: Consolidate vhost feature bits into vhost_net structure Jason Wang
2025-07-15 4:35 ` [PULL V2 08/16] net: Add get_acked_features callback to VhostNetOptions Jason Wang
2025-07-15 4:35 ` [PULL V2 09/16] net: Add save_acked_features callback to vhost_net Jason Wang
2025-07-15 4:35 ` [PULL V2 10/16] net: Allow network backends to advertise max TX queue size Jason Wang
2025-07-15 4:35 ` [PULL V2 11/16] net: Add is_vhost_user flag to vhost_net struct Jason Wang
2025-07-15 4:35 ` [PULL V2 12/16] net: Add passt network backend Jason Wang
2025-07-17 9:28 ` Peter Maydell
2025-07-17 11:12 ` Laurent Vivier
2025-07-15 4:35 ` [PULL V2 13/16] net/passt: Implement vhost-user backend support Jason Wang
2025-07-17 9:32 ` Peter Maydell
2025-07-15 4:35 ` [PULL V2 14/16] net/af-xdp: Remove XDP program cleanup logic Jason Wang
2025-07-15 4:35 ` [PULL V2 15/16] net/af-xdp: Fix up cleanup path upon failure in queue creation Jason Wang
2025-07-15 4:35 ` [PULL V2 16/16] net/af-xdp: Support pinned map path for AF_XDP sockets Jason Wang
2025-07-16 12:39 ` [PULL V2 00/16] Net patches Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).