[PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel
@ 2025-05-21 11:33 Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next Paolo Abeni
                   ` (17 more replies)
  0 siblings, 18 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.

The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.

One of the reasons for the RFC tag is that the kernel-side
implementation has just been shared upstream and is not merged yet, but
there are also other relevant reasons, see below.

Currently, the kernel virtio support limits the feature space to 64 bits,
while the virtio specification allows for a larger number of features.
Specifically, the GSO-over-UDP-tunnel-related virtio features use bits
65-69; the larger part of this series (patches 2-11) actually deals with
the extended feature space.

I tried to minimize the otherwise very large code churn by limiting the
extended features support to arches with native 128 integer support and
introducing the extended features space support only in virtio/vhost
core and in the relevant device driver.

The actual offload implementation is in patches 12-16 and boils down to
propagating the new offload to the tun devices and the vhost backend.

Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support
and vs snapshots creation and restore.

Notably this does not include (yet) any additional tests. Some guidance
on such matter would be really appreciated, and any feedback about the
features extension strategy would be more than welcome!

Paolo Abeni (16):
  linux-headers: Update to Linux v6.15-rc net-next
  migration: introduce support for 128 bit int state.
  virtio: introduce extended features type
  virtio: serialize extended features state
  qmp: update virtio features map to support extended features
  virtio: add support for negotiating extended features.
  virtio-pci: implement support for extended features.
  vhost: add support for negotiating extended features.
  vhost-backend: implement extended features support.
  vhost-net: implement extended features support.
  qdev-properties: add property for extended virtio features
  virtio-net: implement extended features support.
  net: implement tunnel probing
  net: bundle all offloads in a single struct
  net: implement tnl feature offloading
  net: make vhost-net aware of GSO over UDP tunnel hdr layout

 hw/core/qdev-properties.c                     |  46 +++++
 hw/net/e1000e_core.c                          |   5 +-
 hw/net/igb_core.c                             |   5 +-
 hw/net/vhost_net-stub.c                       |   7 +-
 hw/net/vhost_net.c                            |  35 ++--
 hw/net/virtio-net.c                           | 195 +++++++++++++-----
 hw/net/vmxnet3.c                              |  13 +-
 hw/virtio/vhost-backend.c                     |  59 +++++-
 hw/virtio/vhost.c                             |  58 ++++--
 hw/virtio/virtio-bus.c                        |  15 +-
 hw/virtio/virtio-hmp-cmds.c                   |   3 +-
 hw/virtio/virtio-pci.c                        |  19 +-
 hw/virtio/virtio-qmp.c                        |  28 ++-
 hw/virtio/virtio-qmp.h                        |   3 +-
 hw/virtio/virtio.c                            | 103 ++++++++-
 include/hw/qdev-properties.h                  |  13 ++
 include/hw/virtio/vhost-backend.h             |  10 +
 include/hw/virtio/vhost.h                     |  13 +-
 include/hw/virtio/virtio-features.h           |  90 ++++++++
 include/hw/virtio/virtio-net.h                |   2 +-
 include/hw/virtio/virtio-pci.h                |   2 +-
 include/hw/virtio/virtio.h                    |  17 +-
 include/migration/qemu-file-types.h           |  15 ++
 include/migration/vmstate.h                   |  11 +
 include/net/net.h                             |  20 +-
 include/net/vhost_net.h                       |   8 +-
 include/standard-headers/asm-x86/setup_data.h |   4 +-
 include/standard-headers/drm/drm_fourcc.h     |  41 ++++
 include/standard-headers/linux/const.h        |   2 +-
 include/standard-headers/linux/ethtool.h      | 156 ++++++++------
 include/standard-headers/linux/fuse.h         |  12 +-
 include/standard-headers/linux/pci_regs.h     |  13 +-
 include/standard-headers/linux/virtio_net.h   |  46 +++++
 include/standard-headers/linux/virtio_pci.h   |   1 +
 include/standard-headers/linux/virtio_snd.h   |   2 +-
 linux-headers/asm-arm64/kvm.h                 |  11 +
 linux-headers/asm-arm64/unistd_64.h           |   1 +
 linux-headers/asm-generic/mman-common.h       |   1 +
 linux-headers/asm-generic/unistd.h            |   4 +-
 linux-headers/asm-loongarch/unistd_64.h       |   1 +
 linux-headers/asm-mips/unistd_n32.h           |   1 +
 linux-headers/asm-mips/unistd_n64.h           |   1 +
 linux-headers/asm-mips/unistd_o32.h           |   1 +
 linux-headers/asm-powerpc/unistd_32.h         |   1 +
 linux-headers/asm-powerpc/unistd_64.h         |   1 +
 linux-headers/asm-riscv/kvm.h                 |   2 +
 linux-headers/asm-riscv/unistd_32.h           |   1 +
 linux-headers/asm-riscv/unistd_64.h           |   1 +
 linux-headers/asm-s390/unistd_32.h            |   1 +
 linux-headers/asm-s390/unistd_64.h            |   1 +
 linux-headers/asm-x86/kvm.h                   |   3 +
 linux-headers/asm-x86/unistd_32.h             |   1 +
 linux-headers/asm-x86/unistd_64.h             |   1 +
 linux-headers/asm-x86/unistd_x32.h            |   1 +
 linux-headers/linux/bits.h                    |   8 +-
 linux-headers/linux/const.h                   |   2 +-
 linux-headers/linux/iommufd.h                 | 129 +++++++++++-
 linux-headers/linux/kvm.h                     |   1 +
 linux-headers/linux/psp-sev.h                 |  21 +-
 linux-headers/linux/stddef.h                  |   2 +
 linux-headers/linux/vfio.h                    |  30 ++-
 linux-headers/linux/vhost.h                   |  12 +-
 migration/qemu-file.c                         |  16 ++
 migration/vmstate-types.c                     |  25 +++
 net/net.c                                     |  21 +-
 net/netmap.c                                  |   3 +-
 net/tap-bsd.c                                 |   8 +-
 net/tap-linux.c                               |  46 ++++-
 net/tap-solaris.c                             |   9 +-
 net/tap-stub.c                                |   8 +-
 net/tap.c                                     |  19 +-
 net/tap_int.h                                 |   5 +-
 qapi/virtio.json                              |   8 +-
 73 files changed, 1209 insertions(+), 271 deletions(-)
 create mode 100644 include/hw/virtio/virtio-features.h

-- 
2.49.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
@ 2025-05-21 11:33 ` Paolo Abeni
  2025-05-23  9:50   ` Akihiko Odaki
  2025-05-21 11:33 ` [PATCH RFC 02/16] migration: introduce support for 128 bit int state Paolo Abeni
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Update headers to include the virtio GSO over UDP tunnel features

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
The relevant bits are not yet merged upstream, will update this
patch after merge.
---
 include/standard-headers/asm-x86/setup_data.h |   4 +-
 include/standard-headers/drm/drm_fourcc.h     |  41 +++++
 include/standard-headers/linux/const.h        |   2 +-
 include/standard-headers/linux/ethtool.h      | 156 ++++++++++--------
 include/standard-headers/linux/fuse.h         |  12 +-
 include/standard-headers/linux/pci_regs.h     |  13 +-
 include/standard-headers/linux/virtio_net.h   |  46 ++++++
 include/standard-headers/linux/virtio_pci.h   |   1 +
 include/standard-headers/linux/virtio_snd.h   |   2 +-
 linux-headers/asm-arm64/kvm.h                 |  11 ++
 linux-headers/asm-arm64/unistd_64.h           |   1 +
 linux-headers/asm-generic/mman-common.h       |   1 +
 linux-headers/asm-generic/unistd.h            |   4 +-
 linux-headers/asm-loongarch/unistd_64.h       |   1 +
 linux-headers/asm-mips/unistd_n32.h           |   1 +
 linux-headers/asm-mips/unistd_n64.h           |   1 +
 linux-headers/asm-mips/unistd_o32.h           |   1 +
 linux-headers/asm-powerpc/unistd_32.h         |   1 +
 linux-headers/asm-powerpc/unistd_64.h         |   1 +
 linux-headers/asm-riscv/kvm.h                 |   2 +
 linux-headers/asm-riscv/unistd_32.h           |   1 +
 linux-headers/asm-riscv/unistd_64.h           |   1 +
 linux-headers/asm-s390/unistd_32.h            |   1 +
 linux-headers/asm-s390/unistd_64.h            |   1 +
 linux-headers/asm-x86/kvm.h                   |   3 +
 linux-headers/asm-x86/unistd_32.h             |   1 +
 linux-headers/asm-x86/unistd_64.h             |   1 +
 linux-headers/asm-x86/unistd_x32.h            |   1 +
 linux-headers/linux/bits.h                    |   8 +-
 linux-headers/linux/const.h                   |   2 +-
 linux-headers/linux/iommufd.h                 | 129 ++++++++++++++-
 linux-headers/linux/kvm.h                     |   1 +
 linux-headers/linux/psp-sev.h                 |  21 ++-
 linux-headers/linux/stddef.h                  |   2 +
 linux-headers/linux/vfio.h                    |  30 ++--
 linux-headers/linux/vhost.h                   |  12 +-
 36 files changed, 414 insertions(+), 103 deletions(-)

diff --git a/include/standard-headers/asm-x86/setup_data.h b/include/standard-headers/asm-x86/setup_data.h
index 09355f54c5..a483d72f42 100644
--- a/include/standard-headers/asm-x86/setup_data.h
+++ b/include/standard-headers/asm-x86/setup_data.h
@@ -18,7 +18,7 @@
 #define SETUP_INDIRECT			(1<<31)
 #define SETUP_TYPE_MAX			(SETUP_ENUM_MAX | SETUP_INDIRECT)
 
-#ifndef __ASSEMBLY__
+#ifndef __ASSEMBLER__
 
 #include "standard-headers/linux/types.h"
 
@@ -78,6 +78,6 @@ struct ima_setup_data {
 	uint64_t size;
 } QEMU_PACKED;
 
-#endif /* __ASSEMBLY__ */
+#endif /* __ASSEMBLER__ */
 
 #endif /* _ASM_X86_SETUP_DATA_H */
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
index 708647776f..a8b759dcbc 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -420,6 +420,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_MTK     0x0b
 
 /* add more to the end as needed */
 
@@ -1452,6 +1453,46 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
  */
 #define AMLOGIC_FBC_OPTION_MEM_SAVING		(1ULL << 0)
 
+/* MediaTek modifiers
+ * Bits  Parameter                Notes
+ * ----- ------------------------ ---------------------------------------------
+ *   7: 0 TILE LAYOUT              Values are MTK_FMT_MOD_TILE_*
+ *  15: 8 COMPRESSION              Values are MTK_FMT_MOD_COMPRESS_*
+ *  23:16 10 BIT LAYOUT            Values are MTK_FMT_MOD_10BIT_LAYOUT_*
+ *
+ */
+
+#define DRM_FORMAT_MOD_MTK(__flags)		fourcc_mod_code(MTK, __flags)
+
+/*
+ * MediaTek Tiled Modifier
+ * The lowest 8 bits of the modifier is used to specify the tiling
+ * layout. Only the 16L_32S tiling is used for now, but we define an
+ * "untiled" version and leave room for future expansion.
+ */
+#define MTK_FMT_MOD_TILE_MASK     0xf
+#define MTK_FMT_MOD_TILE_NONE     0x0
+#define MTK_FMT_MOD_TILE_16L32S   0x1
+
+/*
+ * Bits 8-15 specify compression options
+ */
+#define MTK_FMT_MOD_COMPRESS_MASK (0xf << 8)
+#define MTK_FMT_MOD_COMPRESS_NONE (0x0 << 8)
+#define MTK_FMT_MOD_COMPRESS_V1   (0x1 << 8)
+
+/*
+ * Bits 16-23 specify how the bits of 10 bit formats are
+ * stored out in memory
+ */
+#define MTK_FMT_MOD_10BIT_LAYOUT_MASK      (0xf << 16)
+#define MTK_FMT_MOD_10BIT_LAYOUT_PACKED    (0x0 << 16)
+#define MTK_FMT_MOD_10BIT_LAYOUT_LSBTILED  (0x1 << 16)
+#define MTK_FMT_MOD_10BIT_LAYOUT_LSBRASTER (0x2 << 16)
+
+/* alias for the most common tiling format */
+#define DRM_FORMAT_MOD_MTK_16L_32S_TILE  DRM_FORMAT_MOD_MTK(MTK_FMT_MOD_TILE_16L32S)
+
 /*
  * AMD modifiers
  *
diff --git a/include/standard-headers/linux/const.h b/include/standard-headers/linux/const.h
index 2122610de7..95ede23342 100644
--- a/include/standard-headers/linux/const.h
+++ b/include/standard-headers/linux/const.h
@@ -33,7 +33,7 @@
  * Missing __asm__ support
  *
  * __BIT128() would not work in the __asm__ code, as it shifts an
- * 'unsigned __init128' data type as direct representation of
+ * 'unsigned __int128' data type as direct representation of
  * 128 bit constants is not supported in the gcc compiler, as
  * they get silently truncated.
  *
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index e83382531c..cef0d207a6 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -2059,6 +2059,24 @@ enum ethtool_link_mode_bit_indices {
 	ETHTOOL_LINK_MODE_10baseT1S_Half_BIT		 = 100,
 	ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT	 = 101,
 	ETHTOOL_LINK_MODE_10baseT1BRR_Full_BIT		 = 102,
+	ETHTOOL_LINK_MODE_200000baseCR_Full_BIT		 = 103,
+	ETHTOOL_LINK_MODE_200000baseKR_Full_BIT		 = 104,
+	ETHTOOL_LINK_MODE_200000baseDR_Full_BIT		 = 105,
+	ETHTOOL_LINK_MODE_200000baseDR_2_Full_BIT	 = 106,
+	ETHTOOL_LINK_MODE_200000baseSR_Full_BIT		 = 107,
+	ETHTOOL_LINK_MODE_200000baseVR_Full_BIT		 = 108,
+	ETHTOOL_LINK_MODE_400000baseCR2_Full_BIT	 = 109,
+	ETHTOOL_LINK_MODE_400000baseKR2_Full_BIT	 = 110,
+	ETHTOOL_LINK_MODE_400000baseDR2_Full_BIT	 = 111,
+	ETHTOOL_LINK_MODE_400000baseDR2_2_Full_BIT	 = 112,
+	ETHTOOL_LINK_MODE_400000baseSR2_Full_BIT	 = 113,
+	ETHTOOL_LINK_MODE_400000baseVR2_Full_BIT	 = 114,
+	ETHTOOL_LINK_MODE_800000baseCR4_Full_BIT	 = 115,
+	ETHTOOL_LINK_MODE_800000baseKR4_Full_BIT	 = 116,
+	ETHTOOL_LINK_MODE_800000baseDR4_Full_BIT	 = 117,
+	ETHTOOL_LINK_MODE_800000baseDR4_2_Full_BIT	 = 118,
+	ETHTOOL_LINK_MODE_800000baseSR4_Full_BIT	 = 119,
+	ETHTOOL_LINK_MODE_800000baseVR4_Full_BIT	 = 120,
 
 	/* must be last entry */
 	__ETHTOOL_LINK_MODE_MASK_NBITS
@@ -2271,73 +2289,81 @@ static inline int ethtool_validate_duplex(uint8_t duplex)
  * be exploited to reduce the RSS queue spread.
  */
 #define	RXH_XFRM_SYM_XOR	(1 << 0)
+/* Similar to SYM_XOR, except that one copy of the XOR'ed fields is replaced by
+ * an OR of the same fields
+ */
+#define	RXH_XFRM_SYM_OR_XOR	(1 << 1)
 #define	RXH_XFRM_NO_CHANGE	0xff
 
-/* L2-L4 network traffic flow types */
-#define	TCP_V4_FLOW	0x01	/* hash or spec (tcp_ip4_spec) */
-#define	UDP_V4_FLOW	0x02	/* hash or spec (udp_ip4_spec) */
-#define	SCTP_V4_FLOW	0x03	/* hash or spec (sctp_ip4_spec) */
-#define	AH_ESP_V4_FLOW	0x04	/* hash only */
-#define	TCP_V6_FLOW	0x05	/* hash or spec (tcp_ip6_spec; nfc only) */
-#define	UDP_V6_FLOW	0x06	/* hash or spec (udp_ip6_spec; nfc only) */
-#define	SCTP_V6_FLOW	0x07	/* hash or spec (sctp_ip6_spec; nfc only) */
-#define	AH_ESP_V6_FLOW	0x08	/* hash only */
-#define	AH_V4_FLOW	0x09	/* hash or spec (ah_ip4_spec) */
-#define	ESP_V4_FLOW	0x0a	/* hash or spec (esp_ip4_spec) */
-#define	AH_V6_FLOW	0x0b	/* hash or spec (ah_ip6_spec; nfc only) */
-#define	ESP_V6_FLOW	0x0c	/* hash or spec (esp_ip6_spec; nfc only) */
-#define	IPV4_USER_FLOW	0x0d	/* spec only (usr_ip4_spec) */
-#define	IP_USER_FLOW	IPV4_USER_FLOW
-#define	IPV6_USER_FLOW	0x0e	/* spec only (usr_ip6_spec; nfc only) */
-#define	IPV4_FLOW	0x10	/* hash only */
-#define	IPV6_FLOW	0x11	/* hash only */
-#define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
-
-/* Used for GTP-U IPv4 and IPv6.
- * The format of GTP packets only includes
- * elements such as TEID and GTP version.
- * It is primarily intended for data communication of the UE.
- */
-#define GTPU_V4_FLOW 0x13	/* hash only */
-#define GTPU_V6_FLOW 0x14	/* hash only */
-
-/* Use for GTP-C IPv4 and v6.
- * The format of these GTP packets does not include TEID.
- * Primarily expected to be used for communication
- * to create sessions for UE data communication,
- * commonly referred to as CSR (Create Session Request).
- */
-#define GTPC_V4_FLOW 0x15	/* hash only */
-#define GTPC_V6_FLOW 0x16	/* hash only */
-
-/* Use for GTP-C IPv4 and v6.
- * Unlike GTPC_V4_FLOW, the format of these GTP packets includes TEID.
- * After session creation, it becomes this packet.
- * This is mainly used for requests to realize UE handover.
- */
-#define GTPC_TEID_V4_FLOW 0x17	/* hash only */
-#define GTPC_TEID_V6_FLOW 0x18	/* hash only */
-
-/* Use for GTP-U and extended headers for the PSC (PDU Session Container).
- * The format of these GTP packets includes TEID and QFI.
- * In 5G communication using UPF (User Plane Function),
- * data communication with this extended header is performed.
- */
-#define GTPU_EH_V4_FLOW 0x19	/* hash only */
-#define GTPU_EH_V6_FLOW 0x1a	/* hash only */
-
-/* Use for GTP-U IPv4 and v6 PSC (PDU Session Container) extended headers.
- * This differs from GTPU_EH_V(4|6)_FLOW in that it is distinguished by
- * UL/DL included in the PSC.
- * There are differences in the data included based on Downlink/Uplink,
- * and can be used to distinguish packets.
- * The functions described so far are useful when you want to
- * handle communication from the mobile network in UPF, PGW, etc.
- */
-#define GTPU_UL_V4_FLOW 0x1b	/* hash only */
-#define GTPU_UL_V6_FLOW 0x1c	/* hash only */
-#define GTPU_DL_V4_FLOW 0x1d	/* hash only */
-#define GTPU_DL_V6_FLOW 0x1e	/* hash only */
+enum {
+	/* L2-L4 network traffic flow types */
+	TCP_V4_FLOW	= 0x01,	/* hash or spec (tcp_ip4_spec) */
+	UDP_V4_FLOW	= 0x02,	/* hash or spec (udp_ip4_spec) */
+	SCTP_V4_FLOW	= 0x03,	/* hash or spec (sctp_ip4_spec) */
+	AH_ESP_V4_FLOW	= 0x04,	/* hash only */
+	TCP_V6_FLOW	= 0x05,	/* hash or spec (tcp_ip6_spec; nfc only) */
+	UDP_V6_FLOW	= 0x06,	/* hash or spec (udp_ip6_spec; nfc only) */
+	SCTP_V6_FLOW	= 0x07,	/* hash or spec (sctp_ip6_spec; nfc only) */
+	AH_ESP_V6_FLOW	= 0x08,	/* hash only */
+	AH_V4_FLOW	= 0x09,	/* hash or spec (ah_ip4_spec) */
+	ESP_V4_FLOW	= 0x0a,	/* hash or spec (esp_ip4_spec) */
+	AH_V6_FLOW	= 0x0b,	/* hash or spec (ah_ip6_spec; nfc only) */
+	ESP_V6_FLOW	= 0x0c,	/* hash or spec (esp_ip6_spec; nfc only) */
+	IPV4_USER_FLOW	= 0x0d,	/* spec only (usr_ip4_spec) */
+	IP_USER_FLOW	= IPV4_USER_FLOW,
+	IPV6_USER_FLOW	= 0x0e, /* spec only (usr_ip6_spec; nfc only) */
+	IPV4_FLOW	= 0x10, /* hash only */
+	IPV6_FLOW	= 0x11, /* hash only */
+	ETHER_FLOW	= 0x12, /* spec only (ether_spec) */
+
+	/* Used for GTP-U IPv4 and IPv6.
+	 * The format of GTP packets only includes
+	 * elements such as TEID and GTP version.
+	 * It is primarily intended for data communication of the UE.
+	 */
+	GTPU_V4_FLOW	= 0x13,	/* hash only */
+	GTPU_V6_FLOW	= 0x14,	/* hash only */
+
+	/* Use for GTP-C IPv4 and v6.
+	 * The format of these GTP packets does not include TEID.
+	 * Primarily expected to be used for communication
+	 * to create sessions for UE data communication,
+	 * commonly referred to as CSR (Create Session Request).
+	 */
+	GTPC_V4_FLOW	= 0x15,	/* hash only */
+	GTPC_V6_FLOW	= 0x16,	/* hash only */
+
+	/* Use for GTP-C IPv4 and v6.
+	 * Unlike GTPC_V4_FLOW, the format of these GTP packets includes TEID.
+	 * After session creation, it becomes this packet.
+	 * This is mainly used for requests to realize UE handover.
+	 */
+	GTPC_TEID_V4_FLOW	= 0x17,	/* hash only */
+	GTPC_TEID_V6_FLOW	= 0x18,	/* hash only */
+
+	/* Use for GTP-U and extended headers for the PSC (PDU Session Container).
+	 * The format of these GTP packets includes TEID and QFI.
+	 * In 5G communication using UPF (User Plane Function),
+	 * data communication with this extended header is performed.
+	 */
+	GTPU_EH_V4_FLOW	= 0x19,	/* hash only */
+	GTPU_EH_V6_FLOW	= 0x1a,	/* hash only */
+
+	/* Use for GTP-U IPv4 and v6 PSC (PDU Session Container) extended headers.
+	 * This differs from GTPU_EH_V(4|6)_FLOW in that it is distinguished by
+	 * UL/DL included in the PSC.
+	 * There are differences in the data included based on Downlink/Uplink,
+	 * and can be used to distinguish packets.
+	 * The functions described so far are useful when you want to
+	 * handle communication from the mobile network in UPF, PGW, etc.
+	 */
+	GTPU_UL_V4_FLOW	= 0x1b,	/* hash only */
+	GTPU_UL_V6_FLOW	= 0x1c,	/* hash only */
+	GTPU_DL_V4_FLOW	= 0x1d,	/* hash only */
+	GTPU_DL_V6_FLOW	= 0x1e,	/* hash only */
+
+	__FLOW_TYPE_COUNT,
+};
 
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index d303effb2a..a2b5815d89 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -229,6 +229,9 @@
  *    - FUSE_URING_IN_OUT_HEADER_SZ
  *    - FUSE_URING_OP_IN_OUT_SZ
  *    - enum fuse_uring_cmd
+ *
+ *  7.43
+ *  - add FUSE_REQUEST_TIMEOUT
  */
 
 #ifndef _LINUX_FUSE_H
@@ -260,7 +263,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 42
+#define FUSE_KERNEL_MINOR_VERSION 43
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -431,6 +434,8 @@ struct fuse_file_lock {
  *		    of the request ID indicates resend requests
  * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
  * FUSE_OVER_IO_URING: Indicate that client supports io-uring
+ * FUSE_REQUEST_TIMEOUT: kernel supports timing out requests.
+ *			 init_out.request_timeout contains the timeout (in secs)
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -473,11 +478,11 @@ struct fuse_file_lock {
 #define FUSE_PASSTHROUGH	(1ULL << 37)
 #define FUSE_NO_EXPORT_SUPPORT	(1ULL << 38)
 #define FUSE_HAS_RESEND		(1ULL << 39)
-
 /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
 #define FUSE_DIRECT_IO_RELAX	FUSE_DIRECT_IO_ALLOW_MMAP
 #define FUSE_ALLOW_IDMAP	(1ULL << 40)
 #define FUSE_OVER_IO_URING	(1ULL << 41)
+#define FUSE_REQUEST_TIMEOUT	(1ULL << 42)
 
 /**
  * CUSE INIT request/reply flags
@@ -905,7 +910,8 @@ struct fuse_init_out {
 	uint16_t	map_alignment;
 	uint32_t	flags2;
 	uint32_t	max_stack_depth;
-	uint32_t	unused[6];
+	uint16_t	request_timeout;
+	uint16_t	unused[11];
 };
 
 #define CUSE_INIT_INFO_MAX 4096
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index 3445c4970e..ba326710f9 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -486,6 +486,7 @@
 #define   PCI_EXP_TYPE_RC_EC	   0xa	/* Root Complex Event Collector */
 #define  PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
 #define  PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
+#define  PCI_EXP_FLAGS_FLIT	0x8000	/* Flit Mode Supported */
 #define PCI_EXP_DEVCAP		0x04	/* Device capabilities */
 #define  PCI_EXP_DEVCAP_PAYLOAD	0x00000007 /* Max_Payload_Size */
 #define  PCI_EXP_DEVCAP_PHANTOM	0x00000018 /* Phantom functions */
@@ -795,6 +796,8 @@
 #define  PCI_ERR_CAP_ECRC_CHKC		0x00000080 /* ECRC Check Capable */
 #define  PCI_ERR_CAP_ECRC_CHKE		0x00000100 /* ECRC Check Enable */
 #define  PCI_ERR_CAP_PREFIX_LOG_PRESENT	0x00000800 /* TLP Prefix Log Present */
+#define  PCI_ERR_CAP_TLP_LOG_FLIT	0x00040000 /* TLP was logged in Flit Mode */
+#define  PCI_ERR_CAP_TLP_LOG_SIZE	0x00f80000 /* Logged TLP Size (only in Flit mode) */
 #define PCI_ERR_HEADER_LOG	0x1c	/* Header Log Register (16 bytes) */
 #define PCI_ERR_ROOT_COMMAND	0x2c	/* Root Error Command */
 #define  PCI_ERR_ROOT_CMD_COR_EN	0x00000001 /* Correctable Err Reporting Enable */
@@ -1013,7 +1016,7 @@
 
 /* Resizable BARs */
 #define PCI_REBAR_CAP		4	/* capability register */
-#define  PCI_REBAR_CAP_SIZES		0x00FFFFF0  /* supported BAR sizes */
+#define  PCI_REBAR_CAP_SIZES		0xFFFFFFF0  /* supported BAR sizes */
 #define PCI_REBAR_CTRL		8	/* control register */
 #define  PCI_REBAR_CTRL_BAR_IDX		0x00000007  /* BAR index */
 #define  PCI_REBAR_CTRL_NBAR_MASK	0x000000E0  /* # of resizable BARs */
@@ -1061,8 +1064,9 @@
 #define  PCI_EXP_DPC_CAP_RP_EXT		0x0020	/* Root Port Extensions */
 #define  PCI_EXP_DPC_CAP_POISONED_TLP	0x0040	/* Poisoned TLP Egress Blocking Supported */
 #define  PCI_EXP_DPC_CAP_SW_TRIGGER	0x0080	/* Software Triggering Supported */
-#define  PCI_EXP_DPC_RP_PIO_LOG_SIZE	0x0F00	/* RP PIO Log Size */
+#define  PCI_EXP_DPC_RP_PIO_LOG_SIZE	0x0F00	/* RP PIO Log Size [3:0] */
 #define  PCI_EXP_DPC_CAP_DL_ACTIVE	0x1000	/* ERR_COR signal on DL_Active supported */
+#define  PCI_EXP_DPC_RP_PIO_LOG_SIZE4	0x2000	/* RP PIO Log Size [4] */
 
 #define PCI_EXP_DPC_CTL			0x06	/* DPC control */
 #define  PCI_EXP_DPC_CTL_EN_FATAL	0x0001	/* Enable trigger on ERR_FATAL message */
@@ -1205,9 +1209,12 @@
 #define PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX		0x000000ff
 #define PCI_DOE_DATA_OBJECT_DISC_REQ_3_VER		0x0000ff00
 #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID		0x0000ffff
-#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL		0x00ff0000
+#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE		0x00ff0000
 #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX	0xff000000
 
+/* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
+#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL		PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
+
 /* Compute Express Link (CXL r3.1, sec 8.1.5) */
 #define PCI_DVSEC_CXL_PORT				3
 #define PCI_DVSEC_CXL_PORT_CTL				0x0c
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index fc594fe5fc..4ddefe25d6 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -70,6 +70,28 @@
 					 * with the same MAC.
 					 */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
+					      * GSO-over-UDP-tunnel packets
+					      */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
+						   * GSO-over-UDP-tunnel
+						   * packets with partial csum
+						   * for the outer header
+						   */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
+					     * GSO-over-UDP-tunnel packets
+					     */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
+						  * GSO-over-UDP-tunnel
+						  * packets with partial csum
+						  * for the outer header
+						  */
+
+/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
+ * features
+ */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
@@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
 #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
 #define VIRTIO_NET_HDR_F_RSC_INFO	4	/* rsc info in csum_ fields */
+#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8	/* UDP tunnel requires csum offload */
 	uint8_t flags;
 #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
 #define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
+				       VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
 	uint8_t gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
@@ -181,6 +208,12 @@ struct virtio_net_hdr_v1_hash {
 	uint16_t padding;
 };
 
+/* This header after hashing information */
+struct virtio_net_hdr_tunnel {
+	__virtio16 outer_th_offset;
+	__virtio16 inner_nh_offset;
+};
+
 #ifndef VIRTIO_NET_NO_LEGACY
 /* This header comes first in the scatter-gather list.
  * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
@@ -327,6 +360,19 @@ struct virtio_net_rss_config {
 	uint8_t hash_key_data[/* hash_key_length */];
 };
 
+struct virtio_net_rss_config_hdr {
+	uint32_t hash_types;
+	uint16_t indirection_table_mask;
+	uint16_t unclassified_queue;
+	uint16_t indirection_table[/* 1 + indirection_table_mask */];
+};
+
+struct virtio_net_rss_config_trailer {
+	uint16_t max_tx_vq;
+	uint8_t hash_key_length;
+	uint8_t hash_key_data[/* hash_key_length */];
+};
+
  #define VIRTIO_NET_CTRL_MQ_RSS_CONFIG          1
 
 /*
diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h
index 91fec6f502..09e964e6ee 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -246,6 +246,7 @@ struct virtio_pci_cfg_cap {
 #define VIRTIO_ADMIN_CMD_LIST_USE	0x1
 
 /* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SELF	0x0
 #define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
 
 /* Transitional device admin command. */
diff --git a/include/standard-headers/linux/virtio_snd.h b/include/standard-headers/linux/virtio_snd.h
index 860f12e0a4..160d57899f 100644
--- a/include/standard-headers/linux/virtio_snd.h
+++ b/include/standard-headers/linux/virtio_snd.h
@@ -25,7 +25,7 @@ struct virtio_snd_config {
 	uint32_t streams;
 	/* # of available channel maps */
 	uint32_t chmaps;
-	/* # of available control elements */
+	/* # of available control elements (if VIRTIO_SND_F_CTLS) */
 	uint32_t controls;
 };
 
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index ec1e82bdc8..4e6aff08df 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -105,6 +105,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
 #define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
+#define KVM_ARM_VCPU_HAS_EL2_E2H0	8 /* Limit NV support to E2H RES0 */
 
 struct kvm_vcpu_init {
 	__u32 target;
@@ -365,6 +366,7 @@ enum {
 	KVM_REG_ARM_STD_HYP_BIT_PV_TIME	= 0,
 };
 
+/* Vendor hyper call function numbers 0-63 */
 #define KVM_REG_ARM_VENDOR_HYP_BMAP		KVM_REG_ARM_FW_FEAT_BMAP_REG(2)
 
 enum {
@@ -372,6 +374,14 @@ enum {
 	KVM_REG_ARM_VENDOR_HYP_BIT_PTP		= 1,
 };
 
+/* Vendor hyper call function numbers 64-127 */
+#define KVM_REG_ARM_VENDOR_HYP_BMAP_2		KVM_REG_ARM_FW_FEAT_BMAP_REG(3)
+
+enum {
+	KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_VER	= 0,
+	KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_CPUS	= 1,
+};
+
 /* Device Control API on vm fd */
 #define KVM_ARM_VM_SMCCC_CTRL		0
 #define   KVM_ARM_VM_SMCCC_FILTER	0
@@ -394,6 +404,7 @@ enum {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/linux-headers/asm-arm64/unistd_64.h b/linux-headers/asm-arm64/unistd_64.h
index d4e90fff76..ee9aaebdf3 100644
--- a/linux-headers/asm-arm64/unistd_64.h
+++ b/linux-headers/asm-arm64/unistd_64.h
@@ -323,6 +323,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
index 1ea2c4c33b..ef1c27fa3c 100644
--- a/linux-headers/asm-generic/mman-common.h
+++ b/linux-headers/asm-generic/mman-common.h
@@ -85,6 +85,7 @@
 /* compatibility flags */
 #define MAP_FILE	0
 
+#define PKEY_UNRESTRICTED	0x0
 #define PKEY_DISABLE_ACCESS	0x1
 #define PKEY_DISABLE_WRITE	0x2
 #define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index 88dc393c2b..2892a45023 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -849,9 +849,11 @@ __SYSCALL(__NR_getxattrat, sys_getxattrat)
 __SYSCALL(__NR_listxattrat, sys_listxattrat)
 #define __NR_removexattrat 466
 __SYSCALL(__NR_removexattrat, sys_removexattrat)
+#define __NR_open_tree_attr 467
+__SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
 
 #undef __NR_syscalls
-#define __NR_syscalls 467
+#define __NR_syscalls 468
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-loongarch/unistd_64.h b/linux-headers/asm-loongarch/unistd_64.h
index 23fb96a8a7..50d22df8f7 100644
--- a/linux-headers/asm-loongarch/unistd_64.h
+++ b/linux-headers/asm-loongarch/unistd_64.h
@@ -319,6 +319,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
index 9a75719644..bdcc2f460b 100644
--- a/linux-headers/asm-mips/unistd_n32.h
+++ b/linux-headers/asm-mips/unistd_n32.h
@@ -395,5 +395,6 @@
 #define __NR_getxattrat (__NR_Linux + 464)
 #define __NR_listxattrat (__NR_Linux + 465)
 #define __NR_removexattrat (__NR_Linux + 466)
+#define __NR_open_tree_attr (__NR_Linux + 467)
 
 #endif /* _ASM_UNISTD_N32_H */
diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
index 7086783b0c..3b6b0193b6 100644
--- a/linux-headers/asm-mips/unistd_n64.h
+++ b/linux-headers/asm-mips/unistd_n64.h
@@ -371,5 +371,6 @@
 #define __NR_getxattrat (__NR_Linux + 464)
 #define __NR_listxattrat (__NR_Linux + 465)
 #define __NR_removexattrat (__NR_Linux + 466)
+#define __NR_open_tree_attr (__NR_Linux + 467)
 
 #endif /* _ASM_UNISTD_N64_H */
diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
index b3825823e4..4609a4b4d3 100644
--- a/linux-headers/asm-mips/unistd_o32.h
+++ b/linux-headers/asm-mips/unistd_o32.h
@@ -441,5 +441,6 @@
 #define __NR_getxattrat (__NR_Linux + 464)
 #define __NR_listxattrat (__NR_Linux + 465)
 #define __NR_removexattrat (__NR_Linux + 466)
+#define __NR_open_tree_attr (__NR_Linux + 467)
 
 #endif /* _ASM_UNISTD_O32_H */
diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
index 38ee4dc35d..5d38a427e0 100644
--- a/linux-headers/asm-powerpc/unistd_32.h
+++ b/linux-headers/asm-powerpc/unistd_32.h
@@ -448,6 +448,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
index 5e5f156834..860a488e4d 100644
--- a/linux-headers/asm-powerpc/unistd_64.h
+++ b/linux-headers/asm-powerpc/unistd_64.h
@@ -420,6 +420,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
index f06bc5efcd..5f59fd226c 100644
--- a/linux-headers/asm-riscv/kvm.h
+++ b/linux-headers/asm-riscv/kvm.h
@@ -182,6 +182,8 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_SVVPTC,
 	KVM_RISCV_ISA_EXT_ZABHA,
 	KVM_RISCV_ISA_EXT_ZICCRSE,
+	KVM_RISCV_ISA_EXT_ZAAMO,
+	KVM_RISCV_ISA_EXT_ZALRSC,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/linux-headers/asm-riscv/unistd_32.h b/linux-headers/asm-riscv/unistd_32.h
index 74f6127aed..a5e769f1d9 100644
--- a/linux-headers/asm-riscv/unistd_32.h
+++ b/linux-headers/asm-riscv/unistd_32.h
@@ -314,6 +314,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-riscv/unistd_64.h b/linux-headers/asm-riscv/unistd_64.h
index bb6a15a2ec..8df4d64841 100644
--- a/linux-headers/asm-riscv/unistd_64.h
+++ b/linux-headers/asm-riscv/unistd_64.h
@@ -324,6 +324,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
index 620201cb36..85eedbd18e 100644
--- a/linux-headers/asm-s390/unistd_32.h
+++ b/linux-headers/asm-s390/unistd_32.h
@@ -439,5 +439,6 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 #endif /* _ASM_S390_UNISTD_32_H */
diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
index e7e4a10aaf..c03b1b9701 100644
--- a/linux-headers/asm-s390/unistd_64.h
+++ b/linux-headers/asm-s390/unistd_64.h
@@ -387,5 +387,6 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 #endif /* _ASM_S390_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 86f2c34e7a..dc591fb17e 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -557,6 +557,9 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
 #define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA	(1 << 8)
 
+#define KVM_XEN_MSR_MIN_INDEX			0x40000000u
+#define KVM_XEN_MSR_MAX_INDEX			0x4fffffffu
+
 struct kvm_xen_hvm_config {
 	__u32 flags;
 	__u32 msr;
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index a2eb492a75..491d6b4eb6 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -457,6 +457,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index 2f5fc400f5..7cf88bf9bd 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -380,6 +380,7 @@
 #define __NR_getxattrat 464
 #define __NR_listxattrat 465
 #define __NR_removexattrat 466
+#define __NR_open_tree_attr 467
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index fecd832e7f..82959111e6 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -333,6 +333,7 @@
 #define __NR_getxattrat (__X32_SYSCALL_BIT + 464)
 #define __NR_listxattrat (__X32_SYSCALL_BIT + 465)
 #define __NR_removexattrat (__X32_SYSCALL_BIT + 466)
+#define __NR_open_tree_attr (__X32_SYSCALL_BIT + 467)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/bits.h b/linux-headers/linux/bits.h
index c0d00c0a98..58596d18f4 100644
--- a/linux-headers/linux/bits.h
+++ b/linux-headers/linux/bits.h
@@ -4,13 +4,9 @@
 #ifndef _LINUX_BITS_H
 #define _LINUX_BITS_H
 
-#define __GENMASK(h, l) \
-        (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
-         (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
+#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
 
-#define __GENMASK_ULL(h, l) \
-        (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
-         (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
+#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
 
 #define __GENMASK_U128(h, l) \
 	((_BIT128((h)) << 1) - (_BIT128(l)))
diff --git a/linux-headers/linux/const.h b/linux-headers/linux/const.h
index 2122610de7..95ede23342 100644
--- a/linux-headers/linux/const.h
+++ b/linux-headers/linux/const.h
@@ -33,7 +33,7 @@
  * Missing __asm__ support
  *
  * __BIT128() would not work in the __asm__ code, as it shifts an
- * 'unsigned __init128' data type as direct representation of
+ * 'unsigned __int128' data type as direct representation of
  * 128 bit constants is not supported in the gcc compiler, as
  * they get silently truncated.
  *
diff --git a/linux-headers/linux/iommufd.h b/linux-headers/linux/iommufd.h
index ccbdca5e11..cb0f7d6b4d 100644
--- a/linux-headers/linux/iommufd.h
+++ b/linux-headers/linux/iommufd.h
@@ -55,6 +55,7 @@ enum {
 	IOMMUFD_CMD_VIOMMU_ALLOC = 0x90,
 	IOMMUFD_CMD_VDEVICE_ALLOC = 0x91,
 	IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
+	IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
 };
 
 /**
@@ -392,6 +393,9 @@ struct iommu_vfio_ioas {
  *                          Any domain attached to the non-PASID part of the
  *                          device must also be flagged, otherwise attaching a
  *                          PASID will blocked.
+ *                          For the user that wants to attach PASID, ioas is
+ *                          not recommended for both the non-PASID part
+ *                          and PASID part of the device.
  *                          If IOMMU does not support PASID it will return
  *                          error (-EOPNOTSUPP).
  */
@@ -608,9 +612,17 @@ enum iommu_hw_info_type {
  *                                   IOMMU_HWPT_GET_DIRTY_BITMAP
  *                                   IOMMU_HWPT_SET_DIRTY_TRACKING
  *
+ * @IOMMU_HW_CAP_PCI_PASID_EXEC: Execute Permission Supported, user ignores it
+ *                               when the struct
+ *                               iommu_hw_info::out_max_pasid_log2 is zero.
+ * @IOMMU_HW_CAP_PCI_PASID_PRIV: Privileged Mode Supported, user ignores it
+ *                               when the struct
+ *                               iommu_hw_info::out_max_pasid_log2 is zero.
  */
 enum iommufd_hw_capabilities {
 	IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0,
+	IOMMU_HW_CAP_PCI_PASID_EXEC = 1 << 1,
+	IOMMU_HW_CAP_PCI_PASID_PRIV = 1 << 2,
 };
 
 /**
@@ -626,6 +638,9 @@ enum iommufd_hw_capabilities {
  *                 iommu_hw_info_type.
  * @out_capabilities: Output the generic iommu capability info type as defined
  *                    in the enum iommu_hw_capabilities.
+ * @out_max_pasid_log2: Output the width of PASIDs. 0 means no PASID support.
+ *                      PCI devices turn to out_capabilities to check if the
+ *                      specific capabilities is supported or not.
  * @__reserved: Must be 0
  *
  * Query an iommu type specific hardware information data from an iommu behind
@@ -649,7 +664,8 @@ struct iommu_hw_info {
 	__u32 data_len;
 	__aligned_u64 data_uptr;
 	__u32 out_data_type;
-	__u32 __reserved;
+	__u8 out_max_pasid_log2;
+	__u8 __reserved[3];
 	__aligned_u64 out_capabilities;
 };
 #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
@@ -1014,4 +1030,115 @@ struct iommu_ioas_change_process {
 #define IOMMU_IOAS_CHANGE_PROCESS \
 	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS)
 
+/**
+ * enum iommu_veventq_flag - flag for struct iommufd_vevent_header
+ * @IOMMU_VEVENTQ_FLAG_LOST_EVENTS: vEVENTQ has lost vEVENTs
+ */
+enum iommu_veventq_flag {
+	IOMMU_VEVENTQ_FLAG_LOST_EVENTS = (1U << 0),
+};
+
+/**
+ * struct iommufd_vevent_header - Virtual Event Header for a vEVENTQ Status
+ * @flags: Combination of enum iommu_veventq_flag
+ * @sequence: The sequence index of a vEVENT in the vEVENTQ, with a range of
+ *            [0, INT_MAX] where the following index of INT_MAX is 0
+ *
+ * Each iommufd_vevent_header reports a sequence index of the following vEVENT:
+ *
+ * +----------------------+-------+----------------------+-------+---+-------+
+ * | header0 {sequence=0} | data0 | header1 {sequence=1} | data1 |...| dataN |
+ * +----------------------+-------+----------------------+-------+---+-------+
+ *
+ * And this sequence index is expected to be monotonic to the sequence index of
+ * the previous vEVENT. If two adjacent sequence indexes has a delta larger than
+ * 1, it means that delta - 1 number of vEVENTs has lost, e.g. two lost vEVENTs:
+ *
+ * +-----+----------------------+-------+----------------------+-------+-----+
+ * | ... | header3 {sequence=3} | data3 | header6 {sequence=6} | data6 | ... |
+ * +-----+----------------------+-------+----------------------+-------+-----+
+ *
+ * If a vEVENT lost at the tail of the vEVENTQ and there is no following vEVENT
+ * providing the next sequence index, an IOMMU_VEVENTQ_FLAG_LOST_EVENTS header
+ * would be added to the tail, and no data would follow this header:
+ *
+ * +--+----------------------+-------+-----------------------------------------+
+ * |..| header3 {sequence=3} | data3 | header4 {flags=LOST_EVENTS, sequence=4} |
+ * +--+----------------------+-------+-----------------------------------------+
+ */
+struct iommufd_vevent_header {
+	__u32 flags;
+	__u32 sequence;
+};
+
+/**
+ * enum iommu_veventq_type - Virtual Event Queue Type
+ * @IOMMU_VEVENTQ_TYPE_DEFAULT: Reserved for future use
+ * @IOMMU_VEVENTQ_TYPE_ARM_SMMUV3: ARM SMMUv3 Virtual Event Queue
+ */
+enum iommu_veventq_type {
+	IOMMU_VEVENTQ_TYPE_DEFAULT = 0,
+	IOMMU_VEVENTQ_TYPE_ARM_SMMUV3 = 1,
+};
+
+/**
+ * struct iommu_vevent_arm_smmuv3 - ARM SMMUv3 Virtual Event
+ *                                  (IOMMU_VEVENTQ_TYPE_ARM_SMMUV3)
+ * @evt: 256-bit ARM SMMUv3 Event record, little-endian.
+ *       Reported event records: (Refer to "7.3 Event records" in SMMUv3 HW Spec)
+ *       - 0x04 C_BAD_STE
+ *       - 0x06 F_STREAM_DISABLED
+ *       - 0x08 C_BAD_SUBSTREAMID
+ *       - 0x0a C_BAD_CD
+ *       - 0x10 F_TRANSLATION
+ *       - 0x11 F_ADDR_SIZE
+ *       - 0x12 F_ACCESS
+ *       - 0x13 F_PERMISSION
+ *
+ * StreamID field reports a virtual device ID. To receive a virtual event for a
+ * device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC.
+ */
+struct iommu_vevent_arm_smmuv3 {
+	__aligned_le64 evt[4];
+};
+
+/**
+ * struct iommu_veventq_alloc - ioctl(IOMMU_VEVENTQ_ALLOC)
+ * @size: sizeof(struct iommu_veventq_alloc)
+ * @flags: Must be 0
+ * @viommu_id: virtual IOMMU ID to associate the vEVENTQ with
+ * @type: Type of the vEVENTQ. Must be defined in enum iommu_veventq_type
+ * @veventq_depth: Maximum number of events in the vEVENTQ
+ * @out_veventq_id: The ID of the new vEVENTQ
+ * @out_veventq_fd: The fd of the new vEVENTQ. User space must close the
+ *                  successfully returned fd after using it
+ * @__reserved: Must be 0
+ *
+ * Explicitly allocate a virtual event queue interface for a vIOMMU. A vIOMMU
+ * can have multiple FDs for different types, but is confined to one per @type.
+ * User space should open the @out_veventq_fd to read vEVENTs out of a vEVENTQ,
+ * if there are vEVENTs available. A vEVENTQ will lose events due to overflow,
+ * if the number of the vEVENTs hits @veventq_depth.
+ *
+ * Each vEVENT in a vEVENTQ encloses a struct iommufd_vevent_header followed by
+ * a type-specific data structure, in a normal case:
+ *
+ * +-+---------+-------+---------+-------+-----+---------+-------+-+
+ * | | header0 | data0 | header1 | data1 | ... | headerN | dataN | |
+ * +-+---------+-------+---------+-------+-----+---------+-------+-+
+ *
+ * unless a tailing IOMMU_VEVENTQ_FLAG_LOST_EVENTS header is logged (refer to
+ * struct iommufd_vevent_header).
+ */
+struct iommu_veventq_alloc {
+	__u32 size;
+	__u32 flags;
+	__u32 viommu_id;
+	__u32 type;
+	__u32 veventq_depth;
+	__u32 out_veventq_id;
+	__u32 out_veventq_fd;
+	__u32 __reserved;
+};
+#define IOMMU_VEVENTQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VEVENTQ_ALLOC)
 #endif
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 27181b3dd8..e5f3e8b5a0 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -921,6 +921,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_PRE_FAULT_MEMORY 236
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
index 17bf191573..113c4ceb78 100644
--- a/linux-headers/linux/psp-sev.h
+++ b/linux-headers/linux/psp-sev.h
@@ -73,13 +73,20 @@ typedef enum {
 	SEV_RET_INVALID_PARAM,
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
-	SEV_RET_INVALID_KEY = 0x27,
-	SEV_RET_INVALID_PAGE_SIZE,
-	SEV_RET_INVALID_PAGE_STATE,
-	SEV_RET_INVALID_MDATA_ENTRY,
-	SEV_RET_INVALID_PAGE_OWNER,
-	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
-	SEV_RET_RMP_INIT_REQUIRED,
+	SEV_RET_INVALID_PAGE_SIZE          = 0x0019,
+	SEV_RET_INVALID_PAGE_STATE         = 0x001A,
+	SEV_RET_INVALID_MDATA_ENTRY        = 0x001B,
+	SEV_RET_INVALID_PAGE_OWNER         = 0x001C,
+	SEV_RET_AEAD_OFLOW                 = 0x001D,
+	SEV_RET_EXIT_RING_BUFFER           = 0x001F,
+	SEV_RET_RMP_INIT_REQUIRED          = 0x0020,
+	SEV_RET_BAD_SVN                    = 0x0021,
+	SEV_RET_BAD_VERSION                = 0x0022,
+	SEV_RET_SHUTDOWN_REQUIRED          = 0x0023,
+	SEV_RET_UPDATE_FAILED              = 0x0024,
+	SEV_RET_RESTORE_REQUIRED           = 0x0025,
+	SEV_RET_RMP_INITIALIZATION_FAILED  = 0x0026,
+	SEV_RET_INVALID_KEY                = 0x0027,
 	SEV_RET_MAX,
 } sev_ret_code;
 
diff --git a/linux-headers/linux/stddef.h b/linux-headers/linux/stddef.h
index e1416f7937..e1fcfcf3b3 100644
--- a/linux-headers/linux/stddef.h
+++ b/linux-headers/linux/stddef.h
@@ -70,4 +70,6 @@
 #define __counted_by_be(m)
 #endif
 
+#define __kernel_nonstring
+
 #endif /* _LINUX_STDDEF_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 1b5e254d6a..79bf8c0cc5 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -671,6 +671,7 @@ enum {
  */
 enum {
 	VFIO_AP_REQ_IRQ_INDEX,
+	VFIO_AP_CFG_CHG_IRQ_INDEX,
 	VFIO_AP_NUM_IRQS
 };
 
@@ -931,29 +932,34 @@ struct vfio_device_bind_iommufd {
  * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 19,
  *					struct vfio_device_attach_iommufd_pt)
  * @argsz:	User filled size of this data.
- * @flags:	Must be 0.
+ * @flags:	Flags for attach.
  * @pt_id:	Input the target id which can represent an ioas or a hwpt
  *		allocated via iommufd subsystem.
  *		Output the input ioas id or the attached hwpt id which could
  *		be the specified hwpt itself or a hwpt automatically created
  *		for the specified ioas by kernel during the attachment.
+ * @pasid:	The pasid to be attached, only meaningful when
+ *		VFIO_DEVICE_ATTACH_PASID is set in @flags
  *
  * Associate the device with an address space within the bound iommufd.
  * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.  This is only
  * allowed on cdev fds.
  *
- * If a vfio device is currently attached to a valid hw_pagetable, without doing
- * a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl
- * passing in another hw_pagetable (hwpt) id is allowed. This action, also known
- * as a hw_pagetable replacement, will replace the device's currently attached
- * hw_pagetable with a new hw_pagetable corresponding to the given pt_id.
+ * If a vfio device or a pasid of this device is currently attached to a valid
+ * hw_pagetable (hwpt), without doing a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl passing in another hwpt id is allowed.
+ * This action, also known as a hw_pagetable replacement, will replace the
+ * currently attached hwpt of the device or the pasid of this device with a new
+ * hwpt corresponding to the given pt_id.
  *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_device_attach_iommufd_pt {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_DEVICE_ATTACH_PASID	(1 << 0)
 	__u32	pt_id;
+	__u32	pasid;
 };
 
 #define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 19)
@@ -962,17 +968,21 @@ struct vfio_device_attach_iommufd_pt {
  * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
  *					struct vfio_device_detach_iommufd_pt)
  * @argsz:	User filled size of this data.
- * @flags:	Must be 0.
+ * @flags:	Flags for detach.
+ * @pasid:	The pasid to be detached, only meaningful when
+ *		VFIO_DEVICE_DETACH_PASID is set in @flags
  *
- * Remove the association of the device and its current associated address
- * space.  After it, the device should be in a blocking DMA state.  This is only
- * allowed on cdev fds.
+ * Remove the association of the device or a pasid of the device and its current
+ * associated address space.  After it, the device or the pasid should be in a
+ * blocking DMA state.  This is only allowed on cdev fds.
  *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_device_detach_iommufd_pt {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_DEVICE_DETACH_PASID	(1 << 0)
+	__u32	pasid;
 };
 
 #define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index b95dd84eef..328e81badf 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -28,10 +28,10 @@
 
 /* Set current process as the (exclusive) owner of this file descriptor.  This
  * must be called before any other vhost command.  Further calls to
- * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */
+ * VHOST_SET_OWNER fail until VHOST_RESET_OWNER is called. */
 #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
 /* Give up ownership, and reset the device to default values.
- * Allows subsequent call to VHOST_OWNER_SET to succeed. */
+ * Allows subsequent call to VHOST_SET_OWNER to succeed. */
 #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
 
 /* Set up/modify memory layout */
@@ -235,4 +235,12 @@
  */
 #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
 					      struct vhost_vring_state)
+
+/* Extended features manipulation
+ */
+#ifdef __SIZEOF_INT128__
+#define VHOST_GET_FEATURES_EX  _IOR(VHOST_VIRTIO, 0x83, __u128)
+#define VHOST_SET_FEATURES_EX  _IOW(VHOST_VIRTIO, 0x83, __u128)
+#endif
+
 #endif
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 02/16] migration: introduce support for 128 bit int state.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next Paolo Abeni
@ 2025-05-21 11:33 ` Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 03/16] virtio: introduce extended features type Paolo Abeni
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio net driver.

For platform natively supporting 128 bits, introduce a 128 bit integer
state.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/migration/qemu-file-types.h | 15 +++++++++++++++
 include/migration/vmstate.h         | 11 +++++++++++
 migration/qemu-file.c               | 16 ++++++++++++++++
 migration/vmstate-types.c           | 25 +++++++++++++++++++++++++
 4 files changed, 67 insertions(+)

diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h
index adec5abc07..094ace5bb2 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -92,6 +92,21 @@ static inline void qemu_get_8s(QEMUFile *f, uint8_t *pv)
     *pv = qemu_get_byte(f);
 }
 
+#ifdef CONFIG_INT128
+void qemu_put_be128(QEMUFile *f, __uint128_t v);
+__uint128_t qemu_get_be128(QEMUFile *f);
+
+static inline void qemu_put_be128s(QEMUFile *f, const __uint128_t *pv)
+{
+    qemu_put_be128(f, *pv);
+}
+
+static inline void qemu_get_be128s(QEMUFile *f, __uint128_t *pv)
+{
+    *pv = qemu_get_be128(f);
+}
+#endif
+
 /* Signed versions for type safety */
 static inline void qemu_put_sbe16(QEMUFile *f, int v)
 {
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index a1dfab4460..9695d4ba06 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -903,6 +903,17 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64_V(_f, _s, _v)                                  \
     VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
 
+#ifdef CONFIG_INT128
+#define VMSTATE_UINT128_V(_f, _s, _v)                                 \
+    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint128, __uint128_t)
+#define VMSTATE_UINT128(_f, _s)                                       \
+    VMSTATE_UINT128_V(_f, _s, 0)
+#define VMSTATE_UINT128_TEST(_f, _s, _t)                              \
+    VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_int128, __int128_t)
+
+extern const VMStateInfo vmstate_info_uint128;
+#endif
+
 #define VMSTATE_FD_V(_f, _s, _v)                                  \
     VMSTATE_SINGLE(_f, _s, _v, vmstate_info_fd, int32_t)
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6ac190034..3dc7645d3e 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -825,6 +825,22 @@ void qemu_put_be64(QEMUFile *f, uint64_t v)
     qemu_put_be32(f, v);
 }
 
+#ifdef CONFIG_INT128
+void qemu_put_be128(QEMUFile *f, __uint128_t v)
+{
+    qemu_put_be64(f, v >> 64);
+    qemu_put_be64(f, v);
+}
+
+__uint128_t qemu_get_be128(QEMUFile *f)
+{
+    __uint128_t v;
+    v = (__uint128_t)qemu_get_be64(f) << 64;
+    v |= qemu_get_be64(f);
+    return v;
+}
+#endif
+
 unsigned int qemu_get_be16(QEMUFile *f)
 {
     unsigned int v;
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index 741a588b7e..120ea1f9cd 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -315,6 +315,31 @@ const VMStateInfo vmstate_info_uint64 = {
     .put  = put_uint64,
 };
 
+/* 128 bit unsigned int */
+#ifdef CONFIG_INT128
+static int get_uint128(QEMUFile *f, void *pv, size_t size,
+                       const VMStateField *field)
+{
+    __uint128_t *v = pv;
+    qemu_get_be128s(f, v);
+    return 0;
+}
+
+static int put_uint128(QEMUFile *f, void *pv, size_t size,
+                       const VMStateField *field, JSONWriter *vmdesc)
+{
+    __uint128_t *v = pv;
+    qemu_put_be128s(f, v);
+    return 0;
+}
+
+const VMStateInfo vmstate_info_uint128 = {
+    .name = "uint128",
+    .get  = get_uint128,
+    .put  = put_uint128,
+};
+#endif
+
 /* File descriptor communicated via SCM_RIGHTS */
 
 static int get_fd(QEMUFile *f, void *pv, size_t size,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 03/16] virtio: introduce extended features type
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 02/16] migration: introduce support for 128 bit int state Paolo Abeni
@ 2025-05-21 11:33 ` Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 04/16] virtio: serialize extended features state Paolo Abeni
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio net driver.

Introduce a specific type to represent the virtio features bitmask.
On platform where 128 bits integer are available use such wide int
for the features bitmask, otherwise maintain the current u64.

Most drivers will keep using only 64 bits features space; use union
to allow them access the lower part of the extended space without any
per driver change, but let the features field initializers set the
extended space.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/virtio-net.c                 |  2 +-
 hw/virtio/virtio-bus.c              |  4 +-
 hw/virtio/virtio.c                  |  4 +-
 include/hw/virtio/virtio-features.h | 90 +++++++++++++++++++++++++++++
 include/hw/virtio/virtio.h          |  9 +--
 5 files changed, 100 insertions(+), 9 deletions(-)
 create mode 100644 include/hw/virtio/virtio-features.h

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 2de037c273..9f500c64e7 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -799,7 +799,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
         virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
     }
     features = vhost_net_get_features(get_vhost_net(nc->peer), features);
-    vdev->backend_features = features;
+    vdev->backend_features_ex = features;
 
     if (n->mtu_bypass_backend &&
             (n->host_features & 1ULL << VIRTIO_NET_F_MTU)) {
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 11adfbf3ab..9b84ead831 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -63,8 +63,8 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
 
     /* Get the features of the plugged device. */
     assert(vdc->get_features != NULL);
-    vdev->host_features = vdc->get_features(vdev, vdev->host_features,
-                                            &local_err);
+    vdev->host_features_ex = vdc->get_features(vdev, vdev->host_features,
+                                               &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 480c2e5036..701f59884d 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2346,7 +2346,7 @@ void virtio_reset(void *opaque)
     vdev->start_on_kick = false;
     vdev->started = false;
     vdev->broken = false;
-    vdev->guest_features = 0;
+    vdev->guest_features_ex = 0;
     vdev->queue_sel = 0;
     vdev->status = 0;
     vdev->disabled = false;
@@ -3239,7 +3239,7 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
      * Note: devices should always test host features in future - don't create
      * new dependencies like this.
      */
-    vdev->guest_features = features;
+    vdev->guest_features_ex = features;
 
     config_len = qemu_get_be32(f);
 
diff --git a/include/hw/virtio/virtio-features.h b/include/hw/virtio/virtio-features.h
new file mode 100644
index 0000000000..a0a115cd66
--- /dev/null
+++ b/include/hw/virtio/virtio-features.h
@@ -0,0 +1,90 @@
+/*
+ * Virtio features helpers
+ *
+ * Copyright 2025 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef _QEMU_VIRTIO_FEATURES_H
+#define _QEMU_VIRTIO_FEATURES_H
+
+#define VIRTIO_FEATURES_FMT             "%016"PRIx64"%016"PRIx64
+
+#ifdef CONFIG_INT128
+#define VIRTIO_BIT(b)              ((__int128_t)1 << b)
+#define VIRTIO_FEATURES_WORDS      4
+#define VIRTIO_FEATURES_HI(f)      ((uint64_t)((f) >> 64))
+#define VIRTIO_FEATURES_LOW(f)     ((uint64_t)(f))
+
+typedef __uint128_t virtio_features_t;
+
+#if HOST_BIG_ENDIAN
+#define DECLARE_FEATURES(name)      \
+    union {                         \
+        struct {                    \
+            uint64_t name##_hi;     \
+            uint64_t name;          \
+        };                          \
+        __uint128_t  name##_ex;     \
+    }
+#else
+#define DECLARE_FEATURES(name)      \
+    union {                         \
+        struct {                    \
+            uint64_t name;          \
+            uint64_t name##_hi;     \
+        };                          \
+        __uint128_t  name##_ex;     \
+    }
+#endif
+
+static inline void virtio_add_feature_ex(__uint128_t *features,
+                                         unsigned int fbit)
+{
+    assert(fbit < 128);
+    *features |= VIRTIO_BIT(fbit);
+}
+
+static inline void virtio_clear_feature_ex(__uint128_t *features,
+                                           unsigned int fbit)
+{
+    assert(fbit < 128);
+    *features &= ~VIRTIO_BIT(fbit);
+}
+
+static inline bool virtio_has_feature_ex(__uint128_t features,
+                                         unsigned int fbit)
+{
+    assert(fbit < 128);
+    return !!(features & VIRTIO_BIT(fbit));
+}
+
+#else /* !CONFIG_INT128 */
+
+#define VIRTIO_BIT(b)              (1ULL << b)
+#define VIRTIO_FEATURES_WORDS      2
+#define VIRTIO_FEATURES_HI(f)      0
+#define VIRTIO_FEATURES_LOW(f)     f
+
+typedef uint64_t virtio_features_t;
+
+/*
+ * Without 128 bits support, 'features_ex' is just an alias for the 64 bits
+ * variable. This help avoiding conditionals in the core virtio code
+ * manipulation the features
+ */
+#define DECLARE_FEATURES(name)      \
+    union {                         \
+        uint64_t name;              \
+        uint64_t name##_ex;         \
+    }
+
+#define virtio_clear_feature_ex virtio_clear_feature
+#define virtio_add_feature_ex virtio_add_feature
+#define virtio_has_feature_ex virtio_has_feature
+
+#endif
+
+#endif
+
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7e0c471ea4..82ff6c1630 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -16,6 +16,7 @@
 
 #include "system/memory.h"
 #include "hw/qdev-core.h"
+#include "hw/virtio/virtio-features.h"
 #include "net/net.h"
 #include "migration/vmstate.h"
 #include "qemu/event_notifier.h"
@@ -121,9 +122,9 @@ struct VirtIODevice
      * backend (e.g. vhost) and could potentially be a subset of the
      * total feature set offered by QEMU.
      */
-    uint64_t host_features;
-    uint64_t guest_features;
-    uint64_t backend_features;
+    DECLARE_FEATURES(host_features);
+    DECLARE_FEATURES(guest_features);
+    DECLARE_FEATURES(backend_features);
 
     size_t config_len;
     void *config;
@@ -195,7 +196,7 @@ struct VirtioDeviceClass {
      * that are only exposed on the legacy interface but not
      * the modern one.
      */
-    uint64_t legacy_features;
+    virtio_features_t legacy_features;
     /* Test and clear event pending status.
      * Should be called after unmask to avoid losing events.
      * If backend does not support masking,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 04/16] virtio: serialize extended features state
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-05-21 11:33 ` [PATCH RFC 03/16] virtio: introduce extended features type Paolo Abeni
@ 2025-05-21 11:33 ` Paolo Abeni
  2025-05-21 11:33 ` [PATCH RFC 05/16] qmp: update virtio features map to support extended features Paolo Abeni
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

If the host supports 128 bit-wide features, and the driver
use any of them, serialize the full features range leveraging
newly introduced 128bits integer helpers.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/virtio/virtio.c | 76 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 701f59884d..ef15a1835e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2982,6 +2982,26 @@ static const VMStateDescription vmstate_virtio_disabled = {
     }
 };
 
+#ifdef CONFIG_INT128
+static bool virtio_128bit_features_needed(void *opaque)
+{
+    VirtIODevice *vdev = opaque;
+
+    return (vdev->host_features_ex >> 64) != 0;
+}
+
+static const VMStateDescription vmstate_virtio_128bit_features = {
+    .name = "virtio/128bit_features",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = &virtio_128bit_features_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT128(guest_features_ex, VirtIODevice),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif
+
 static const VMStateDescription vmstate_virtio = {
     .name = "virtio",
     .version_id = 1,
@@ -2991,6 +3011,9 @@ static const VMStateDescription vmstate_virtio = {
     },
     .subsections = (const VMStateDescription * const []) {
         &vmstate_virtio_device_endian,
+#ifdef CONFIG_INT128
+        &vmstate_virtio_128bit_features,
+#endif
         &vmstate_virtio_64bit_features,
         &vmstate_virtio_virtqueues,
         &vmstate_virtio_ringsize,
@@ -3087,7 +3110,8 @@ const VMStateInfo  virtio_vmstate_info = {
     .put = virtio_device_put,
 };
 
-static int virtio_set_features_nocheck(VirtIODevice *vdev, uint64_t val)
+static int virtio_set_features_nocheck(VirtIODevice *vdev,
+                                       virtio_features_t val)
 {
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
     bool bad = (val & ~(vdev->host_features)) != 0;
@@ -3133,6 +3157,42 @@ virtio_set_features_nocheck_maybe_co(VirtIODevice *vdev, uint64_t val)
     }
 }
 
+#ifdef CONFIG_INT128
+typedef struct VirtioSetFeaturesExNocheckData {
+    Coroutine *co;
+    VirtIODevice *vdev;
+    __uint128_t val;
+    int ret;
+} VirtioSetFeaturesExNocheckData;
+
+static void virtio_set_features_ex_nocheck_bh(void *opaque)
+{
+    VirtioSetFeaturesExNocheckData *data = opaque;
+
+    data->ret = virtio_set_features_nocheck(data->vdev, data->val);
+    aio_co_wake(data->co);
+}
+
+static int coroutine_mixed_fn
+virtio_set_features_ex_nocheck_maybe_co(VirtIODevice *vdev, __uint128_t val)
+{
+    if (qemu_in_coroutine()) {
+        VirtioSetFeaturesExNocheckData data = {
+            .co = qemu_coroutine_self(),
+            .vdev = vdev,
+            .val = val,
+        };
+        aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
+                                virtio_set_features_ex_nocheck_bh, &data);
+        qemu_coroutine_yield();
+        return data.ret;
+    } else {
+        return virtio_set_features_nocheck(vdev, val);
+    }
+}
+
+#endif
+
 int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 {
     int ret;
@@ -3318,6 +3378,20 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
         vdev->device_endian = virtio_default_endian();
     }
 
+#ifdef CONFIG_INT128
+    if (virtio_128bit_features_needed(vdev)) {
+        __int128_t features128 = vdev->guest_features_ex;
+        if (virtio_set_features_ex_nocheck_maybe_co(vdev, features128) < 0) {
+            error_report("Features 0x" VIRTIO_FEATURES_FMT " unsupported. "
+                         "Allowed features: 0x" VIRTIO_FEATURES_FMT,
+                         VIRTIO_FEATURES_HI(features128),
+                         VIRTIO_FEATURES_LOW(features128),
+                         VIRTIO_FEATURES_HI(vdev->host_features_ex),
+                         VIRTIO_FEATURES_LOW(vdev->host_features_ex));
+            return -1;
+        }
+    } else
+#endif
     if (virtio_64bit_features_needed(vdev)) {
         /*
          * Subsection load filled vdev->guest_features.  Run them
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 05/16] qmp: update virtio features map to support extended features
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-05-21 11:33 ` [PATCH RFC 04/16] virtio: serialize extended features state Paolo Abeni
@ 2025-05-21 11:33 ` Paolo Abeni
  2025-05-21 11:34 ` [PATCH RFC 06/16] virtio: add support for negotiating " Paolo Abeni
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/virtio/virtio-hmp-cmds.c |  3 ++-
 hw/virtio/virtio-qmp.c      | 28 ++++++++++++++++++++++------
 hw/virtio/virtio-qmp.h      |  3 ++-
 qapi/virtio.json            |  8 ++++++--
 4 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 7d8677bcf0..04c7fae1c8 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -74,7 +74,8 @@ static void hmp_virtio_dump_features(Monitor *mon,
     }
 
     if (features->has_unknown_dev_features) {
-        monitor_printf(mon, "  unknown-features(0x%016"PRIx64")\n",
+        monitor_printf(mon, "  unknown-features(0x%016"PRIx64"%016"PRIx64")\n",
+                       features->unknown_dev_features_hi,
                        features->unknown_dev_features);
     }
 }
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 3b6377cf0d..d172a6e9f9 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -325,6 +325,20 @@ static const qmp_virtio_feature_map_t virtio_net_feature_map[] = {
     FEATURE_ENTRY(VHOST_USER_F_PROTOCOL_FEATURES, \
             "VHOST_USER_F_PROTOCOL_FEATURES: Vhost-user protocol features "
             "negotiation supported"),
+    FEATURE_ENTRY(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO, \
+            "VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO: Driver can receive GSO over "
+            "UDP tunnel packets"),
+    FEATURE_ENTRY(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM, \
+            "VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO: Driver can receive GSO over "
+            "UDP tunnel packets requiring checksum offload for the outer "
+            "header"),
+    FEATURE_ENTRY(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO, \
+            "VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO: Device can receive GSO over "
+            "UDP tunnel packets"),
+    FEATURE_ENTRY(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM, \
+            "VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO: Device can receive GSO over "
+            "UDP tunnel packets requiring checksum offload for the outer "
+            "header"),
     { -1, "" }
 };
 #endif
@@ -496,7 +510,7 @@ static const qmp_virtio_feature_map_t virtio_gpio_feature_map[] = {
                 bit = map[i].virtio_bit;                 \
             }                                            \
             else {                                       \
-                bit = 1ULL << map[i].virtio_bit;         \
+                bit = VIRTIO_BIT(map[i].virtio_bit);     \
             }                                            \
             if ((bitmap & bit) == 0) {                   \
                 continue;                                \
@@ -545,10 +559,11 @@ VhostDeviceProtocols *qmp_decode_protocols(uint64_t bitmap)
     return vhu_protocols;
 }
 
-VirtioDeviceFeatures *qmp_decode_features(uint16_t device_id, uint64_t bitmap)
+VirtioDeviceFeatures *qmp_decode_features(uint16_t device_id,
+                                          virtio_features_t bitmap)
 {
     VirtioDeviceFeatures *features;
-    uint64_t bit;
+    virtio_features_t bit;
     int i;
 
     features = g_new0(VirtioDeviceFeatures, 1);
@@ -683,6 +698,7 @@ VirtioDeviceFeatures *qmp_decode_features(uint16_t device_id, uint64_t bitmap)
     features->has_unknown_dev_features = bitmap != 0;
     if (features->has_unknown_dev_features) {
         features->unknown_dev_features = bitmap;
+        features->unknown_dev_features_hi = bitmap >> 64;
     }
 
     return features;
@@ -743,11 +759,11 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
     status->device_id = vdev->device_id;
     status->vhost_started = vdev->vhost_started;
     status->guest_features = qmp_decode_features(vdev->device_id,
-                                                 vdev->guest_features);
+                                                 vdev->guest_features_ex);
     status->host_features = qmp_decode_features(vdev->device_id,
-                                                vdev->host_features);
+                                                vdev->host_features_ex);
     status->backend_features = qmp_decode_features(vdev->device_id,
-                                                   vdev->backend_features);
+                                                   vdev->backend_features_ex);
 
     switch (vdev->device_endian) {
     case VIRTIO_DEVICE_ENDIAN_LITTLE:
diff --git a/hw/virtio/virtio-qmp.h b/hw/virtio/virtio-qmp.h
index 245a446a56..b64899f04a 100644
--- a/hw/virtio/virtio-qmp.h
+++ b/hw/virtio/virtio-qmp.h
@@ -18,6 +18,7 @@
 VirtIODevice *qmp_find_virtio_device(const char *path);
 VirtioDeviceStatus *qmp_decode_status(uint8_t bitmap);
 VhostDeviceProtocols *qmp_decode_protocols(uint64_t bitmap);
-VirtioDeviceFeatures *qmp_decode_features(uint16_t device_id, uint64_t bitmap);
+VirtioDeviceFeatures *qmp_decode_features(uint16_t device_id,
+                                          virtio_features_t bitmap);
 
 #endif
diff --git a/qapi/virtio.json b/qapi/virtio.json
index d351d2166e..2fde8ed753 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -488,14 +488,18 @@
 #     unique features)
 #
 # @unknown-dev-features: Virtio device features bitmap that have not
-#     been decoded
+#     been decoded (lower 64 bit)
+#
+# @unknown-dev-features-hi: Virtio device features bitmap that have not
+#     been decoded (high 64 bit)
 #
 # Since: 7.2
 ##
 { 'struct': 'VirtioDeviceFeatures',
   'data': { 'transports': [ 'str' ],
             '*dev-features': [ 'str' ],
-            '*unknown-dev-features': 'uint64' } }
+            '*unknown-dev-features': 'uint64',
+            '*unknown-dev-features-hi': 'uint64' } }
 
 ##
 # @VirtQueueStatus:
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 06/16] virtio: add support for negotiating extended features.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (4 preceding siblings ...)
  2025-05-21 11:33 ` [PATCH RFC 05/16] qmp: update virtio features map to support extended features Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-21 11:34 ` [PATCH RFC 07/16] virtio-pci: implement support for " Paolo Abeni
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio net driver.

Add support to allow extended features negotiation on a per
devices basis. Devices willing to negotiated extended features
need to implemented a new pair of features getter/setter, the
core will conditionally use them instead of the basic one.

Note that 'bad_features' don't need to be extended, as they are
bound to the 64 bits limit.

No functional changes intended for host without 128 bit support.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/virtio/virtio-bus.c     | 15 ++++++++++++---
 hw/virtio/virtio.c         | 23 +++++++++++++++++------
 include/hw/virtio/virtio.h |  8 +++++++-
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 9b84ead831..40948fca39 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -62,9 +62,18 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
     }
 
     /* Get the features of the plugged device. */
-    assert(vdc->get_features != NULL);
-    vdev->host_features_ex = vdc->get_features(vdev, vdev->host_features,
-                                               &local_err);
+#ifdef CONFIG_INT128
+    if (vdc->get_features_ex)
+        vdev->host_features_ex = vdc->get_features_ex(vdev,
+                                                      vdev->host_features_ex,
+                                                      &local_err);
+    else
+#endif
+    {
+        assert(vdc->get_features != NULL);
+        vdev->host_features_ex = vdc->get_features(vdev, vdev->host_features,
+                                                   &local_err);
+    }
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index ef15a1835e..90822e54f8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3114,13 +3114,24 @@ static int virtio_set_features_nocheck(VirtIODevice *vdev,
                                        virtio_features_t val)
 {
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
-    bool bad = (val & ~(vdev->host_features)) != 0;
+    bool bad = (val & ~(vdev->host_features_ex)) != 0;
 
-    val &= vdev->host_features;
-    if (k->set_features) {
-        k->set_features(vdev, val);
+    val &= vdev->host_features_ex;
+#ifdef CONFIG_INT128
+    if (!k->set_features_ex) {
+        val = (uint64_t)val;
+    }
+
+    if (k->set_features_ex) {
+        k->set_features_ex(vdev, val);
+    } else
+#endif
+    {
+        if (k->set_features) {
+            k->set_features(vdev, val);
+        }
     }
-    vdev->guest_features = val;
+    vdev->guest_features_ex = val;
     return bad ? -1 : 0;
 }
 
@@ -3193,7 +3204,7 @@ virtio_set_features_ex_nocheck_maybe_co(VirtIODevice *vdev, __uint128_t val)
 
 #endif
 
-int virtio_set_features(VirtIODevice *vdev, uint64_t val)
+int virtio_set_features(VirtIODevice *vdev, virtio_features_t val)
 {
     int ret;
     /*
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 82ff6c1630..e98fd76e7f 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -178,6 +178,12 @@ struct VirtioDeviceClass {
     /* This is what a VirtioDevice must implement */
     DeviceRealize realize;
     DeviceUnrealize unrealize;
+#ifdef CONFIG_INT128
+    virtio_features_t (*get_features_ex)(VirtIODevice *vdev,
+                                         virtio_features_t requested_features,
+                                         Error **errp);
+    void (*set_features_ex)(VirtIODevice *vdev, virtio_features_t val);
+#endif
     uint64_t (*get_features)(VirtIODevice *vdev,
                              uint64_t requested_features,
                              Error **errp);
@@ -366,7 +372,7 @@ void virtio_reset(void *opaque);
 void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
 void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
 void virtio_update_irq(VirtIODevice *vdev);
-int virtio_set_features(VirtIODevice *vdev, uint64_t val);
+int virtio_set_features(VirtIODevice *vdev, virtio_features_t val);
 
 /* Base devices.  */
 typedef struct VirtIOBlkConf VirtIOBlkConf;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 07/16] virtio-pci: implement support for extended features.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (5 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 06/16] virtio: add support for negotiating " Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-23  7:23   ` Akihiko Odaki
  2025-05-21 11:34 ` [PATCH RFC 08/16] vhost: add support for negotiating " Paolo Abeni
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Allow the common read/write operation to access all the
available features space.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/virtio/virtio-pci.c         | 19 +++++++++++++------
 include/hw/virtio/virtio-pci.h |  2 +-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 0fa8fe4955..7815ef2d9b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -123,7 +123,8 @@ static const VMStateDescription vmstate_virtio_pci_modern_state_sub = {
     .fields = (const VMStateField[]) {
         VMSTATE_UINT32(dfselect, VirtIOPCIProxy),
         VMSTATE_UINT32(gfselect, VirtIOPCIProxy),
-        VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy, 2),
+        VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy,
+                             VIRTIO_FEATURES_WORDS),
         VMSTATE_STRUCT_ARRAY(vqs, VirtIOPCIProxy, VIRTIO_QUEUE_MAX, 0,
                              vmstate_virtio_pci_modern_queue_state,
                              VirtIOPCIQueue),
@@ -1490,10 +1491,10 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
         val = proxy->dfselect;
         break;
     case VIRTIO_PCI_COMMON_DF:
-        if (proxy->dfselect <= 1) {
+        if (proxy->dfselect < VIRTIO_FEATURES_WORDS) {
             VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
 
-            val = (vdev->host_features & ~vdc->legacy_features) >>
+            val = (vdev->host_features_ex & ~vdc->legacy_features) >>
                 (32 * proxy->dfselect);
         }
         break;
@@ -1585,10 +1586,16 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
         break;
     case VIRTIO_PCI_COMMON_GF:
         if (proxy->gfselect < ARRAY_SIZE(proxy->guest_features)) {
+            virtio_features_t features = 0;
+            int i;
+
             proxy->guest_features[proxy->gfselect] = val;
-            virtio_set_features(vdev,
-                                (((uint64_t)proxy->guest_features[1]) << 32) |
-                                proxy->guest_features[0]);
+            for (i = 0; i < VIRTIO_FEATURES_WORDS; ++i) {
+                virtio_features_t cur = proxy->guest_features[i];
+
+                features |= cur << (i * 32);
+            }
+            virtio_set_features(vdev, features);
         }
         break;
     case VIRTIO_PCI_COMMON_MSIX:
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index 31ec144509..c20b289e64 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -165,7 +165,7 @@ struct VirtIOPCIProxy {
     uint32_t nvectors;
     uint32_t dfselect;
     uint32_t gfselect;
-    uint32_t guest_features[2];
+    uint32_t guest_features[VIRTIO_FEATURES_WORDS];
     VirtIOPCIQueue vqs[VIRTIO_QUEUE_MAX];
 
     VirtIOIRQFD *vector_irqfd;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 08/16] vhost: add support for negotiating extended features.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (6 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 07/16] virtio-pci: implement support for " Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-21 11:34 ` [PATCH RFC 09/16] vhost-backend: implement extended features support Paolo Abeni
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Similar to virtio infra, vhost core maintain the features
status in the widest format available and allow the devices
to implement extended version of the getter/setter.

Some care is needed for features bit manipulation: when clearing
a bit with 'and not' bitwise operations, the bit mask must be
extended to the feature format, or the all highest bits will be
unintentionally cleared.

Note that 'protocol_features' are not extended: they are only
used by vhost-user, and the latter device is not going to implement
extended features soon.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/virtio/vhost.c                 | 58 ++++++++++++++++++++++++-------
 include/hw/virtio/vhost-backend.h | 10 ++++++
 include/hw/virtio/vhost.h         | 13 +++----
 3 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 4cae7c1664..20592473f3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -985,20 +985,34 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
 static int vhost_dev_set_features(struct vhost_dev *dev,
                                   bool enable_log)
 {
-    uint64_t features = dev->acked_features;
+    virtio_features_t features = dev->acked_features;
     int r;
     if (enable_log) {
-        features |= 0x1ULL << VHOST_F_LOG_ALL;
+        features |= VIRTIO_BIT(VHOST_F_LOG_ALL);
     }
     if (!vhost_dev_has_iommu(dev)) {
-        features &= ~(0x1ULL << VIRTIO_F_IOMMU_PLATFORM);
+        features &= ~VIRTIO_BIT(VIRTIO_F_IOMMU_PLATFORM);
     }
     if (dev->vhost_ops->vhost_force_iommu) {
         if (dev->vhost_ops->vhost_force_iommu(dev) == true) {
-            features |= 0x1ULL << VIRTIO_F_IOMMU_PLATFORM;
+            features |= VIRTIO_BIT(VIRTIO_F_IOMMU_PLATFORM);
        }
     }
-    r = dev->vhost_ops->vhost_set_features(dev, features);
+
+#ifdef CONFIG_INT128
+    if ((features >> 64) && !dev->vhost_ops->vhost_set_features_ex) {
+        VHOST_OPS_DEBUG(r, "extended features without device support");
+        r = -EINVAL;
+        goto out;
+    }
+
+    if (dev->vhost_ops->vhost_set_features_ex) {
+        r = dev->vhost_ops->vhost_set_features_ex(dev, features);
+    } else
+#endif
+    {
+        r = dev->vhost_ops->vhost_set_features(dev, features);
+    }
     if (r < 0) {
         VHOST_OPS_DEBUG(r, "vhost_set_features failed");
         goto out;
@@ -1505,12 +1519,29 @@ static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
     }
 }
 
+static int vhost_dev_get_features(struct vhost_dev *hdev,
+                                  virtio_features_t *features)
+{
+    uint64_t features64;
+    int r;
+
+#ifdef CONFIG_INT128
+    if (hdev->vhost_ops->vhost_get_features_ex)
+        return hdev->vhost_ops->vhost_get_features_ex(hdev, features);
+    else
+#endif
+
+    r = hdev->vhost_ops->vhost_get_features(hdev, &features64);
+    *features = features64;
+    return r;
+}
+
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
                    VhostBackendType backend_type, uint32_t busyloop_timeout,
                    Error **errp)
 {
     unsigned int used, reserved, limit;
-    uint64_t features;
+    virtio_features_t features;
     int i, r, n_initialized_vqs = 0;
 
     hdev->vdev = NULL;
@@ -1530,7 +1561,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
         goto fail;
     }
 
-    r = hdev->vhost_ops->vhost_get_features(hdev, &features);
+    r = vhost_dev_get_features(hdev, &features);
     if (r < 0) {
         error_setg_errno(errp, -r, "vhost_get_features failed");
         goto fail;
@@ -1591,7 +1622,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     };
 
     if (hdev->migration_blocker == NULL) {
-        if (!(hdev->features & (0x1ULL << VHOST_F_LOG_ALL))) {
+        if (!(hdev->features & VIRTIO_BIT(VHOST_F_LOG_ALL))) {
             error_setg(&hdev->migration_blocker,
                        "Migration disabled: vhost lacks VHOST_F_LOG_ALL feature.");
         } else if (vhost_dev_log_is_shared(hdev) && !qemu_memfd_alloc_check()) {
@@ -1860,12 +1891,13 @@ static void vhost_start_config_intr(struct vhost_dev *dev)
     }
 }
 
-uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
-                            uint64_t features)
+virtio_features_t vhost_get_features(struct vhost_dev *hdev,
+                                     const int *feature_bits,
+                                     virtio_features_t features)
 {
     const int *bit = feature_bits;
     while (*bit != VHOST_INVALID_FEATURE_BIT) {
-        uint64_t bit_mask = (1ULL << *bit);
+        virtio_features_t bit_mask = VIRTIO_BIT(*bit);
         if (!(hdev->features & bit_mask)) {
             features &= ~bit_mask;
         }
@@ -1875,11 +1907,11 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
 }
 
 void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits,
-                        uint64_t features)
+                        virtio_features_t features)
 {
     const int *bit = feature_bits;
     while (*bit != VHOST_INVALID_FEATURE_BIT) {
-        uint64_t bit_mask = (1ULL << *bit);
+        virtio_features_t bit_mask = VIRTIO_BIT(*bit);
         if (features & bit_mask) {
             hdev->acked_features |= bit_mask;
         }
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index d6df209a2f..de9bcaf95f 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -95,6 +95,12 @@ typedef int (*vhost_new_worker_op)(struct vhost_dev *dev,
                                    struct vhost_worker_state *worker);
 typedef int (*vhost_free_worker_op)(struct vhost_dev *dev,
                                     struct vhost_worker_state *worker);
+#ifdef CONFIG_INT128
+typedef int (*vhost_set_features_ex_op)(struct vhost_dev *dev,
+                                        __uint128_t features);
+typedef int (*vhost_get_features_ex_op)(struct vhost_dev *dev,
+                                        __uint128_t *features);
+#endif
 typedef int (*vhost_set_features_op)(struct vhost_dev *dev,
                                      uint64_t features);
 typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
@@ -186,6 +192,10 @@ typedef struct VhostOps {
     vhost_free_worker_op vhost_free_worker;
     vhost_get_vring_worker_op vhost_get_vring_worker;
     vhost_attach_vring_worker_op vhost_attach_vring_worker;
+#ifdef CONFIG_INT128
+    vhost_set_features_ex_op vhost_set_features_ex;
+    vhost_get_features_ex_op vhost_get_features_ex;
+#endif
     vhost_set_features_op vhost_set_features;
     vhost_get_features_op vhost_get_features;
     vhost_set_backend_cap_op vhost_set_backend_cap;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index bb4b58e115..ea5ad117c5 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -106,9 +106,9 @@ struct vhost_dev {
      * future use should be discouraged and the variable retired as
      * its easy to confuse with the VirtIO backend_features.
      */
-    uint64_t features;
-    uint64_t acked_features;
-    uint64_t backend_features;
+    virtio_features_t features;
+    virtio_features_t acked_features;
+    virtio_features_t backend_features;
 
     /**
      * @protocol_features: is the vhost-user only feature set by
@@ -308,8 +308,9 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
  * is supported by the vhost backend (hdev->features), the supported
  * feature_bits and the requested feature set.
  */
-uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
-                            uint64_t features);
+virtio_features_t vhost_get_features(struct vhost_dev *hdev,
+                                     const int *feature_bits,
+                                     virtio_features_t features);
 
 /**
  * vhost_ack_features() - set vhost acked_features
@@ -321,7 +322,7 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
  * the backends advertised features and the supported feature_bits.
  */
 void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits,
-                        uint64_t features);
+                        virtio_features_t features);
 unsigned int vhost_get_max_memslots(void);
 unsigned int vhost_get_free_memslots(void);
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 09/16] vhost-backend: implement extended features support.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (7 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 08/16] vhost: add support for negotiating " Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-21 11:34 ` [PATCH RFC 10/16] vhost-net: " Paolo Abeni
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Leverage the kernel extended features manipulation ioctls(), if
available, and fallback to old ops otherwise. Error out when setting
extended features but kernel support is not available.

Note that extended support for get/set backend features is not needed,
as the only feature that can be changed belongs to the 64 bit range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/virtio/vhost-backend.c | 59 +++++++++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 833804dd40..a5e28e15ee 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -182,12 +182,6 @@ static int vhost_kernel_get_vring_worker(struct vhost_dev *dev,
     return vhost_kernel_call(dev, VHOST_GET_VRING_WORKER, worker);
 }
 
-static int vhost_kernel_set_features(struct vhost_dev *dev,
-                                     uint64_t features)
-{
-    return vhost_kernel_call(dev, VHOST_SET_FEATURES, &features);
-}
-
 static int vhost_kernel_set_backend_cap(struct vhost_dev *dev)
 {
     uint64_t features;
@@ -210,11 +204,59 @@ static int vhost_kernel_set_backend_cap(struct vhost_dev *dev)
     return 0;
 }
 
+#ifdef CONFIG_INT128
+static int vhost_kernel_set_features_ex(struct vhost_dev *dev,
+                                     virtio_features_t features)
+{
+    uint64_t features64;
+    int r;
+
+    /*
+     * Can't check for ENOTTY, as the kernel for unknown ioctls interprets
+     * the argument as a virtio queue id and most likely errors out validating
+     * such id, instead of reporting an unknown operation.
+     */
+    r = vhost_kernel_call(dev, VHOST_SET_FEATURES_EX, &features);
+    if (!r) {
+        return 0;
+    }
+
+    if (!!(features >> 64)) {
+        error_report("Trying to set extended features without kernel support");
+        return -EINVAL;
+    }
+    features64 = (uint64_t)features;
+    return vhost_kernel_call(dev, VHOST_SET_FEATURES, &features64);
+}
+
+static int vhost_kernel_get_features_ex(struct vhost_dev *dev,
+                                        virtio_features_t *features)
+{
+    uint64_t features64;
+    int r;
+
+    r = vhost_kernel_call(dev, VHOST_GET_FEATURES_EX, features);
+    if (!r) {
+        return 0;
+    }
+
+    r = vhost_kernel_call(dev, VHOST_GET_FEATURES, &features64);
+    *features = features64;
+    return r;
+}
+#else
+static int vhost_kernel_set_features(struct vhost_dev *dev,
+                                     uint64_t features)
+{
+    return vhost_kernel_call(dev, VHOST_SET_FEATURES, &features);
+}
+
 static int vhost_kernel_get_features(struct vhost_dev *dev,
                                      uint64_t *features)
 {
     return vhost_kernel_call(dev, VHOST_GET_FEATURES, features);
 }
+#endif
 
 static int vhost_kernel_set_owner(struct vhost_dev *dev)
 {
@@ -341,8 +383,13 @@ const VhostOps kernel_ops = {
         .vhost_attach_vring_worker = vhost_kernel_attach_vring_worker,
         .vhost_new_worker = vhost_kernel_new_worker,
         .vhost_free_worker = vhost_kernel_free_worker,
+#ifdef CONFIG_INT128
+        .vhost_set_features_ex = vhost_kernel_set_features_ex,
+        .vhost_get_features_ex = vhost_kernel_get_features_ex,
+#else
         .vhost_set_features = vhost_kernel_set_features,
         .vhost_get_features = vhost_kernel_get_features,
+#endif
         .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
         .vhost_set_owner = vhost_kernel_set_owner,
         .vhost_get_vq_index = vhost_kernel_get_vq_index,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 10/16] vhost-net: implement extended features support.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (8 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 09/16] vhost-backend: implement extended features support Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-21 11:34 ` [PATCH RFC 11/16] qdev-properties: add property for extended virtio features Paolo Abeni
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Update the features manipulation helpers to cope with the
extended features, adjust the relevant format strings accordingly
and always use the virtio features type for bitmask manipulation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/vhost_net-stub.c |  7 ++++---
 hw/net/vhost_net.c      | 31 ++++++++++++++++++-------------
 include/net/vhost_net.h |  8 +++++---
 3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
index 72df6d757e..3997b3b814 100644
--- a/hw/net/vhost_net-stub.c
+++ b/hw/net/vhost_net-stub.c
@@ -47,7 +47,8 @@ void vhost_net_cleanup(struct vhost_net *net)
 {
 }
 
-uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
+virtio_features_t vhost_net_get_features(struct vhost_net *net,
+                                         virtio_features_t features)
 {
     return features;
 }
@@ -63,11 +64,11 @@ int vhost_net_set_config(struct vhost_net *net, const uint8_t *data,
     return 0;
 }
 
-void vhost_net_ack_features(struct vhost_net *net, uint64_t features)
+void vhost_net_ack_features(struct vhost_net *net, virtio_features_t features)
 {
 }
 
-uint64_t vhost_net_get_acked_features(VHostNetState *net)
+virtio_features_t vhost_net_get_acked_features(VHostNetState *net)
 {
     return 0;
 }
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 891f235a0a..58d7619fc8 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -121,7 +121,8 @@ static const int *vhost_net_get_feature_bits(struct vhost_net *net)
     return feature_bits;
 }
 
-uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
+virtio_features_t vhost_net_get_features(struct vhost_net *net,
+                                         virtio_features_t features)
 {
     return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
             features);
@@ -137,7 +138,7 @@ int vhost_net_set_config(struct vhost_net *net, const uint8_t *data,
     return vhost_dev_set_config(&net->dev, data, offset, size, flags);
 }
 
-void vhost_net_ack_features(struct vhost_net *net, uint64_t features)
+void vhost_net_ack_features(struct vhost_net *net, virtio_features_t features)
 {
     net->dev.acked_features = net->dev.backend_features;
     vhost_ack_features(&net->dev, vhost_net_get_feature_bits(net), features);
@@ -148,7 +149,7 @@ uint64_t vhost_net_get_max_queues(VHostNetState *net)
     return net->dev.max_queues;
 }
 
-uint64_t vhost_net_get_acked_features(VHostNetState *net)
+virtio_features_t vhost_net_get_acked_features(VHostNetState *net)
 {
     return net->dev.acked_features;
 }
@@ -317,10 +318,11 @@ static int vhost_net_get_fd(NetClientState *backend)
 
 struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
+    virtio_features_t missing_features;
     int r;
     bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL;
     struct vhost_net *net = g_new0(struct vhost_net, 1);
-    uint64_t features = 0;
+    virtio_features_t features = 0;
     Error *local_err = NULL;
 
     if (!options->net_backend) {
@@ -361,12 +363,14 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
     if (backend_kernel) {
         if (!qemu_has_vnet_hdr_len(options->net_backend,
                                sizeof(struct virtio_net_hdr_mrg_rxbuf))) {
-            net->dev.features &= ~(1ULL << VIRTIO_NET_F_MRG_RXBUF);
+            net->dev.features &= ~VIRTIO_BIT(VIRTIO_NET_F_MRG_RXBUF);
         }
-        if (~net->dev.features & net->dev.backend_features) {
-            fprintf(stderr, "vhost lacks feature mask 0x%" PRIx64
-                   " for backend\n",
-                   (uint64_t)(~net->dev.features & net->dev.backend_features));
+
+        missing_features = ~net->dev.features & net->dev.backend_features;
+        if (missing_features) {
+            fprintf(stderr, "vhost lacks feature mask 0x" VIRTIO_FEATURES_FMT
+                   " for backend\n", VIRTIO_FEATURES_HI(missing_features),
+                   VIRTIO_FEATURES_LOW(missing_features));
             goto fail;
         }
     }
@@ -375,10 +379,11 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 #ifdef CONFIG_VHOST_NET_USER
     if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
         features = vhost_user_get_acked_features(net->nc);
-        if (~net->dev.features & features) {
-            fprintf(stderr, "vhost lacks feature mask 0x%" PRIx64
-                    " for backend\n",
-                    (uint64_t)(~net->dev.features & features));
+        missing_features = ~net->dev.features & features;
+        if (missing_features) {
+            fprintf(stderr, "vhost lacks feature mask 0x" VIRTIO_FEATURES_FMT
+                    " for backend\n", VIRTIO_FEATURES_HI(missing_features),
+                    VIRTIO_FEATURES_LOW(missing_features));
             goto fail;
         }
     }
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index c6a5361a2a..d7d733b7ad 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -2,6 +2,7 @@
 #define VHOST_NET_H
 
 #include "net/net.h"
+#include "hw/virtio/virtio-features.h"
 #include "hw/virtio/vhost-backend.h"
 
 struct vhost_net;
@@ -25,8 +26,9 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 
 void vhost_net_cleanup(VHostNetState *net);
 
-uint64_t vhost_net_get_features(VHostNetState *net, uint64_t features);
-void vhost_net_ack_features(VHostNetState *net, uint64_t features);
+virtio_features_t vhost_net_get_features(VHostNetState *net,
+                                         virtio_features_t features);
+void vhost_net_ack_features(VHostNetState *net, virtio_features_t features);
 
 int vhost_net_get_config(struct vhost_net *net,  uint8_t *config,
                          uint32_t config_len);
@@ -43,7 +45,7 @@ VHostNetState *get_vhost_net(NetClientState *nc);
 
 int vhost_set_vring_enable(NetClientState * nc, int enable);
 
-uint64_t vhost_net_get_acked_features(VHostNetState *net);
+virtio_features_t vhost_net_get_acked_features(VHostNetState *net);
 
 int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 11/16] qdev-properties: add property for extended virtio features
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (9 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 10/16] vhost-net: " Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-21 11:34 ` [PATCH RFC 12/16] virtio-net: implement extended features support Paolo Abeni
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Virtio features extend above the 64 bit space, and GSO over
UDP tunnels support is going to use some bits in the extended
space.

Introduce a new Property type to handle the extended feature
defined in the previous patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/core/qdev-properties.c    | 46 ++++++++++++++++++++++++++++++++++++
 include/hw/qdev-properties.h | 13 ++++++++++
 2 files changed, 59 insertions(+)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 147b3ffd16..2a0182479c 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -225,6 +225,52 @@ const PropertyInfo qdev_prop_bit64 = {
     .set_default_value = set_default_value_bool,
 };
 
+/* Bit virtio features __int128_t */
+#ifdef CONFIG_INT128
+static void vf_prop_set(Object *obj, const Property *props, bool val)
+{
+    __int128_t *vf = object_field_prop_ptr(obj, props);
+    assert(props->info == &qdev_prop_bitvf);
+    if (val) {
+        *vf |= (__int128_t)1 << props->bitnr;
+    } else {
+        *vf &= ~((__int128_t)1 << props->bitnr);
+    }
+}
+
+static void prop_get_bitvf(Object *obj, Visitor *v, const char *name,
+                           void *opaque, Error **errp)
+{
+    const Property *prop = opaque;
+    __int128_t *vf = object_field_prop_ptr(obj, prop);
+    bool value;
+
+    assert(prop->info == &qdev_prop_bitvf);
+    value = *vf & ((__int128_t)1 << prop->bitnr);
+    visit_type_bool(v, name, &value, errp);
+}
+
+static void prop_set_bitvf(Object *obj, Visitor *v, const char *name,
+                           void *opaque, Error **errp)
+{
+    const Property *prop = opaque;
+    bool value;
+
+    if (!visit_type_bool(v, name, &value, errp)) {
+        return;
+    }
+    vf_prop_set(obj, prop, value);
+}
+
+const PropertyInfo qdev_prop_bitvf = {
+    .type  = "bool",
+    .description = "on/off",
+    .get   = prop_get_bitvf,
+    .set   = prop_set_bitvf,
+    .set_default_value = set_default_value_bool,
+};
+#endif
+
 /* --- bool --- */
 
 static void get_bool(Object *obj, Visitor *v, const char *name, void *opaque,
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 2c99856caa..7760dd6dbd 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -100,6 +100,19 @@ extern const PropertyInfo qdev_prop_link;
                 .set_default = true,                              \
                 .defval.u  = (bool)_defval)
 
+#ifdef CONFIG_INT128
+extern const PropertyInfo qdev_prop_bitvf;
+
+#define DEFINE_PROP_BITVF(_name, _state, _field, _bit, _defval)   \
+    DEFINE_PROP(_name, _state, _field, qdev_prop_bitvf,           \
+                 virtio_features_t,                               \
+                .bitnr    = (_bit),                               \
+                .set_default = true,                              \
+                .defval.u  = (bool)_defval)
+#else
+#define qdev_prop_bitvf qdev_prop_bit64
+#endif
+
 #define DEFINE_PROP_BOOL(_name, _state, _field, _defval)     \
     DEFINE_PROP(_name, _state, _field, qdev_prop_bool, bool, \
                 .set_default = true,                         \
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 12/16] virtio-net: implement extended features support.
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (10 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 11/16] qdev-properties: add property for extended virtio features Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-23  8:09   ` Akihiko Odaki
  2025-05-21 11:34 ` [PATCH RFC 13/16] net: implement tunnel probing Paolo Abeni
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Use the extended types and helpers to manipulate the virtio_net
features.

Note that offloads are still 64bits wide, as per specification,
and extended offloads will be mapped into such range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/virtio-net.c            | 87 +++++++++++++++++++++-------------
 include/hw/virtio/virtio-net.h |  2 +-
 2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9f500c64e7..193469fc27 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -90,6 +90,17 @@
                                          VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
                                          VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
 
+#define VIRTIO_OFFLOAD_MAP_MIN    46
+#define VIRTIO_OFFLOAD_MAP_LENGTH 4
+#define VIRTIO_OFFLOAD_MAP        MAKE_64BIT_MASK(VIRTIO_OFFLOAD_MAP_MIN, \
+                                                VIRTIO_OFFLOAD_MAP_LENGTH)
+#define VIRTIO_FEATURES_MAP_MIN   65
+#define VIRTIO_O2F_DELTA          (VIRTIO_FEATURES_MAP_MIN - \
+                                   VIRTIO_OFFLOAD_MAP_MIN)
+
+#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)  (fbit >= 64 ? \
+                                          fbit - VIRTIO_O2F_DELTA : fbit)
+
 static const VirtIOFeature feature_sizes[] = {
     {.flags = 1ULL << VIRTIO_NET_F_MAC,
      .end = endof(struct virtio_net_config, mac)},
@@ -751,44 +762,45 @@ static void virtio_net_set_queue_pairs(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
-static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
-                                        Error **errp)
+static virtio_features_t virtio_net_get_features(VirtIODevice *vdev,
+                                                 virtio_features_t features,
+                                                 Error **errp)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     NetClientState *nc = qemu_get_queue(n->nic);
 
     /* Firstly sync all virtio-net possible supported features */
-    features |= n->host_features;
+    features |= n->host_features_ex;
 
-    virtio_add_feature(&features, VIRTIO_NET_F_MAC);
+    virtio_add_feature_ex(&features, VIRTIO_NET_F_MAC);
 
     if (!peer_has_vnet_hdr(n)) {
-        virtio_clear_feature(&features, VIRTIO_NET_F_CSUM);
-        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO4);
-        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO6);
-        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_ECN);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_CSUM);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_TSO4);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_TSO6);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_ECN);
 
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_CSUM);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_TSO4);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_TSO6);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_ECN);
 
-        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_USO);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO4);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO6);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_USO);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO4);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
 
-        virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HASH_REPORT);
     }
 
     if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_UFO);
-        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_UFO);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_UFO);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_UFO);
     }
 
     if (!peer_has_uso(n)) {
-        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_USO);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO4);
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO6);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_USO);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO4);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
     }
 
     if (!get_vhost_net(nc->peer)) {
@@ -796,7 +808,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
     }
 
     if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
-        virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_RSS);
     }
     features = vhost_net_get_features(get_vhost_net(nc->peer), features);
     vdev->backend_features_ex = features;
@@ -818,7 +830,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
      * support it.
      */
     if (!virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_CTRL_VQ)) {
-        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ANNOUNCE);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_ANNOUNCE);
     }
 
     return features;
@@ -851,9 +863,16 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
             !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)));
 }
 
-static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
+static uint64_t virtio_net_features_to_offload(virtio_features_t features)
+{
+    return (features & ~VIRTIO_OFFLOAD_MAP) |
+           ((features >> VIRTIO_O2F_DELTA) & VIRTIO_OFFLOAD_MAP);
+}
+
+static uint64_t
+virtio_net_guest_offloads_by_features(virtio_features_t features)
 {
-    static const uint64_t guest_offloads_mask =
+    static const virtio_features_t guest_offloads_mask =
         (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
         (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
         (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
@@ -862,13 +881,13 @@ static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
         (1ULL << VIRTIO_NET_F_GUEST_USO4) |
         (1ULL << VIRTIO_NET_F_GUEST_USO6);
 
-    return guest_offloads_mask & features;
+    return guest_offloads_mask & virtio_net_features_to_offload(features);
 }
 
 uint64_t virtio_net_supported_guest_offloads(const VirtIONet *n)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
-    return virtio_net_guest_offloads_by_features(vdev->guest_features);
+    return virtio_net_guest_offloads_by_features(vdev->guest_features_ex);
 }
 
 typedef struct {
@@ -947,7 +966,8 @@ static void failover_add_primary(VirtIONet *n, Error **errp)
     error_propagate(errp, err);
 }
 
-static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
+static void virtio_net_set_features(VirtIODevice *vdev,
+                                    virtio_features_t features)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     Error *err = NULL;
@@ -955,7 +975,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
 
     if (n->mtu_bypass_backend &&
             !virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_MTU)) {
-        features &= ~(1ULL << VIRTIO_NET_F_MTU);
+        features &= ~VIRTIO_BIT(VIRTIO_NET_F_MTU);
     }
 
     virtio_net_set_multiqueue(n,
@@ -1962,10 +1982,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
                 virtio_error(vdev, "virtio-net unexpected empty queue: "
                              "i %zd mergeable %d offset %zd, size %zd, "
                              "guest hdr len %zd, host hdr len %zd "
-                             "guest features 0x%" PRIx64,
+                             "guest features 0x" VIRTIO_FEATURES_FMT,
                              i, n->mergeable_rx_bufs, offset, size,
                              n->guest_hdr_len, n->host_hdr_len,
-                             vdev->guest_features);
+                             VIRTIO_FEATURES_HI(vdev->guest_features_ex),
+                             VIRTIO_FEATURES_LOW(vdev->guest_features_ex));
             }
             err = -1;
             goto err;
@@ -4146,8 +4167,8 @@ static void virtio_net_class_init(ObjectClass *klass, const void *data)
     vdc->unrealize = virtio_net_device_unrealize;
     vdc->get_config = virtio_net_get_config;
     vdc->set_config = virtio_net_set_config;
-    vdc->get_features = virtio_net_get_features;
-    vdc->set_features = virtio_net_set_features;
+    vdc->get_features_ex = virtio_net_get_features;
+    vdc->set_features_ex = virtio_net_set_features;
     vdc->bad_features = virtio_net_bad_features;
     vdc->reset = virtio_net_reset;
     vdc->queue_reset = virtio_net_queue_reset;
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b9ea9e824e..5ccdbeb253 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -178,7 +178,7 @@ struct VirtIONet {
     uint32_t has_vnet_hdr;
     size_t host_hdr_len;
     size_t guest_hdr_len;
-    uint64_t host_features;
+    DECLARE_FEATURES(host_features);
     uint32_t rsc_timeout;
     uint8_t rsc4_enabled;
     uint8_t rsc6_enabled;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 13/16] net: implement tunnel probing
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (11 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 12/16] virtio-net: implement extended features support Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-23  7:39   ` Akihiko Odaki
  2025-05-21 11:34 ` [PATCH RFC 14/16] net: bundle all offloads in a single struct Paolo Abeni
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Tap devices support GSO over UDP tunnel offload. Probe for such
feature in a similar manner to other offloads.

GSO over UDP tunnel needs to be enabled in addition to  a "plain"
offload (TSO or USO).

No need to check separately for the outer header checksum offload:
the kernel is going to support both of them or none.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/virtio-net.c | 39 +++++++++++++++++++++++++++++++++++++++
 include/net/net.h   |  3 +++
 net/net.c           |  9 +++++++++
 net/tap-bsd.c       |  5 +++++
 net/tap-linux.c     | 19 +++++++++++++++++++
 net/tap-solaris.c   |  5 +++++
 net/tap-stub.c      |  5 +++++
 net/tap.c           | 11 +++++++++++
 net/tap_int.h       |  1 +
 9 files changed, 97 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 193469fc27..05cf23700f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -646,6 +646,15 @@ static int peer_has_uso(VirtIONet *n)
     return qemu_has_uso(qemu_get_queue(n->nic)->peer);
 }
 
+static int peer_has_tunnel(VirtIONet *n)
+{
+    if (!peer_has_vnet_hdr(n)) {
+        return 0;
+    }
+
+    return qemu_has_tunnel(qemu_get_queue(n->nic)->peer);
+}
+
 static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
                                        int version_1, int hash_report)
 {
@@ -789,6 +798,15 @@ static virtio_features_t virtio_net_get_features(VirtIODevice *vdev,
         virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO4);
         virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
 
+#ifdef CONFIG_INT128
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO);
+        virtio_clear_feature_ex(&features,
+                                VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM);
+        virtio_clear_feature_ex(&features,
+                                VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM);
+#endif
+
         virtio_clear_feature_ex(&features, VIRTIO_NET_F_HASH_REPORT);
     }
 
@@ -803,6 +821,17 @@ static virtio_features_t virtio_net_get_features(VirtIODevice *vdev,
         virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
     }
 
+#ifdef CONFIG_INT128
+    if (!peer_has_tunnel(n)) {
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO);
+        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO);
+        virtio_clear_feature_ex(&features,
+                                VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM);
+        virtio_clear_feature_ex(&features,
+                                VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM);
+    }
+#endif
+
     if (!get_vhost_net(nc->peer)) {
         return features;
     }
@@ -4153,6 +4182,16 @@ static const Property virtio_net_properties[] = {
                       VIRTIO_NET_F_GUEST_USO6, true),
     DEFINE_PROP_BIT64("host_uso", VirtIONet, host_features,
                       VIRTIO_NET_F_HOST_USO, true),
+#ifdef CONFIG_INT128
+    DEFINE_PROP_BITVF("host_tunnel", VirtIONet, host_features_ex,
+                      VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO, true),
+    DEFINE_PROP_BITVF("host_tunnel_csum", VirtIONet, host_features_ex,
+                      VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM, true),
+    DEFINE_PROP_BITVF("guest_tunnel", VirtIONet, host_features_ex,
+                      VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO, true),
+    DEFINE_PROP_BITVF("guest_tunnel_csum", VirtIONet, host_features_ex,
+                      VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM, true),
+#endif
 };
 
 static void virtio_net_class_init(ObjectClass *klass, const void *data)
diff --git a/include/net/net.h b/include/net/net.h
index cdd5b109b0..391d983e49 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -55,6 +55,7 @@ typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasUso)(NetClientState *);
+typedef bool (HasTunnel)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
 typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
@@ -83,6 +84,7 @@ typedef struct NetClientInfo {
     NetPoll *poll;
     HasUfo *has_ufo;
     HasUso *has_uso;
+    HasTunnel *has_tunnel;
     HasVnetHdr *has_vnet_hdr;
     HasVnetHdrLen *has_vnet_hdr_len;
     SetOffload *set_offload;
@@ -183,6 +185,7 @@ void qemu_set_info_str(NetClientState *nc,
 void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
 bool qemu_has_ufo(NetClientState *nc);
 bool qemu_has_uso(NetClientState *nc);
+bool qemu_has_tunnel(NetClientState *nc);
 bool qemu_has_vnet_hdr(NetClientState *nc);
 bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
 void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
diff --git a/net/net.c b/net/net.c
index 39d6f28158..9c83d3b137 100644
--- a/net/net.c
+++ b/net/net.c
@@ -522,6 +522,15 @@ bool qemu_has_uso(NetClientState *nc)
     return nc->info->has_uso(nc);
 }
 
+bool qemu_has_tunnel(NetClientState *nc)
+{
+    if (!nc || !nc->info->has_tunnel) {
+        return false;
+    }
+
+    return nc->info->has_tunnel(nc);
+}
+
 bool qemu_has_vnet_hdr(NetClientState *nc)
 {
     if (!nc || !nc->info->has_vnet_hdr) {
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index b4c84441ba..3f01c8921e 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -217,6 +217,11 @@ int tap_probe_has_uso(int fd)
     return 0;
 }
 
+int tap_probe_has_tunnel(int fd)
+{
+    return 0;
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 22ec2f45d2..2df601551e 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -37,6 +37,14 @@
 
 #define PATH_NET_TUN "/dev/net/tun"
 
+#ifndef TUN_F_UDP_TUNNEL_GSO
+#define TUN_F_UDP_TUNNEL_GSO       0x080
+#endif
+
+#ifndef TUN_F_UDP_TUNNEL_GSO_CSUM
+#define TUN_F_UDP_TUNNEL_GSO_CSUM  0x100
+#endif
+
 int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
              int vnet_hdr_required, int mq_required, Error **errp)
 {
@@ -196,6 +204,17 @@ int tap_probe_has_uso(int fd)
     return 1;
 }
 
+int tap_probe_has_tunnel(int fd)
+{
+    unsigned offload;
+
+    offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_UDP_TUNNEL_GSO;
+    if (ioctl(fd, TUNSETOFFLOAD, offload) < 0) {
+        return 0;
+    }
+    return 1;
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
     if (ioctl(fd, TUNSETVNETHDRSZ, &len) == -1) {
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 51b7830bef..b1aa40d46b 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -221,6 +221,11 @@ int tap_probe_has_uso(int fd)
     return 0;
 }
 
+int tap_probe_has_tunnel(int fd)
+{
+    return 0;
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 38673434cb..5f57d6baac 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -52,6 +52,11 @@ int tap_probe_has_uso(int fd)
     return 0;
 }
 
+int tap_probe_has_tunnel(int fd)
+{
+    return 0;
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap.c b/net/tap.c
index ae1c7e3983..f6e8cd5f1c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -58,6 +58,7 @@ typedef struct TAPState {
     bool using_vnet_hdr;
     bool has_ufo;
     bool has_uso;
+    bool has_tunnel;
     bool enabled;
     VHostNetState *vhost_net;
     unsigned host_vnet_hdr_len;
@@ -223,6 +224,14 @@ static bool tap_has_uso(NetClientState *nc)
     return s->has_uso;
 }
 
+static bool tap_has_tunnel(NetClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
+    return s->has_tunnel;
+}
+
 static bool tap_has_vnet_hdr(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -340,6 +349,7 @@ static NetClientInfo net_tap_info = {
     .cleanup = tap_cleanup,
     .has_ufo = tap_has_ufo,
     .has_uso = tap_has_uso,
+    .has_tunnel = tap_has_tunnel,
     .has_vnet_hdr = tap_has_vnet_hdr,
     .has_vnet_hdr_len = tap_has_vnet_hdr_len,
     .set_offload = tap_set_offload,
@@ -367,6 +377,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
     s->using_vnet_hdr = false;
     s->has_ufo = tap_probe_has_ufo(s->fd);
     s->has_uso = tap_probe_has_uso(s->fd);
+    s->has_tunnel = tap_probe_has_tunnel(s->fd);
     s->enabled = true;
     tap_set_offload(&s->nc, 0, 0, 0, 0, 0, 0, 0);
     /*
diff --git a/net/tap_int.h b/net/tap_int.h
index 8857ff299d..2a8aa3632f 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -37,6 +37,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
+int tap_probe_has_tunnel(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
                         int uso4, int uso6);
 void tap_fd_set_vnet_hdr_len(int fd, int len);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 14/16] net: bundle all offloads in a single struct
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (12 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 13/16] net: implement tunnel probing Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-23  7:45   ` Akihiko Odaki
  2025-05-21 11:34 ` [PATCH RFC 15/16] net: implement tnl feature offloading Paolo Abeni
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

The set_offload() argument list is already pretty long and
we are going to introduce soon a bunch of additional offloads.

Replace the offload arguments with a single struct and update
all the relevant call-sites.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/e1000e_core.c |  5 +++--
 hw/net/igb_core.c    |  5 +++--
 hw/net/virtio-net.c  | 19 +++++++++++--------
 hw/net/vmxnet3.c     | 13 +++++--------
 include/net/net.h    | 15 ++++++++++++---
 net/net.c            |  5 ++---
 net/netmap.c         |  3 +--
 net/tap-bsd.c        |  3 +--
 net/tap-linux.c      | 21 ++++++++++++---------
 net/tap-solaris.c    |  4 ++--
 net/tap-stub.c       |  3 +--
 net/tap.c            |  8 ++++----
 net/tap_int.h        |  4 ++--
 13 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 2413858790..ec90869e56 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2827,8 +2827,9 @@ e1000e_update_rx_offloads(E1000ECore *core)
     trace_e1000e_rx_set_cso(cso_state);
 
     if (core->has_vnet) {
-        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer,
-                         cso_state, 0, 0, 0, 0, 0, 0);
+        struct NetOffloads ol = {.csum = cso_state };
+
+        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer, &ol);
     }
 }
 
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 39e3ce1c8f..e940d3a8e2 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -3058,8 +3058,9 @@ igb_update_rx_offloads(IGBCore *core)
     trace_e1000e_rx_set_cso(cso_state);
 
     if (core->has_vnet) {
-        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer,
-                         cso_state, 0, 0, 0, 0, 0, 0);
+        struct NetOffloads ol = {.csum = cso_state };
+
+        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer, &ol);
     }
 }
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 05cf23700f..881877086e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -882,14 +882,17 @@ static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
 
 static void virtio_net_apply_guest_offloads(VirtIONet *n)
 {
-    qemu_set_offload(qemu_get_queue(n->nic)->peer,
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_CSUM)),
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO4)),
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO6)),
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_ECN)),
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
-            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)));
+    NetOffloads ol = {
+       .csum = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_CSUM)),
+       .tso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO4)),
+       .tso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO6)),
+       .ecn  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_ECN)),
+       .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
+       .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
+       .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
+    };
+
+    qemu_set_offload(qemu_get_queue(n->nic)->peer, &ol);
 }
 
 static uint64_t virtio_net_features_to_offload(virtio_features_t features)
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 83d942af17..dbacb4aa18 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1334,14 +1334,11 @@ static void vmxnet3_update_features(VMXNET3State *s)
               s->lro_supported, rxcso_supported,
               s->rx_vlan_stripping);
     if (s->peer_has_vhdr) {
-        qemu_set_offload(qemu_get_queue(s->nic)->peer,
-                         rxcso_supported,
-                         s->lro_supported,
-                         s->lro_supported,
-                         0,
-                         0,
-                         0,
-                         0);
+        struct NetOffloads ol = { .csum = rxcso_supported,
+                                  .tso4 = s->lro_supported,
+                                  .tso6 = s->lro_supported };
+
+        qemu_set_offload(qemu_get_queue(s->nic)->peer, &ol);
     }
 }
 
diff --git a/include/net/net.h b/include/net/net.h
index 391d983e49..c71d7c6074 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -35,6 +35,16 @@ typedef struct NICConf {
     int32_t bootindex;
 } NICConf;
 
+typedef struct NetOffloads {
+    bool csum;
+    bool tso4;
+    bool tso6;
+    bool ecn;
+    bool ufo;
+    bool uso4;
+    bool uso6;
+} NetOffloads;
+
 #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
     DEFINE_PROP_MACADDR("mac",   _state, _conf.macaddr),                \
     DEFINE_PROP_NETDEV("netdev", _state, _conf.peers)
@@ -58,7 +68,7 @@ typedef bool (HasUso)(NetClientState *);
 typedef bool (HasTunnel)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
-typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
+typedef void (SetOffload)(NetClientState *, const NetOffloads *);
 typedef int (GetVnetHdrLen)(NetClientState *);
 typedef void (SetVnetHdrLen)(NetClientState *, int);
 typedef int (SetVnetLE)(NetClientState *, bool);
@@ -188,8 +198,7 @@ bool qemu_has_uso(NetClientState *nc);
 bool qemu_has_tunnel(NetClientState *nc);
 bool qemu_has_vnet_hdr(NetClientState *nc);
 bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
-void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
-                      int ecn, int ufo, int uso4, int uso6);
+void qemu_set_offload(NetClientState *nc, const NetOffloads *ol);
 int qemu_get_vnet_hdr_len(NetClientState *nc);
 void qemu_set_vnet_hdr_len(NetClientState *nc, int len);
 int qemu_set_vnet_le(NetClientState *nc, bool is_le);
diff --git a/net/net.c b/net/net.c
index 9c83d3b137..5a2f00c108 100644
--- a/net/net.c
+++ b/net/net.c
@@ -549,14 +549,13 @@ bool qemu_has_vnet_hdr_len(NetClientState *nc, int len)
     return nc->info->has_vnet_hdr_len(nc, len);
 }
 
-void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
-                          int ecn, int ufo, int uso4, int uso6)
+void qemu_set_offload(NetClientState *nc, const NetOffloads *ol)
 {
     if (!nc || !nc->info->set_offload) {
         return;
     }
 
-    nc->info->set_offload(nc, csum, tso4, tso6, ecn, ufo, uso4, uso6);
+    nc->info->set_offload(nc, ol);
 }
 
 int qemu_get_vnet_hdr_len(NetClientState *nc)
diff --git a/net/netmap.c b/net/netmap.c
index 297510e190..6cd8f2bdc5 100644
--- a/net/netmap.c
+++ b/net/netmap.c
@@ -366,8 +366,7 @@ static void netmap_set_vnet_hdr_len(NetClientState *nc, int len)
     }
 }
 
-static void netmap_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
-                               int ecn, int ufo, int uso4, int uso6)
+static void netmap_set_offload(NetClientState *nc, const NetOffloads *ol)
 {
     NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
 
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 3f01c8921e..e7de0672f4 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -236,8 +236,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
     return -EINVAL;
 }
 
-void tap_fd_set_offload(int fd, int csum, int tso4,
-                        int tso6, int ecn, int ufo, int uso4, int uso6)
+void tap_fd_set_offload(int fd, const NetOffloads *ol)
 {
 }
 
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 2df601551e..aa5f3a6e22 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -258,8 +258,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
     abort();
 }
 
-void tap_fd_set_offload(int fd, int csum, int tso4,
-                        int tso6, int ecn, int ufo, int uso4, int uso6)
+void tap_fd_set_offload(int fd, const NetOffloads *ol)
 {
     unsigned int offload = 0;
 
@@ -268,20 +267,24 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
         return;
     }
 
-    if (csum) {
+    if (ol->csum) {
         offload |= TUN_F_CSUM;
-        if (tso4)
+        if (ol->tso4) {
             offload |= TUN_F_TSO4;
-        if (tso6)
+        }
+        if (ol->tso6) {
             offload |= TUN_F_TSO6;
-        if ((tso4 || tso6) && ecn)
+        }
+        if ((ol->tso4 || ol->tso6) && ol->ecn) {
             offload |= TUN_F_TSO_ECN;
-        if (ufo)
+        }
+        if (ol->ufo) {
             offload |= TUN_F_UFO;
-        if (uso4) {
+        }
+        if (ol->uso4) {
             offload |= TUN_F_USO4;
         }
-        if (uso6) {
+        if (ol->uso6) {
             offload |= TUN_F_USO6;
         }
     }
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index b1aa40d46b..ac09ae03c0 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -27,6 +27,7 @@
 #include "tap_int.h"
 #include "qemu/ctype.h"
 #include "qemu/cutils.h"
+#include "net/net.h"
 
 #include <sys/ethernet.h>
 #include <sys/sockio.h>
@@ -240,8 +241,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
     return -EINVAL;
 }
 
-void tap_fd_set_offload(int fd, int csum, int tso4,
-                        int tso6, int ecn, int ufo, int uso4, int uso6)
+void tap_fd_set_offload(int fd, const NetOffloads *ol)
 {
 }
 
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 5f57d6baac..66abbbc392 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -71,8 +71,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
     return -EINVAL;
 }
 
-void tap_fd_set_offload(int fd, int csum, int tso4,
-                        int tso6, int ecn, int ufo, int uso4, int uso6)
+void tap_fd_set_offload(int fd, const NetOffloads *ol)
 {
 }
 
diff --git a/net/tap.c b/net/tap.c
index f6e8cd5f1c..c7612fb91b 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -271,15 +271,14 @@ static int tap_set_vnet_be(NetClientState *nc, bool is_be)
     return tap_fd_set_vnet_be(s->fd, is_be);
 }
 
-static void tap_set_offload(NetClientState *nc, int csum, int tso4,
-                     int tso6, int ecn, int ufo, int uso4, int uso6)
+static void tap_set_offload(NetClientState *nc, const NetOffloads *ol)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     if (s->fd < 0) {
         return;
     }
 
-    tap_fd_set_offload(s->fd, csum, tso4, tso6, ecn, ufo, uso4, uso6);
+    tap_fd_set_offload(s->fd, ol);
 }
 
 static void tap_exit_notify(Notifier *notifier, void *data)
@@ -365,6 +364,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
                                  int fd,
                                  int vnet_hdr)
 {
+    NetOffloads ol = {};
     NetClientState *nc;
     TAPState *s;
 
@@ -379,7 +379,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
     s->has_uso = tap_probe_has_uso(s->fd);
     s->has_tunnel = tap_probe_has_tunnel(s->fd);
     s->enabled = true;
-    tap_set_offload(&s->nc, 0, 0, 0, 0, 0, 0, 0);
+    tap_set_offload(&s->nc, &ol);
     /*
      * Make sure host header length is set correctly in tap:
      * it might have been modified by another instance of qemu.
diff --git a/net/tap_int.h b/net/tap_int.h
index 2a8aa3632f..327d10f68b 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -27,6 +27,7 @@
 #define NET_TAP_INT_H
 
 #include "qapi/qapi-types-net.h"
+#include "net/net.h"
 
 int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
              int vnet_hdr_required, int mq_required, Error **errp);
@@ -38,8 +39,7 @@ int tap_probe_vnet_hdr(int fd, Error **errp);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
 int tap_probe_has_tunnel(int fd);
-void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
-                        int uso4, int uso6);
+void tap_fd_set_offload(int fd, const NetOffloads *ol);
 void tap_fd_set_vnet_hdr_len(int fd, int len);
 int tap_fd_set_vnet_le(int fd, int vnet_is_le);
 int tap_fd_set_vnet_be(int fd, int vnet_is_be);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (13 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 14/16] net: bundle all offloads in a single struct Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-23  8:16   ` Akihiko Odaki
  2025-05-21 11:34 ` [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout Paolo Abeni
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

When any host or guest GSO over UDP tunnel offload is enabled the
virtio net header includes the additional tunnel-related fields,
update the size accordingly.

Push the GSO over UDP tunnel offloads all the way down to the tap
device extending the newly introduced NetFeatures struct, and
eventually enable the associated features.

As per virtio specification, to convert features bit to offload bit,
map the extended features into the reserved range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/virtio-net.c | 48 ++++++++++++++++++++++++++++++++++++++++-----
 include/net/net.h   |  2 ++
 net/net.c           |  7 ++++++-
 net/tap-linux.c     |  6 ++++++
 4 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 881877086e..758ceaffba 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -101,6 +101,27 @@
 #define VIRTIO_FEATURE_TO_OFFLOAD(fbit)  (fbit >= 64 ? \
                                           fbit - VIRTIO_O2F_DELTA : fbit)
 
+#ifdef CONFIG_INT128
+#define VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO \
+    VIRTIO_FEATURE_TO_OFFLOAD(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO)
+#define VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM \
+    VIRTIO_FEATURE_TO_OFFLOAD(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM)
+
+static bool virtio_has_tnl_hdr(virtio_features_t features)
+{
+    return virtio_has_feature_ex(features, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
+           virtio_has_feature_ex(features, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO);
+}
+
+#else
+
+static bool virtio_has_tnl_hdr(virtio_features_t features)
+{
+    return false;
+}
+
+#endif
+
 static const VirtIOFeature feature_sizes[] = {
     {.flags = 1ULL << VIRTIO_NET_F_MAC,
      .end = endof(struct virtio_net_config, mac)},
@@ -656,7 +677,8 @@ static int peer_has_tunnel(VirtIONet *n)
 }
 
 static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
-                                       int version_1, int hash_report)
+                                       int version_1, int hash_report,
+                                       int tnl)
 {
     int i;
     NetClientState *nc;
@@ -674,6 +696,9 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
             sizeof(struct virtio_net_hdr);
         n->rss_data.populate_hash = false;
     }
+    if (tnl) {
+        n->guest_hdr_len += sizeof(struct virtio_net_hdr_tunnel);
+    }
 
     for (i = 0; i < n->max_queue_pairs; i++) {
         nc = qemu_get_subqueue(n->nic, i);
@@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
        .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
        .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
        .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
+#ifdef CONFIG_INT128
+       .tnl  = !!(n->curr_guest_offloads &
+                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
+       .tnl_csum = !!(n->curr_guest_offloads &
+                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
+#endif
     };
 
     qemu_set_offload(qemu_get_queue(n->nic)->peer, &ol);
@@ -911,7 +942,12 @@ virtio_net_guest_offloads_by_features(virtio_features_t features)
         (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
         (1ULL << VIRTIO_NET_F_GUEST_UFO)  |
         (1ULL << VIRTIO_NET_F_GUEST_USO4) |
-        (1ULL << VIRTIO_NET_F_GUEST_USO6);
+        (1ULL << VIRTIO_NET_F_GUEST_USO6)
+#ifdef CONFIG_INT128
+        | (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)
+        | (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)
+#endif
+        ;
 
     return guest_offloads_mask & virtio_net_features_to_offload(features);
 }
@@ -1020,7 +1056,8 @@ static void virtio_net_set_features(VirtIODevice *vdev,
                                virtio_has_feature(features,
                                                   VIRTIO_F_VERSION_1),
                                virtio_has_feature(features,
-                                                  VIRTIO_NET_F_HASH_REPORT));
+                                                  VIRTIO_NET_F_HASH_REPORT),
+                               virtio_has_tnl_hdr(features));
 
     n->rsc4_enabled = virtio_has_feature(features, VIRTIO_NET_F_RSC_EXT) &&
         virtio_has_feature(features, VIRTIO_NET_F_GUEST_TSO4);
@@ -3139,7 +3176,8 @@ static int virtio_net_post_load_device(void *opaque, int version_id)
                                virtio_vdev_has_feature(vdev,
                                                        VIRTIO_F_VERSION_1),
                                virtio_vdev_has_feature(vdev,
-                                                       VIRTIO_NET_F_HASH_REPORT));
+                                                       VIRTIO_NET_F_HASH_REPORT),
+                               virtio_has_tnl_hdr(vdev->guest_features));
 
     /* MAC_TABLE_ENTRIES may be different from the saved image */
     if (n->mac_table.in_use > MAC_TABLE_ENTRIES) {
@@ -3946,7 +3984,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 
     n->vqs[0].tx_waiting = 0;
     n->tx_burst = n->net_conf.txburst;
-    virtio_net_set_mrg_rx_bufs(n, 0, 0, 0);
+    virtio_net_set_mrg_rx_bufs(n, 0, 0, 0, 0);
     n->promisc = 1; /* for compatibility */
 
     n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
diff --git a/include/net/net.h b/include/net/net.h
index c71d7c6074..5049d293f2 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -43,6 +43,8 @@ typedef struct NetOffloads {
     bool ufo;
     bool uso4;
     bool uso6;
+    bool tnl;
+    bool tnl_csum;
 } NetOffloads;
 
 #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
diff --git a/net/net.c b/net/net.c
index 5a2f00c108..bd41229407 100644
--- a/net/net.c
+++ b/net/net.c
@@ -569,13 +569,18 @@ int qemu_get_vnet_hdr_len(NetClientState *nc)
 
 void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
 {
+    int len_tnl = len - sizeof(struct virtio_net_hdr_tunnel);
+
     if (!nc || !nc->info->set_vnet_hdr_len) {
         return;
     }
 
     assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
+           len_tnl == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
            len == sizeof(struct virtio_net_hdr) ||
-           len == sizeof(struct virtio_net_hdr_v1_hash));
+           len_tnl == sizeof(struct virtio_net_hdr) ||
+           len == sizeof(struct virtio_net_hdr_v1_hash) ||
+           len_tnl == sizeof(struct virtio_net_hdr_v1_hash));
 
     nc->vnet_hdr_len = len;
     nc->info->set_vnet_hdr_len(nc, len);
diff --git a/net/tap-linux.c b/net/tap-linux.c
index aa5f3a6e22..b7662ece63 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -287,6 +287,12 @@ void tap_fd_set_offload(int fd, const NetOffloads *ol)
         if (ol->uso6) {
             offload |= TUN_F_USO6;
         }
+        if ((ol->tso4 || ol->tso6 || ol->uso4 || ol->uso6) && ol->tnl) {
+            offload |= TUN_F_UDP_TUNNEL_GSO;
+        }
+        if ((offload & TUN_F_UDP_TUNNEL_GSO) && ol->tnl_csum) {
+            offload |= TUN_F_UDP_TUNNEL_GSO_CSUM;
+        }
     }
 
     if (ioctl(fd, TUNSETOFFLOAD, offload) != 0) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (14 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 15/16] net: implement tnl feature offloading Paolo Abeni
@ 2025-05-21 11:34 ` Paolo Abeni
  2025-05-23  8:22   ` Akihiko Odaki
  2025-05-23  7:19 ` [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Akihiko Odaki
  2025-06-17 15:01 ` Paolo Abeni
  17 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

When the GSO over UDP tunnel offload is enabled, the virtio net
header includes additional fields to support such offload.

The vhost backend must be aware of the exact header layout, to
copy it correctly. The tunnel-related field are present if either
the guest or the host negotiated any UDP tunnel related feature:
add them to host kernel supported features list, to allow qemu
transder to such backend the needed information.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 hw/net/vhost_net.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 58d7619fc8..c8e02d1732 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -52,6 +52,10 @@ static const int kernel_feature_bits[] = {
     VIRTIO_F_NOTIFICATION_DATA,
     VIRTIO_NET_F_RSC_EXT,
     VIRTIO_NET_F_HASH_REPORT,
+#ifdef CONFIG_INT128
+    VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
+    VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
+#endif
     VHOST_INVALID_FEATURE_BIT
 };
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (15 preceding siblings ...)
  2025-05-21 11:34 ` [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout Paolo Abeni
@ 2025-05-23  7:19 ` Akihiko Odaki
  2025-05-23  9:43   ` Paolo Abeni
  2025-06-17 15:01 ` Paolo Abeni
  17 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  7:19 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:33, Paolo Abeni wrote:
> Some virtualized deployments use UDP tunnel pervasively and are impacted
> negatively by the lack of GSO support for such kind of traffic in the
> virtual NIC driver.
> 
> The virtio_net specification recently introduced support for GSO over
> UDP tunnel, this series updates the virtio implementation to support
> such a feature.
> 
> One of the reasons for the RFC tag is that the kernel-side
> implementation has just been shared upstream and is not merged yet, but
> there are also other relevant reasons, see below.
> 
> Currently, the kernel virtio support limits the feature space to 64 bits,
> while the virtio specification allows for a larger number of features.
> Specifically, the GSO-over-UDP-tunnel-related virtio features use bits
> 65-69; the larger part of this series (patches 2-11) actually deals with
> the extended feature space.
> 
> I tried to minimize the otherwise very large code churn by limiting the
> extended features support to arches with native 128 integer support and
> introducing the extended features space support only in virtio/vhost
> core and in the relevant device driver.

What about adding another 64-bit integer to hold the high bits? It makes 
adding the 128-bit integer type to VMState and properties and 
CONFIG_INT128 checks unnecessary.

> 
> The actual offload implementation is in patches 12-16 and boils down to
> propagating the new offload to the tun devices and the vhost backend.
> 
> Tested with basic stream transfer with all the possible permutations of
> host kernel/qemu/guest kernel with/without GSO over UDP tunnel support
> and vs snapshots creation and restore.
> 
> Notably this does not include (yet) any additional tests. Some guidance
> on such matter would be really appreciated, and any feedback about the
> features extension strategy would be more than welcome!

My proposal to add a feature to tap devices[1] simply omitted tests and 
I wrote simple testing scripts for my personal usage. As you can see, 
there is no testing code that covers tap devices, unfortunately, and I 
think adding one takes significant effort.

[1] https://patchew.org/QEMU/20250313-hash-v4-0-c75c494b495e@daynix.com/

> 
> Paolo Abeni (16):
>    linux-headers: Update to Linux v6.15-rc net-next
>    migration: introduce support for 128 bit int state.
>    virtio: introduce extended features type
>    virtio: serialize extended features state
>    qmp: update virtio features map to support extended features
>    virtio: add support for negotiating extended features.
>    virtio-pci: implement support for extended features.
>    vhost: add support for negotiating extended features.
>    vhost-backend: implement extended features support.
>    vhost-net: implement extended features support.
>    qdev-properties: add property for extended virtio features
>    virtio-net: implement extended features support.
>    net: implement tunnel probing
>    net: bundle all offloads in a single struct
>    net: implement tnl feature offloading
>    net: make vhost-net aware of GSO over UDP tunnel hdr layout
> 
>   hw/core/qdev-properties.c                     |  46 +++++
>   hw/net/e1000e_core.c                          |   5 +-
>   hw/net/igb_core.c                             |   5 +-
>   hw/net/vhost_net-stub.c                       |   7 +-
>   hw/net/vhost_net.c                            |  35 ++--
>   hw/net/virtio-net.c                           | 195 +++++++++++++-----
>   hw/net/vmxnet3.c                              |  13 +-
>   hw/virtio/vhost-backend.c                     |  59 +++++-
>   hw/virtio/vhost.c                             |  58 ++++--
>   hw/virtio/virtio-bus.c                        |  15 +-
>   hw/virtio/virtio-hmp-cmds.c                   |   3 +-
>   hw/virtio/virtio-pci.c                        |  19 +-
>   hw/virtio/virtio-qmp.c                        |  28 ++-
>   hw/virtio/virtio-qmp.h                        |   3 +-
>   hw/virtio/virtio.c                            | 103 ++++++++-
>   include/hw/qdev-properties.h                  |  13 ++
>   include/hw/virtio/vhost-backend.h             |  10 +
>   include/hw/virtio/vhost.h                     |  13 +-
>   include/hw/virtio/virtio-features.h           |  90 ++++++++
>   include/hw/virtio/virtio-net.h                |   2 +-
>   include/hw/virtio/virtio-pci.h                |   2 +-
>   include/hw/virtio/virtio.h                    |  17 +-
>   include/migration/qemu-file-types.h           |  15 ++
>   include/migration/vmstate.h                   |  11 +
>   include/net/net.h                             |  20 +-
>   include/net/vhost_net.h                       |   8 +-
>   include/standard-headers/asm-x86/setup_data.h |   4 +-
>   include/standard-headers/drm/drm_fourcc.h     |  41 ++++
>   include/standard-headers/linux/const.h        |   2 +-
>   include/standard-headers/linux/ethtool.h      | 156 ++++++++------
>   include/standard-headers/linux/fuse.h         |  12 +-
>   include/standard-headers/linux/pci_regs.h     |  13 +-
>   include/standard-headers/linux/virtio_net.h   |  46 +++++
>   include/standard-headers/linux/virtio_pci.h   |   1 +
>   include/standard-headers/linux/virtio_snd.h   |   2 +-
>   linux-headers/asm-arm64/kvm.h                 |  11 +
>   linux-headers/asm-arm64/unistd_64.h           |   1 +
>   linux-headers/asm-generic/mman-common.h       |   1 +
>   linux-headers/asm-generic/unistd.h            |   4 +-
>   linux-headers/asm-loongarch/unistd_64.h       |   1 +
>   linux-headers/asm-mips/unistd_n32.h           |   1 +
>   linux-headers/asm-mips/unistd_n64.h           |   1 +
>   linux-headers/asm-mips/unistd_o32.h           |   1 +
>   linux-headers/asm-powerpc/unistd_32.h         |   1 +
>   linux-headers/asm-powerpc/unistd_64.h         |   1 +
>   linux-headers/asm-riscv/kvm.h                 |   2 +
>   linux-headers/asm-riscv/unistd_32.h           |   1 +
>   linux-headers/asm-riscv/unistd_64.h           |   1 +
>   linux-headers/asm-s390/unistd_32.h            |   1 +
>   linux-headers/asm-s390/unistd_64.h            |   1 +
>   linux-headers/asm-x86/kvm.h                   |   3 +
>   linux-headers/asm-x86/unistd_32.h             |   1 +
>   linux-headers/asm-x86/unistd_64.h             |   1 +
>   linux-headers/asm-x86/unistd_x32.h            |   1 +
>   linux-headers/linux/bits.h                    |   8 +-
>   linux-headers/linux/const.h                   |   2 +-
>   linux-headers/linux/iommufd.h                 | 129 +++++++++++-
>   linux-headers/linux/kvm.h                     |   1 +
>   linux-headers/linux/psp-sev.h                 |  21 +-
>   linux-headers/linux/stddef.h                  |   2 +
>   linux-headers/linux/vfio.h                    |  30 ++-
>   linux-headers/linux/vhost.h                   |  12 +-
>   migration/qemu-file.c                         |  16 ++
>   migration/vmstate-types.c                     |  25 +++
>   net/net.c                                     |  21 +-
>   net/netmap.c                                  |   3 +-
>   net/tap-bsd.c                                 |   8 +-
>   net/tap-linux.c                               |  46 ++++-
>   net/tap-solaris.c                             |   9 +-
>   net/tap-stub.c                                |   8 +-
>   net/tap.c                                     |  19 +-
>   net/tap_int.h                                 |   5 +-
>   qapi/virtio.json                              |   8 +-
>   73 files changed, 1209 insertions(+), 271 deletions(-)
>   create mode 100644 include/hw/virtio/virtio-features.h
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 07/16] virtio-pci: implement support for extended features.
  2025-05-21 11:34 ` [PATCH RFC 07/16] virtio-pci: implement support for " Paolo Abeni
@ 2025-05-23  7:23   ` Akihiko Odaki
  2025-05-23  9:52     ` Paolo Abeni
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  7:23 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

Having a period in the subject is unusual.

On 2025/05/21 20:34, Paolo Abeni wrote:
> Allow the common read/write operation to access all the
> available features space.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   hw/virtio/virtio-pci.c         | 19 +++++++++++++------
>   include/hw/virtio/virtio-pci.h |  2 +-
>   2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 0fa8fe4955..7815ef2d9b 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -123,7 +123,8 @@ static const VMStateDescription vmstate_virtio_pci_modern_state_sub = {
>       .fields = (const VMStateField[]) {
>           VMSTATE_UINT32(dfselect, VirtIOPCIProxy),
>           VMSTATE_UINT32(gfselect, VirtIOPCIProxy),
> -        VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy, 2),
> +        VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy,
> +                             VIRTIO_FEATURES_WORDS),

Modifying existing fields breaks migration across versions. Please refer 
to docs/devel/migration/main.rst for details.

>           VMSTATE_STRUCT_ARRAY(vqs, VirtIOPCIProxy, VIRTIO_QUEUE_MAX, 0,
>                                vmstate_virtio_pci_modern_queue_state,
>                                VirtIOPCIQueue),
> @@ -1490,10 +1491,10 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
>           val = proxy->dfselect;
>           break;
>       case VIRTIO_PCI_COMMON_DF:
> -        if (proxy->dfselect <= 1) {
> +        if (proxy->dfselect < VIRTIO_FEATURES_WORDS) {
>               VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
>   
> -            val = (vdev->host_features & ~vdc->legacy_features) >>
> +            val = (vdev->host_features_ex & ~vdc->legacy_features) >>
>                   (32 * proxy->dfselect);
>           }
>           break;
> @@ -1585,10 +1586,16 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>           break;
>       case VIRTIO_PCI_COMMON_GF:
>           if (proxy->gfselect < ARRAY_SIZE(proxy->guest_features)) {
> +            virtio_features_t features = 0;
> +            int i;
> +
>               proxy->guest_features[proxy->gfselect] = val;
> -            virtio_set_features(vdev,
> -                                (((uint64_t)proxy->guest_features[1]) << 32) |
> -                                proxy->guest_features[0]);
> +            for (i = 0; i < VIRTIO_FEATURES_WORDS; ++i) {
> +                virtio_features_t cur = proxy->guest_features[i];
> +
> +                features |= cur << (i * 32);
> +            }
> +            virtio_set_features(vdev, features);
>           }
>           break;
>       case VIRTIO_PCI_COMMON_MSIX:
> diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
> index 31ec144509..c20b289e64 100644
> --- a/include/hw/virtio/virtio-pci.h
> +++ b/include/hw/virtio/virtio-pci.h
> @@ -165,7 +165,7 @@ struct VirtIOPCIProxy {
>       uint32_t nvectors;
>       uint32_t dfselect;
>       uint32_t gfselect;
> -    uint32_t guest_features[2];
> +    uint32_t guest_features[VIRTIO_FEATURES_WORDS];
>       VirtIOPCIQueue vqs[VIRTIO_QUEUE_MAX];
>   
>       VirtIOIRQFD *vector_irqfd;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 13/16] net: implement tunnel probing
  2025-05-21 11:34 ` [PATCH RFC 13/16] net: implement tunnel probing Paolo Abeni
@ 2025-05-23  7:39   ` Akihiko Odaki
  2025-05-23 10:24     ` Paolo Abeni
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  7:39 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:34, Paolo Abeni wrote:
> Tap devices support GSO over UDP tunnel offload. Probe for such
> feature in a similar manner to other offloads.
> 
> GSO over UDP tunnel needs to be enabled in addition to  a "plain"
> offload (TSO or USO).
> 
> No need to check separately for the outer header checksum offload:
> the kernel is going to support both of them or none.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   hw/net/virtio-net.c | 39 +++++++++++++++++++++++++++++++++++++++
>   include/net/net.h   |  3 +++
>   net/net.c           |  9 +++++++++
>   net/tap-bsd.c       |  5 +++++
>   net/tap-linux.c     | 19 +++++++++++++++++++
>   net/tap-solaris.c   |  5 +++++
>   net/tap-stub.c      |  5 +++++
>   net/tap.c           | 11 +++++++++++
>   net/tap_int.h       |  1 +
>   9 files changed, 97 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 193469fc27..05cf23700f 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -646,6 +646,15 @@ static int peer_has_uso(VirtIONet *n)
>       return qemu_has_uso(qemu_get_queue(n->nic)->peer);
>   }
>   
> +static int peer_has_tunnel(VirtIONet *n)

Let's make this return a bool. Preceding functions like 
peer_has_vnet_hdr() returns int, but I think it's better to follow the 
convention more common across the codebase.

> +{
> +    if (!peer_has_vnet_hdr(n)) {
> +        return 0;
> +    }
> +
> +    return qemu_has_tunnel(qemu_get_queue(n->nic)->peer);
> +}
> +
>   static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
>                                          int version_1, int hash_report)
>   {
> @@ -789,6 +798,15 @@ static virtio_features_t virtio_net_get_features(VirtIODevice *vdev,
>           virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO4);
>           virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
>   
> +#ifdef CONFIG_INT128
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO);
> +        virtio_clear_feature_ex(&features,
> +                                VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM);
> +        virtio_clear_feature_ex(&features,
> +                                VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM);
> +#endif
> +
>           virtio_clear_feature_ex(&features, VIRTIO_NET_F_HASH_REPORT);
>       }
>   
> @@ -803,6 +821,17 @@ static virtio_features_t virtio_net_get_features(VirtIODevice *vdev,
>           virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
>       }
>   
> +#ifdef CONFIG_INT128
> +    if (!peer_has_tunnel(n)) {
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO);
> +        virtio_clear_feature_ex(&features,
> +                                VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM);
> +        virtio_clear_feature_ex(&features,
> +                                VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM);
> +    }
> +#endif
> +
>       if (!get_vhost_net(nc->peer)) {
>           return features;
>       }
> @@ -4153,6 +4182,16 @@ static const Property virtio_net_properties[] = {
>                         VIRTIO_NET_F_GUEST_USO6, true),
>       DEFINE_PROP_BIT64("host_uso", VirtIONet, host_features,
>                         VIRTIO_NET_F_HOST_USO, true),
> +#ifdef CONFIG_INT128
> +    DEFINE_PROP_BITVF("host_tunnel", VirtIONet, host_features_ex,
> +                      VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO, true),
> +    DEFINE_PROP_BITVF("host_tunnel_csum", VirtIONet, host_features_ex,
> +                      VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM, true),
> +    DEFINE_PROP_BITVF("guest_tunnel", VirtIONet, host_features_ex,
> +                      VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO, true),
> +    DEFINE_PROP_BITVF("guest_tunnel_csum", VirtIONet, host_features_ex,
> +                      VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM, true),
> +#endif
>   };
>   
>   static void virtio_net_class_init(ObjectClass *klass, const void *data)
> diff --git a/include/net/net.h b/include/net/net.h
> index cdd5b109b0..391d983e49 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -55,6 +55,7 @@ typedef void (NetClientDestructor)(NetClientState *);
>   typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
>   typedef bool (HasUfo)(NetClientState *);
>   typedef bool (HasUso)(NetClientState *);
> +typedef bool (HasTunnel)(NetClientState *);
>   typedef bool (HasVnetHdr)(NetClientState *);
>   typedef bool (HasVnetHdrLen)(NetClientState *, int);
>   typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
> @@ -83,6 +84,7 @@ typedef struct NetClientInfo {
>       NetPoll *poll;
>       HasUfo *has_ufo;
>       HasUso *has_uso;
> +    HasTunnel *has_tunnel;
>       HasVnetHdr *has_vnet_hdr;
>       HasVnetHdrLen *has_vnet_hdr_len;
>       SetOffload *set_offload;
> @@ -183,6 +185,7 @@ void qemu_set_info_str(NetClientState *nc,
>   void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
>   bool qemu_has_ufo(NetClientState *nc);
>   bool qemu_has_uso(NetClientState *nc);
> +bool qemu_has_tunnel(NetClientState *nc);
>   bool qemu_has_vnet_hdr(NetClientState *nc);
>   bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
>   void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
> diff --git a/net/net.c b/net/net.c
> index 39d6f28158..9c83d3b137 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -522,6 +522,15 @@ bool qemu_has_uso(NetClientState *nc)
>       return nc->info->has_uso(nc);
>   }
>   
> +bool qemu_has_tunnel(NetClientState *nc)
> +{
> +    if (!nc || !nc->info->has_tunnel) {
> +        return false;
> +    }
> +
> +    return nc->info->has_tunnel(nc);
> +}
> +
>   bool qemu_has_vnet_hdr(NetClientState *nc)
>   {
>       if (!nc || !nc->info->has_vnet_hdr) {
> diff --git a/net/tap-bsd.c b/net/tap-bsd.c
> index b4c84441ba..3f01c8921e 100644
> --- a/net/tap-bsd.c
> +++ b/net/tap-bsd.c
> @@ -217,6 +217,11 @@ int tap_probe_has_uso(int fd)
>       return 0;
>   }
>   
> +int tap_probe_has_tunnel(int fd)
> +{
> +    return 0;
> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>   }
> diff --git a/net/tap-linux.c b/net/tap-linux.c
> index 22ec2f45d2..2df601551e 100644
> --- a/net/tap-linux.c
> +++ b/net/tap-linux.c
> @@ -37,6 +37,14 @@
>   
>   #define PATH_NET_TUN "/dev/net/tun"
>   
> +#ifndef TUN_F_UDP_TUNNEL_GSO
> +#define TUN_F_UDP_TUNNEL_GSO       0x080
> +#endif
> +
> +#ifndef TUN_F_UDP_TUNNEL_GSO_CSUM
> +#define TUN_F_UDP_TUNNEL_GSO_CSUM  0x100
> +#endif
> +

These should be added to net/tap-linux.h, which contains other UAPI 
definitions.

But perhaps it may be better to refactor it to add the real header file 
using scripts/update-linux-headers.sh. Such a refactoring can be done 
before this series gets ready to merge and will make this series a bit 
smaller.

>   int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>                int vnet_hdr_required, int mq_required, Error **errp)
>   {
> @@ -196,6 +204,17 @@ int tap_probe_has_uso(int fd)
>       return 1;
>   }
>   
> +int tap_probe_has_tunnel(int fd)
> +{
> +    unsigned offload;
> +
> +    offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_UDP_TUNNEL_GSO;
> +    if (ioctl(fd, TUNSETOFFLOAD, offload) < 0) {
> +        return 0;
> +    }
> +    return 1;
> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>       if (ioctl(fd, TUNSETVNETHDRSZ, &len) == -1) {
> diff --git a/net/tap-solaris.c b/net/tap-solaris.c
> index 51b7830bef..b1aa40d46b 100644
> --- a/net/tap-solaris.c
> +++ b/net/tap-solaris.c
> @@ -221,6 +221,11 @@ int tap_probe_has_uso(int fd)
>       return 0;
>   }
>   
> +int tap_probe_has_tunnel(int fd)
> +{
> +    return 0;
> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>   }
> diff --git a/net/tap-stub.c b/net/tap-stub.c
> index 38673434cb..5f57d6baac 100644
> --- a/net/tap-stub.c
> +++ b/net/tap-stub.c
> @@ -52,6 +52,11 @@ int tap_probe_has_uso(int fd)
>       return 0;
>   }
>   
> +int tap_probe_has_tunnel(int fd)
> +{
> +    return 0;
> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>   }
> diff --git a/net/tap.c b/net/tap.c
> index ae1c7e3983..f6e8cd5f1c 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -58,6 +58,7 @@ typedef struct TAPState {
>       bool using_vnet_hdr;
>       bool has_ufo;
>       bool has_uso;
> +    bool has_tunnel;
>       bool enabled;
>       VHostNetState *vhost_net;
>       unsigned host_vnet_hdr_len;
> @@ -223,6 +224,14 @@ static bool tap_has_uso(NetClientState *nc)
>       return s->has_uso;
>   }
>   
> +static bool tap_has_tunnel(NetClientState *nc)
> +{
> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
> +    return s->has_tunnel;
> +}
> +
>   static bool tap_has_vnet_hdr(NetClientState *nc)
>   {
>       TAPState *s = DO_UPCAST(TAPState, nc, nc);
> @@ -340,6 +349,7 @@ static NetClientInfo net_tap_info = {
>       .cleanup = tap_cleanup,
>       .has_ufo = tap_has_ufo,
>       .has_uso = tap_has_uso,
> +    .has_tunnel = tap_has_tunnel,
>       .has_vnet_hdr = tap_has_vnet_hdr,
>       .has_vnet_hdr_len = tap_has_vnet_hdr_len,
>       .set_offload = tap_set_offload,
> @@ -367,6 +377,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>       s->using_vnet_hdr = false;
>       s->has_ufo = tap_probe_has_ufo(s->fd);
>       s->has_uso = tap_probe_has_uso(s->fd);
> +    s->has_tunnel = tap_probe_has_tunnel(s->fd);
>       s->enabled = true;
>       tap_set_offload(&s->nc, 0, 0, 0, 0, 0, 0, 0);
>       /*
> diff --git a/net/tap_int.h b/net/tap_int.h
> index 8857ff299d..2a8aa3632f 100644
> --- a/net/tap_int.h
> +++ b/net/tap_int.h
> @@ -37,6 +37,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
>   int tap_probe_vnet_hdr(int fd, Error **errp);
>   int tap_probe_has_ufo(int fd);
>   int tap_probe_has_uso(int fd);
> +int tap_probe_has_tunnel(int fd);
>   void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
>                           int uso4, int uso6);
>   void tap_fd_set_vnet_hdr_len(int fd, int len);



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 14/16] net: bundle all offloads in a single struct
  2025-05-21 11:34 ` [PATCH RFC 14/16] net: bundle all offloads in a single struct Paolo Abeni
@ 2025-05-23  7:45   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  7:45 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:34, Paolo Abeni wrote:
> The set_offload() argument list is already pretty long and
> we are going to introduce soon a bunch of additional offloads.

I sugguest posting this patch separately so that it can be merged 
ealier. This series can refer to it using Based-on: tag, which is 
explained with: docs/devel/submitting-a-patch.rst

> 
> Replace the offload arguments with a single struct and update
> all the relevant call-sites.
> 
> No functional changes intended.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   hw/net/e1000e_core.c |  5 +++--
>   hw/net/igb_core.c    |  5 +++--
>   hw/net/virtio-net.c  | 19 +++++++++++--------
>   hw/net/vmxnet3.c     | 13 +++++--------
>   include/net/net.h    | 15 ++++++++++++---
>   net/net.c            |  5 ++---
>   net/netmap.c         |  3 +--
>   net/tap-bsd.c        |  3 +--
>   net/tap-linux.c      | 21 ++++++++++++---------
>   net/tap-solaris.c    |  4 ++--
>   net/tap-stub.c       |  3 +--
>   net/tap.c            |  8 ++++----
>   net/tap_int.h        |  4 ++--
>   13 files changed, 59 insertions(+), 49 deletions(-)
> 
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index 2413858790..ec90869e56 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -2827,8 +2827,9 @@ e1000e_update_rx_offloads(E1000ECore *core)
>       trace_e1000e_rx_set_cso(cso_state);
>   
>       if (core->has_vnet) {
> -        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer,
> -                         cso_state, 0, 0, 0, 0, 0, 0);
> +        struct NetOffloads ol = {.csum = cso_state };

Please omit "struct".

> +
> +        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer, &ol);
>       }
>   }
>   
> diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
> index 39e3ce1c8f..e940d3a8e2 100644
> --- a/hw/net/igb_core.c
> +++ b/hw/net/igb_core.c
> @@ -3058,8 +3058,9 @@ igb_update_rx_offloads(IGBCore *core)
>       trace_e1000e_rx_set_cso(cso_state);
>   
>       if (core->has_vnet) {
> -        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer,
> -                         cso_state, 0, 0, 0, 0, 0, 0);
> +        struct NetOffloads ol = {.csum = cso_state };
> +
> +        qemu_set_offload(qemu_get_queue(core->owner_nic)->peer, &ol);
>       }
>   }
>   
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 05cf23700f..881877086e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -882,14 +882,17 @@ static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
>   
>   static void virtio_net_apply_guest_offloads(VirtIONet *n)
>   {
> -    qemu_set_offload(qemu_get_queue(n->nic)->peer,
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_CSUM)),
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO4)),
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO6)),
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_ECN)),
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
> -            !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)));
> +    NetOffloads ol = {
> +       .csum = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_CSUM)),
> +       .tso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO4)),
> +       .tso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_TSO6)),
> +       .ecn  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_ECN)),
> +       .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
> +       .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
> +       .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
> +    };
> +
> +    qemu_set_offload(qemu_get_queue(n->nic)->peer, &ol);
>   }
>   
>   static uint64_t virtio_net_features_to_offload(virtio_features_t features)
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 83d942af17..dbacb4aa18 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -1334,14 +1334,11 @@ static void vmxnet3_update_features(VMXNET3State *s)
>                 s->lro_supported, rxcso_supported,
>                 s->rx_vlan_stripping);
>       if (s->peer_has_vhdr) {
> -        qemu_set_offload(qemu_get_queue(s->nic)->peer,
> -                         rxcso_supported,
> -                         s->lro_supported,
> -                         s->lro_supported,
> -                         0,
> -                         0,
> -                         0,
> -                         0);
> +        struct NetOffloads ol = { .csum = rxcso_supported,
> +                                  .tso4 = s->lro_supported,
> +                                  .tso6 = s->lro_supported };
> +
> +        qemu_set_offload(qemu_get_queue(s->nic)->peer, &ol);
>       }
>   }
>   
> diff --git a/include/net/net.h b/include/net/net.h
> index 391d983e49..c71d7c6074 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -35,6 +35,16 @@ typedef struct NICConf {
>       int32_t bootindex;
>   } NICConf;
>   
> +typedef struct NetOffloads {
> +    bool csum;
> +    bool tso4;
> +    bool tso6;
> +    bool ecn;
> +    bool ufo;
> +    bool uso4;
> +    bool uso6;
> +} NetOffloads;
> +
>   #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
>       DEFINE_PROP_MACADDR("mac",   _state, _conf.macaddr),                \
>       DEFINE_PROP_NETDEV("netdev", _state, _conf.peers)
> @@ -58,7 +68,7 @@ typedef bool (HasUso)(NetClientState *);
>   typedef bool (HasTunnel)(NetClientState *);
>   typedef bool (HasVnetHdr)(NetClientState *);
>   typedef bool (HasVnetHdrLen)(NetClientState *, int);
> -typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
> +typedef void (SetOffload)(NetClientState *, const NetOffloads *);
>   typedef int (GetVnetHdrLen)(NetClientState *);
>   typedef void (SetVnetHdrLen)(NetClientState *, int);
>   typedef int (SetVnetLE)(NetClientState *, bool);
> @@ -188,8 +198,7 @@ bool qemu_has_uso(NetClientState *nc);
>   bool qemu_has_tunnel(NetClientState *nc);
>   bool qemu_has_vnet_hdr(NetClientState *nc);
>   bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
> -void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
> -                      int ecn, int ufo, int uso4, int uso6);
> +void qemu_set_offload(NetClientState *nc, const NetOffloads *ol);
>   int qemu_get_vnet_hdr_len(NetClientState *nc);
>   void qemu_set_vnet_hdr_len(NetClientState *nc, int len);
>   int qemu_set_vnet_le(NetClientState *nc, bool is_le);
> diff --git a/net/net.c b/net/net.c
> index 9c83d3b137..5a2f00c108 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -549,14 +549,13 @@ bool qemu_has_vnet_hdr_len(NetClientState *nc, int len)
>       return nc->info->has_vnet_hdr_len(nc, len);
>   }
>   
> -void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
> -                          int ecn, int ufo, int uso4, int uso6)
> +void qemu_set_offload(NetClientState *nc, const NetOffloads *ol)
>   {
>       if (!nc || !nc->info->set_offload) {
>           return;
>       }
>   
> -    nc->info->set_offload(nc, csum, tso4, tso6, ecn, ufo, uso4, uso6);
> +    nc->info->set_offload(nc, ol);
>   }
>   
>   int qemu_get_vnet_hdr_len(NetClientState *nc)
> diff --git a/net/netmap.c b/net/netmap.c
> index 297510e190..6cd8f2bdc5 100644
> --- a/net/netmap.c
> +++ b/net/netmap.c
> @@ -366,8 +366,7 @@ static void netmap_set_vnet_hdr_len(NetClientState *nc, int len)
>       }
>   }
>   
> -static void netmap_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
> -                               int ecn, int ufo, int uso4, int uso6)
> +static void netmap_set_offload(NetClientState *nc, const NetOffloads *ol)
>   {
>       NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
>   
> diff --git a/net/tap-bsd.c b/net/tap-bsd.c
> index 3f01c8921e..e7de0672f4 100644
> --- a/net/tap-bsd.c
> +++ b/net/tap-bsd.c
> @@ -236,8 +236,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
>       return -EINVAL;
>   }
>   
> -void tap_fd_set_offload(int fd, int csum, int tso4,
> -                        int tso6, int ecn, int ufo, int uso4, int uso6)
> +void tap_fd_set_offload(int fd, const NetOffloads *ol)
>   {
>   }
>   
> diff --git a/net/tap-linux.c b/net/tap-linux.c
> index 2df601551e..aa5f3a6e22 100644
> --- a/net/tap-linux.c
> +++ b/net/tap-linux.c
> @@ -258,8 +258,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
>       abort();
>   }
>   
> -void tap_fd_set_offload(int fd, int csum, int tso4,
> -                        int tso6, int ecn, int ufo, int uso4, int uso6)
> +void tap_fd_set_offload(int fd, const NetOffloads *ol)
>   {
>       unsigned int offload = 0;
>   
> @@ -268,20 +267,24 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>           return;
>       }
>   
> -    if (csum) {
> +    if (ol->csum) {
>           offload |= TUN_F_CSUM;
> -        if (tso4)
> +        if (ol->tso4) {
>               offload |= TUN_F_TSO4;
> -        if (tso6)
> +        }
> +        if (ol->tso6) {
>               offload |= TUN_F_TSO6;
> -        if ((tso4 || tso6) && ecn)
> +        }
> +        if ((ol->tso4 || ol->tso6) && ol->ecn) {
>               offload |= TUN_F_TSO_ECN;
> -        if (ufo)
> +        }
> +        if (ol->ufo) {
>               offload |= TUN_F_UFO;
> -        if (uso4) {
> +        }
> +        if (ol->uso4) {
>               offload |= TUN_F_USO4;
>           }
> -        if (uso6) {
> +        if (ol->uso6) {
>               offload |= TUN_F_USO6;
>           }
>       }
> diff --git a/net/tap-solaris.c b/net/tap-solaris.c
> index b1aa40d46b..ac09ae03c0 100644
> --- a/net/tap-solaris.c
> +++ b/net/tap-solaris.c
> @@ -27,6 +27,7 @@
>   #include "tap_int.h"
>   #include "qemu/ctype.h"
>   #include "qemu/cutils.h"
> +#include "net/net.h"
>   
>   #include <sys/ethernet.h>
>   #include <sys/sockio.h>
> @@ -240,8 +241,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
>       return -EINVAL;
>   }
>   
> -void tap_fd_set_offload(int fd, int csum, int tso4,
> -                        int tso6, int ecn, int ufo, int uso4, int uso6)
> +void tap_fd_set_offload(int fd, const NetOffloads *ol)
>   {
>   }
>   
> diff --git a/net/tap-stub.c b/net/tap-stub.c
> index 5f57d6baac..66abbbc392 100644
> --- a/net/tap-stub.c
> +++ b/net/tap-stub.c
> @@ -71,8 +71,7 @@ int tap_fd_set_vnet_be(int fd, int is_be)
>       return -EINVAL;
>   }
>   
> -void tap_fd_set_offload(int fd, int csum, int tso4,
> -                        int tso6, int ecn, int ufo, int uso4, int uso6)
> +void tap_fd_set_offload(int fd, const NetOffloads *ol)
>   {
>   }
>   
> diff --git a/net/tap.c b/net/tap.c
> index f6e8cd5f1c..c7612fb91b 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -271,15 +271,14 @@ static int tap_set_vnet_be(NetClientState *nc, bool is_be)
>       return tap_fd_set_vnet_be(s->fd, is_be);
>   }
>   
> -static void tap_set_offload(NetClientState *nc, int csum, int tso4,
> -                     int tso6, int ecn, int ufo, int uso4, int uso6)
> +static void tap_set_offload(NetClientState *nc, const NetOffloads *ol)
>   {
>       TAPState *s = DO_UPCAST(TAPState, nc, nc);
>       if (s->fd < 0) {
>           return;
>       }
>   
> -    tap_fd_set_offload(s->fd, csum, tso4, tso6, ecn, ufo, uso4, uso6);
> +    tap_fd_set_offload(s->fd, ol);
>   }
>   
>   static void tap_exit_notify(Notifier *notifier, void *data)
> @@ -365,6 +364,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>                                    int fd,
>                                    int vnet_hdr)
>   {
> +    NetOffloads ol = {};
>       NetClientState *nc;
>       TAPState *s;
>   
> @@ -379,7 +379,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>       s->has_uso = tap_probe_has_uso(s->fd);
>       s->has_tunnel = tap_probe_has_tunnel(s->fd);
>       s->enabled = true;
> -    tap_set_offload(&s->nc, 0, 0, 0, 0, 0, 0, 0);
> +    tap_set_offload(&s->nc, &ol);
>       /*
>        * Make sure host header length is set correctly in tap:
>        * it might have been modified by another instance of qemu.
> diff --git a/net/tap_int.h b/net/tap_int.h
> index 2a8aa3632f..327d10f68b 100644
> --- a/net/tap_int.h
> +++ b/net/tap_int.h
> @@ -27,6 +27,7 @@
>   #define NET_TAP_INT_H
>   
>   #include "qapi/qapi-types-net.h"
> +#include "net/net.h"
>   
>   int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>                int vnet_hdr_required, int mq_required, Error **errp);
> @@ -38,8 +39,7 @@ int tap_probe_vnet_hdr(int fd, Error **errp);
>   int tap_probe_has_ufo(int fd);
>   int tap_probe_has_uso(int fd);
>   int tap_probe_has_tunnel(int fd);
> -void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
> -                        int uso4, int uso6);
> +void tap_fd_set_offload(int fd, const NetOffloads *ol);
>   void tap_fd_set_vnet_hdr_len(int fd, int len);
>   int tap_fd_set_vnet_le(int fd, int vnet_is_le);
>   int tap_fd_set_vnet_be(int fd, int vnet_is_be);



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 12/16] virtio-net: implement extended features support.
  2025-05-21 11:34 ` [PATCH RFC 12/16] virtio-net: implement extended features support Paolo Abeni
@ 2025-05-23  8:09   ` Akihiko Odaki
  2025-05-23 10:01     ` Paolo Abeni
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  8:09 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:34, Paolo Abeni wrote:
> Use the extended types and helpers to manipulate the virtio_net
> features.
> 
> Note that offloads are still 64bits wide, as per specification,
> and extended offloads will be mapped into such range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   hw/net/virtio-net.c            | 87 +++++++++++++++++++++-------------
>   include/hw/virtio/virtio-net.h |  2 +-
>   2 files changed, 55 insertions(+), 34 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9f500c64e7..193469fc27 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -90,6 +90,17 @@
>                                            VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
>                                            VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
>   
> +#define VIRTIO_OFFLOAD_MAP_MIN    46
> +#define VIRTIO_OFFLOAD_MAP_LENGTH 4
> +#define VIRTIO_OFFLOAD_MAP        MAKE_64BIT_MASK(VIRTIO_OFFLOAD_MAP_MIN, \
> +                                                VIRTIO_OFFLOAD_MAP_LENGTH)
> +#define VIRTIO_FEATURES_MAP_MIN   65
> +#define VIRTIO_O2F_DELTA          (VIRTIO_FEATURES_MAP_MIN - \
> +                                   VIRTIO_OFFLOAD_MAP_MIN)
> +
> +#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)  (fbit >= 64 ? \
> +                                          fbit - VIRTIO_O2F_DELTA : fbit)
> +

These are specific to virtio-net but look like they are common for 
virtio as the names don't contain "NET".

VIRTIO_FEATURES_MAP_MIN is also a bit confusing. It points to the least 
significant bit that refers to an offloading feature in the upper-half 
of the feature bits, but the name lacks the context.

>   static const VirtIOFeature feature_sizes[] = {
>       {.flags = 1ULL << VIRTIO_NET_F_MAC,
>        .end = endof(struct virtio_net_config, mac)},
> @@ -751,44 +762,45 @@ static void virtio_net_set_queue_pairs(VirtIONet *n)
>   
>   static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
>   
> -static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> -                                        Error **errp)
> +static virtio_features_t virtio_net_get_features(VirtIODevice *vdev,
> +                                                 virtio_features_t features,
> +                                                 Error **errp)
>   {
>       VirtIONet *n = VIRTIO_NET(vdev);
>       NetClientState *nc = qemu_get_queue(n->nic);
>   
>       /* Firstly sync all virtio-net possible supported features */
> -    features |= n->host_features;
> +    features |= n->host_features_ex;
>   
> -    virtio_add_feature(&features, VIRTIO_NET_F_MAC);
> +    virtio_add_feature_ex(&features, VIRTIO_NET_F_MAC);
>   
>       if (!peer_has_vnet_hdr(n)) {
> -        virtio_clear_feature(&features, VIRTIO_NET_F_CSUM);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO4);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO6);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_ECN);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_CSUM);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_TSO4);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_TSO6);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_ECN);
>   
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_CSUM);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_TSO4);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_TSO6);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_ECN);
>   
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_USO);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO4);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO6);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_USO);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO4);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
>   
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HASH_REPORT);
>       }
>   
>       if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_UFO);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_UFO);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_UFO);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_UFO);
>       }
>   
>       if (!peer_has_uso(n)) {
> -        virtio_clear_feature(&features, VIRTIO_NET_F_HOST_USO);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO4);
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO6);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_HOST_USO);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO4);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_USO6);
>       }
>   
>       if (!get_vhost_net(nc->peer)) {
> @@ -796,7 +808,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>       }
>   
>       if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
> -        virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_RSS);
>       }
>       features = vhost_net_get_features(get_vhost_net(nc->peer), features);
>       vdev->backend_features_ex = features;
> @@ -818,7 +830,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>        * support it.
>        */
>       if (!virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_CTRL_VQ)) {
> -        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ANNOUNCE);
> +        virtio_clear_feature_ex(&features, VIRTIO_NET_F_GUEST_ANNOUNCE);
>       }
>   
>       return features;
> @@ -851,9 +863,16 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>               !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)));
>   }
>   
> -static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
> +static uint64_t virtio_net_features_to_offload(virtio_features_t features)
 > +{> +    return (features & ~VIRTIO_OFFLOAD_MAP) |
> +           ((features >> VIRTIO_O2F_DELTA) & VIRTIO_OFFLOAD_MAP);
> +}
> +
> +static uint64_t
> +virtio_net_guest_offloads_by_features(virtio_features_t features)
>   {
> -    static const uint64_t guest_offloads_mask =
> +    static const virtio_features_t guest_offloads_mask =
>           (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
>           (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
>           (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
> @@ -862,13 +881,13 @@ static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
>           (1ULL << VIRTIO_NET_F_GUEST_USO4) |
>           (1ULL << VIRTIO_NET_F_GUEST_USO6);
>   
> -    return guest_offloads_mask & features;
> +    return guest_offloads_mask & virtio_net_features_to_offload(features);


How about:

static const virtio_features_t guest_offload_features_mask = ...
virtio_features_t masked_features = guest_offload_features_mask & features;

return masked_features | ((masked_features >> VIRTIO_FEATURES_MAP_MIN) 
<< VIRTIO_OFFLOAD_MAP_MIN);

This makes virtio_net_features_to_offload() unnecessary.

>   }
>   
>   uint64_t virtio_net_supported_guest_offloads(const VirtIONet *n)
>   {
>       VirtIODevice *vdev = VIRTIO_DEVICE(n);
> -    return virtio_net_guest_offloads_by_features(vdev->guest_features);
> +    return virtio_net_guest_offloads_by_features(vdev->guest_features_ex);
>   }
>   
>   typedef struct {
> @@ -947,7 +966,8 @@ static void failover_add_primary(VirtIONet *n, Error **errp)
>       error_propagate(errp, err);
>   }
>   
> -static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> +static void virtio_net_set_features(VirtIODevice *vdev,
> +                                    virtio_features_t features)
>   {
>       VirtIONet *n = VIRTIO_NET(vdev);
>       Error *err = NULL;
> @@ -955,7 +975,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
>   
>       if (n->mtu_bypass_backend &&
>               !virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_MTU)) {
> -        features &= ~(1ULL << VIRTIO_NET_F_MTU);
> +        features &= ~VIRTIO_BIT(VIRTIO_NET_F_MTU);
>       }
>   
>       virtio_net_set_multiqueue(n,
> @@ -1962,10 +1982,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>                   virtio_error(vdev, "virtio-net unexpected empty queue: "
>                                "i %zd mergeable %d offset %zd, size %zd, "
>                                "guest hdr len %zd, host hdr len %zd "
> -                             "guest features 0x%" PRIx64,
> +                             "guest features 0x" VIRTIO_FEATURES_FMT,
>                                i, n->mergeable_rx_bufs, offset, size,
>                                n->guest_hdr_len, n->host_hdr_len,
> -                             vdev->guest_features);
> +                             VIRTIO_FEATURES_HI(vdev->guest_features_ex),
> +                             VIRTIO_FEATURES_LOW(vdev->guest_features_ex));
>               }
>               err = -1;
>               goto err;
> @@ -4146,8 +4167,8 @@ static void virtio_net_class_init(ObjectClass *klass, const void *data)
>       vdc->unrealize = virtio_net_device_unrealize;
>       vdc->get_config = virtio_net_get_config;
>       vdc->set_config = virtio_net_set_config;
> -    vdc->get_features = virtio_net_get_features;
> -    vdc->set_features = virtio_net_set_features;
> +    vdc->get_features_ex = virtio_net_get_features;
> +    vdc->set_features_ex = virtio_net_set_features;
>       vdc->bad_features = virtio_net_bad_features;
>       vdc->reset = virtio_net_reset;
>       vdc->queue_reset = virtio_net_queue_reset;
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index b9ea9e824e..5ccdbeb253 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -178,7 +178,7 @@ struct VirtIONet {
>       uint32_t has_vnet_hdr;
>       size_t host_hdr_len;
>       size_t guest_hdr_len;
> -    uint64_t host_features;
> +    DECLARE_FEATURES(host_features);
>       uint32_t rsc_timeout;
>       uint8_t rsc4_enabled;
>       uint8_t rsc6_enabled;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-21 11:34 ` [PATCH RFC 15/16] net: implement tnl feature offloading Paolo Abeni
@ 2025-05-23  8:16   ` Akihiko Odaki
  2025-05-23 10:40     ` Paolo Abeni
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  8:16 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:34, Paolo Abeni wrote:
> When any host or guest GSO over UDP tunnel offload is enabled the
> virtio net header includes the additional tunnel-related fields,
> update the size accordingly.
> 
> Push the GSO over UDP tunnel offloads all the way down to the tap
> device extending the newly introduced NetFeatures struct, and
> eventually enable the associated features.
> 
> As per virtio specification, to convert features bit to offload bit,
> map the extended features into the reserved range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   hw/net/virtio-net.c | 48 ++++++++++++++++++++++++++++++++++++++++-----
>   include/net/net.h   |  2 ++
>   net/net.c           |  7 ++++++-
>   net/tap-linux.c     |  6 ++++++
>   4 files changed, 57 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 881877086e..758ceaffba 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -101,6 +101,27 @@
>   #define VIRTIO_FEATURE_TO_OFFLOAD(fbit)  (fbit >= 64 ? \
>                                             fbit - VIRTIO_O2F_DELTA : fbit)
>   
> +#ifdef CONFIG_INT128
> +#define VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO \
> +    VIRTIO_FEATURE_TO_OFFLOAD(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO)
> +#define VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM \
> +    VIRTIO_FEATURE_TO_OFFLOAD(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM)
> +
> +static bool virtio_has_tnl_hdr(virtio_features_t features)

"tnl" looks a bit cryptic to me and also inconsistent with everywhere 
else, which just calls it "tunnel".

> +{
> +    return virtio_has_feature_ex(features, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
> +           virtio_has_feature_ex(features, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO);
> +}
> +
> +#else
> +
> +static bool virtio_has_tnl_hdr(virtio_features_t features)
> +{
> +    return false;
> +}
> +
> +#endif
> +
>   static const VirtIOFeature feature_sizes[] = {
>       {.flags = 1ULL << VIRTIO_NET_F_MAC,
>        .end = endof(struct virtio_net_config, mac)},
> @@ -656,7 +677,8 @@ static int peer_has_tunnel(VirtIONet *n)
>   }
>   
>   static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
> -                                       int version_1, int hash_report)
> +                                       int version_1, int hash_report,
> +                                       int tnl)
>   {
>       int i;
>       NetClientState *nc;
> @@ -674,6 +696,9 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
>               sizeof(struct virtio_net_hdr);
>           n->rss_data.populate_hash = false;
>       }
> +    if (tnl) {
> +        n->guest_hdr_len += sizeof(struct virtio_net_hdr_tunnel);
> +    }
>   
>       for (i = 0; i < n->max_queue_pairs; i++) {
>           nc = qemu_get_subqueue(n->nic, i);
> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>          .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>          .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>          .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
> +#ifdef CONFIG_INT128
> +       .tnl  = !!(n->curr_guest_offloads &
> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
> +       .tnl_csum = !!(n->curr_guest_offloads &
> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),

"[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a 
struct for offloading, but how about passing n->curr_guest_offloads as 
is instead?

It loses some type safety and makes it prone to have unknown bits, but 
omitting duplicate these bit operations may outweigh the downside.

> +#endif
>       };
>   
>       qemu_set_offload(qemu_get_queue(n->nic)->peer, &ol);
> @@ -911,7 +942,12 @@ virtio_net_guest_offloads_by_features(virtio_features_t features)
>           (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
>           (1ULL << VIRTIO_NET_F_GUEST_UFO)  |
>           (1ULL << VIRTIO_NET_F_GUEST_USO4) |
> -        (1ULL << VIRTIO_NET_F_GUEST_USO6);
> +        (1ULL << VIRTIO_NET_F_GUEST_USO6)
> +#ifdef CONFIG_INT128
> +        | (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)
> +        | (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)
> +#endif
> +        ;
>   
>       return guest_offloads_mask & virtio_net_features_to_offload(features);
>   }
> @@ -1020,7 +1056,8 @@ static void virtio_net_set_features(VirtIODevice *vdev,
>                                  virtio_has_feature(features,
>                                                     VIRTIO_F_VERSION_1),
>                                  virtio_has_feature(features,
> -                                                  VIRTIO_NET_F_HASH_REPORT));
> +                                                  VIRTIO_NET_F_HASH_REPORT),
> +                               virtio_has_tnl_hdr(features));
>   
>       n->rsc4_enabled = virtio_has_feature(features, VIRTIO_NET_F_RSC_EXT) &&
>           virtio_has_feature(features, VIRTIO_NET_F_GUEST_TSO4);
> @@ -3139,7 +3176,8 @@ static int virtio_net_post_load_device(void *opaque, int version_id)
>                                  virtio_vdev_has_feature(vdev,
>                                                          VIRTIO_F_VERSION_1),
>                                  virtio_vdev_has_feature(vdev,
> -                                                       VIRTIO_NET_F_HASH_REPORT));
> +                                                       VIRTIO_NET_F_HASH_REPORT),
> +                               virtio_has_tnl_hdr(vdev->guest_features));
>   
>       /* MAC_TABLE_ENTRIES may be different from the saved image */
>       if (n->mac_table.in_use > MAC_TABLE_ENTRIES) {
> @@ -3946,7 +3984,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>   
>       n->vqs[0].tx_waiting = 0;
>       n->tx_burst = n->net_conf.txburst;
> -    virtio_net_set_mrg_rx_bufs(n, 0, 0, 0);
> +    virtio_net_set_mrg_rx_bufs(n, 0, 0, 0, 0);
>       n->promisc = 1; /* for compatibility */
>   
>       n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
> diff --git a/include/net/net.h b/include/net/net.h
> index c71d7c6074..5049d293f2 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -43,6 +43,8 @@ typedef struct NetOffloads {
>       bool ufo;
>       bool uso4;
>       bool uso6;
> +    bool tnl;
> +    bool tnl_csum;
>   } NetOffloads;
>   
>   #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
> diff --git a/net/net.c b/net/net.c
> index 5a2f00c108..bd41229407 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -569,13 +569,18 @@ int qemu_get_vnet_hdr_len(NetClientState *nc)
>   
>   void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
>   {
> +    int len_tnl = len - sizeof(struct virtio_net_hdr_tunnel);
> +
>       if (!nc || !nc->info->set_vnet_hdr_len) {
>           return;
>       }
>   
>       assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
> +           len_tnl == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
>              len == sizeof(struct virtio_net_hdr) ||
> -           len == sizeof(struct virtio_net_hdr_v1_hash));
> +           len_tnl == sizeof(struct virtio_net_hdr) ||
> +           len == sizeof(struct virtio_net_hdr_v1_hash) ||
> +           len_tnl == sizeof(struct virtio_net_hdr_v1_hash));
>   
>       nc->vnet_hdr_len = len;
>       nc->info->set_vnet_hdr_len(nc, len);
> diff --git a/net/tap-linux.c b/net/tap-linux.c
> index aa5f3a6e22..b7662ece63 100644
> --- a/net/tap-linux.c
> +++ b/net/tap-linux.c
> @@ -287,6 +287,12 @@ void tap_fd_set_offload(int fd, const NetOffloads *ol)
>           if (ol->uso6) {
>               offload |= TUN_F_USO6;
>           }
> +        if ((ol->tso4 || ol->tso6 || ol->uso4 || ol->uso6) && ol->tnl) {

Is it possible to have ol->tnl without TSO or USO? If so, is ignoring 
ol->tnl really what you want?

> +            offload |= TUN_F_UDP_TUNNEL_GSO;
> +        }
> +        if ((offload & TUN_F_UDP_TUNNEL_GSO) && ol->tnl_csum) {
> +            offload |= TUN_F_UDP_TUNNEL_GSO_CSUM;
> +        }
>       }
>   
>       if (ioctl(fd, TUNSETOFFLOAD, offload) != 0) {



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout
  2025-05-21 11:34 ` [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout Paolo Abeni
@ 2025-05-23  8:22   ` Akihiko Odaki
  2025-05-28  3:04     ` Lei Yang
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  8:22 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:34, Paolo Abeni wrote:
> When the GSO over UDP tunnel offload is enabled, the virtio net
> header includes additional fields to support such offload.
> 
> The vhost backend must be aware of the exact header layout, to
> copy it correctly. The tunnel-related field are present if either
> the guest or the host negotiated any UDP tunnel related feature:
> add them to host kernel supported features list, to allow qemu
> transder to such backend the needed information.

s/transder/transfer/

This patch should be squashed into the previous patch ("[PATCH RFC 
15/16] net: implement tnl feature offloading") as QEMU only with the 
previous patch will incorrectly enable tunnel offloading even when vhost 
doesn't support it.

> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   hw/net/vhost_net.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 58d7619fc8..c8e02d1732 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -52,6 +52,10 @@ static const int kernel_feature_bits[] = {
>       VIRTIO_F_NOTIFICATION_DATA,
>       VIRTIO_NET_F_RSC_EXT,
>       VIRTIO_NET_F_HASH_REPORT,
> +#ifdef CONFIG_INT128
> +    VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
> +    VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
> +#endif
>       VHOST_INVALID_FEATURE_BIT
>   };
>   



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel
  2025-05-23  7:19 ` [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Akihiko Odaki
@ 2025-05-23  9:43   ` Paolo Abeni
  2025-05-23  9:48     ` Akihiko Odaki
  2025-06-21  6:39     ` Akihiko Odaki
  0 siblings, 2 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23  9:43 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 9:19 AM, Akihiko Odaki wrote:
> On 2025/05/21 20:33, Paolo Abeni wrote:
>> Some virtualized deployments use UDP tunnel pervasively and are impacted
>> negatively by the lack of GSO support for such kind of traffic in the
>> virtual NIC driver.
>>
>> The virtio_net specification recently introduced support for GSO over
>> UDP tunnel, this series updates the virtio implementation to support
>> such a feature.
>>
>> One of the reasons for the RFC tag is that the kernel-side
>> implementation has just been shared upstream and is not merged yet, but
>> there are also other relevant reasons, see below.
>>
>> Currently, the kernel virtio support limits the feature space to 64 bits,
>> while the virtio specification allows for a larger number of features.
>> Specifically, the GSO-over-UDP-tunnel-related virtio features use bits
>> 65-69; the larger part of this series (patches 2-11) actually deals with
>> the extended feature space.
>>
>> I tried to minimize the otherwise very large code churn by limiting the
>> extended features support to arches with native 128 integer support and
>> introducing the extended features space support only in virtio/vhost
>> core and in the relevant device driver.
> 
> What about adding another 64-bit integer to hold the high bits? It makes 
> adding the 128-bit integer type to VMState and properties and 
> CONFIG_INT128 checks unnecessary.

I did a few others implementation attempts before the current one. The
closes to the above proposal I tried was to implement virtio_features_t
as fixed size array of u64.

A problem a found with that approach is that it requires a very large
code churn, as ~ every line touching a feature related variable should
be modified.

Let me think a little bit on this other option (I hope to avoid
discarding a lot of work here).

>> The actual offload implementation is in patches 12-16 and boils down to
>> propagating the new offload to the tun devices and the vhost backend.
>>
>> Tested with basic stream transfer with all the possible permutations of
>> host kernel/qemu/guest kernel with/without GSO over UDP tunnel support
>> and vs snapshots creation and restore.
>>
>> Notably this does not include (yet) any additional tests. Some guidance
>> on such matter would be really appreciated, and any feedback about the
>> features extension strategy would be more than welcome!
> 
> My proposal to add a feature to tap devices[1] simply omitted tests and 
> I wrote simple testing scripts for my personal usage. As you can see, 
> there is no testing code that covers tap devices, unfortunately, and I 
> think adding one takes significant effort.
> 
> [1] https://patchew.org/QEMU/20250313-hash-v4-0-c75c494b495e@daynix.com/

Thanks for the pointer

Paolo



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel
  2025-05-23  9:43   ` Paolo Abeni
@ 2025-05-23  9:48     ` Akihiko Odaki
  2025-06-21  6:39     ` Akihiko Odaki
  1 sibling, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  9:48 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 18:43, Paolo Abeni wrote:
> On 5/23/25 9:19 AM, Akihiko Odaki wrote:
>> On 2025/05/21 20:33, Paolo Abeni wrote:
>>> Some virtualized deployments use UDP tunnel pervasively and are impacted
>>> negatively by the lack of GSO support for such kind of traffic in the
>>> virtual NIC driver.
>>>
>>> The virtio_net specification recently introduced support for GSO over
>>> UDP tunnel, this series updates the virtio implementation to support
>>> such a feature.
>>>
>>> One of the reasons for the RFC tag is that the kernel-side
>>> implementation has just been shared upstream and is not merged yet, but
>>> there are also other relevant reasons, see below.
>>>
>>> Currently, the kernel virtio support limits the feature space to 64 bits,
>>> while the virtio specification allows for a larger number of features.
>>> Specifically, the GSO-over-UDP-tunnel-related virtio features use bits
>>> 65-69; the larger part of this series (patches 2-11) actually deals with
>>> the extended feature space.
>>>
>>> I tried to minimize the otherwise very large code churn by limiting the
>>> extended features support to arches with native 128 integer support and
>>> introducing the extended features space support only in virtio/vhost
>>> core and in the relevant device driver.
>>
>> What about adding another 64-bit integer to hold the high bits? It makes
>> adding the 128-bit integer type to VMState and properties and
>> CONFIG_INT128 checks unnecessary.
> 
> I did a few others implementation attempts before the current one. The
> closes to the above proposal I tried was to implement virtio_features_t
> as fixed size array of u64.
> 
> A problem a found with that approach is that it requires a very large
> code churn, as ~ every line touching a feature related variable should
> be modified.

Using an array may be ideal in the end, but for now you may instead add 
a separate field for the upper-half. For example, you may change 
DECLARE_FEATURES() as follows:

#define DECLARE_FEATURES(name) \
     uint64_t name;             \
     uint64_t name##_hi;

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next
  2025-05-21 11:33 ` [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next Paolo Abeni
@ 2025-05-23  9:50   ` Akihiko Odaki
  2025-05-23 10:06     ` Paolo Abeni
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23  9:50 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/21 20:33, Paolo Abeni wrote:
> Update headers to include the virtio GSO over UDP tunnel features
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> The relevant bits are not yet merged upstream, will update this
> patch after merge.
> ---
>   include/standard-headers/asm-x86/setup_data.h |   4 +-
>   include/standard-headers/drm/drm_fourcc.h     |  41 +++++
>   include/standard-headers/linux/const.h        |   2 +-
>   include/standard-headers/linux/ethtool.h      | 156 ++++++++++--------
>   include/standard-headers/linux/fuse.h         |  12 +-
>   include/standard-headers/linux/pci_regs.h     |  13 +-
>   include/standard-headers/linux/virtio_net.h   |  46 ++++++
>   include/standard-headers/linux/virtio_pci.h   |   1 +
>   include/standard-headers/linux/virtio_snd.h   |   2 +-
>   linux-headers/asm-arm64/kvm.h                 |  11 ++
>   linux-headers/asm-arm64/unistd_64.h           |   1 +
>   linux-headers/asm-generic/mman-common.h       |   1 +
>   linux-headers/asm-generic/unistd.h            |   4 +-
>   linux-headers/asm-loongarch/unistd_64.h       |   1 +
>   linux-headers/asm-mips/unistd_n32.h           |   1 +
>   linux-headers/asm-mips/unistd_n64.h           |   1 +
>   linux-headers/asm-mips/unistd_o32.h           |   1 +
>   linux-headers/asm-powerpc/unistd_32.h         |   1 +
>   linux-headers/asm-powerpc/unistd_64.h         |   1 +
>   linux-headers/asm-riscv/kvm.h                 |   2 +
>   linux-headers/asm-riscv/unistd_32.h           |   1 +
>   linux-headers/asm-riscv/unistd_64.h           |   1 +
>   linux-headers/asm-s390/unistd_32.h            |   1 +
>   linux-headers/asm-s390/unistd_64.h            |   1 +
>   linux-headers/asm-x86/kvm.h                   |   3 +
>   linux-headers/asm-x86/unistd_32.h             |   1 +
>   linux-headers/asm-x86/unistd_64.h             |   1 +
>   linux-headers/asm-x86/unistd_x32.h            |   1 +
>   linux-headers/linux/bits.h                    |   8 +-
>   linux-headers/linux/const.h                   |   2 +-
>   linux-headers/linux/iommufd.h                 | 129 ++++++++++++++-
>   linux-headers/linux/kvm.h                     |   1 +
>   linux-headers/linux/psp-sev.h                 |  21 ++-
>   linux-headers/linux/stddef.h                  |   2 +
>   linux-headers/linux/vfio.h                    |  30 ++--
>   linux-headers/linux/vhost.h                   |  12 +-
>   36 files changed, 414 insertions(+), 103 deletions(-)
> 
> diff --git a/include/standard-headers/asm-x86/setup_data.h b/include/standard-headers/asm-x86/setup_data.h
> index 09355f54c5..a483d72f42 100644
> --- a/include/standard-headers/asm-x86/setup_data.h
> +++ b/include/standard-headers/asm-x86/setup_data.h
> @@ -18,7 +18,7 @@
>   #define SETUP_INDIRECT			(1<<31)
>   #define SETUP_TYPE_MAX			(SETUP_ENUM_MAX | SETUP_INDIRECT)
>   
> -#ifndef __ASSEMBLY__
> +#ifndef __ASSEMBLER__
>   
>   #include "standard-headers/linux/types.h"
>   
> @@ -78,6 +78,6 @@ struct ima_setup_data {
>   	uint64_t size;
>   } QEMU_PACKED;
>   
> -#endif /* __ASSEMBLY__ */
> +#endif /* __ASSEMBLER__ */
>   
>   #endif /* _ASM_X86_SETUP_DATA_H */
> diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
> index 708647776f..a8b759dcbc 100644
> --- a/include/standard-headers/drm/drm_fourcc.h
> +++ b/include/standard-headers/drm/drm_fourcc.h
> @@ -420,6 +420,7 @@ extern "C" {
>   #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
>   #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
>   #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> +#define DRM_FORMAT_MOD_VENDOR_MTK     0x0b
>   
>   /* add more to the end as needed */
>   
> @@ -1452,6 +1453,46 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
>    */
>   #define AMLOGIC_FBC_OPTION_MEM_SAVING		(1ULL << 0)
>   
> +/* MediaTek modifiers
> + * Bits  Parameter                Notes
> + * ----- ------------------------ ---------------------------------------------
> + *   7: 0 TILE LAYOUT              Values are MTK_FMT_MOD_TILE_*
> + *  15: 8 COMPRESSION              Values are MTK_FMT_MOD_COMPRESS_*
> + *  23:16 10 BIT LAYOUT            Values are MTK_FMT_MOD_10BIT_LAYOUT_*
> + *
> + */
> +
> +#define DRM_FORMAT_MOD_MTK(__flags)		fourcc_mod_code(MTK, __flags)
> +
> +/*
> + * MediaTek Tiled Modifier
> + * The lowest 8 bits of the modifier is used to specify the tiling
> + * layout. Only the 16L_32S tiling is used for now, but we define an
> + * "untiled" version and leave room for future expansion.
> + */
> +#define MTK_FMT_MOD_TILE_MASK     0xf
> +#define MTK_FMT_MOD_TILE_NONE     0x0
> +#define MTK_FMT_MOD_TILE_16L32S   0x1
> +
> +/*
> + * Bits 8-15 specify compression options
> + */
> +#define MTK_FMT_MOD_COMPRESS_MASK (0xf << 8)
> +#define MTK_FMT_MOD_COMPRESS_NONE (0x0 << 8)
> +#define MTK_FMT_MOD_COMPRESS_V1   (0x1 << 8)
> +
> +/*
> + * Bits 16-23 specify how the bits of 10 bit formats are
> + * stored out in memory
> + */
> +#define MTK_FMT_MOD_10BIT_LAYOUT_MASK      (0xf << 16)
> +#define MTK_FMT_MOD_10BIT_LAYOUT_PACKED    (0x0 << 16)
> +#define MTK_FMT_MOD_10BIT_LAYOUT_LSBTILED  (0x1 << 16)
> +#define MTK_FMT_MOD_10BIT_LAYOUT_LSBRASTER (0x2 << 16)
> +
> +/* alias for the most common tiling format */
> +#define DRM_FORMAT_MOD_MTK_16L_32S_TILE  DRM_FORMAT_MOD_MTK(MTK_FMT_MOD_TILE_16L32S)
> +
>   /*
>    * AMD modifiers
>    *
> diff --git a/include/standard-headers/linux/const.h b/include/standard-headers/linux/const.h
> index 2122610de7..95ede23342 100644
> --- a/include/standard-headers/linux/const.h
> +++ b/include/standard-headers/linux/const.h
> @@ -33,7 +33,7 @@
>    * Missing __asm__ support
>    *
>    * __BIT128() would not work in the __asm__ code, as it shifts an
> - * 'unsigned __init128' data type as direct representation of
> + * 'unsigned __int128' data type as direct representation of
>    * 128 bit constants is not supported in the gcc compiler, as
>    * they get silently truncated.
>    *
> diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
> index e83382531c..cef0d207a6 100644
> --- a/include/standard-headers/linux/ethtool.h
> +++ b/include/standard-headers/linux/ethtool.h
> @@ -2059,6 +2059,24 @@ enum ethtool_link_mode_bit_indices {
>   	ETHTOOL_LINK_MODE_10baseT1S_Half_BIT		 = 100,
>   	ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT	 = 101,
>   	ETHTOOL_LINK_MODE_10baseT1BRR_Full_BIT		 = 102,
> +	ETHTOOL_LINK_MODE_200000baseCR_Full_BIT		 = 103,
> +	ETHTOOL_LINK_MODE_200000baseKR_Full_BIT		 = 104,
> +	ETHTOOL_LINK_MODE_200000baseDR_Full_BIT		 = 105,
> +	ETHTOOL_LINK_MODE_200000baseDR_2_Full_BIT	 = 106,
> +	ETHTOOL_LINK_MODE_200000baseSR_Full_BIT		 = 107,
> +	ETHTOOL_LINK_MODE_200000baseVR_Full_BIT		 = 108,
> +	ETHTOOL_LINK_MODE_400000baseCR2_Full_BIT	 = 109,
> +	ETHTOOL_LINK_MODE_400000baseKR2_Full_BIT	 = 110,
> +	ETHTOOL_LINK_MODE_400000baseDR2_Full_BIT	 = 111,
> +	ETHTOOL_LINK_MODE_400000baseDR2_2_Full_BIT	 = 112,
> +	ETHTOOL_LINK_MODE_400000baseSR2_Full_BIT	 = 113,
> +	ETHTOOL_LINK_MODE_400000baseVR2_Full_BIT	 = 114,
> +	ETHTOOL_LINK_MODE_800000baseCR4_Full_BIT	 = 115,
> +	ETHTOOL_LINK_MODE_800000baseKR4_Full_BIT	 = 116,
> +	ETHTOOL_LINK_MODE_800000baseDR4_Full_BIT	 = 117,
> +	ETHTOOL_LINK_MODE_800000baseDR4_2_Full_BIT	 = 118,
> +	ETHTOOL_LINK_MODE_800000baseSR4_Full_BIT	 = 119,
> +	ETHTOOL_LINK_MODE_800000baseVR4_Full_BIT	 = 120,
>   
>   	/* must be last entry */
>   	__ETHTOOL_LINK_MODE_MASK_NBITS
> @@ -2271,73 +2289,81 @@ static inline int ethtool_validate_duplex(uint8_t duplex)
>    * be exploited to reduce the RSS queue spread.
>    */
>   #define	RXH_XFRM_SYM_XOR	(1 << 0)
> +/* Similar to SYM_XOR, except that one copy of the XOR'ed fields is replaced by
> + * an OR of the same fields
> + */
> +#define	RXH_XFRM_SYM_OR_XOR	(1 << 1)
>   #define	RXH_XFRM_NO_CHANGE	0xff
>   
> -/* L2-L4 network traffic flow types */
> -#define	TCP_V4_FLOW	0x01	/* hash or spec (tcp_ip4_spec) */
> -#define	UDP_V4_FLOW	0x02	/* hash or spec (udp_ip4_spec) */
> -#define	SCTP_V4_FLOW	0x03	/* hash or spec (sctp_ip4_spec) */
> -#define	AH_ESP_V4_FLOW	0x04	/* hash only */
> -#define	TCP_V6_FLOW	0x05	/* hash or spec (tcp_ip6_spec; nfc only) */
> -#define	UDP_V6_FLOW	0x06	/* hash or spec (udp_ip6_spec; nfc only) */
> -#define	SCTP_V6_FLOW	0x07	/* hash or spec (sctp_ip6_spec; nfc only) */
> -#define	AH_ESP_V6_FLOW	0x08	/* hash only */
> -#define	AH_V4_FLOW	0x09	/* hash or spec (ah_ip4_spec) */
> -#define	ESP_V4_FLOW	0x0a	/* hash or spec (esp_ip4_spec) */
> -#define	AH_V6_FLOW	0x0b	/* hash or spec (ah_ip6_spec; nfc only) */
> -#define	ESP_V6_FLOW	0x0c	/* hash or spec (esp_ip6_spec; nfc only) */
> -#define	IPV4_USER_FLOW	0x0d	/* spec only (usr_ip4_spec) */
> -#define	IP_USER_FLOW	IPV4_USER_FLOW
> -#define	IPV6_USER_FLOW	0x0e	/* spec only (usr_ip6_spec; nfc only) */
> -#define	IPV4_FLOW	0x10	/* hash only */
> -#define	IPV6_FLOW	0x11	/* hash only */
> -#define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
> -
> -/* Used for GTP-U IPv4 and IPv6.
> - * The format of GTP packets only includes
> - * elements such as TEID and GTP version.
> - * It is primarily intended for data communication of the UE.
> - */
> -#define GTPU_V4_FLOW 0x13	/* hash only */
> -#define GTPU_V6_FLOW 0x14	/* hash only */
> -
> -/* Use for GTP-C IPv4 and v6.
> - * The format of these GTP packets does not include TEID.
> - * Primarily expected to be used for communication
> - * to create sessions for UE data communication,
> - * commonly referred to as CSR (Create Session Request).
> - */
> -#define GTPC_V4_FLOW 0x15	/* hash only */
> -#define GTPC_V6_FLOW 0x16	/* hash only */
> -
> -/* Use for GTP-C IPv4 and v6.
> - * Unlike GTPC_V4_FLOW, the format of these GTP packets includes TEID.
> - * After session creation, it becomes this packet.
> - * This is mainly used for requests to realize UE handover.
> - */
> -#define GTPC_TEID_V4_FLOW 0x17	/* hash only */
> -#define GTPC_TEID_V6_FLOW 0x18	/* hash only */
> -
> -/* Use for GTP-U and extended headers for the PSC (PDU Session Container).
> - * The format of these GTP packets includes TEID and QFI.
> - * In 5G communication using UPF (User Plane Function),
> - * data communication with this extended header is performed.
> - */
> -#define GTPU_EH_V4_FLOW 0x19	/* hash only */
> -#define GTPU_EH_V6_FLOW 0x1a	/* hash only */
> -
> -/* Use for GTP-U IPv4 and v6 PSC (PDU Session Container) extended headers.
> - * This differs from GTPU_EH_V(4|6)_FLOW in that it is distinguished by
> - * UL/DL included in the PSC.
> - * There are differences in the data included based on Downlink/Uplink,
> - * and can be used to distinguish packets.
> - * The functions described so far are useful when you want to
> - * handle communication from the mobile network in UPF, PGW, etc.
> - */
> -#define GTPU_UL_V4_FLOW 0x1b	/* hash only */
> -#define GTPU_UL_V6_FLOW 0x1c	/* hash only */
> -#define GTPU_DL_V4_FLOW 0x1d	/* hash only */
> -#define GTPU_DL_V6_FLOW 0x1e	/* hash only */
> +enum {
> +	/* L2-L4 network traffic flow types */
> +	TCP_V4_FLOW	= 0x01,	/* hash or spec (tcp_ip4_spec) */
> +	UDP_V4_FLOW	= 0x02,	/* hash or spec (udp_ip4_spec) */
> +	SCTP_V4_FLOW	= 0x03,	/* hash or spec (sctp_ip4_spec) */
> +	AH_ESP_V4_FLOW	= 0x04,	/* hash only */
> +	TCP_V6_FLOW	= 0x05,	/* hash or spec (tcp_ip6_spec; nfc only) */
> +	UDP_V6_FLOW	= 0x06,	/* hash or spec (udp_ip6_spec; nfc only) */
> +	SCTP_V6_FLOW	= 0x07,	/* hash or spec (sctp_ip6_spec; nfc only) */
> +	AH_ESP_V6_FLOW	= 0x08,	/* hash only */
> +	AH_V4_FLOW	= 0x09,	/* hash or spec (ah_ip4_spec) */
> +	ESP_V4_FLOW	= 0x0a,	/* hash or spec (esp_ip4_spec) */
> +	AH_V6_FLOW	= 0x0b,	/* hash or spec (ah_ip6_spec; nfc only) */
> +	ESP_V6_FLOW	= 0x0c,	/* hash or spec (esp_ip6_spec; nfc only) */
> +	IPV4_USER_FLOW	= 0x0d,	/* spec only (usr_ip4_spec) */
> +	IP_USER_FLOW	= IPV4_USER_FLOW,
> +	IPV6_USER_FLOW	= 0x0e, /* spec only (usr_ip6_spec; nfc only) */
> +	IPV4_FLOW	= 0x10, /* hash only */
> +	IPV6_FLOW	= 0x11, /* hash only */
> +	ETHER_FLOW	= 0x12, /* spec only (ether_spec) */
> +
> +	/* Used for GTP-U IPv4 and IPv6.
> +	 * The format of GTP packets only includes
> +	 * elements such as TEID and GTP version.
> +	 * It is primarily intended for data communication of the UE.
> +	 */
> +	GTPU_V4_FLOW	= 0x13,	/* hash only */
> +	GTPU_V6_FLOW	= 0x14,	/* hash only */
> +
> +	/* Use for GTP-C IPv4 and v6.
> +	 * The format of these GTP packets does not include TEID.
> +	 * Primarily expected to be used for communication
> +	 * to create sessions for UE data communication,
> +	 * commonly referred to as CSR (Create Session Request).
> +	 */
> +	GTPC_V4_FLOW	= 0x15,	/* hash only */
> +	GTPC_V6_FLOW	= 0x16,	/* hash only */
> +
> +	/* Use for GTP-C IPv4 and v6.
> +	 * Unlike GTPC_V4_FLOW, the format of these GTP packets includes TEID.
> +	 * After session creation, it becomes this packet.
> +	 * This is mainly used for requests to realize UE handover.
> +	 */
> +	GTPC_TEID_V4_FLOW	= 0x17,	/* hash only */
> +	GTPC_TEID_V6_FLOW	= 0x18,	/* hash only */
> +
> +	/* Use for GTP-U and extended headers for the PSC (PDU Session Container).
> +	 * The format of these GTP packets includes TEID and QFI.
> +	 * In 5G communication using UPF (User Plane Function),
> +	 * data communication with this extended header is performed.
> +	 */
> +	GTPU_EH_V4_FLOW	= 0x19,	/* hash only */
> +	GTPU_EH_V6_FLOW	= 0x1a,	/* hash only */
> +
> +	/* Use for GTP-U IPv4 and v6 PSC (PDU Session Container) extended headers.
> +	 * This differs from GTPU_EH_V(4|6)_FLOW in that it is distinguished by
> +	 * UL/DL included in the PSC.
> +	 * There are differences in the data included based on Downlink/Uplink,
> +	 * and can be used to distinguish packets.
> +	 * The functions described so far are useful when you want to
> +	 * handle communication from the mobile network in UPF, PGW, etc.
> +	 */
> +	GTPU_UL_V4_FLOW	= 0x1b,	/* hash only */
> +	GTPU_UL_V6_FLOW	= 0x1c,	/* hash only */
> +	GTPU_DL_V4_FLOW	= 0x1d,	/* hash only */
> +	GTPU_DL_V6_FLOW	= 0x1e,	/* hash only */
> +
> +	__FLOW_TYPE_COUNT,
> +};
>   
>   /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
>   #define	FLOW_EXT	0x80000000
> diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
> index d303effb2a..a2b5815d89 100644
> --- a/include/standard-headers/linux/fuse.h
> +++ b/include/standard-headers/linux/fuse.h
> @@ -229,6 +229,9 @@
>    *    - FUSE_URING_IN_OUT_HEADER_SZ
>    *    - FUSE_URING_OP_IN_OUT_SZ
>    *    - enum fuse_uring_cmd
> + *
> + *  7.43
> + *  - add FUSE_REQUEST_TIMEOUT
>    */
>   
>   #ifndef _LINUX_FUSE_H
> @@ -260,7 +263,7 @@
>   #define FUSE_KERNEL_VERSION 7
>   
>   /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 42
> +#define FUSE_KERNEL_MINOR_VERSION 43
>   
>   /** The node ID of the root inode */
>   #define FUSE_ROOT_ID 1
> @@ -431,6 +434,8 @@ struct fuse_file_lock {
>    *		    of the request ID indicates resend requests
>    * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
>    * FUSE_OVER_IO_URING: Indicate that client supports io-uring
> + * FUSE_REQUEST_TIMEOUT: kernel supports timing out requests.
> + *			 init_out.request_timeout contains the timeout (in secs)
>    */
>   #define FUSE_ASYNC_READ		(1 << 0)
>   #define FUSE_POSIX_LOCKS	(1 << 1)
> @@ -473,11 +478,11 @@ struct fuse_file_lock {
>   #define FUSE_PASSTHROUGH	(1ULL << 37)
>   #define FUSE_NO_EXPORT_SUPPORT	(1ULL << 38)
>   #define FUSE_HAS_RESEND		(1ULL << 39)
> -
>   /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
>   #define FUSE_DIRECT_IO_RELAX	FUSE_DIRECT_IO_ALLOW_MMAP
>   #define FUSE_ALLOW_IDMAP	(1ULL << 40)
>   #define FUSE_OVER_IO_URING	(1ULL << 41)
> +#define FUSE_REQUEST_TIMEOUT	(1ULL << 42)
>   
>   /**
>    * CUSE INIT request/reply flags
> @@ -905,7 +910,8 @@ struct fuse_init_out {
>   	uint16_t	map_alignment;
>   	uint32_t	flags2;
>   	uint32_t	max_stack_depth;
> -	uint32_t	unused[6];
> +	uint16_t	request_timeout;
> +	uint16_t	unused[11];
>   };
>   
>   #define CUSE_INIT_INFO_MAX 4096
> diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
> index 3445c4970e..ba326710f9 100644
> --- a/include/standard-headers/linux/pci_regs.h
> +++ b/include/standard-headers/linux/pci_regs.h
> @@ -486,6 +486,7 @@
>   #define   PCI_EXP_TYPE_RC_EC	   0xa	/* Root Complex Event Collector */
>   #define  PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
>   #define  PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
> +#define  PCI_EXP_FLAGS_FLIT	0x8000	/* Flit Mode Supported */
>   #define PCI_EXP_DEVCAP		0x04	/* Device capabilities */
>   #define  PCI_EXP_DEVCAP_PAYLOAD	0x00000007 /* Max_Payload_Size */
>   #define  PCI_EXP_DEVCAP_PHANTOM	0x00000018 /* Phantom functions */
> @@ -795,6 +796,8 @@
>   #define  PCI_ERR_CAP_ECRC_CHKC		0x00000080 /* ECRC Check Capable */
>   #define  PCI_ERR_CAP_ECRC_CHKE		0x00000100 /* ECRC Check Enable */
>   #define  PCI_ERR_CAP_PREFIX_LOG_PRESENT	0x00000800 /* TLP Prefix Log Present */
> +#define  PCI_ERR_CAP_TLP_LOG_FLIT	0x00040000 /* TLP was logged in Flit Mode */
> +#define  PCI_ERR_CAP_TLP_LOG_SIZE	0x00f80000 /* Logged TLP Size (only in Flit mode) */
>   #define PCI_ERR_HEADER_LOG	0x1c	/* Header Log Register (16 bytes) */
>   #define PCI_ERR_ROOT_COMMAND	0x2c	/* Root Error Command */
>   #define  PCI_ERR_ROOT_CMD_COR_EN	0x00000001 /* Correctable Err Reporting Enable */
> @@ -1013,7 +1016,7 @@
>   
>   /* Resizable BARs */
>   #define PCI_REBAR_CAP		4	/* capability register */
> -#define  PCI_REBAR_CAP_SIZES		0x00FFFFF0  /* supported BAR sizes */
> +#define  PCI_REBAR_CAP_SIZES		0xFFFFFFF0  /* supported BAR sizes */
>   #define PCI_REBAR_CTRL		8	/* control register */
>   #define  PCI_REBAR_CTRL_BAR_IDX		0x00000007  /* BAR index */
>   #define  PCI_REBAR_CTRL_NBAR_MASK	0x000000E0  /* # of resizable BARs */
> @@ -1061,8 +1064,9 @@
>   #define  PCI_EXP_DPC_CAP_RP_EXT		0x0020	/* Root Port Extensions */
>   #define  PCI_EXP_DPC_CAP_POISONED_TLP	0x0040	/* Poisoned TLP Egress Blocking Supported */
>   #define  PCI_EXP_DPC_CAP_SW_TRIGGER	0x0080	/* Software Triggering Supported */
> -#define  PCI_EXP_DPC_RP_PIO_LOG_SIZE	0x0F00	/* RP PIO Log Size */
> +#define  PCI_EXP_DPC_RP_PIO_LOG_SIZE	0x0F00	/* RP PIO Log Size [3:0] */
>   #define  PCI_EXP_DPC_CAP_DL_ACTIVE	0x1000	/* ERR_COR signal on DL_Active supported */
> +#define  PCI_EXP_DPC_RP_PIO_LOG_SIZE4	0x2000	/* RP PIO Log Size [4] */
>   
>   #define PCI_EXP_DPC_CTL			0x06	/* DPC control */
>   #define  PCI_EXP_DPC_CTL_EN_FATAL	0x0001	/* Enable trigger on ERR_FATAL message */
> @@ -1205,9 +1209,12 @@
>   #define PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX		0x000000ff
>   #define PCI_DOE_DATA_OBJECT_DISC_REQ_3_VER		0x0000ff00
>   #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID		0x0000ffff
> -#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL		0x00ff0000
> +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE		0x00ff0000
>   #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX	0xff000000
>   
> +/* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
> +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL		PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
> +
>   /* Compute Express Link (CXL r3.1, sec 8.1.5) */
>   #define PCI_DVSEC_CXL_PORT				3
>   #define PCI_DVSEC_CXL_PORT_CTL				0x0c
> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> index fc594fe5fc..4ddefe25d6 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -70,6 +70,28 @@
>   					 * with the same MAC.
>   					 */
>   #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
> +					      * GSO-over-UDP-tunnel packets
> +					      */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
> +						   * GSO-over-UDP-tunnel
> +						   * packets with partial csum
> +						   * for the outer header
> +						   */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
> +					     * GSO-over-UDP-tunnel packets
> +					     */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
> +						  * GSO-over-UDP-tunnel
> +						  * packets with partial csum
> +						  * for the outer header
> +						  */
> +
> +/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
> + * features
> + */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
>   
>   #ifndef VIRTIO_NET_NO_LEGACY
>   #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
> @@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
>   #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
>   #define VIRTIO_NET_HDR_F_RSC_INFO	4	/* rsc info in csum_ fields */
> +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8	/* UDP tunnel requires csum offload */
>   	uint8_t flags;
>   #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
>   #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
>   #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
>   #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
>   #define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
> +				       VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
>   #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
>   	uint8_t gso_type;
>   	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
> @@ -181,6 +208,12 @@ struct virtio_net_hdr_v1_hash {
>   	uint16_t padding;
>   };
>   
> +/* This header after hashing information */
> +struct virtio_net_hdr_tunnel {
> +	__virtio16 outer_th_offset;
> +	__virtio16 inner_nh_offset;

These should be __le16 (which will be converted into uint16_t by 
scripts/update-linux-headers.sh) as they are specific to the modern 
virtio and little endian should be always used for the modern virtio.

> +};
> +
>   #ifndef VIRTIO_NET_NO_LEGACY
>   /* This header comes first in the scatter-gather list.
>    * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
> @@ -327,6 +360,19 @@ struct virtio_net_rss_config {
>   	uint8_t hash_key_data[/* hash_key_length */];
>   };
>   
> +struct virtio_net_rss_config_hdr {
> +	uint32_t hash_types;
> +	uint16_t indirection_table_mask;
> +	uint16_t unclassified_queue;
> +	uint16_t indirection_table[/* 1 + indirection_table_mask */];
> +};
> +
> +struct virtio_net_rss_config_trailer {
> +	uint16_t max_tx_vq;
> +	uint8_t hash_key_length;
> +	uint8_t hash_key_data[/* hash_key_length */];
> +};
> +
>    #define VIRTIO_NET_CTRL_MQ_RSS_CONFIG          1
>   
>   /*
> diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h
> index 91fec6f502..09e964e6ee 100644
> --- a/include/standard-headers/linux/virtio_pci.h
> +++ b/include/standard-headers/linux/virtio_pci.h
> @@ -246,6 +246,7 @@ struct virtio_pci_cfg_cap {
>   #define VIRTIO_ADMIN_CMD_LIST_USE	0x1
>   
>   /* Admin command group type. */
> +#define VIRTIO_ADMIN_GROUP_TYPE_SELF	0x0
>   #define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
>   
>   /* Transitional device admin command. */
> diff --git a/include/standard-headers/linux/virtio_snd.h b/include/standard-headers/linux/virtio_snd.h
> index 860f12e0a4..160d57899f 100644
> --- a/include/standard-headers/linux/virtio_snd.h
> +++ b/include/standard-headers/linux/virtio_snd.h
> @@ -25,7 +25,7 @@ struct virtio_snd_config {
>   	uint32_t streams;
>   	/* # of available channel maps */
>   	uint32_t chmaps;
> -	/* # of available control elements */
> +	/* # of available control elements (if VIRTIO_SND_F_CTLS) */
>   	uint32_t controls;
>   };
>   
> diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
> index ec1e82bdc8..4e6aff08df 100644
> --- a/linux-headers/asm-arm64/kvm.h
> +++ b/linux-headers/asm-arm64/kvm.h
> @@ -105,6 +105,7 @@ struct kvm_regs {
>   #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>   #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
>   #define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
> +#define KVM_ARM_VCPU_HAS_EL2_E2H0	8 /* Limit NV support to E2H RES0 */
>   
>   struct kvm_vcpu_init {
>   	__u32 target;
> @@ -365,6 +366,7 @@ enum {
>   	KVM_REG_ARM_STD_HYP_BIT_PV_TIME	= 0,
>   };
>   
> +/* Vendor hyper call function numbers 0-63 */
>   #define KVM_REG_ARM_VENDOR_HYP_BMAP		KVM_REG_ARM_FW_FEAT_BMAP_REG(2)
>   
>   enum {
> @@ -372,6 +374,14 @@ enum {
>   	KVM_REG_ARM_VENDOR_HYP_BIT_PTP		= 1,
>   };
>   
> +/* Vendor hyper call function numbers 64-127 */
> +#define KVM_REG_ARM_VENDOR_HYP_BMAP_2		KVM_REG_ARM_FW_FEAT_BMAP_REG(3)
> +
> +enum {
> +	KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_VER	= 0,
> +	KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_CPUS	= 1,
> +};
> +
>   /* Device Control API on vm fd */
>   #define KVM_ARM_VM_SMCCC_CTRL		0
>   #define   KVM_ARM_VM_SMCCC_FILTER	0
> @@ -394,6 +404,7 @@ enum {
>   #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
>   #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
>   #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
> +#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
>   #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
>   #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
>   			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
> diff --git a/linux-headers/asm-arm64/unistd_64.h b/linux-headers/asm-arm64/unistd_64.h
> index d4e90fff76..ee9aaebdf3 100644
> --- a/linux-headers/asm-arm64/unistd_64.h
> +++ b/linux-headers/asm-arm64/unistd_64.h
> @@ -323,6 +323,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_64_H */
> diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
> index 1ea2c4c33b..ef1c27fa3c 100644
> --- a/linux-headers/asm-generic/mman-common.h
> +++ b/linux-headers/asm-generic/mman-common.h
> @@ -85,6 +85,7 @@
>   /* compatibility flags */
>   #define MAP_FILE	0
>   
> +#define PKEY_UNRESTRICTED	0x0
>   #define PKEY_DISABLE_ACCESS	0x1
>   #define PKEY_DISABLE_WRITE	0x2
>   #define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
> diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
> index 88dc393c2b..2892a45023 100644
> --- a/linux-headers/asm-generic/unistd.h
> +++ b/linux-headers/asm-generic/unistd.h
> @@ -849,9 +849,11 @@ __SYSCALL(__NR_getxattrat, sys_getxattrat)
>   __SYSCALL(__NR_listxattrat, sys_listxattrat)
>   #define __NR_removexattrat 466
>   __SYSCALL(__NR_removexattrat, sys_removexattrat)
> +#define __NR_open_tree_attr 467
> +__SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
>   
>   #undef __NR_syscalls
> -#define __NR_syscalls 467
> +#define __NR_syscalls 468
>   
>   /*
>    * 32 bit systems traditionally used different
> diff --git a/linux-headers/asm-loongarch/unistd_64.h b/linux-headers/asm-loongarch/unistd_64.h
> index 23fb96a8a7..50d22df8f7 100644
> --- a/linux-headers/asm-loongarch/unistd_64.h
> +++ b/linux-headers/asm-loongarch/unistd_64.h
> @@ -319,6 +319,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_64_H */
> diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
> index 9a75719644..bdcc2f460b 100644
> --- a/linux-headers/asm-mips/unistd_n32.h
> +++ b/linux-headers/asm-mips/unistd_n32.h
> @@ -395,5 +395,6 @@
>   #define __NR_getxattrat (__NR_Linux + 464)
>   #define __NR_listxattrat (__NR_Linux + 465)
>   #define __NR_removexattrat (__NR_Linux + 466)
> +#define __NR_open_tree_attr (__NR_Linux + 467)
>   
>   #endif /* _ASM_UNISTD_N32_H */
> diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
> index 7086783b0c..3b6b0193b6 100644
> --- a/linux-headers/asm-mips/unistd_n64.h
> +++ b/linux-headers/asm-mips/unistd_n64.h
> @@ -371,5 +371,6 @@
>   #define __NR_getxattrat (__NR_Linux + 464)
>   #define __NR_listxattrat (__NR_Linux + 465)
>   #define __NR_removexattrat (__NR_Linux + 466)
> +#define __NR_open_tree_attr (__NR_Linux + 467)
>   
>   #endif /* _ASM_UNISTD_N64_H */
> diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
> index b3825823e4..4609a4b4d3 100644
> --- a/linux-headers/asm-mips/unistd_o32.h
> +++ b/linux-headers/asm-mips/unistd_o32.h
> @@ -441,5 +441,6 @@
>   #define __NR_getxattrat (__NR_Linux + 464)
>   #define __NR_listxattrat (__NR_Linux + 465)
>   #define __NR_removexattrat (__NR_Linux + 466)
> +#define __NR_open_tree_attr (__NR_Linux + 467)
>   
>   #endif /* _ASM_UNISTD_O32_H */
> diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
> index 38ee4dc35d..5d38a427e0 100644
> --- a/linux-headers/asm-powerpc/unistd_32.h
> +++ b/linux-headers/asm-powerpc/unistd_32.h
> @@ -448,6 +448,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_32_H */
> diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
> index 5e5f156834..860a488e4d 100644
> --- a/linux-headers/asm-powerpc/unistd_64.h
> +++ b/linux-headers/asm-powerpc/unistd_64.h
> @@ -420,6 +420,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_64_H */
> diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
> index f06bc5efcd..5f59fd226c 100644
> --- a/linux-headers/asm-riscv/kvm.h
> +++ b/linux-headers/asm-riscv/kvm.h
> @@ -182,6 +182,8 @@ enum KVM_RISCV_ISA_EXT_ID {
>   	KVM_RISCV_ISA_EXT_SVVPTC,
>   	KVM_RISCV_ISA_EXT_ZABHA,
>   	KVM_RISCV_ISA_EXT_ZICCRSE,
> +	KVM_RISCV_ISA_EXT_ZAAMO,
> +	KVM_RISCV_ISA_EXT_ZALRSC,
>   	KVM_RISCV_ISA_EXT_MAX,
>   };
>   
> diff --git a/linux-headers/asm-riscv/unistd_32.h b/linux-headers/asm-riscv/unistd_32.h
> index 74f6127aed..a5e769f1d9 100644
> --- a/linux-headers/asm-riscv/unistd_32.h
> +++ b/linux-headers/asm-riscv/unistd_32.h
> @@ -314,6 +314,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_32_H */
> diff --git a/linux-headers/asm-riscv/unistd_64.h b/linux-headers/asm-riscv/unistd_64.h
> index bb6a15a2ec..8df4d64841 100644
> --- a/linux-headers/asm-riscv/unistd_64.h
> +++ b/linux-headers/asm-riscv/unistd_64.h
> @@ -324,6 +324,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_64_H */
> diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
> index 620201cb36..85eedbd18e 100644
> --- a/linux-headers/asm-s390/unistd_32.h
> +++ b/linux-headers/asm-s390/unistd_32.h
> @@ -439,5 +439,6 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   #endif /* _ASM_S390_UNISTD_32_H */
> diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
> index e7e4a10aaf..c03b1b9701 100644
> --- a/linux-headers/asm-s390/unistd_64.h
> +++ b/linux-headers/asm-s390/unistd_64.h
> @@ -387,5 +387,6 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   #endif /* _ASM_S390_UNISTD_64_H */
> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> index 86f2c34e7a..dc591fb17e 100644
> --- a/linux-headers/asm-x86/kvm.h
> +++ b/linux-headers/asm-x86/kvm.h
> @@ -557,6 +557,9 @@ struct kvm_x86_mce {
>   #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
>   #define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA	(1 << 8)
>   
> +#define KVM_XEN_MSR_MIN_INDEX			0x40000000u
> +#define KVM_XEN_MSR_MAX_INDEX			0x4fffffffu
> +
>   struct kvm_xen_hvm_config {
>   	__u32 flags;
>   	__u32 msr;
> diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
> index a2eb492a75..491d6b4eb6 100644
> --- a/linux-headers/asm-x86/unistd_32.h
> +++ b/linux-headers/asm-x86/unistd_32.h
> @@ -457,6 +457,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_32_H */
> diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
> index 2f5fc400f5..7cf88bf9bd 100644
> --- a/linux-headers/asm-x86/unistd_64.h
> +++ b/linux-headers/asm-x86/unistd_64.h
> @@ -380,6 +380,7 @@
>   #define __NR_getxattrat 464
>   #define __NR_listxattrat 465
>   #define __NR_removexattrat 466
> +#define __NR_open_tree_attr 467
>   
>   
>   #endif /* _ASM_UNISTD_64_H */
> diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
> index fecd832e7f..82959111e6 100644
> --- a/linux-headers/asm-x86/unistd_x32.h
> +++ b/linux-headers/asm-x86/unistd_x32.h
> @@ -333,6 +333,7 @@
>   #define __NR_getxattrat (__X32_SYSCALL_BIT + 464)
>   #define __NR_listxattrat (__X32_SYSCALL_BIT + 465)
>   #define __NR_removexattrat (__X32_SYSCALL_BIT + 466)
> +#define __NR_open_tree_attr (__X32_SYSCALL_BIT + 467)
>   #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
>   #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
>   #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
> diff --git a/linux-headers/linux/bits.h b/linux-headers/linux/bits.h
> index c0d00c0a98..58596d18f4 100644
> --- a/linux-headers/linux/bits.h
> +++ b/linux-headers/linux/bits.h
> @@ -4,13 +4,9 @@
>   #ifndef _LINUX_BITS_H
>   #define _LINUX_BITS_H
>   
> -#define __GENMASK(h, l) \
> -        (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
> -         (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
> +#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
>   
> -#define __GENMASK_ULL(h, l) \
> -        (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
> -         (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
> +#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
>   
>   #define __GENMASK_U128(h, l) \
>   	((_BIT128((h)) << 1) - (_BIT128(l)))
> diff --git a/linux-headers/linux/const.h b/linux-headers/linux/const.h
> index 2122610de7..95ede23342 100644
> --- a/linux-headers/linux/const.h
> +++ b/linux-headers/linux/const.h
> @@ -33,7 +33,7 @@
>    * Missing __asm__ support
>    *
>    * __BIT128() would not work in the __asm__ code, as it shifts an
> - * 'unsigned __init128' data type as direct representation of
> + * 'unsigned __int128' data type as direct representation of
>    * 128 bit constants is not supported in the gcc compiler, as
>    * they get silently truncated.
>    *
> diff --git a/linux-headers/linux/iommufd.h b/linux-headers/linux/iommufd.h
> index ccbdca5e11..cb0f7d6b4d 100644
> --- a/linux-headers/linux/iommufd.h
> +++ b/linux-headers/linux/iommufd.h
> @@ -55,6 +55,7 @@ enum {
>   	IOMMUFD_CMD_VIOMMU_ALLOC = 0x90,
>   	IOMMUFD_CMD_VDEVICE_ALLOC = 0x91,
>   	IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
> +	IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
>   };
>   
>   /**
> @@ -392,6 +393,9 @@ struct iommu_vfio_ioas {
>    *                          Any domain attached to the non-PASID part of the
>    *                          device must also be flagged, otherwise attaching a
>    *                          PASID will blocked.
> + *                          For the user that wants to attach PASID, ioas is
> + *                          not recommended for both the non-PASID part
> + *                          and PASID part of the device.
>    *                          If IOMMU does not support PASID it will return
>    *                          error (-EOPNOTSUPP).
>    */
> @@ -608,9 +612,17 @@ enum iommu_hw_info_type {
>    *                                   IOMMU_HWPT_GET_DIRTY_BITMAP
>    *                                   IOMMU_HWPT_SET_DIRTY_TRACKING
>    *
> + * @IOMMU_HW_CAP_PCI_PASID_EXEC: Execute Permission Supported, user ignores it
> + *                               when the struct
> + *                               iommu_hw_info::out_max_pasid_log2 is zero.
> + * @IOMMU_HW_CAP_PCI_PASID_PRIV: Privileged Mode Supported, user ignores it
> + *                               when the struct
> + *                               iommu_hw_info::out_max_pasid_log2 is zero.
>    */
>   enum iommufd_hw_capabilities {
>   	IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0,
> +	IOMMU_HW_CAP_PCI_PASID_EXEC = 1 << 1,
> +	IOMMU_HW_CAP_PCI_PASID_PRIV = 1 << 2,
>   };
>   
>   /**
> @@ -626,6 +638,9 @@ enum iommufd_hw_capabilities {
>    *                 iommu_hw_info_type.
>    * @out_capabilities: Output the generic iommu capability info type as defined
>    *                    in the enum iommu_hw_capabilities.
> + * @out_max_pasid_log2: Output the width of PASIDs. 0 means no PASID support.
> + *                      PCI devices turn to out_capabilities to check if the
> + *                      specific capabilities is supported or not.
>    * @__reserved: Must be 0
>    *
>    * Query an iommu type specific hardware information data from an iommu behind
> @@ -649,7 +664,8 @@ struct iommu_hw_info {
>   	__u32 data_len;
>   	__aligned_u64 data_uptr;
>   	__u32 out_data_type;
> -	__u32 __reserved;
> +	__u8 out_max_pasid_log2;
> +	__u8 __reserved[3];
>   	__aligned_u64 out_capabilities;
>   };
>   #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
> @@ -1014,4 +1030,115 @@ struct iommu_ioas_change_process {
>   #define IOMMU_IOAS_CHANGE_PROCESS \
>   	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS)
>   
> +/**
> + * enum iommu_veventq_flag - flag for struct iommufd_vevent_header
> + * @IOMMU_VEVENTQ_FLAG_LOST_EVENTS: vEVENTQ has lost vEVENTs
> + */
> +enum iommu_veventq_flag {
> +	IOMMU_VEVENTQ_FLAG_LOST_EVENTS = (1U << 0),
> +};
> +
> +/**
> + * struct iommufd_vevent_header - Virtual Event Header for a vEVENTQ Status
> + * @flags: Combination of enum iommu_veventq_flag
> + * @sequence: The sequence index of a vEVENT in the vEVENTQ, with a range of
> + *            [0, INT_MAX] where the following index of INT_MAX is 0
> + *
> + * Each iommufd_vevent_header reports a sequence index of the following vEVENT:
> + *
> + * +----------------------+-------+----------------------+-------+---+-------+
> + * | header0 {sequence=0} | data0 | header1 {sequence=1} | data1 |...| dataN |
> + * +----------------------+-------+----------------------+-------+---+-------+
> + *
> + * And this sequence index is expected to be monotonic to the sequence index of
> + * the previous vEVENT. If two adjacent sequence indexes has a delta larger than
> + * 1, it means that delta - 1 number of vEVENTs has lost, e.g. two lost vEVENTs:
> + *
> + * +-----+----------------------+-------+----------------------+-------+-----+
> + * | ... | header3 {sequence=3} | data3 | header6 {sequence=6} | data6 | ... |
> + * +-----+----------------------+-------+----------------------+-------+-----+
> + *
> + * If a vEVENT lost at the tail of the vEVENTQ and there is no following vEVENT
> + * providing the next sequence index, an IOMMU_VEVENTQ_FLAG_LOST_EVENTS header
> + * would be added to the tail, and no data would follow this header:
> + *
> + * +--+----------------------+-------+-----------------------------------------+
> + * |..| header3 {sequence=3} | data3 | header4 {flags=LOST_EVENTS, sequence=4} |
> + * +--+----------------------+-------+-----------------------------------------+
> + */
> +struct iommufd_vevent_header {
> +	__u32 flags;
> +	__u32 sequence;
> +};
> +
> +/**
> + * enum iommu_veventq_type - Virtual Event Queue Type
> + * @IOMMU_VEVENTQ_TYPE_DEFAULT: Reserved for future use
> + * @IOMMU_VEVENTQ_TYPE_ARM_SMMUV3: ARM SMMUv3 Virtual Event Queue
> + */
> +enum iommu_veventq_type {
> +	IOMMU_VEVENTQ_TYPE_DEFAULT = 0,
> +	IOMMU_VEVENTQ_TYPE_ARM_SMMUV3 = 1,
> +};
> +
> +/**
> + * struct iommu_vevent_arm_smmuv3 - ARM SMMUv3 Virtual Event
> + *                                  (IOMMU_VEVENTQ_TYPE_ARM_SMMUV3)
> + * @evt: 256-bit ARM SMMUv3 Event record, little-endian.
> + *       Reported event records: (Refer to "7.3 Event records" in SMMUv3 HW Spec)
> + *       - 0x04 C_BAD_STE
> + *       - 0x06 F_STREAM_DISABLED
> + *       - 0x08 C_BAD_SUBSTREAMID
> + *       - 0x0a C_BAD_CD
> + *       - 0x10 F_TRANSLATION
> + *       - 0x11 F_ADDR_SIZE
> + *       - 0x12 F_ACCESS
> + *       - 0x13 F_PERMISSION
> + *
> + * StreamID field reports a virtual device ID. To receive a virtual event for a
> + * device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC.
> + */
> +struct iommu_vevent_arm_smmuv3 {
> +	__aligned_le64 evt[4];
> +};
> +
> +/**
> + * struct iommu_veventq_alloc - ioctl(IOMMU_VEVENTQ_ALLOC)
> + * @size: sizeof(struct iommu_veventq_alloc)
> + * @flags: Must be 0
> + * @viommu_id: virtual IOMMU ID to associate the vEVENTQ with
> + * @type: Type of the vEVENTQ. Must be defined in enum iommu_veventq_type
> + * @veventq_depth: Maximum number of events in the vEVENTQ
> + * @out_veventq_id: The ID of the new vEVENTQ
> + * @out_veventq_fd: The fd of the new vEVENTQ. User space must close the
> + *                  successfully returned fd after using it
> + * @__reserved: Must be 0
> + *
> + * Explicitly allocate a virtual event queue interface for a vIOMMU. A vIOMMU
> + * can have multiple FDs for different types, but is confined to one per @type.
> + * User space should open the @out_veventq_fd to read vEVENTs out of a vEVENTQ,
> + * if there are vEVENTs available. A vEVENTQ will lose events due to overflow,
> + * if the number of the vEVENTs hits @veventq_depth.
> + *
> + * Each vEVENT in a vEVENTQ encloses a struct iommufd_vevent_header followed by
> + * a type-specific data structure, in a normal case:
> + *
> + * +-+---------+-------+---------+-------+-----+---------+-------+-+
> + * | | header0 | data0 | header1 | data1 | ... | headerN | dataN | |
> + * +-+---------+-------+---------+-------+-----+---------+-------+-+
> + *
> + * unless a tailing IOMMU_VEVENTQ_FLAG_LOST_EVENTS header is logged (refer to
> + * struct iommufd_vevent_header).
> + */
> +struct iommu_veventq_alloc {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 viommu_id;
> +	__u32 type;
> +	__u32 veventq_depth;
> +	__u32 out_veventq_id;
> +	__u32 out_veventq_fd;
> +	__u32 __reserved;
> +};
> +#define IOMMU_VEVENTQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VEVENTQ_ALLOC)
>   #endif
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 27181b3dd8..e5f3e8b5a0 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -921,6 +921,7 @@ struct kvm_enable_cap {
>   #define KVM_CAP_PRE_FAULT_MEMORY 236
>   #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
>   #define KVM_CAP_X86_GUEST_MODE 238
> +#define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
>   
>   struct kvm_irq_routing_irqchip {
>   	__u32 irqchip;
> diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
> index 17bf191573..113c4ceb78 100644
> --- a/linux-headers/linux/psp-sev.h
> +++ b/linux-headers/linux/psp-sev.h
> @@ -73,13 +73,20 @@ typedef enum {
>   	SEV_RET_INVALID_PARAM,
>   	SEV_RET_RESOURCE_LIMIT,
>   	SEV_RET_SECURE_DATA_INVALID,
> -	SEV_RET_INVALID_KEY = 0x27,
> -	SEV_RET_INVALID_PAGE_SIZE,
> -	SEV_RET_INVALID_PAGE_STATE,
> -	SEV_RET_INVALID_MDATA_ENTRY,
> -	SEV_RET_INVALID_PAGE_OWNER,
> -	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
> -	SEV_RET_RMP_INIT_REQUIRED,
> +	SEV_RET_INVALID_PAGE_SIZE          = 0x0019,
> +	SEV_RET_INVALID_PAGE_STATE         = 0x001A,
> +	SEV_RET_INVALID_MDATA_ENTRY        = 0x001B,
> +	SEV_RET_INVALID_PAGE_OWNER         = 0x001C,
> +	SEV_RET_AEAD_OFLOW                 = 0x001D,
> +	SEV_RET_EXIT_RING_BUFFER           = 0x001F,
> +	SEV_RET_RMP_INIT_REQUIRED          = 0x0020,
> +	SEV_RET_BAD_SVN                    = 0x0021,
> +	SEV_RET_BAD_VERSION                = 0x0022,
> +	SEV_RET_SHUTDOWN_REQUIRED          = 0x0023,
> +	SEV_RET_UPDATE_FAILED              = 0x0024,
> +	SEV_RET_RESTORE_REQUIRED           = 0x0025,
> +	SEV_RET_RMP_INITIALIZATION_FAILED  = 0x0026,
> +	SEV_RET_INVALID_KEY                = 0x0027,
>   	SEV_RET_MAX,
>   } sev_ret_code;
>   
> diff --git a/linux-headers/linux/stddef.h b/linux-headers/linux/stddef.h
> index e1416f7937..e1fcfcf3b3 100644
> --- a/linux-headers/linux/stddef.h
> +++ b/linux-headers/linux/stddef.h
> @@ -70,4 +70,6 @@
>   #define __counted_by_be(m)
>   #endif
>   
> +#define __kernel_nonstring
> +
>   #endif /* _LINUX_STDDEF_H */
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 1b5e254d6a..79bf8c0cc5 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -671,6 +671,7 @@ enum {
>    */
>   enum {
>   	VFIO_AP_REQ_IRQ_INDEX,
> +	VFIO_AP_CFG_CHG_IRQ_INDEX,
>   	VFIO_AP_NUM_IRQS
>   };
>   
> @@ -931,29 +932,34 @@ struct vfio_device_bind_iommufd {
>    * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 19,
>    *					struct vfio_device_attach_iommufd_pt)
>    * @argsz:	User filled size of this data.
> - * @flags:	Must be 0.
> + * @flags:	Flags for attach.
>    * @pt_id:	Input the target id which can represent an ioas or a hwpt
>    *		allocated via iommufd subsystem.
>    *		Output the input ioas id or the attached hwpt id which could
>    *		be the specified hwpt itself or a hwpt automatically created
>    *		for the specified ioas by kernel during the attachment.
> + * @pasid:	The pasid to be attached, only meaningful when
> + *		VFIO_DEVICE_ATTACH_PASID is set in @flags
>    *
>    * Associate the device with an address space within the bound iommufd.
>    * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.  This is only
>    * allowed on cdev fds.
>    *
> - * If a vfio device is currently attached to a valid hw_pagetable, without doing
> - * a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl
> - * passing in another hw_pagetable (hwpt) id is allowed. This action, also known
> - * as a hw_pagetable replacement, will replace the device's currently attached
> - * hw_pagetable with a new hw_pagetable corresponding to the given pt_id.
> + * If a vfio device or a pasid of this device is currently attached to a valid
> + * hw_pagetable (hwpt), without doing a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second
> + * VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl passing in another hwpt id is allowed.
> + * This action, also known as a hw_pagetable replacement, will replace the
> + * currently attached hwpt of the device or the pasid of this device with a new
> + * hwpt corresponding to the given pt_id.
>    *
>    * Return: 0 on success, -errno on failure.
>    */
>   struct vfio_device_attach_iommufd_pt {
>   	__u32	argsz;
>   	__u32	flags;
> +#define VFIO_DEVICE_ATTACH_PASID	(1 << 0)
>   	__u32	pt_id;
> +	__u32	pasid;
>   };
>   
>   #define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 19)
> @@ -962,17 +968,21 @@ struct vfio_device_attach_iommufd_pt {
>    * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
>    *					struct vfio_device_detach_iommufd_pt)
>    * @argsz:	User filled size of this data.
> - * @flags:	Must be 0.
> + * @flags:	Flags for detach.
> + * @pasid:	The pasid to be detached, only meaningful when
> + *		VFIO_DEVICE_DETACH_PASID is set in @flags
>    *
> - * Remove the association of the device and its current associated address
> - * space.  After it, the device should be in a blocking DMA state.  This is only
> - * allowed on cdev fds.
> + * Remove the association of the device or a pasid of the device and its current
> + * associated address space.  After it, the device or the pasid should be in a
> + * blocking DMA state.  This is only allowed on cdev fds.
>    *
>    * Return: 0 on success, -errno on failure.
>    */
>   struct vfio_device_detach_iommufd_pt {
>   	__u32	argsz;
>   	__u32	flags;
> +#define VFIO_DEVICE_DETACH_PASID	(1 << 0)
> +	__u32	pasid;
>   };
>   
>   #define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> index b95dd84eef..328e81badf 100644
> --- a/linux-headers/linux/vhost.h
> +++ b/linux-headers/linux/vhost.h
> @@ -28,10 +28,10 @@
>   
>   /* Set current process as the (exclusive) owner of this file descriptor.  This
>    * must be called before any other vhost command.  Further calls to
> - * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */
> + * VHOST_SET_OWNER fail until VHOST_RESET_OWNER is called. */
>   #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
>   /* Give up ownership, and reset the device to default values.
> - * Allows subsequent call to VHOST_OWNER_SET to succeed. */
> + * Allows subsequent call to VHOST_SET_OWNER to succeed. */
>   #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
>   
>   /* Set up/modify memory layout */
> @@ -235,4 +235,12 @@
>    */
>   #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
>   					      struct vhost_vring_state)
> +
> +/* Extended features manipulation
> + */
> +#ifdef __SIZEOF_INT128__
> +#define VHOST_GET_FEATURES_EX  _IOR(VHOST_VIRTIO, 0x83, __u128)
> +#define VHOST_SET_FEATURES_EX  _IOW(VHOST_VIRTIO, 0x83, __u128)

Suffixing names with _EX is a culture of Windows, and it becomes mess 
when extending multiple times (e.g., VHOST_GET_FEATURES_EX_EX).

I sugguest naming them as VHOST_GET_FEATURES2 and VHOST_SET_FEATURES2 or 
VHOST_GET_FEATURES128 and VHOST_SET_FEATURES128 for clarity.

include/uapi/asm-generic/ioctl.h says:
  * Encoding the size of the parameter structure in the ioctl request
  * is useful for catching programs compiled with old versions
  * and to avoid overwriting user space outside the user buffer area.

So perhaps the intended encoding for an extended ioctl is to keep the 
first and second argument and change only the third parameter. For example:

#define VHOST_GET_FEATURES128 _IOR(VHOST_VIRTIO, 0x00, __u128)
#define VHOST_SET_FEATURES128 _IOW(VHOST_VIRTIO, 0x00, __u128)


> +#endif
> +
>   #endif



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 07/16] virtio-pci: implement support for extended features.
  2025-05-23  7:23   ` Akihiko Odaki
@ 2025-05-23  9:52     ` Paolo Abeni
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23  9:52 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 9:23 AM, Akihiko Odaki wrote:
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index 0fa8fe4955..7815ef2d9b 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -123,7 +123,8 @@ static const VMStateDescription vmstate_virtio_pci_modern_state_sub = {
>>       .fields = (const VMStateField[]) {
>>           VMSTATE_UINT32(dfselect, VirtIOPCIProxy),
>>           VMSTATE_UINT32(gfselect, VirtIOPCIProxy),
>> -        VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy, 2),
>> +        VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy,
>> +                             VIRTIO_FEATURES_WORDS),
> 
> Modifying existing fields breaks migration across versions. Please refer 
> to docs/devel/migration/main.rst for details.

Thanks for the pointer! I missed a lot of context. I guess I need some
trickery similar to the "virtio/64bit_features"/"virtio/128bit_features"
VMstate description.

/P



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 12/16] virtio-net: implement extended features support.
  2025-05-23  8:09   ` Akihiko Odaki
@ 2025-05-23 10:01     ` Paolo Abeni
  2025-05-23 10:14       ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23 10:01 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 10:09 AM, Akihiko Odaki wrote:
> On 2025/05/21 20:34, Paolo Abeni wrote:
>> Use the extended types and helpers to manipulate the virtio_net
>> features.
>>
>> Note that offloads are still 64bits wide, as per specification,
>> and extended offloads will be mapped into such range.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>   hw/net/virtio-net.c            | 87 +++++++++++++++++++++-------------
>>   include/hw/virtio/virtio-net.h |  2 +-
>>   2 files changed, 55 insertions(+), 34 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 9f500c64e7..193469fc27 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -90,6 +90,17 @@
>>                                            VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
>>                                            VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
>>   
>> +#define VIRTIO_OFFLOAD_MAP_MIN    46
>> +#define VIRTIO_OFFLOAD_MAP_LENGTH 4
>> +#define VIRTIO_OFFLOAD_MAP        MAKE_64BIT_MASK(VIRTIO_OFFLOAD_MAP_MIN, \
>> +                                                VIRTIO_OFFLOAD_MAP_LENGTH)
>> +#define VIRTIO_FEATURES_MAP_MIN   65
>> +#define VIRTIO_O2F_DELTA          (VIRTIO_FEATURES_MAP_MIN - \
>> +                                   VIRTIO_OFFLOAD_MAP_MIN)
>> +
>> +#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)  (fbit >= 64 ? \
>> +                                          fbit - VIRTIO_O2F_DELTA : fbit)
>> +
> 
> These are specific to virtio-net but look like they are common for 
> virtio as the names don't contain "NET".
> 
> VIRTIO_FEATURES_MAP_MIN is also a bit confusing. It points to the least 
> significant bit that refers to an offloading feature in the upper-half 
> of the feature bits, but the name lacks the context.

Uhmmm... putting the whole context in the macro name sounds very verbose
and/or hard, what about:

How about VIRTIO_NET_OFFLOAD_MAPPED_MIN

?

> @@ -862,13 +881,13 @@ static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
>>           (1ULL << VIRTIO_NET_F_GUEST_USO4) |
>>           (1ULL << VIRTIO_NET_F_GUEST_USO6);
>>   
>> -    return guest_offloads_mask & features;
>> +    return guest_offloads_mask & virtio_net_features_to_offload(features);
> 
> 
> How about:
> 
> static const virtio_features_t guest_offload_features_mask = ...
> virtio_features_t masked_features = guest_offload_features_mask & features;
> 
> return masked_features | ((masked_features >> VIRTIO_FEATURES_MAP_MIN) 
> << VIRTIO_OFFLOAD_MAP_MIN);
> 
> This makes virtio_net_features_to_offload() unnecessary.

The above looks a little fragile, as (in future) 'features' could have
some bit in the mapped range set (and not representing a guest offload):
we need to explicitly mask such bits out before the first '&' operator.

If dropping the helper is preferred, it can still be dropped.

/P



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next
  2025-05-23  9:50   ` Akihiko Odaki
@ 2025-05-23 10:06     ` Paolo Abeni
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23 10:06 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 11:50 AM, Akihiko Odaki wrote:
> On 2025/05/21 20:33, Paolo Abeni wrote:
>> @@ -235,4 +235,12 @@
>>    */
>>   #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
>>   					      struct vhost_vring_state)
>> +
>> +/* Extended features manipulation
>> + */
>> +#ifdef __SIZEOF_INT128__
>> +#define VHOST_GET_FEATURES_EX  _IOR(VHOST_VIRTIO, 0x83, __u128)
>> +#define VHOST_SET_FEATURES_EX  _IOW(VHOST_VIRTIO, 0x83, __u128)
> 
> Suffixing names with _EX is a culture of Windows, and it becomes mess 
> when extending multiple times (e.g., VHOST_GET_FEATURES_EX_EX).
> 
> I sugguest naming them as VHOST_GET_FEATURES2 and VHOST_SET_FEATURES2 or 
> VHOST_GET_FEATURES128 and VHOST_SET_FEATURES128 for clarity.
> 
> include/uapi/asm-generic/ioctl.h says:
>   * Encoding the size of the parameter structure in the ioctl request
>   * is useful for catching programs compiled with old versions
>   * and to avoid overwriting user space outside the user buffer area.
> 
> So perhaps the intended encoding for an extended ioctl is to keep the 
> first and second argument and change only the third parameter. For example:
> 
> #define VHOST_GET_FEATURES128 _IOR(VHOST_VIRTIO, 0x00, __u128)
> #define VHOST_SET_FEATURES128 _IOW(VHOST_VIRTIO, 0x00, __u128)

Thanks, I like that the latter form. I'll do in the next revision.

BTW, the reference to that legacy OS really hurts here :-P

/P



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 12/16] virtio-net: implement extended features support.
  2025-05-23 10:01     ` Paolo Abeni
@ 2025-05-23 10:14       ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23 10:14 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 19:01, Paolo Abeni wrote:
> On 5/23/25 10:09 AM, Akihiko Odaki wrote:
>> On 2025/05/21 20:34, Paolo Abeni wrote:
>>> Use the extended types and helpers to manipulate the virtio_net
>>> features.
>>>
>>> Note that offloads are still 64bits wide, as per specification,
>>> and extended offloads will be mapped into such range.
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>>    hw/net/virtio-net.c            | 87 +++++++++++++++++++++-------------
>>>    include/hw/virtio/virtio-net.h |  2 +-
>>>    2 files changed, 55 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 9f500c64e7..193469fc27 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -90,6 +90,17 @@
>>>                                             VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
>>>                                             VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
>>>    
>>> +#define VIRTIO_OFFLOAD_MAP_MIN    46
>>> +#define VIRTIO_OFFLOAD_MAP_LENGTH 4
>>> +#define VIRTIO_OFFLOAD_MAP        MAKE_64BIT_MASK(VIRTIO_OFFLOAD_MAP_MIN, \
>>> +                                                VIRTIO_OFFLOAD_MAP_LENGTH)
>>> +#define VIRTIO_FEATURES_MAP_MIN   65
>>> +#define VIRTIO_O2F_DELTA          (VIRTIO_FEATURES_MAP_MIN - \
>>> +                                   VIRTIO_OFFLOAD_MAP_MIN)
>>> +
>>> +#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)  (fbit >= 64 ? \
>>> +                                          fbit - VIRTIO_O2F_DELTA : fbit)
>>> +
>>
>> These are specific to virtio-net but look like they are common for
>> virtio as the names don't contain "NET".
>>
>> VIRTIO_FEATURES_MAP_MIN is also a bit confusing. It points to the least
>> significant bit that refers to an offloading feature in the upper-half
>> of the feature bits, but the name lacks the context.
> 
> Uhmmm... putting the whole context in the macro name sounds very verbose
> and/or hard, what about:
> 
> How about VIRTIO_NET_OFFLOAD_MAPPED_MIN
> 
> ?

It looks like it represents a bit in the 64-bit mapping 
(VIRTIO_OFFLOAD_MAP) instead of a feature bit as it contains "MAPPED" 
while it doesn't contain "FEATURE".

Perhaps VIRTIO_OFFLOAD_MAP is the one that is confusing. As it is 
intended to be a compact 64-bit representation, how about:

VIRTIO_OFFLOAD_MAP -> VIRTIO_NET_OFFLOAD64
VIRTIO_FEATURE_TO_OFFLOAD -> VIRTIO_NET_FEATURE_TO_OFFLOAD64
VIRTIO_FEATURES_MAP_MIN -> VIRTIO_NET_OFFLOAD_FEATURE_MIN

> 
>> @@ -862,13 +881,13 @@ static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
>>>            (1ULL << VIRTIO_NET_F_GUEST_USO4) |
>>>            (1ULL << VIRTIO_NET_F_GUEST_USO6);
>>>    
>>> -    return guest_offloads_mask & features;
>>> +    return guest_offloads_mask & virtio_net_features_to_offload(features);
>>
>>
>> How about:
>>
>> static const virtio_features_t guest_offload_features_mask = ...
>> virtio_features_t masked_features = guest_offload_features_mask & features;
>>
>> return masked_features | ((masked_features >> VIRTIO_FEATURES_MAP_MIN)
>> << VIRTIO_OFFLOAD_MAP_MIN);
>>
>> This makes virtio_net_features_to_offload() unnecessary.
> 
> The above looks a little fragile, as (in future) 'features' could have
> some bit in the mapped range set (and not representing a guest offload):
> we need to explicitly mask such bits out before the first '&' operator.
My suggestion is to mask all feature bits that don't represent guest 
offloading first. masked_features should only contain guest offload 
features.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 13/16] net: implement tunnel probing
  2025-05-23  7:39   ` Akihiko Odaki
@ 2025-05-23 10:24     ` Paolo Abeni
  2025-05-23 10:32       ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23 10:24 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 9:39 AM, Akihiko Odaki wrote:
>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>> index 22ec2f45d2..2df601551e 100644
>> --- a/net/tap-linux.c
>> +++ b/net/tap-linux.c
>> @@ -37,6 +37,14 @@
>>   
>>   #define PATH_NET_TUN "/dev/net/tun"
>>   
>> +#ifndef TUN_F_UDP_TUNNEL_GSO
>> +#define TUN_F_UDP_TUNNEL_GSO       0x080
>> +#endif
>> +
>> +#ifndef TUN_F_UDP_TUNNEL_GSO_CSUM
>> +#define TUN_F_UDP_TUNNEL_GSO_CSUM  0x100
>> +#endif
>> +
> 
> These should be added to net/tap-linux.h, which contains other UAPI 
> definitions.
> 
> But perhaps it may be better to refactor it to add the real header file 
> using scripts/update-linux-headers.sh. Such a refactoring can be done 
> before this series gets ready to merge and will make this series a bit 
> smaller.

I may be missing something, but I don't think such refactor will make
this series relevantly smaller?!? Also, it looks something quite
orthogonal to me.

I propose to just move the above definition in net/tap-linux.h, if possible.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 13/16] net: implement tunnel probing
  2025-05-23 10:24     ` Paolo Abeni
@ 2025-05-23 10:32       ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23 10:32 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 19:24, Paolo Abeni wrote:
> On 5/23/25 9:39 AM, Akihiko Odaki wrote:
>>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>>> index 22ec2f45d2..2df601551e 100644
>>> --- a/net/tap-linux.c
>>> +++ b/net/tap-linux.c
>>> @@ -37,6 +37,14 @@
>>>    
>>>    #define PATH_NET_TUN "/dev/net/tun"
>>>    
>>> +#ifndef TUN_F_UDP_TUNNEL_GSO
>>> +#define TUN_F_UDP_TUNNEL_GSO       0x080
>>> +#endif
>>> +
>>> +#ifndef TUN_F_UDP_TUNNEL_GSO_CSUM
>>> +#define TUN_F_UDP_TUNNEL_GSO_CSUM  0x100
>>> +#endif
>>> +
>>
>> These should be added to net/tap-linux.h, which contains other UAPI
>> definitions.
>>
>> But perhaps it may be better to refactor it to add the real header file
>> using scripts/update-linux-headers.sh. Such a refactoring can be done
>> before this series gets ready to merge and will make this series a bit
>> smaller.
> 
> I may be missing something, but I don't think such refactor will make
> this series relevantly smaller?!? Also, it looks something quite
> orthogonal to me.
> 
> I propose to just move the above definition in net/tap-linux.h, if possible.

You can get rid of this 8 lines and that's all. Just moving to 
net/tap-linux.h is also fine.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-23  8:16   ` Akihiko Odaki
@ 2025-05-23 10:40     ` Paolo Abeni
  2025-05-23 10:54       ` Akihiko Odaki
  2025-05-23 11:35       ` Akihiko Odaki
  0 siblings, 2 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23 10:40 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 10:16 AM, Akihiko Odaki wrote:
> On 2025/05/21 20:34, Paolo Abeni wrote:
>> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>>          .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>>          .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>>          .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
>> +#ifdef CONFIG_INT128
>> +       .tnl  = !!(n->curr_guest_offloads &
>> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
>> +       .tnl_csum = !!(n->curr_guest_offloads &
>> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
> 
> "[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a 
> struct for offloading, but how about passing n->curr_guest_offloads as 
> is instead?
> 
> It loses some type safety and makes it prone to have unknown bits, but 
> omitting duplicate these bit operations may outweigh the downside.

I *think* that one of the relevant point about the current interface is
that qemu_set_offload() abstracts from the virtio specifics, as it's
also used by other drivers. Forcing them to covert the to-be-configured
offloads to a virtio specific bitmask sound incorrect to me. Possibly I
misread your suggestion?

[...]
>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>> index aa5f3a6e22..b7662ece63 100644
>> --- a/net/tap-linux.c
>> +++ b/net/tap-linux.c
>> @@ -287,6 +287,12 @@ void tap_fd_set_offload(int fd, const NetOffloads *ol)
>>           if (ol->uso6) {
>>               offload |= TUN_F_USO6;
>>           }
>> +        if ((ol->tso4 || ol->tso6 || ol->uso4 || ol->uso6) && ol->tnl) {
> 
> Is it possible to have ol->tnl without TSO or USO? If so, is ignoring 
> ol->tnl really what you want?

The virtio specifications actually prevent setting UDP-tunnel offload
without any other "inner" offload (TSO or USO), as it makes little to no
sense (the stack can't GSO/GRO the outer header without doing the same
for the inner).

Does the above makes sense/answer your questions?

/P



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-23 10:40     ` Paolo Abeni
@ 2025-05-23 10:54       ` Akihiko Odaki
  2025-05-23 11:06         ` Paolo Abeni
  2025-05-23 11:35       ` Akihiko Odaki
  1 sibling, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23 10:54 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 19:40, Paolo Abeni wrote:
> On 5/23/25 10:16 AM, Akihiko Odaki wrote:
>> On 2025/05/21 20:34, Paolo Abeni wrote:
>>> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>>>           .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>>>           .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>>>           .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
>>> +#ifdef CONFIG_INT128
>>> +       .tnl  = !!(n->curr_guest_offloads &
>>> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
>>> +       .tnl_csum = !!(n->curr_guest_offloads &
>>> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
>>
>> "[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a
>> struct for offloading, but how about passing n->curr_guest_offloads as
>> is instead?
>>
>> It loses some type safety and makes it prone to have unknown bits, but
>> omitting duplicate these bit operations may outweigh the downside.
> 
> I *think* that one of the relevant point about the current interface is
> that qemu_set_offload() abstracts from the virtio specifics, as it's
> also used by other drivers. Forcing them to covert the to-be-configured
> offloads to a virtio specific bitmask sound incorrect to me. Possibly I
> misread your suggestion?
> 
> [...]
>>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>>> index aa5f3a6e22..b7662ece63 100644
>>> --- a/net/tap-linux.c
>>> +++ b/net/tap-linux.c
>>> @@ -287,6 +287,12 @@ void tap_fd_set_offload(int fd, const NetOffloads *ol)
>>>            if (ol->uso6) {
>>>                offload |= TUN_F_USO6;
>>>            }
>>> +        if ((ol->tso4 || ol->tso6 || ol->uso4 || ol->uso6) && ol->tnl) {
>>
>> Is it possible to have ol->tnl without TSO or USO? If so, is ignoring
>> ol->tnl really what you want?
> 
> The virtio specifications actually prevent setting UDP-tunnel offload
> without any other "inner" offload (TSO or USO), as it makes little to no
> sense (the stack can't GSO/GRO the outer header without doing the same
> for the inner).
> 
> Does the above makes sense/answer your questions?

The code implies the following:
1a. ol->tnl may be true while TSO and USO are disabled.
2a. It is defined as no-op in such a case.

But the reality is as follows:
1b. ol->tnl being true while TSO and USO are disabled is an error.
2b. The consequence is undefined in such a case.

In that case, virtio_net_get_features() should report the error for 1b, 
which will prevent the error condition from reaching to 
tap_fd_set_offload().

Making the error condition no-op in tap_fd_set_offload() does not make 
it (more) correct as the consequence is undefined anyway (2b). It may 
simply ignore the condition under the assumption that it will never 
happen or assert that assumption.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-23 10:54       ` Akihiko Odaki
@ 2025-05-23 11:06         ` Paolo Abeni
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23 11:06 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 12:54 PM, Akihiko Odaki wrote:
> On 2025/05/23 19:40, Paolo Abeni wrote:
>> On 5/23/25 10:16 AM, Akihiko Odaki wrote:
>>> On 2025/05/21 20:34, Paolo Abeni wrote:
>>>> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>>>>           .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>>>>           .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>>>>           .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
>>>> +#ifdef CONFIG_INT128
>>>> +       .tnl  = !!(n->curr_guest_offloads &
>>>> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
>>>> +       .tnl_csum = !!(n->curr_guest_offloads &
>>>> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
>>>
>>> "[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a
>>> struct for offloading, but how about passing n->curr_guest_offloads as
>>> is instead?
>>>
>>> It loses some type safety and makes it prone to have unknown bits, but
>>> omitting duplicate these bit operations may outweigh the downside.
>>
>> I *think* that one of the relevant point about the current interface is
>> that qemu_set_offload() abstracts from the virtio specifics, as it's
>> also used by other drivers. Forcing them to covert the to-be-configured
>> offloads to a virtio specific bitmask sound incorrect to me. Possibly I
>> misread your suggestion?
>>
>> [...]
>>>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>>>> index aa5f3a6e22..b7662ece63 100644
>>>> --- a/net/tap-linux.c
>>>> +++ b/net/tap-linux.c
>>>> @@ -287,6 +287,12 @@ void tap_fd_set_offload(int fd, const NetOffloads *ol)
>>>>            if (ol->uso6) {
>>>>                offload |= TUN_F_USO6;
>>>>            }
>>>> +        if ((ol->tso4 || ol->tso6 || ol->uso4 || ol->uso6) && ol->tnl) {
>>>
>>> Is it possible to have ol->tnl without TSO or USO? If so, is ignoring
>>> ol->tnl really what you want?
>>
>> The virtio specifications actually prevent setting UDP-tunnel offload
>> without any other "inner" offload (TSO or USO), as it makes little to no
>> sense (the stack can't GSO/GRO the outer header without doing the same
>> for the inner).
>>
>> Does the above makes sense/answer your questions?
> 
> The code implies the following:
> 1a. ol->tnl may be true while TSO and USO are disabled.
> 2a. It is defined as no-op in such a case.
> 
> But the reality is as follows:
> 1b. ol->tnl being true while TSO and USO are disabled is an error.
> 2b. The consequence is undefined in such a case.
> 
> In that case, virtio_net_get_features() should report the error for 1b, 
> which will prevent the error condition from reaching to 
> tap_fd_set_offload().
> 
> Making the error condition no-op in tap_fd_set_offload() does not make 
> it (more) correct as the consequence is undefined anyway (2b). It may 
> simply ignore the condition under the assumption that it will never 
> happen or assert that assumption.

I see. I'll add the sanity check in virtio_net_get_features() and will
add an assert in the tap code.

Thanks!

/P



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-23 10:40     ` Paolo Abeni
  2025-05-23 10:54       ` Akihiko Odaki
@ 2025-05-23 11:35       ` Akihiko Odaki
  2025-05-23 14:46         ` Paolo Abeni
  1 sibling, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-23 11:35 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 19:40, Paolo Abeni wrote:
> On 5/23/25 10:16 AM, Akihiko Odaki wrote:
>> On 2025/05/21 20:34, Paolo Abeni wrote:
>>> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>>>           .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>>>           .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>>>           .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
>>> +#ifdef CONFIG_INT128
>>> +       .tnl  = !!(n->curr_guest_offloads &
>>> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
>>> +       .tnl_csum = !!(n->curr_guest_offloads &
>>> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
>>
>> "[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a
>> struct for offloading, but how about passing n->curr_guest_offloads as
>> is instead?
>>
>> It loses some type safety and makes it prone to have unknown bits, but
>> omitting duplicate these bit operations may outweigh the downside.
> 
> I *think* that one of the relevant point about the current interface is
> that qemu_set_offload() abstracts from the virtio specifics, as it's
> also used by other drivers. Forcing them to covert the to-be-configured
> offloads to a virtio specific bitmask sound incorrect to me. Possibly I
> misread your suggestion?
> 

virtio is also an interface, and we can reuse it for QEMU-internal 
interfaces too if it is appropriate.

That said, the feature bitmask defined by virtio is inappropriate for 
for qemu_set_offload() because it also contains other features not 
related to guest offloading. We need an alternative interface, and the 
current qemu_set_offload() just passes each flag separately.

Now, "[PATCH RFC 12/16] virtio-net: implement extended features 
support." is adding another format that derives from virtio for guest 
offloading. This format only contains bits related to guest offloading 
by definition and suits well with qemu_set_offload().

Bit names like VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO will imply that it 
derives from the virtio spec I think this is actually an improvement; 
the virtio spec have been the definitive document of the offloading 
features of tuntap, and some features even used the virtio header (so 
e1000e and igb parse and build virtio headers). These bit names make 
this relationship between tuntap and the virtio spec explicit.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-23 11:35       ` Akihiko Odaki
@ 2025-05-23 14:46         ` Paolo Abeni
  2025-05-24  4:13           ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Abeni @ 2025-05-23 14:46 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/23/25 1:35 PM, Akihiko Odaki wrote:
> On 2025/05/23 19:40, Paolo Abeni wrote:
>> On 5/23/25 10:16 AM, Akihiko Odaki wrote:
>>> On 2025/05/21 20:34, Paolo Abeni wrote:
>>>> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>>>>           .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>>>>           .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>>>>           .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
>>>> +#ifdef CONFIG_INT128
>>>> +       .tnl  = !!(n->curr_guest_offloads &
>>>> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
>>>> +       .tnl_csum = !!(n->curr_guest_offloads &
>>>> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
>>>
>>> "[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a
>>> struct for offloading, but how about passing n->curr_guest_offloads as
>>> is instead?
>>>
>>> It loses some type safety and makes it prone to have unknown bits, but
>>> omitting duplicate these bit operations may outweigh the downside.
>>
>> I *think* that one of the relevant point about the current interface is
>> that qemu_set_offload() abstracts from the virtio specifics, as it's
>> also used by other drivers. Forcing them to covert the to-be-configured
>> offloads to a virtio specific bitmask sound incorrect to me. Possibly I
>> misread your suggestion?
>>
> 
> virtio is also an interface, and we can reuse it for QEMU-internal 
> interfaces too if it is appropriate.
> 
> That said, the feature bitmask defined by virtio is inappropriate for 
> for qemu_set_offload() because it also contains other features not 
> related to guest offloading. We need an alternative interface, and the 
> current qemu_set_offload() just passes each flag separately.
> 
> Now, "[PATCH RFC 12/16] virtio-net: implement extended features 
> support." is adding another format that derives from virtio for guest 
> offloading. This format only contains bits related to guest offloading 
> by definition and suits well with qemu_set_offload().
> 
> Bit names like VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO will imply that it 
> derives from the virtio spec I think this is actually an improvement; 
> the virtio spec have been the definitive document of the offloading 
> features of tuntap, and some features even used the virtio header (so 
> e1000e and igb parse and build virtio headers). These bit names make 
> this relationship between tuntap and the virtio spec explicit.

Let me check we are on the same page. You are suggesting the following:

- change set_offload() signature to:
typedef void (SetOffload)(NetClientState *, uint64_t);

- define VIRTIO_NET_O_GUEST_<offload> masks for known/supported offload
in include/net/net.h (including TSO, USO, etc...)

- adapt the drivers to the above interface.

- move this patch as series pre-req.

Am I correct?

Thanks,

Paolo




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 15/16] net: implement tnl feature offloading
  2025-05-23 14:46         ` Paolo Abeni
@ 2025-05-24  4:13           ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2025-05-24  4:13 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 23:46, Paolo Abeni wrote:
> On 5/23/25 1:35 PM, Akihiko Odaki wrote:
>> On 2025/05/23 19:40, Paolo Abeni wrote:
>>> On 5/23/25 10:16 AM, Akihiko Odaki wrote:
>>>> On 2025/05/21 20:34, Paolo Abeni wrote:
>>>>> @@ -890,6 +915,12 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
>>>>>            .ufo  = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)),
>>>>>            .uso4 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO4)),
>>>>>            .uso6 = !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_USO6)),
>>>>> +#ifdef CONFIG_INT128
>>>>> +       .tnl  = !!(n->curr_guest_offloads &
>>>>> +                  (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO)),
>>>>> +       .tnl_csum = !!(n->curr_guest_offloads &
>>>>> +                      (1ULL << VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO_CSUM)),
>>>>
>>>> "[PATCH RFC 14/16] net: bundle all offloads in a single struct" added a
>>>> struct for offloading, but how about passing n->curr_guest_offloads as
>>>> is instead?
>>>>
>>>> It loses some type safety and makes it prone to have unknown bits, but
>>>> omitting duplicate these bit operations may outweigh the downside.
>>>
>>> I *think* that one of the relevant point about the current interface is
>>> that qemu_set_offload() abstracts from the virtio specifics, as it's
>>> also used by other drivers. Forcing them to covert the to-be-configured
>>> offloads to a virtio specific bitmask sound incorrect to me. Possibly I
>>> misread your suggestion?
>>>
>>
>> virtio is also an interface, and we can reuse it for QEMU-internal
>> interfaces too if it is appropriate.
>>
>> That said, the feature bitmask defined by virtio is inappropriate for
>> for qemu_set_offload() because it also contains other features not
>> related to guest offloading. We need an alternative interface, and the
>> current qemu_set_offload() just passes each flag separately.
>>
>> Now, "[PATCH RFC 12/16] virtio-net: implement extended features
>> support." is adding another format that derives from virtio for guest
>> offloading. This format only contains bits related to guest offloading
>> by definition and suits well with qemu_set_offload().
>>
>> Bit names like VIRTIO_NET_O_GUEST_UDP_TUNNEL_GSO will imply that it
>> derives from the virtio spec I think this is actually an improvement;
>> the virtio spec have been the definitive document of the offloading
>> features of tuntap, and some features even used the virtio header (so
>> e1000e and igb parse and build virtio headers). These bit names make
>> this relationship between tuntap and the virtio spec explicit.
> 
> Let me check we are on the same page. You are suggesting the following:
> 
> - change set_offload() signature to:
> typedef void (SetOffload)(NetClientState *, uint64_t);
> 
> - define VIRTIO_NET_O_GUEST_<offload> masks for known/supported offload
> in include/net/net.h (including TSO, USO, etc...)
> 
> - adapt the drivers to the above interface.
> 
> - move this patch as series pre-req.
> 
> Am I correct?

Yes, that's what I meant.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout
  2025-05-23  8:22   ` Akihiko Odaki
@ 2025-05-28  3:04     ` Lei Yang
  0 siblings, 0 replies; 42+ messages in thread
From: Lei Yang @ 2025-05-28  3:04 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: qemu-devel, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster, Akihiko Odaki

Tested this series of patches with vhost-net regression tests,
everything works fine.

Tested-by: Lei Yang <leiyang@redhat.com>

On Fri, May 23, 2025 at 4:24 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> On 2025/05/21 20:34, Paolo Abeni wrote:
> > When the GSO over UDP tunnel offload is enabled, the virtio net
> > header includes additional fields to support such offload.
> >
> > The vhost backend must be aware of the exact header layout, to
> > copy it correctly. The tunnel-related field are present if either
> > the guest or the host negotiated any UDP tunnel related feature:
> > add them to host kernel supported features list, to allow qemu
> > transder to such backend the needed information.
>
> s/transder/transfer/
>
> This patch should be squashed into the previous patch ("[PATCH RFC
> 15/16] net: implement tnl feature offloading") as QEMU only with the
> previous patch will incorrectly enable tunnel offloading even when vhost
> doesn't support it.
>
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >   hw/net/vhost_net.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 58d7619fc8..c8e02d1732 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -52,6 +52,10 @@ static const int kernel_feature_bits[] = {
> >       VIRTIO_F_NOTIFICATION_DATA,
> >       VIRTIO_NET_F_RSC_EXT,
> >       VIRTIO_NET_F_HASH_REPORT,
> > +#ifdef CONFIG_INT128
> > +    VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
> > +    VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
> > +#endif
> >       VHOST_INVALID_FEATURE_BIT
> >   };
> >
>
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel
  2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
                   ` (16 preceding siblings ...)
  2025-05-23  7:19 ` [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Akihiko Odaki
@ 2025-06-17 15:01 ` Paolo Abeni
  17 siblings, 0 replies; 42+ messages in thread
From: Paolo Abeni @ 2025-06-17 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Akihiko Odaki, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 5/21/25 1:33 PM, Paolo Abeni wrote:
> Some virtualized deployments use UDP tunnel pervasively and are impacted
> negatively by the lack of GSO support for such kind of traffic in the
> virtual NIC driver.
> 
> The virtio_net specification recently introduced support for GSO over
> UDP tunnel, this series updates the virtio implementation to support
> such a feature.
> 
> One of the reasons for the RFC tag is that the kernel-side
> implementation has just been shared upstream and is not merged yet, but
> there are also other relevant reasons, see below.
> 
> Currently, the kernel virtio support limits the feature space to 64 bits,
> while the virtio specification allows for a larger number of features.
> Specifically, the GSO-over-UDP-tunnel-related virtio features use bits
> 65-69; the larger part of this series (patches 2-11) actually deals with
> the extended feature space.
> 
> I tried to minimize the otherwise very large code churn by limiting the
> extended features support to arches with native 128 integer support and
> introducing the extended features space support only in virtio/vhost
> core and in the relevant device driver.
> 
> The actual offload implementation is in patches 12-16 and boils down to
> propagating the new offload to the tun devices and the vhost backend.
> 
> Tested with basic stream transfer with all the possible permutations of
> host kernel/qemu/guest kernel with/without GSO over UDP tunnel support
> and vs snapshots creation and restore.
> 
> Notably this does not include (yet) any additional tests. Some guidance
> on such matter would be really appreciated, and any feedback about the
> features extension strategy would be more than welcome!

Some times has passed, and I haven't followed-up yet with a v2. The root
cause (beyond the usual ENOTIME ;) is that I'm focusing on the
kernel-side patches, which I almost co-posted with this RFC, and are
still under discussion.

Since the outcome of such discussion will influence also the userland,
I'll wait it to settle before sharing a new revision here.

It could take still some time, and the new revision will likely present
significant different from v1, especially WRT virtio feature space
expansion - as working on the kernel code showed a possibly better approach.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel
  2025-05-23  9:43   ` Paolo Abeni
  2025-05-23  9:48     ` Akihiko Odaki
@ 2025-06-21  6:39     ` Akihiko Odaki
  1 sibling, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2025-06-21  6:39 UTC (permalink / raw)
  To: Paolo Abeni, qemu-devel
  Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Dmitry Fleytman, Jason Wang, Sriram Yagnaraman,
	Michael S. Tsirkin, Stefano Garzarella, Peter Xu, Fabiano Rosas,
	Cornelia Huck, Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
	Eric Blake, Markus Armbruster

On 2025/05/23 18:43, Paolo Abeni wrote:
> On 5/23/25 9:19 AM, Akihiko Odaki wrote:
>> On 2025/05/21 20:33, Paolo Abeni wrote:
>>> Some virtualized deployments use UDP tunnel pervasively and are impacted
>>> negatively by the lack of GSO support for such kind of traffic in the
>>> virtual NIC driver.
>>>
>>> The virtio_net specification recently introduced support for GSO over
>>> UDP tunnel, this series updates the virtio implementation to support
>>> such a feature.
>>>
>>> One of the reasons for the RFC tag is that the kernel-side
>>> implementation has just been shared upstream and is not merged yet, but
>>> there are also other relevant reasons, see below.
>>>
>>> Currently, the kernel virtio support limits the feature space to 64 bits,
>>> while the virtio specification allows for a larger number of features.
>>> Specifically, the GSO-over-UDP-tunnel-related virtio features use bits
>>> 65-69; the larger part of this series (patches 2-11) actually deals with
>>> the extended feature space.
>>>
>>> I tried to minimize the otherwise very large code churn by limiting the
>>> extended features support to arches with native 128 integer support and
>>> introducing the extended features space support only in virtio/vhost
>>> core and in the relevant device driver.
>>
>> What about adding another 64-bit integer to hold the high bits? It makes
>> adding the 128-bit integer type to VMState and properties and
>> CONFIG_INT128 checks unnecessary.
> 
> I did a few others implementation attempts before the current one. The
> closes to the above proposal I tried was to implement virtio_features_t
> as fixed size array of u64.
> 
> A problem a found with that approach is that it requires a very large
> code churn, as ~ every line touching a feature related variable should
> be modified.
> 
> Let me think a little bit on this other option (I hope to avoid
> discarding a lot of work here).
> 
>>> The actual offload implementation is in patches 12-16 and boils down to
>>> propagating the new offload to the tun devices and the vhost backend.
>>>
>>> Tested with basic stream transfer with all the possible permutations of
>>> host kernel/qemu/guest kernel with/without GSO over UDP tunnel support
>>> and vs snapshots creation and restore.
>>>
>>> Notably this does not include (yet) any additional tests. Some guidance
>>> on such matter would be really appreciated, and any feedback about the
>>> features extension strategy would be more than welcome!
>>
>> My proposal to add a feature to tap devices[1] simply omitted tests and
>> I wrote simple testing scripts for my personal usage. As you can see,
>> there is no testing code that covers tap devices, unfortunately, and I
>> think adding one takes significant effort.
>>
>> [1] https://patchew.org/QEMU/20250313-hash-v4-0-c75c494b495e@daynix.com/
> 
> Thanks for the pointer

By the way, I did write selftests for the kernel-side change, which you 
can find at:
https://lore.kernel.org/r/20250530-rss-v12-0-95d8b348de91@daynix.com/

Perhaps you may be able to steal and tweak it for the UDP tunnel feature.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2025-06-21  6:41 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 11:33 [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Paolo Abeni
2025-05-21 11:33 ` [PATCH RFC 01/16] linux-headers: Update to Linux v6.15-rc net-next Paolo Abeni
2025-05-23  9:50   ` Akihiko Odaki
2025-05-23 10:06     ` Paolo Abeni
2025-05-21 11:33 ` [PATCH RFC 02/16] migration: introduce support for 128 bit int state Paolo Abeni
2025-05-21 11:33 ` [PATCH RFC 03/16] virtio: introduce extended features type Paolo Abeni
2025-05-21 11:33 ` [PATCH RFC 04/16] virtio: serialize extended features state Paolo Abeni
2025-05-21 11:33 ` [PATCH RFC 05/16] qmp: update virtio features map to support extended features Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 06/16] virtio: add support for negotiating " Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 07/16] virtio-pci: implement support for " Paolo Abeni
2025-05-23  7:23   ` Akihiko Odaki
2025-05-23  9:52     ` Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 08/16] vhost: add support for negotiating " Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 09/16] vhost-backend: implement extended features support Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 10/16] vhost-net: " Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 11/16] qdev-properties: add property for extended virtio features Paolo Abeni
2025-05-21 11:34 ` [PATCH RFC 12/16] virtio-net: implement extended features support Paolo Abeni
2025-05-23  8:09   ` Akihiko Odaki
2025-05-23 10:01     ` Paolo Abeni
2025-05-23 10:14       ` Akihiko Odaki
2025-05-21 11:34 ` [PATCH RFC 13/16] net: implement tunnel probing Paolo Abeni
2025-05-23  7:39   ` Akihiko Odaki
2025-05-23 10:24     ` Paolo Abeni
2025-05-23 10:32       ` Akihiko Odaki
2025-05-21 11:34 ` [PATCH RFC 14/16] net: bundle all offloads in a single struct Paolo Abeni
2025-05-23  7:45   ` Akihiko Odaki
2025-05-21 11:34 ` [PATCH RFC 15/16] net: implement tnl feature offloading Paolo Abeni
2025-05-23  8:16   ` Akihiko Odaki
2025-05-23 10:40     ` Paolo Abeni
2025-05-23 10:54       ` Akihiko Odaki
2025-05-23 11:06         ` Paolo Abeni
2025-05-23 11:35       ` Akihiko Odaki
2025-05-23 14:46         ` Paolo Abeni
2025-05-24  4:13           ` Akihiko Odaki
2025-05-21 11:34 ` [PATCH RFC 16/16] net: make vhost-net aware of GSO over UDP tunnel hdr layout Paolo Abeni
2025-05-23  8:22   ` Akihiko Odaki
2025-05-28  3:04     ` Lei Yang
2025-05-23  7:19 ` [PATCH RFC 00/16] virtio: introduce support for GSO over UDP tunnel Akihiko Odaki
2025-05-23  9:43   ` Paolo Abeni
2025-05-23  9:48     ` Akihiko Odaki
2025-06-21  6:39     ` Akihiko Odaki
2025-06-17 15:01 ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).