[PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel
@ 2025-05-21 10:32 Paolo Abeni
  2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
                   ` (9 more replies)
  0 siblings, 10 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.

The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.

Currently the kernel virtio support limits the feature space to 64,
while the virtio specification allows for a larger number of features.
Specifically the GSO-over-UDP-tunnel-related virtio features use bits
65-69.

The first four patches in this series rework the virtio and vhost
feature support to cope with up to 128 bits. The limit is arch-dependent:
only arches with native 128 integer support allow for the wider feature
space.

This implementation choice is aimed at keeping the code churn as
limited as possible. For the same reason, only the virtio_net driver is
reworked to leverage the extended feature space; all other
virtio/vhost drivers are unaffected, but could be upgraded to support
the extended features space in a later time.

The last four patches bring in the actual GSO over UDP tunnel support.
As per specification, some additional fields are introduced into the
virtio net header to support the new offload. The presence of such
fields depends on the negotiated features.

A new pair of helpers is introduced to convert the UDP-tunneled skb
metadata to an extended virtio net header and vice versa. Such helpers
are used by the tun and virtio_net driver to cope with the newly
supported offloads.

Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.
Sharing somewhat early to collect feedback, especially on the userland
code.

Paolo Abeni (8):
  virtio: introduce virtio_features_t
  virtio_pci_modern: allow setting configuring extended features
  vhost-net: allow configuring extended features
  virtio_net: add supports for extended offloads
  net: implement virtio helpers to handle UDP GSO tunneling.
  virtio_net: enable gso over UDP tunnel support.
  tun: enable gso over UDP tunnel support.
  vhost/net: enable gso over UDP tunnel support.

 drivers/net/tun.c                      |  77 +++++++++--
 drivers/net/tun_vnet.h                 |  74 +++++++++--
 drivers/net/virtio_net.c               |  99 ++++++++++++--
 drivers/vhost/net.c                    |  32 ++++-
 drivers/vhost/vhost.h                  |   2 +-
 drivers/virtio/virtio.c                |  12 +-
 drivers/virtio/virtio_mmio.c           |   4 +-
 drivers/virtio/virtio_pci_legacy.c     |   2 +-
 drivers/virtio/virtio_pci_modern.c     |   7 +-
 drivers/virtio/virtio_pci_modern_dev.c |  44 +++---
 drivers/virtio/virtio_vdpa.c           |   2 +-
 include/linux/virtio.h                 |   5 +-
 include/linux/virtio_config.h          |  22 +--
 include/linux/virtio_features.h        |  23 ++++
 include/linux/virtio_net.h             | 177 +++++++++++++++++++++++--
 include/linux/virtio_pci_modern.h      |  11 +-
 include/uapi/linux/if_tun.h            |   9 ++
 include/uapi/linux/vhost.h             |   8 ++
 include/uapi/linux/virtio_net.h        |  33 +++++
 19 files changed, 551 insertions(+), 92 deletions(-)
 create mode 100644 include/linux/virtio_features.h

-- 
2.49.0

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-21 16:02   ` Michael S. Tsirkin
                     ` (2 more replies)
  2025-05-21 10:32 ` [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features Paolo Abeni
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio_net driver.

Introduce an specific type to represent the virtio features bitmask.
On platform where 128 bits integer are available use such wide int
for the features bitmask, otherwise maintain the current u64.

Updates all the relevant virtio API to use the new type.

Note that legacy and transport features don't need any change, as
they are always in the low 64 bit range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/virtio/virtio.c                | 12 ++++++------
 drivers/virtio/virtio_mmio.c           |  4 ++--
 drivers/virtio/virtio_pci_legacy.c     |  2 +-
 drivers/virtio/virtio_pci_modern.c     |  7 ++++---
 drivers/virtio/virtio_pci_modern_dev.c | 13 ++++++-------
 drivers/virtio/virtio_vdpa.c           |  2 +-
 include/linux/virtio.h                 |  5 +++--
 include/linux/virtio_config.h          | 22 +++++++++++-----------
 include/linux/virtio_features.h        | 23 +++++++++++++++++++++++
 include/linux/virtio_pci_modern.h      | 11 ++++++++---
 10 files changed, 65 insertions(+), 36 deletions(-)
 create mode 100644 include/linux/virtio_features.h

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 95d5d7993e5b1..542735d3a12ba 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -272,9 +272,9 @@ static int virtio_dev_probe(struct device *_d)
 	int err, i;
 	struct virtio_device *dev = dev_to_virtio(_d);
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
-	u64 device_features;
-	u64 driver_features;
-	u64 driver_features_legacy;
+	virtio_features_t device_features;
+	virtio_features_t driver_features;
+	virtio_features_t driver_features_legacy;
 
 	/* We have a driver! */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
@@ -286,8 +286,8 @@ static int virtio_dev_probe(struct device *_d)
 	driver_features = 0;
 	for (i = 0; i < drv->feature_table_size; i++) {
 		unsigned int f = drv->feature_table[i];
-		BUG_ON(f >= 64);
-		driver_features |= (1ULL << f);
+		BUG_ON(f >= VIRTIO_FEATURES_MAX);
+		driver_features |= VIRTIO_BIT(f);
 	}
 
 	/* Some drivers have a separate feature table for virtio v1.0 */
@@ -320,7 +320,7 @@ static int virtio_dev_probe(struct device *_d)
 		goto err;
 
 	if (drv->validate) {
-		u64 features = dev->features;
+		virtio_features_t features = dev->features;
 
 		err = drv->validate(dev);
 		if (err)
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 5d78c2d572abf..158c47ac67de7 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -106,10 +106,10 @@ struct virtio_mmio_vq_info {
 
 /* Configuration interface */
 
-static u64 vm_get_features(struct virtio_device *vdev)
+static virtio_features_t vm_get_features(struct virtio_device *vdev)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
-	u64 features;
+	virtio_features_t features;
 
 	writel(1, vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES_SEL);
 	features = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES);
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index d9cbb02b35a11..b2fbc74f74b5c 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -18,7 +18,7 @@
 #include "virtio_pci_common.h"
 
 /* virtio config->get_features() implementation */
-static u64 vp_get_features(struct virtio_device *vdev)
+static virtio_features_t vp_get_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d50fe030d8253..c3e0ddc7ae9ab 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -22,7 +22,7 @@
 
 #define VIRTIO_AVQ_SGS_MAX	4
 
-static u64 vp_get_features(struct virtio_device *vdev)
+static virtio_features_t vp_get_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
@@ -353,7 +353,8 @@ static void vp_modern_avq_cleanup(struct virtio_device *vdev)
 	}
 }
 
-static void vp_transport_features(struct virtio_device *vdev, u64 features)
+static void vp_transport_features(struct virtio_device *vdev,
+				  virtio_features_t features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct pci_dev *pci_dev = vp_dev->pci_dev;
@@ -409,7 +410,7 @@ static int vp_check_common_size(struct virtio_device *vdev)
 static int vp_finalize_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	u64 features = vdev->features;
+	virtio_features_t features = vdev->features;
 
 	/* Give virtio_ring a chance to accept features. */
 	vring_transport_features(vdev);
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 0d3dbfaf4b236..1d34655f6b658 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -393,11 +393,10 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
  *
  * Returns the features read from the device
  */
-u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
+virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
-
-	u64 features;
+	virtio_features_t features;
 
 	vp_iowrite32(0, &cfg->device_feature_select);
 	features = vp_ioread32(&cfg->device_feature);
@@ -414,11 +413,11 @@ EXPORT_SYMBOL_GPL(vp_modern_get_features);
  *
  * Returns the driver features read from the device
  */
-u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
+virtio_features_t
+vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
-
-	u64 features;
+	virtio_features_t features;
 
 	vp_iowrite32(0, &cfg->guest_feature_select);
 	features = vp_ioread32(&cfg->guest_feature);
@@ -435,7 +434,7 @@ EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
  * @features: the features set to device
  */
 void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
-			    u64 features)
+			    virtio_features_t features)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
 
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 1f60c9d5cb181..b92749174885e 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -409,7 +409,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
 	return err;
 }
 
-static u64 virtio_vdpa_get_features(struct virtio_device *vdev)
+static virtio_features_t virtio_vdpa_get_features(struct virtio_device *vdev)
 {
 	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
 	const struct vdpa_config_ops *ops = vdpa->config;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 64cb4b04be7ad..6e51400d04635 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -11,6 +11,7 @@
 #include <linux/gfp.h>
 #include <linux/dma-mapping.h>
 #include <linux/completion.h>
+#include <linux/virtio_features.h>
 
 /**
  * struct virtqueue - a queue to register buffers for sending or receiving.
@@ -159,11 +160,11 @@ struct virtio_device {
 	const struct virtio_config_ops *config;
 	const struct vringh_config_ops *vringh_config;
 	struct list_head vqs;
-	u64 features;
+	virtio_features_t features;
 	void *priv;
 #ifdef CONFIG_VIRTIO_DEBUG
 	struct dentry *debugfs_dir;
-	u64 debugfs_filter_features;
+	virtio_features_t debugfs_filter_features;
 #endif
 };
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 169c7d367facb..bff57f675fca7 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -77,7 +77,7 @@ struct virtqueue_info {
  *      vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  *	vdev: the virtio_device
- *	Returns the first 64 feature bits (all we currently need).
+ *	Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
  * @finalize_features: confirm what device features we'll be using.
  *	vdev: the virtio_device
  *	This sends the driver feature bits to the device: it can change
@@ -120,7 +120,7 @@ struct virtio_config_ops {
 			struct irq_affinity *desc);
 	void (*del_vqs)(struct virtio_device *);
 	void (*synchronize_cbs)(struct virtio_device *);
-	u64 (*get_features)(struct virtio_device *vdev);
+	virtio_features_t (*get_features)(struct virtio_device *vdev);
 	int (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
 	int (*set_vq_affinity)(struct virtqueue *vq,
@@ -149,11 +149,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	return vdev->features & BIT_ULL(fbit);
+	return vdev->features & VIRTIO_BIT(fbit);
 }
 
 /**
@@ -166,11 +166,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	vdev->features |= BIT_ULL(fbit);
+	vdev->features |= VIRTIO_BIT(fbit);
 }
 
 /**
@@ -183,11 +183,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	vdev->features &= ~BIT_ULL(fbit);
+	vdev->features &= ~VIRTIO_BIT(fbit);
 }
 
 /**
diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
new file mode 100644
index 0000000000000..2f742eeb45a29
--- /dev/null
+++ b/include/linux/virtio_features.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_FEATURES_H
+#define _LINUX_VIRTIO_FEATURES_H
+
+#include <linux/bits.h>
+
+#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
+#define VIRTIO_HAS_EXTENDED_FEATURES
+#define VIRTIO_FEATURES_MAX	128
+#define VIRTIO_FEATURES_WORDS	4
+#define VIRTIO_BIT(b)		_BIT128(b)
+
+typedef __uint128_t		virtio_features_t;
+
+#else
+#define VIRTIO_FEATURES_MAX	64
+#define VIRTIO_FEATURES_WORDS	2
+#define VIRTIO_BIT(b)		BIT_ULL(b)
+
+typedef u64			virtio_features_t;
+#endif
+
+#endif
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index c0b1b1ca11635..e55fbb272b4d3 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -3,6 +3,7 @@
 #define _LINUX_VIRTIO_PCI_MODERN_H
 
 #include <linux/pci.h>
+#include <linux/virtio_features.h>
 #include <linux/virtio_pci.h>
 
 /**
@@ -95,10 +96,14 @@ static inline void vp_iowrite64_twopart(u64 val,
 	vp_iowrite32(val >> 32, hi);
 }
 
-u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
-u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
+virtio_features_t
+vp_modern_get_features(struct virtio_pci_modern_device *mdev);
+
+virtio_features_t
+vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
+
 void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
-		     u64 features);
+		     virtio_features_t features);
 u32 vp_modern_generation(struct virtio_pci_modern_device *mdev);
 u8 vp_modern_get_status(struct virtio_pci_modern_device *mdev);
 void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-26  0:49   ` Jason Wang
  2025-05-21 10:32 ` [PATCH net-next 3/8] vhost-net: allow " Paolo Abeni
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio_net driver.

Extend the virtio pci modern driver to support configuring the full
virtio features range, replacing the unrolled loops reading and
writing the features space with explicit one bounded to the actual
features space size in word.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 1d34655f6b658..e3025b6fa8540 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
 virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
-	virtio_features_t features;
+	virtio_features_t features = 0;
+	int i;
 
-	vp_iowrite32(0, &cfg->device_feature_select);
-	features = vp_ioread32(&cfg->device_feature);
-	vp_iowrite32(1, &cfg->device_feature_select);
-	features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		virtio_features_t cur;
+
+		vp_iowrite32(i, &cfg->device_feature_select);
+		cur = vp_ioread32(&cfg->device_feature);
+		features |= cur << (32 * i);
+	}
 
 	return features;
 }
@@ -417,12 +421,16 @@ virtio_features_t
 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
-	virtio_features_t features;
+	virtio_features_t features = 0;
+	int i;
+
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		virtio_features_t cur;
 
-	vp_iowrite32(0, &cfg->guest_feature_select);
-	features = vp_ioread32(&cfg->guest_feature);
-	vp_iowrite32(1, &cfg->guest_feature_select);
-	features |= ((u64)vp_ioread32(&cfg->guest_feature) << 32);
+		vp_iowrite32(i, &cfg->guest_feature_select);
+		cur = vp_ioread32(&cfg->guest_feature);
+		features |= cur << (32 * i);
+	}
 
 	return features;
 }
@@ -437,11 +445,14 @@ void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
 			    virtio_features_t features)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	int i;
+
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		u32 cur = features >> (32 * i);
 
-	vp_iowrite32(0, &cfg->guest_feature_select);
-	vp_iowrite32((u32)features, &cfg->guest_feature);
-	vp_iowrite32(1, &cfg->guest_feature_select);
-	vp_iowrite32(features >> 32, &cfg->guest_feature);
+		vp_iowrite32(i, &cfg->guest_feature_select);
+		vp_iowrite32(cur, &cfg->guest_feature);
+	}
 }
 EXPORT_SYMBOL_GPL(vp_modern_set_features);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 3/8] vhost-net: allow configuring extended features
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
  2025-05-21 10:32 ` [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-26  0:47   ` Jason Wang
  2025-05-21 10:32 ` [PATCH net-next 4/8] virtio_net: add supports for extended offloads Paolo Abeni
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Use the extended feature type for 'acked_features' and implement
two new ioctls operation to get and set the extended features.

Note that the legacy ioctls implicitly truncate the negotiated
features to the lower 64 bits range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/vhost/net.c        | 26 +++++++++++++++++++++++++-
 drivers/vhost/vhost.h      |  2 +-
 include/uapi/linux/vhost.h |  8 ++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7cbfc7d718b3f..b894685dded3e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -77,6 +77,10 @@ enum {
 			 (1ULL << VIRTIO_F_RING_RESET)
 };
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
+#endif
+
 enum {
 	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
 };
@@ -1614,7 +1618,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
 	return err;
 }
 
-static int vhost_net_set_features(struct vhost_net *n, u64 features)
+static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
 {
 	size_t vhost_hlen, sock_hlen, hdr_len;
 	int i;
@@ -1704,6 +1708,26 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 		if (features & ~VHOST_NET_FEATURES)
 			return -EOPNOTSUPP;
 		return vhost_net_set_features(n, features);
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	case VHOST_GET_FEATURES_EX:
+	{
+		virtio_features_t features_ex = VHOST_NET_FEATURES_EX;
+
+		if (copy_to_user(featurep, &features_ex, sizeof(features_ex)))
+			return -EFAULT;
+		return 0;
+	}
+	case VHOST_SET_FEATURES_EX:
+	{
+		virtio_features_t features_ex;
+
+		if (copy_from_user(&features_ex, featurep, sizeof(features_ex)))
+			return -EFAULT;
+		if (features_ex & ~VHOST_NET_FEATURES_EX)
+			return -EOPNOTSUPP;
+		return vhost_net_set_features(n, features_ex);
+	}
+#endif
 	case VHOST_GET_BACKEND_FEATURES:
 		features = VHOST_NET_BACKEND_FEATURES;
 		if (copy_to_user(featurep, &features, sizeof(features)))
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index bb75a292d50cd..ef1c7fd6f4e19 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -133,7 +133,7 @@ struct vhost_virtqueue {
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	void *private_data;
-	u64 acked_features;
+	virtio_features_t acked_features;
 	u64 acked_backend_features;
 	/* Log write descriptors */
 	void __user *log_base;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index d4b3e2ae1314d..328e81badf1ad 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -235,4 +235,12 @@
  */
 #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
 					      struct vhost_vring_state)
+
+/* Extended features manipulation
+ */
+#ifdef __SIZEOF_INT128__
+#define VHOST_GET_FEATURES_EX  _IOR(VHOST_VIRTIO, 0x83, __u128)
+#define VHOST_SET_FEATURES_EX  _IOW(VHOST_VIRTIO, 0x83, __u128)
+#endif
+
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 4/8] virtio_net: add supports for extended offloads
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-05-21 10:32 ` [PATCH net-next 3/8] vhost-net: allow " Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-26  1:01   ` Jason Wang
  2025-05-21 10:32 ` [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

The virtio_net driver needs it to implement GSO over UDP tunnel
offload.

The only missing piece is mapping them to/from the extended
features.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/virtio_net.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e53ba600605a5..71a972f20f19b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -35,6 +35,29 @@ module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
 
+#define VIRTIO_OFFLOAD_MAP_MIN	46
+#define VIRTIO_OFFLOAD_MAP_MAX	49
+#define VIRTIO_FEATURES_MAP_MIN	65
+#define VIRTIO_O2F_DELTA	(VIRTIO_FEATURES_MAP_MIN - VIRTIO_OFFLOAD_MAP_MIN)
+
+static bool virtio_is_mapped_offload(unsigned int obit)
+{
+	return obit >= VIRTIO_OFFLOAD_MAP_MIN &&
+	       obit <= VIRTIO_OFFLOAD_MAP_MAX;
+}
+
+#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)	\
+	({								\
+		unsigned int __f = fbit;				\
+		__f >= VIRTIO_FEATURES_MAP_MIN ? __f - VIRTIO_O2F_DELTA : __f; \
+	})
+#define VIRTIO_OFFLOAD_TO_FEATURE(obit)	\
+	({								\
+		unsigned int __o = obit;				\
+		virtio_is_mapped_offload(__o) ? __o + VIRTIO_O2F_DELTA :\
+						__o;			\
+	})
+
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
@@ -7037,9 +7060,13 @@ static int virtnet_probe(struct virtio_device *vdev)
 		netif_carrier_on(dev);
 	}
 
-	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
-		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
+	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++) {
+		unsigned int fbit;
+
+		fbit = VIRTIO_OFFLOAD_TO_FEATURE(guest_offloads[i]);
+		if (virtio_has_feature(vi->vdev, fbit))
 			set_bit(guest_offloads[i], &vi->guest_offloads);
+	}
 	vi->guest_offloads_capable = vi->guest_offloads;
 
 	rtnl_unlock();
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-05-21 10:32 ` [PATCH net-next 4/8] virtio_net: add supports for extended offloads Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-22 22:29   ` Willem de Bruijn
  2025-05-26  4:40   ` Jason Wang
  2025-05-21 10:32 ` [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

The virtio specification are introducing support for GSO over
UDP tunnel.

This patch brings in the needed defines and the additional
virtio hdr parsing/building helpers.

The UDP tunnel support uses additional fields in the virtio hdr,
and such fields location can change depending on other negotiated
features - specifically VIRTIO_NET_F_HASH_REPORT.

Try to be as conservative as possible with the new field validation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/virtio_net.h      | 177 ++++++++++++++++++++++++++++++--
 include/uapi/linux/virtio_net.h |  33 ++++++
 2 files changed, 202 insertions(+), 8 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 02a9f4dc594d0..cf9c712a67cd4 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
 	return 0;
 }
 
-static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
-					const struct virtio_net_hdr *hdr,
-					bool little_endian)
+static inline int __virtio_net_hdr_to_skb(struct sk_buff *skb,
+					  const struct virtio_net_hdr *hdr,
+					  bool little_endian, u8 hdr_gso_type)
 {
 	unsigned int nh_min_len = sizeof(struct iphdr);
 	unsigned int gso_type = 0;
@@ -57,8 +57,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 	unsigned int p_off = 0;
 	unsigned int ip_proto;
 
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+	if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		switch (hdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 			gso_type = SKB_GSO_TCPV4;
 			ip_proto = IPPROTO_TCP;
@@ -84,7 +84,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 			return -EINVAL;
 		}
 
-		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
+		if (hdr_gso_type & VIRTIO_NET_HDR_GSO_ECN)
 			gso_type |= SKB_GSO_TCP_ECN;
 
 		if (hdr->gso_size == 0)
@@ -122,7 +122,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 
 				if (!protocol)
 					virtio_net_hdr_set_proto(skb, hdr);
-				else if (!virtio_net_hdr_match_proto(protocol, hdr->gso_type))
+				else if (!virtio_net_hdr_match_proto(protocol, hdr_gso_type))
 					return -EINVAL;
 				else
 					skb->protocol = protocol;
@@ -153,7 +153,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 		}
 	}
 
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+	if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
 		u16 gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
 		unsigned int nh_off = p_off;
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
@@ -199,6 +199,13 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 	return 0;
 }
 
+static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
+					const struct virtio_net_hdr *hdr,
+					bool little_endian)
+{
+	return __virtio_net_hdr_to_skb(skb, hdr, little_endian, hdr->gso_type);
+}
+
 static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 					  struct virtio_net_hdr *hdr,
 					  bool little_endian,
@@ -242,4 +249,158 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 	return 0;
 }
 
+static inline unsigned int virtio_l3min(bool is_ipv6)
+{
+	return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
+}
+
+static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
+					    const struct virtio_net_hdr *hdr,
+					    unsigned int tnl_hdr_offset,
+					    bool tnl_csum_negotiated,
+					    bool little_endian)
+{
+	u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
+	unsigned int inner_nh, outer_th, inner_th;
+	unsigned int inner_l3min, outer_l3min;
+	struct virtio_net_hdr_tunnel *tnl;
+	u8 gso_inner_type;
+	bool outer_isv6;
+	int ret;
+
+	if (!gso_tunnel_type)
+		return virtio_net_hdr_to_skb(skb, hdr, little_endian);
+
+	/* Tunnel not supported/negotiated, but the hdr asks for it. */
+	if (!tnl_hdr_offset)
+		return -EINVAL;
+
+	/* Either ipv4 or ipv6. */
+	if (gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 &&
+	    gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
+		return -EINVAL;
+
+	/* No UDP fragments over UDP tunnel. */
+	gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
+					   gso_tunnel_type);
+	if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
+		return -EINVAL;
+
+	/* Relay on csum being present. */
+	if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
+		return -EINVAL;
+
+	/* Validate offsets. */
+	outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
+	inner_l3min = virtio_l3min(gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6);
+	outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
+
+	tnl = ((void *)hdr) + tnl_hdr_offset;
+	inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
+	inner_nh = __virtio16_to_cpu(little_endian, tnl->inner_nh_offset);
+	outer_th = __virtio16_to_cpu(little_endian, tnl->outer_th_offset);
+	if (outer_th < outer_l3min ||
+	    inner_nh < outer_th + sizeof(struct udphdr) ||
+	    inner_th < inner_nh + inner_l3min)
+		return -EINVAL;
+
+	/* Let the basic parsing deal with plain GSO features. */
+	ret = __virtio_net_hdr_to_skb(skb, hdr, little_endian,
+				      hdr->gso_type & ~gso_tunnel_type);
+	if (ret)
+		return ret;
+
+	skb_set_inner_protocol(skb, outer_isv6 ? htons(ETH_P_IPV6) :
+						 htons(ETH_P_IP));
+	if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
+		if (!tnl_csum_negotiated)
+			return -EINVAL;
+
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	} else {
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
+	}
+
+	skb->inner_transport_header = inner_th + skb_headroom(skb);
+	skb->inner_network_header = inner_nh + skb_headroom(skb);
+	skb->inner_mac_header = inner_nh + skb_headroom(skb);
+	skb->transport_header = outer_th + skb_headroom(skb);
+	skb->encapsulation = 1;
+	return 0;
+}
+
+static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
+					    struct virtio_net_hdr *hdr,
+					    bool tun_csum_negotiated)
+{
+	if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
+		if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
+			return 0;
+
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
+			return 0;
+
+		/* tunnel csum packets are invalid when the related
+		 * feature has not been negotiated
+		 */
+		if (!tun_csum_negotiated)
+			return -EINVAL;
+		skb->csum_level = 1;
+		return 0;
+	}
+
+	/* DATA_VALID is mutually exclusive with NEEDS_CSUM, and GSO
+	 * over UDP tunnel requires the latter
+	 */
+	if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
+		return -EINVAL;
+	return 0;
+}
+
+static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
+					      struct virtio_net_hdr *hdr,
+					      unsigned int tnl_offset,
+					      bool little_endian,
+					      int vlan_hlen)
+{
+	struct virtio_net_hdr_tunnel *tnl;
+	unsigned int inner_nh, outer_th;
+	int tnl_gso_type;
+	int ret;
+
+	tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
+						    SKB_GSO_UDP_TUNNEL_CSUM);
+	if (!tnl_gso_type)
+		return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
+					       vlan_hlen);
+
+	/* Tunnel support not negotiated but skb ask for it. */
+	if (!tnl_offset)
+		return -EINVAL;
+
+	/* Let the basic parsing deal with plain GSO features. */
+	skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
+	ret = virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
+				      vlan_hlen);
+	skb_shinfo(skb)->gso_type |= tnl_gso_type;
+	if (ret)
+		return ret;
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
+	else
+		hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
+		hdr->flags |= VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM;
+
+	tnl = ((void *)hdr) + tnl_offset;
+	inner_nh = skb->inner_network_header - skb_headroom(skb);
+	outer_th = skb->transport_header - skb_headroom(skb);
+	tnl->inner_nh_offset =  __cpu_to_virtio16(little_endian, inner_nh);
+	tnl->outer_th_offset =  __cpu_to_virtio16(little_endian, outer_th);
+	return 0;
+}
+
 #endif /* _LINUX_VIRTIO_NET_H */
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 963540deae66a..1f1ff88a5749f 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -70,6 +70,28 @@
 					 * with the same MAC.
 					 */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
+					      * GSO-over-UDP-tunnel packets
+					      */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
+						   * GSO-over-UDP-tunnel
+						   * packets with partial csum
+						   * for the outer header
+						   */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
+					     * GSO-over-UDP-tunnel packets
+					     */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
+						  * GSO-over-UDP-tunnel
+						  * packets with partial csum
+						  * for the outer header
+						  */
+
+/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
+ * features
+ */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
@@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
 #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
 #define VIRTIO_NET_HDR_F_RSC_INFO	4	/* rsc info in csum_ fields */
+#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8	/* UDP tunnel requires csum offload */
 	__u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
 #define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
+				       VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
 	__u8 gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
@@ -181,6 +208,12 @@ struct virtio_net_hdr_v1_hash {
 	__le16 padding;
 };
 
+/* This header after hashing information */
+struct virtio_net_hdr_tunnel {
+	__virtio16 outer_th_offset;
+	__virtio16 inner_nh_offset;
+};
+
 #ifndef VIRTIO_NET_NO_LEGACY
 /* This header comes first in the scatter-gather list.
  * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (4 preceding siblings ...)
  2025-05-21 10:32 ` [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-22  8:38   ` kernel test robot
  2025-05-22 22:33   ` Willem de Bruijn
  2025-05-21 10:32 ` [PATCH net-next 7/8] tun: " Paolo Abeni
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

If the related virtio feature is set, enable transmission and reception
of gso over UDP tunnel packets.

Most of the work is done by the previously introduced helper, just need
to determine the UDP tunnel features inside the virtio_net_hdr and
update accordingly the virtio net hdr size.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/virtio_net.c | 78 +++++++++++++++++++++++++++++++---------
 1 file changed, 62 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 71a972f20f19b..3ca275ab887fe 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -46,11 +46,6 @@ static bool virtio_is_mapped_offload(unsigned int obit)
 	       obit <= VIRTIO_OFFLOAD_MAP_MAX;
 }
 
-#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)	\
-	({								\
-		unsigned int __f = fbit;				\
-		__f >= VIRTIO_FEATURES_MAP_MIN ? __f - VIRTIO_O2F_DELTA : __f; \
-	})
 #define VIRTIO_OFFLOAD_TO_FEATURE(obit)	\
 	({								\
 		unsigned int __o = obit;				\
@@ -85,16 +80,30 @@ static const unsigned long guest_offloads[] = {
 	VIRTIO_NET_F_GUEST_CSUM,
 	VIRTIO_NET_F_GUEST_USO4,
 	VIRTIO_NET_F_GUEST_USO6,
-	VIRTIO_NET_F_GUEST_HDRLEN
+	VIRTIO_NET_F_GUEST_HDRLEN,
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
+#endif
 };
 
-#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+#define __GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
 				(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
 				(1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
 				(1ULL << VIRTIO_NET_F_GUEST_USO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_USO6))
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+
+#define GUEST_OFFLOAD_GRO_HW_MASK (__GUEST_OFFLOAD_GRO_HW_MASK | \
+	(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
+	(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
+#else
+
+#define GUEST_OFFLOAD_GRO_HW_MASK __GUEST_OFFLOAD_GRO_HW_MASK
+#endif
+
 struct virtnet_stat_desc {
 	char desc[ETH_GSTRING_LEN];
 	size_t offset;
@@ -443,9 +452,14 @@ struct virtnet_info {
 	/* Packet virtio header size */
 	u8 hdr_len;
 
+	/* UDP tunnel support*/
+	u8 tnl_offset;
+
 	/* Work struct for delayed refilling if we run low on memory. */
 	struct delayed_work refill;
 
+	bool rx_tnl_csum;
+
 	/* Is delayed refill enabled? */
 	bool refill_enabled;
 
@@ -2538,12 +2552,19 @@ static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *
 	if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
 		virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
 
-	if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
-		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	/* restore the received value */
+	hdr->hdr.flags = flags;
+	if (virtio_net_chk_data_valid(skb, &hdr->hdr, vi->rx_tnl_csum)) {
+		net_warn_ratelimited("%s: bad csum: flags: %x, gso_type: %x\n",
+				     dev->name, hdr->hdr.flags,
+				     hdr->hdr.gso_type);
+		goto frame_err;
+	}
 
-	if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
-				  virtio_is_little_endian(vi->vdev))) {
-		net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
+	if (virtio_net_hdr_tnl_to_skb(skb, &hdr->hdr, vi->tnl_offset,
+				      vi->rx_tnl_csum,
+				      virtio_is_little_endian(vi->vdev))) {
+		net_warn_ratelimited("%s: bad gso: type: %x, size: %u\n",
 				     dev->name, hdr->hdr.gso_type,
 				     hdr->hdr.gso_size);
 		goto frame_err;
@@ -3276,9 +3297,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb, bool orphan)
 	else
 		hdr = &skb_vnet_common_hdr(skb)->mrg_hdr;
 
-	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
-				    virtio_is_little_endian(vi->vdev), false,
-				    0))
+	if (virtio_net_hdr_tnl_from_skb(skb, &hdr->hdr, vi->tnl_offset,
+					virtio_is_little_endian(vi->vdev), 0))
 		return -EPROTO;
 
 	if (vi->mergeable_rx_bufs)
@@ -6782,10 +6802,20 @@ static int virtnet_probe(struct virtio_device *vdev)
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_USO))
 			dev->hw_features |= NETIF_F_GSO_UDP_L4;
 
+		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)) {
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+			dev->hw_enc_features = dev->hw_features;
+		}
+		if (dev->hw_features & NETIF_F_GSO_UDP_TUNNEL &&
+		    virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM)) {
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+		}
+
 		dev->features |= NETIF_F_GSO_ROBUST;
 
 		if (gso)
-			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
+			dev->features |= dev->hw_features;
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 
@@ -6886,6 +6916,16 @@ static int virtnet_probe(struct virtio_device *vdev)
 	else
 		vi->hdr_len = sizeof(struct virtio_net_hdr);
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) ||
+	    virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO))
+		vi->tnl_offset = vi->hdr_len;
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM))
+		vi->rx_tnl_csum = true;
+	if (vi->tnl_offset)
+		vi->hdr_len += sizeof(struct virtio_net_hdr_tunnel);
+#endif
+
 	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
 	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 		vi->any_header_sg = true;
@@ -7196,6 +7236,12 @@ static struct virtio_device_id id_table[] = {
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
+	VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
+	VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM,
+#endif
 };
 
 static unsigned int features_legacy[] = {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 7/8] tun: enable gso over UDP tunnel support.
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (5 preceding siblings ...)
  2025-05-21 10:32 ` [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-26  4:40   ` Jason Wang
  2025-05-21 10:32 ` [PATCH net-next 8/8] vhost/net: " Paolo Abeni
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Add new tun features to represent the newly introduced virtio
GSO over UDP tunnel offload. Allows detection and selection of
such features via the existing TUNSETOFFLOAD ioctl, store the
tunnel offload configuration in the highest bit of the tun flags
and compute the expected virtio header size and tunnel header
offset using such bits, so that we can plug almost seamless the
the newly introduced virtio helpers to serialize the extended
virtio header.

As the tun features and the virtio hdr size are configured
separately, the data path need to cope with (hopefully transient)
inconsistent values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/tun.c           | 77 ++++++++++++++++++++++++++++++++-----
 drivers/net/tun_vnet.h      | 74 ++++++++++++++++++++++++++++-------
 include/uapi/linux/if_tun.h |  9 +++++
 3 files changed, 137 insertions(+), 23 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7babd1e9a378b..ef8cef48b66f5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -186,7 +186,8 @@ struct tun_struct {
 	struct net_device	*dev;
 	netdev_features_t	set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
-			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
+			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \
+			  NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
 	int			align;
 	int			vnet_hdr_sz;
@@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev)
 	dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
 			   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
 			   NETIF_F_HW_VLAN_STAG_TX;
+	dev->hw_enc_features = dev->hw_features;
 	dev->features = dev->hw_features;
 	dev->vlan_features = dev->features &
 			     ~(NETIF_F_HW_VLAN_CTAG_TX |
@@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	struct sk_buff *skb;
 	size_t total_len = iov_iter_count(from);
 	size_t len = total_len, align = tun->align, linear;
-	struct virtio_net_hdr gso = { 0 };
+	char buf[TUN_VNET_TNL_SIZE];
+	struct virtio_net_hdr *gso;
 	int good_linear;
 	int copylen;
 	int hdr_len = 0;
@@ -1708,6 +1711,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	int skb_xdp = 1;
 	bool frags = tun_napi_frags_enabled(tfile);
 	enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
+	unsigned int flags = tun->flags & ~TUN_VNET_TNL_MASK;
+
+	/*
+	 * Keep it easy and always zero the whole buffer, even if the
+	 * tunnel-related field will be touched only when the feature
+	 * is enabled and the hdr size id compatible.
+	 */
+	memset(buf, 0, sizeof(buf));
+	gso = (void *)buf;
 
 	if (!(tun->flags & IFF_NO_PI)) {
 		if (len < sizeof(pi))
@@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	if (tun->flags & IFF_VNET_HDR) {
 		int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
+		int parsed_size;
 
-		hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
+		if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
+			parsed_size = vnet_hdr_sz;
+		} else {
+			parsed_size = TUN_VNET_TNL_SIZE;
+			flags |= TUN_VNET_TNL_MASK;
+		}
+		hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, parsed_size,
+					     flags, from, gso);
 		if (hdr_len < 0)
 			return hdr_len;
 
@@ -1755,7 +1775,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		 * (e.g gso or jumbo packet), we will do it at after
 		 * skb was created with generic XDP routine.
 		 */
-		skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp);
+		skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp);
 		err = PTR_ERR_OR_ZERO(skb);
 		if (err)
 			goto drop;
@@ -1799,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		}
 	}
 
-	if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) {
+	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		err = -EINVAL;
 		goto free_skb;
@@ -2050,13 +2070,26 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	}
 
 	if (vnet_hdr_sz) {
-		struct virtio_net_hdr gso;
+		char buf[TUN_VNET_TNL_SIZE];
+		struct virtio_net_hdr *gso;
+		int flags = tun->flags;
+		int parsed_size;
+
+		gso = (void *)buf;
+		parsed_size = tun_vnet_parse_size(tun->flags);
+		if (unlikely(vnet_hdr_sz < parsed_size)) {
+			/* Inconsistent hdr size and (tunnel) offloads:
+			 * strips the latter
+			 */
+			flags &= ~TUN_VNET_TNL_MASK;
+			parsed_size = sizeof(struct virtio_net_hdr);
+		};
 
-		ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
+		ret = tun_vnet_hdr_from_skb(flags, tun->dev, skb, gso);
 		if (ret)
 			return ret;
 
-		ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
+		ret = __tun_vnet_hdr_put(vnet_hdr_sz, parsed_size, iter, gso);
 		if (ret)
 			return ret;
 	}
@@ -2366,6 +2399,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	int metasize = 0;
 	int ret = 0;
 	bool skb_xdp = false;
+	unsigned int flags;
 	struct page *page;
 
 	if (unlikely(datasize < ETH_HLEN))
@@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
 	if (metasize > 0)
 		skb_metadata_set(skb, metasize);
 
-	if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
+	/* Assume tun offloads are enabled if the provided hdr is large
+	 * enough.
+	 */
+	if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
+	    xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
+		flags = tun->flags | TUN_VNET_TNL_MASK;
+	else
+		flags = tun->flags & ~TUN_VNET_TNL_MASK;
+
+	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		kfree_skb(skb);
 		ret = -EINVAL;
@@ -2812,6 +2855,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
 
 }
 
+#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6)
+
 /* This is like a cut-down ethtool ops, except done via tun fd so no
  * privs required. */
 static int set_offload(struct tun_struct *tun, unsigned long arg)
@@ -2841,6 +2886,17 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 			features |= NETIF_F_GSO_UDP_L4;
 			arg &= ~(TUN_F_USO4 | TUN_F_USO6);
 		}
+
+		/* Tunnel offload is allowed only if some plain offload is
+		 * available, too.
+		 */
+		if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) {
+			features |= NETIF_F_GSO_UDP_TUNNEL;
+			if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM)
+				features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			arg &= ~(TUN_F_UDP_TUNNEL_GSO |
+				 TUN_F_UDP_TUNNEL_GSO_CSUM);
+		}
 	}
 
 	/* This gives the user a way to test for new features in future by
@@ -2852,7 +2908,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 	tun->dev->wanted_features &= ~TUN_USER_FEATURES;
 	tun->dev->wanted_features |= features;
 	netdev_update_features(tun->dev);
-
+	tun_set_vnet_tnl(&tun->flags, !!(features & NETIF_F_GSO_UDP_TUNNEL),
+			 !!(features & NETIF_F_GSO_UDP_TUNNEL_CSUM));
 	return 0;
 }
 
diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
index 58b9ac7a5fc40..ab2d4396941ca 100644
--- a/drivers/net/tun_vnet.h
+++ b/drivers/net/tun_vnet.h
@@ -5,6 +5,12 @@
 /* High bits in flags field are unused. */
 #define TUN_VNET_LE     0x80000000
 #define TUN_VNET_BE     0x40000000
+#define TUN_VNET_TNL		0x20000000
+#define TUN_VNET_TNL_CSUM	0x10000000
+#define TUN_VNET_TNL_MASK	(TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
+
+#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
+			   sizeof(struct virtio_net_hdr_tunnel))
 
 static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
 {
@@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
 	return 0;
 }
 
+static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
+{
+	*flags = (*flags & ~TUN_VNET_TNL_MASK) |
+		 tnl * TUN_VNET_TNL |
+		 tnl_csum * TUN_VNET_TNL_CSUM;
+}
+
 static inline bool tun_vnet_is_little_endian(unsigned int flags)
 {
 	return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags);
@@ -107,16 +120,33 @@ static inline long tun_vnet_ioctl(int *vnet_hdr_sz, unsigned int *flags,
 	}
 }
 
-static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
-				   struct iov_iter *from,
-				   struct virtio_net_hdr *hdr)
+static inline unsigned int tun_vnet_parse_size(unsigned int flags)
+{
+	if (!(flags & TUN_VNET_TNL))
+		return sizeof(struct virtio_net_hdr);
+
+	return TUN_VNET_TNL_SIZE;
+}
+
+static inline unsigned int tun_vnet_tnl_offset(unsigned int flags)
+{
+	if (!(flags & TUN_VNET_TNL))
+		return 0;
+
+	return sizeof(struct virtio_net_hdr_v1);
+}
+
+static inline int __tun_vnet_hdr_get(int sz, int parsed_size,
+				     unsigned int flags,
+				     struct iov_iter *from,
+				     struct virtio_net_hdr *hdr)
 {
 	u16 hdr_len;
 
 	if (iov_iter_count(from) < sz)
 		return -EINVAL;
 
-	if (!copy_from_iter_full(hdr, sizeof(*hdr), from))
+	if (!copy_from_iter_full(hdr, parsed_size, from))
 		return -EFAULT;
 
 	hdr_len = tun_vnet16_to_cpu(flags, hdr->hdr_len);
@@ -129,30 +159,47 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
 	if (hdr_len > iov_iter_count(from))
 		return -EINVAL;
 
-	iov_iter_advance(from, sz - sizeof(*hdr));
+	iov_iter_advance(from, sz - parsed_size);
 
 	return hdr_len;
 }
 
-static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
-				   const struct virtio_net_hdr *hdr)
+static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
+				   struct iov_iter *from,
+				   struct virtio_net_hdr *hdr)
+{
+	return __tun_vnet_hdr_get(sz, sizeof(*hdr), flags, from, hdr);
+}
+
+static inline int __tun_vnet_hdr_put(int sz, int parsed_size,
+				     struct iov_iter *iter,
+				     const struct virtio_net_hdr *hdr)
 {
 	if (unlikely(iov_iter_count(iter) < sz))
 		return -EINVAL;
 
-	if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
+	if (unlikely(copy_to_iter(hdr, parsed_size, iter) != parsed_size))
 		return -EFAULT;
 
-	if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr))
+	if (iov_iter_zero(sz - parsed_size, iter) != sz - parsed_size)
 		return -EFAULT;
 
 	return 0;
 }
 
+static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
+				   const struct virtio_net_hdr *hdr)
+{
+	return __tun_vnet_hdr_put(sz, sizeof(*hdr), iter, hdr);
+}
+
 static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
 				      const struct virtio_net_hdr *hdr)
 {
-	return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags));
+	return virtio_net_hdr_tnl_to_skb(skb, hdr,
+					 tun_vnet_tnl_offset(flags),
+					 !!(flags & TUN_VNET_TNL_CSUM),
+					 tun_vnet_is_little_endian(flags));
 }
 
 static inline int tun_vnet_hdr_from_skb(unsigned int flags,
@@ -161,10 +208,11 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
 					struct virtio_net_hdr *hdr)
 {
 	int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
+	int tnl_offset = tun_vnet_tnl_offset(flags);
 
-	if (virtio_net_hdr_from_skb(skb, hdr,
-				    tun_vnet_is_little_endian(flags), true,
-				    vlan_hlen)) {
+	if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
+					tun_vnet_is_little_endian(flags),
+					vlan_hlen)) {
 		struct skb_shared_info *sinfo = skb_shinfo(skb);
 
 		if (net_ratelimit()) {
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 287cdc81c9390..a25a5e7a08ffa 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -93,6 +93,15 @@
 #define TUN_F_USO4	0x20	/* I can handle USO for IPv4 packets */
 #define TUN_F_USO6	0x40	/* I can handle USO for IPv6 packets */
 
+#define TUN_F_UDP_TUNNEL_GSO		0x080 /* I can handle TSO/USO for UDP
+					       * tunneled packets
+					       */
+#define TUN_F_UDP_TUNNEL_GSO_CSUM	0x100 /* I can handle TSO/USO for UDP
+					       * tunneled packets requiring
+					       * csum offload for the outer
+					       * header
+					       */
+
 /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
 #define TUN_PKT_STRIP	0x0001
 struct tun_pi {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH net-next 8/8] vhost/net: enable gso over UDP tunnel support.
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (6 preceding siblings ...)
  2025-05-21 10:32 ` [PATCH net-next 7/8] tun: " Paolo Abeni
@ 2025-05-21 10:32 ` Paolo Abeni
  2025-05-22  6:43   ` kernel test robot
  2025-05-26  4:40   ` Jason Wang
  2025-05-21 11:38 ` [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-05-21 15:52 ` Michael S. Tsirkin
  9 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 10:32 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Vhost net need to know the exact virtio net hdr size to be able
to copy such header correctly. Teach it about the newly defined
UDP tunnel-related option and update the hdr size computation
accordingly.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/vhost/net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index b894685dded3e..985f9662a9003 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -78,7 +78,9 @@ enum {
 };
 
 #ifdef VIRTIO_HAS_EXTENDED_FEATURES
-#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
+#define VHOST_NET_FEATURES_EX (VHOST_NET_FEATURES | \
+			(VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO)) | \
+			(VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)))
 #endif
 
 enum {
@@ -1621,12 +1623,16 @@ static long vhost_net_reset_owner(struct vhost_net *n)
 static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
 {
 	size_t vhost_hlen, sock_hlen, hdr_len;
+	bool has_tunnel;
 	int i;
 
 	hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
 			       (1ULL << VIRTIO_F_VERSION_1))) ?
 			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 			sizeof(struct virtio_net_hdr);
+	has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
+				    VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
+	hdr_len += has_tunnel ? sizeof(struct virtio_net_hdr_tunnel) : 0;
 	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
 		/* vhost provides vnet_hdr */
 		vhost_hlen = hdr_len;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (7 preceding siblings ...)
  2025-05-21 10:32 ` [PATCH net-next 8/8] vhost/net: " Paolo Abeni
@ 2025-05-21 11:38 ` Paolo Abeni
  2025-05-21 15:52 ` Michael S. Tsirkin
  9 siblings, 0 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-21 11:38 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/21/25 12:32 PM, Paolo Abeni wrote:
> Some virtualized deployments use UDP tunnel pervasively and are impacted
> negatively by the lack of GSO support for such kind of traffic in the
> virtual NIC driver.
> 
> The virtio_net specification recently introduced support for GSO over
> UDP tunnel, this series updates the virtio implementation to support
> such a feature.
> 
> Currently the kernel virtio support limits the feature space to 64,
> while the virtio specification allows for a larger number of features.
> Specifically the GSO-over-UDP-tunnel-related virtio features use bits
> 65-69.
> 
> The first four patches in this series rework the virtio and vhost
> feature support to cope with up to 128 bits. The limit is arch-dependent:
> only arches with native 128 integer support allow for the wider feature
> space.
> 
> This implementation choice is aimed at keeping the code churn as
> limited as possible. For the same reason, only the virtio_net driver is
> reworked to leverage the extended feature space; all other
> virtio/vhost drivers are unaffected, but could be upgraded to support
> the extended features space in a later time.
> 
> The last four patches bring in the actual GSO over UDP tunnel support.
> As per specification, some additional fields are introduced into the
> virtio net header to support the new offload. The presence of such
> fields depends on the negotiated features.
> 
> A new pair of helpers is introduced to convert the UDP-tunneled skb
> metadata to an extended virtio net header and vice versa. Such helpers
> are used by the tun and virtio_net driver to cope with the newly
> supported offloads.
> 
> Tested with basic stream transfer with all the possible permutations of
> host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.
> Sharing somewhat early to collect feedback, especially on the userland
> code.

FWIW, the user-space bits are avail here:

https://lists.gnu.org/archive/html/qemu-devel/2025-05/msg05027.html

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel
  2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (8 preceding siblings ...)
  2025-05-21 11:38 ` [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
@ 2025-05-21 15:52 ` Michael S. Tsirkin
  9 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-21 15:52 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Jason Wang, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 12:32:34PM +0200, Paolo Abeni wrote:
> Some virtualized deployments use UDP tunnel pervasively and are impacted
> negatively by the lack of GSO support for such kind of traffic in the
> virtual NIC driver.
> 
> The virtio_net specification recently introduced support for GSO over
> UDP tunnel, this series updates the virtio implementation to support
> such a feature.
> 
> Currently the kernel virtio support limits the feature space to 64,
> while the virtio specification allows for a larger number of features.
> Specifically the GSO-over-UDP-tunnel-related virtio features use bits
> 65-69.
> 
> The first four patches in this series rework the virtio and vhost
> feature support to cope with up to 128 bits. The limit is arch-dependent:
> only arches with native 128 integer support allow for the wider feature
> space.
> 
> This implementation choice is aimed at keeping the code churn as
> limited as possible. For the same reason, only the virtio_net driver is
> reworked to leverage the extended feature space; all other
> virtio/vhost drivers are unaffected, but could be upgraded to support
> the extended features space in a later time.
> 
> The last four patches bring in the actual GSO over UDP tunnel support.
> As per specification, some additional fields are introduced into the
> virtio net header to support the new offload. The presence of such
> fields depends on the negotiated features.
> 
> A new pair of helpers is introduced to convert the UDP-tunneled skb
> metadata to an extended virtio net header and vice versa. Such helpers
> are used by the tun and virtio_net driver to cope with the newly
> supported offloads.
> 
> Tested with basic stream transfer with all the possible permutations of
> host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.
> Sharing somewhat early to collect feedback, especially on the userland
> code.


I like the approach. Some small comments/questions.

> Paolo Abeni (8):
>   virtio: introduce virtio_features_t
>   virtio_pci_modern: allow setting configuring extended features
>   vhost-net: allow configuring extended features
>   virtio_net: add supports for extended offloads
>   net: implement virtio helpers to handle UDP GSO tunneling.
>   virtio_net: enable gso over UDP tunnel support.
>   tun: enable gso over UDP tunnel support.
>   vhost/net: enable gso over UDP tunnel support.
> 
>  drivers/net/tun.c                      |  77 +++++++++--
>  drivers/net/tun_vnet.h                 |  74 +++++++++--
>  drivers/net/virtio_net.c               |  99 ++++++++++++--
>  drivers/vhost/net.c                    |  32 ++++-
>  drivers/vhost/vhost.h                  |   2 +-
>  drivers/virtio/virtio.c                |  12 +-
>  drivers/virtio/virtio_mmio.c           |   4 +-
>  drivers/virtio/virtio_pci_legacy.c     |   2 +-
>  drivers/virtio/virtio_pci_modern.c     |   7 +-
>  drivers/virtio/virtio_pci_modern_dev.c |  44 +++---
>  drivers/virtio/virtio_vdpa.c           |   2 +-
>  include/linux/virtio.h                 |   5 +-
>  include/linux/virtio_config.h          |  22 +--
>  include/linux/virtio_features.h        |  23 ++++
>  include/linux/virtio_net.h             | 177 +++++++++++++++++++++++--
>  include/linux/virtio_pci_modern.h      |  11 +-
>  include/uapi/linux/if_tun.h            |   9 ++
>  include/uapi/linux/vhost.h             |   8 ++
>  include/uapi/linux/virtio_net.h        |  33 +++++
>  19 files changed, 551 insertions(+), 92 deletions(-)
>  create mode 100644 include/linux/virtio_features.h
> 
> -- 
> 2.49.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
@ 2025-05-21 16:02   ` Michael S. Tsirkin
  2025-05-22  7:29     ` Paolo Abeni
  2025-05-22  8:17   ` kernel test robot
  2025-05-26  0:43   ` Jason Wang
  2 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-21 16:02 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Jason Wang, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 12:32:35PM +0200, Paolo Abeni wrote:
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
> 
> Introduce an specific type to represent the virtio features bitmask.
> On platform where 128 bits integer are available use such wide int
> for the features bitmask, otherwise maintain the current u64.
> 
> Updates all the relevant virtio API to use the new type.
> 
> Note that legacy and transport features don't need any change, as
> they are always in the low 64 bit range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/virtio/virtio.c                | 12 ++++++------
>  drivers/virtio/virtio_mmio.c           |  4 ++--
>  drivers/virtio/virtio_pci_legacy.c     |  2 +-
>  drivers/virtio/virtio_pci_modern.c     |  7 ++++---
>  drivers/virtio/virtio_pci_modern_dev.c | 13 ++++++-------
>  drivers/virtio/virtio_vdpa.c           |  2 +-
>  include/linux/virtio.h                 |  5 +++--
>  include/linux/virtio_config.h          | 22 +++++++++++-----------
>  include/linux/virtio_features.h        | 23 +++++++++++++++++++++++
>  include/linux/virtio_pci_modern.h      | 11 ++++++++---
>  10 files changed, 65 insertions(+), 36 deletions(-)
>  create mode 100644 include/linux/virtio_features.h
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 95d5d7993e5b1..542735d3a12ba 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -272,9 +272,9 @@ static int virtio_dev_probe(struct device *_d)
>  	int err, i;
>  	struct virtio_device *dev = dev_to_virtio(_d);
>  	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> -	u64 device_features;
> -	u64 driver_features;
> -	u64 driver_features_legacy;
> +	virtio_features_t device_features;
> +	virtio_features_t driver_features;
> +	virtio_features_t driver_features_legacy;
>  
>  	/* We have a driver! */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);


> @@ -286,8 +286,8 @@ static int virtio_dev_probe(struct device *_d)
>  	driver_features = 0;
>  	for (i = 0; i < drv->feature_table_size; i++) {
>  		unsigned int f = drv->feature_table[i];
> -		BUG_ON(f >= 64);
> -		driver_features |= (1ULL << f);
> +		BUG_ON(f >= VIRTIO_FEATURES_MAX);
> +		driver_features |= VIRTIO_BIT(f);
>  	}
>  
>  	/* Some drivers have a separate feature table for virtio v1.0 */
> @@ -320,7 +320,7 @@ static int virtio_dev_probe(struct device *_d)
>  		goto err;
>  
>  	if (drv->validate) {
> -		u64 features = dev->features;
> +		virtio_features_t features = dev->features;
>  
>  		err = drv->validate(dev);
>  		if (err)
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 5d78c2d572abf..158c47ac67de7 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -106,10 +106,10 @@ struct virtio_mmio_vq_info {
>  
>  /* Configuration interface */
>  
> -static u64 vm_get_features(struct virtio_device *vdev)
> +static virtio_features_t vm_get_features(struct virtio_device *vdev)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> -	u64 features;
> +	virtio_features_t features;
>  
>  	writel(1, vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES_SEL);
>  	features = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES);
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index d9cbb02b35a11..b2fbc74f74b5c 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -18,7 +18,7 @@
>  #include "virtio_pci_common.h"
>  
>  /* virtio config->get_features() implementation */
> -static u64 vp_get_features(struct virtio_device *vdev)
> +static virtio_features_t vp_get_features(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index d50fe030d8253..c3e0ddc7ae9ab 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -22,7 +22,7 @@
>  
>  #define VIRTIO_AVQ_SGS_MAX	4
>  
> -static u64 vp_get_features(struct virtio_device *vdev)
> +static virtio_features_t vp_get_features(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  
> @@ -353,7 +353,8 @@ static void vp_modern_avq_cleanup(struct virtio_device *vdev)
>  	}
>  }
>  
> -static void vp_transport_features(struct virtio_device *vdev, u64 features)
> +static void vp_transport_features(struct virtio_device *vdev,
> +				  virtio_features_t features)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	struct pci_dev *pci_dev = vp_dev->pci_dev;
> @@ -409,7 +410,7 @@ static int vp_check_common_size(struct virtio_device *vdev)
>  static int vp_finalize_features(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> -	u64 features = vdev->features;
> +	virtio_features_t features = vdev->features;
>  
>  	/* Give virtio_ring a chance to accept features. */
>  	vring_transport_features(vdev);
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 0d3dbfaf4b236..1d34655f6b658 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -393,11 +393,10 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>   *
>   * Returns the features read from the device
>   */
> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> +virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> -
> -	u64 features;
> +	virtio_features_t features;
>  
>  	vp_iowrite32(0, &cfg->device_feature_select);
>  	features = vp_ioread32(&cfg->device_feature);
> @@ -414,11 +413,11 @@ EXPORT_SYMBOL_GPL(vp_modern_get_features);
>   *
>   * Returns the driver features read from the device
>   */
> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
> +virtio_features_t
> +vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> -
> -	u64 features;
> +	virtio_features_t features;
>  
>  	vp_iowrite32(0, &cfg->guest_feature_select);
>  	features = vp_ioread32(&cfg->guest_feature);
> @@ -435,7 +434,7 @@ EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
>   * @features: the features set to device
>   */
>  void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
> -			    u64 features)
> +			    virtio_features_t features)
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>  
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 1f60c9d5cb181..b92749174885e 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -409,7 +409,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>  	return err;
>  }
>  
> -static u64 virtio_vdpa_get_features(struct virtio_device *vdev)
> +static virtio_features_t virtio_vdpa_get_features(struct virtio_device *vdev)
>  {
>  	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>  	const struct vdpa_config_ops *ops = vdpa->config;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 64cb4b04be7ad..6e51400d04635 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -11,6 +11,7 @@
>  #include <linux/gfp.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/completion.h>
> +#include <linux/virtio_features.h>
>  
>  /**
>   * struct virtqueue - a queue to register buffers for sending or receiving.
> @@ -159,11 +160,11 @@ struct virtio_device {
>  	const struct virtio_config_ops *config;
>  	const struct vringh_config_ops *vringh_config;
>  	struct list_head vqs;
> -	u64 features;
> +	virtio_features_t features;
>  	void *priv;
>  #ifdef CONFIG_VIRTIO_DEBUG
>  	struct dentry *debugfs_dir;
> -	u64 debugfs_filter_features;
> +	virtio_features_t debugfs_filter_features;
>  #endif
>  };
>  
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 169c7d367facb..bff57f675fca7 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -77,7 +77,7 @@ struct virtqueue_info {
>   *      vdev: the virtio_device
>   * @get_features: get the array of feature bits for this device.
>   *	vdev: the virtio_device
> - *	Returns the first 64 feature bits (all we currently need).
> + *	Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
>   * @finalize_features: confirm what device features we'll be using.
>   *	vdev: the virtio_device
>   *	This sends the driver feature bits to the device: it can change
> @@ -120,7 +120,7 @@ struct virtio_config_ops {
>  			struct irq_affinity *desc);
>  	void (*del_vqs)(struct virtio_device *);
>  	void (*synchronize_cbs)(struct virtio_device *);
> -	u64 (*get_features)(struct virtio_device *vdev);
> +	virtio_features_t (*get_features)(struct virtio_device *vdev);
>  	int (*finalize_features)(struct virtio_device *vdev);
>  	const char *(*bus_name)(struct virtio_device *vdev);
>  	int (*set_vq_affinity)(struct virtqueue *vq,
> @@ -149,11 +149,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>  {
>  	/* Did you forget to fix assumptions on max features? */
>  	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>  	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>  
> -	return vdev->features & BIT_ULL(fbit);
> +	return vdev->features & VIRTIO_BIT(fbit);
>  }
>  
>  /**
> @@ -166,11 +166,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
>  {
>  	/* Did you forget to fix assumptions on max features? */
>  	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>  	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>  
> -	vdev->features |= BIT_ULL(fbit);
> +	vdev->features |= VIRTIO_BIT(fbit);
>  }
>  
>  /**
> @@ -183,11 +183,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
>  {
>  	/* Did you forget to fix assumptions on max features? */
>  	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>  	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>  
> -	vdev->features &= ~BIT_ULL(fbit);
> +	vdev->features &= ~VIRTIO_BIT(fbit);
>  }
>  
>  /**
> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> new file mode 100644
> index 0000000000000..2f742eeb45a29
> --- /dev/null
> +++ b/include/linux/virtio_features.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_FEATURES_H
> +#define _LINUX_VIRTIO_FEATURES_H
> +
> +#include <linux/bits.h>
> +
> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> +#define VIRTIO_HAS_EXTENDED_FEATURES
> +#define VIRTIO_FEATURES_MAX	128
> +#define VIRTIO_FEATURES_WORDS	4
> +#define VIRTIO_BIT(b)		_BIT128(b)
> +
> +typedef __uint128_t		virtio_features_t;

Since we are doing it anyway, what about __bitwise ?

We used to have a lot of bugs where people would stick
a bit number where BIT_ULL would be appropriate. These are all
fixed by now, I presume, but sounds like a good thing to have?
I think the changes would be localized to this patch, since
everyone should going through these macros now.


> +
> +#else
> +#define VIRTIO_FEATURES_MAX	64
> +#define VIRTIO_FEATURES_WORDS	2
> +#define VIRTIO_BIT(b)		BIT_ULL(b)

Hmm. We have
#define BIT_ULL(nr)             (ULL(1) << (nr))
So this is undefined behaviour if given bit > 63.


How about 

(nr > 63 ? 0 : BIT_ULL(b))


I think this will automatically make most code correct
on these platforms.

Some ceremony with a temp variable or an inline function
might be good here to avoid evaluating b twice.


> +
> +typedef u64			virtio_features_t;
> +#endif
> +
> +#endif
> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
> index c0b1b1ca11635..e55fbb272b4d3 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_PCI_MODERN_H
>  
>  #include <linux/pci.h>
> +#include <linux/virtio_features.h>
>  #include <linux/virtio_pci.h>
>  
>  /**
> @@ -95,10 +96,14 @@ static inline void vp_iowrite64_twopart(u64 val,
>  	vp_iowrite32(val >> 32, hi);
>  }
>  
> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
> +virtio_features_t
> +vp_modern_get_features(struct virtio_pci_modern_device *mdev);
> +
> +virtio_features_t
> +vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
> +
>  void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
> -		     u64 features);
> +		     virtio_features_t features);
>  u32 vp_modern_generation(struct virtio_pci_modern_device *mdev);
>  u8 vp_modern_get_status(struct virtio_pci_modern_device *mdev);
>  void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> -- 
> 2.49.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 8/8] vhost/net: enable gso over UDP tunnel support.
  2025-05-21 10:32 ` [PATCH net-next 8/8] vhost/net: " Paolo Abeni
@ 2025-05-22  6:43   ` kernel test robot
  2025-05-23 19:54     ` Michael S. Tsirkin
  2025-05-26  4:40   ` Jason Wang
  1 sibling, 1 reply; 59+ messages in thread
From: kernel test robot @ 2025-05-22  6:43 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: llvm, oe-kbuild-all, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Hi Paolo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/virtio-introduce-virtio_features_t/20250521-183700
base:   net-next/main
patch link:    https://lore.kernel.org/r/f95716aed2c65d079cdb10518431088f3e103899.1747822866.git.pabeni%40redhat.com
patch subject: [PATCH net-next 8/8] vhost/net: enable gso over UDP tunnel support.
config: i386-buildonly-randconfig-001-20250522 (https://download.01.org/0day-ci/archive/20250522/202505221428.67HNn025-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505221428.67HNn025-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505221428.67HNn025-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/vhost/net.c:1633:30: warning: shift count >= width of type [-Wshift-count-overflow]
    1633 |         has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
         |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/virtio_features.h:18:24: note: expanded from macro 'VIRTIO_BIT'
      18 | #define VIRTIO_BIT(b)           BIT_ULL(b)
         |                                 ^~~~~~~~~~
   include/vdso/bits.h:8:30: note: expanded from macro 'BIT_ULL'
       8 | #define BIT_ULL(nr)             (ULL(1) << (nr))
         |                                         ^  ~~~~
   drivers/vhost/net.c:1634:9: warning: shift count >= width of type [-Wshift-count-overflow]
    1634 |                                     VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
         |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/virtio_features.h:18:24: note: expanded from macro 'VIRTIO_BIT'
      18 | #define VIRTIO_BIT(b)           BIT_ULL(b)
         |                                 ^~~~~~~~~~
   include/vdso/bits.h:8:30: note: expanded from macro 'BIT_ULL'
       8 | #define BIT_ULL(nr)             (ULL(1) << (nr))
         |                                         ^  ~~~~
   2 warnings generated.


vim +1633 drivers/vhost/net.c

  1622	
  1623	static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
  1624	{
  1625		size_t vhost_hlen, sock_hlen, hdr_len;
  1626		bool has_tunnel;
  1627		int i;
  1628	
  1629		hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
  1630				       (1ULL << VIRTIO_F_VERSION_1))) ?
  1631				sizeof(struct virtio_net_hdr_mrg_rxbuf) :
  1632				sizeof(struct virtio_net_hdr);
> 1633		has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
  1634					    VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
  1635		hdr_len += has_tunnel ? sizeof(struct virtio_net_hdr_tunnel) : 0;
  1636		if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
  1637			/* vhost provides vnet_hdr */
  1638			vhost_hlen = hdr_len;
  1639			sock_hlen = 0;
  1640		} else {
  1641			/* socket provides vnet_hdr */
  1642			vhost_hlen = 0;
  1643			sock_hlen = hdr_len;
  1644		}
  1645		mutex_lock(&n->dev.mutex);
  1646		if ((features & (1 << VHOST_F_LOG_ALL)) &&
  1647		    !vhost_log_access_ok(&n->dev))
  1648			goto out_unlock;
  1649	
  1650		if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
  1651			if (vhost_init_device_iotlb(&n->dev))
  1652				goto out_unlock;
  1653		}
  1654	
  1655		for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
  1656			mutex_lock(&n->vqs[i].vq.mutex);
  1657			n->vqs[i].vq.acked_features = features;
  1658			n->vqs[i].vhost_hlen = vhost_hlen;
  1659			n->vqs[i].sock_hlen = sock_hlen;
  1660			mutex_unlock(&n->vqs[i].vq.mutex);
  1661		}
  1662		mutex_unlock(&n->dev.mutex);
  1663		return 0;
  1664	
  1665	out_unlock:
  1666		mutex_unlock(&n->dev.mutex);
  1667		return -EFAULT;
  1668	}
  1669	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-21 16:02   ` Michael S. Tsirkin
@ 2025-05-22  7:29     ` Paolo Abeni
  2025-05-22 15:26       ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-22  7:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Willem de Bruijn, Jason Wang, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On 5/21/25 6:02 PM, Michael S. Tsirkin wrote:
> On Wed, May 21, 2025 at 12:32:35PM +0200, Paolo Abeni wrote:
>> The virtio specifications allows for up to 128 bits for the
>> device features. Soon we are going to use some of the 'extended'
>> bits features (above 64) for the virtio_net driver.
>>
>> Introduce an specific type to represent the virtio features bitmask.
>> On platform where 128 bits integer are available use such wide int
>> for the features bitmask, otherwise maintain the current u64.
>>
>> Updates all the relevant virtio API to use the new type.
>>
>> Note that legacy and transport features don't need any change, as
>> they are always in the low 64 bit range.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>  drivers/virtio/virtio.c                | 12 ++++++------
>>  drivers/virtio/virtio_mmio.c           |  4 ++--
>>  drivers/virtio/virtio_pci_legacy.c     |  2 +-
>>  drivers/virtio/virtio_pci_modern.c     |  7 ++++---
>>  drivers/virtio/virtio_pci_modern_dev.c | 13 ++++++-------
>>  drivers/virtio/virtio_vdpa.c           |  2 +-
>>  include/linux/virtio.h                 |  5 +++--
>>  include/linux/virtio_config.h          | 22 +++++++++++-----------
>>  include/linux/virtio_features.h        | 23 +++++++++++++++++++++++
>>  include/linux/virtio_pci_modern.h      | 11 ++++++++---
>>  10 files changed, 65 insertions(+), 36 deletions(-)
>>  create mode 100644 include/linux/virtio_features.h
>>
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 95d5d7993e5b1..542735d3a12ba 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -272,9 +272,9 @@ static int virtio_dev_probe(struct device *_d)
>>  	int err, i;
>>  	struct virtio_device *dev = dev_to_virtio(_d);
>>  	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>> -	u64 device_features;
>> -	u64 driver_features;
>> -	u64 driver_features_legacy;
>> +	virtio_features_t device_features;
>> +	virtio_features_t driver_features;
>> +	virtio_features_t driver_features_legacy;
>>  
>>  	/* We have a driver! */
>>  	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> 
> 
>> @@ -286,8 +286,8 @@ static int virtio_dev_probe(struct device *_d)
>>  	driver_features = 0;
>>  	for (i = 0; i < drv->feature_table_size; i++) {
>>  		unsigned int f = drv->feature_table[i];
>> -		BUG_ON(f >= 64);
>> -		driver_features |= (1ULL << f);
>> +		BUG_ON(f >= VIRTIO_FEATURES_MAX);
>> +		driver_features |= VIRTIO_BIT(f);
>>  	}
>>  
>>  	/* Some drivers have a separate feature table for virtio v1.0 */
>> @@ -320,7 +320,7 @@ static int virtio_dev_probe(struct device *_d)
>>  		goto err;
>>  
>>  	if (drv->validate) {
>> -		u64 features = dev->features;
>> +		virtio_features_t features = dev->features;
>>  
>>  		err = drv->validate(dev);
>>  		if (err)
>> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
>> index 5d78c2d572abf..158c47ac67de7 100644
>> --- a/drivers/virtio/virtio_mmio.c
>> +++ b/drivers/virtio/virtio_mmio.c
>> @@ -106,10 +106,10 @@ struct virtio_mmio_vq_info {
>>  
>>  /* Configuration interface */
>>  
>> -static u64 vm_get_features(struct virtio_device *vdev)
>> +static virtio_features_t vm_get_features(struct virtio_device *vdev)
>>  {
>>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>> -	u64 features;
>> +	virtio_features_t features;
>>  
>>  	writel(1, vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES_SEL);
>>  	features = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES);
>> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
>> index d9cbb02b35a11..b2fbc74f74b5c 100644
>> --- a/drivers/virtio/virtio_pci_legacy.c
>> +++ b/drivers/virtio/virtio_pci_legacy.c
>> @@ -18,7 +18,7 @@
>>  #include "virtio_pci_common.h"
>>  
>>  /* virtio config->get_features() implementation */
>> -static u64 vp_get_features(struct virtio_device *vdev)
>> +static virtio_features_t vp_get_features(struct virtio_device *vdev)
>>  {
>>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>  
>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>> index d50fe030d8253..c3e0ddc7ae9ab 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -22,7 +22,7 @@
>>  
>>  #define VIRTIO_AVQ_SGS_MAX	4
>>  
>> -static u64 vp_get_features(struct virtio_device *vdev)
>> +static virtio_features_t vp_get_features(struct virtio_device *vdev)
>>  {
>>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>  
>> @@ -353,7 +353,8 @@ static void vp_modern_avq_cleanup(struct virtio_device *vdev)
>>  	}
>>  }
>>  
>> -static void vp_transport_features(struct virtio_device *vdev, u64 features)
>> +static void vp_transport_features(struct virtio_device *vdev,
>> +				  virtio_features_t features)
>>  {
>>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>  	struct pci_dev *pci_dev = vp_dev->pci_dev;
>> @@ -409,7 +410,7 @@ static int vp_check_common_size(struct virtio_device *vdev)
>>  static int vp_finalize_features(struct virtio_device *vdev)
>>  {
>>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> -	u64 features = vdev->features;
>> +	virtio_features_t features = vdev->features;
>>  
>>  	/* Give virtio_ring a chance to accept features. */
>>  	vring_transport_features(vdev);
>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
>> index 0d3dbfaf4b236..1d34655f6b658 100644
>> --- a/drivers/virtio/virtio_pci_modern_dev.c
>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
>> @@ -393,11 +393,10 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>>   *
>>   * Returns the features read from the device
>>   */
>> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>> +virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>>  {
>>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>> -
>> -	u64 features;
>> +	virtio_features_t features;
>>  
>>  	vp_iowrite32(0, &cfg->device_feature_select);
>>  	features = vp_ioread32(&cfg->device_feature);
>> @@ -414,11 +413,11 @@ EXPORT_SYMBOL_GPL(vp_modern_get_features);
>>   *
>>   * Returns the driver features read from the device
>>   */
>> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
>> +virtio_features_t
>> +vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
>>  {
>>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>> -
>> -	u64 features;
>> +	virtio_features_t features;
>>  
>>  	vp_iowrite32(0, &cfg->guest_feature_select);
>>  	features = vp_ioread32(&cfg->guest_feature);
>> @@ -435,7 +434,7 @@ EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
>>   * @features: the features set to device
>>   */
>>  void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
>> -			    u64 features)
>> +			    virtio_features_t features)
>>  {
>>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>>  
>> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
>> index 1f60c9d5cb181..b92749174885e 100644
>> --- a/drivers/virtio/virtio_vdpa.c
>> +++ b/drivers/virtio/virtio_vdpa.c
>> @@ -409,7 +409,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>  	return err;
>>  }
>>  
>> -static u64 virtio_vdpa_get_features(struct virtio_device *vdev)
>> +static virtio_features_t virtio_vdpa_get_features(struct virtio_device *vdev)
>>  {
>>  	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>>  	const struct vdpa_config_ops *ops = vdpa->config;
>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>> index 64cb4b04be7ad..6e51400d04635 100644
>> --- a/include/linux/virtio.h
>> +++ b/include/linux/virtio.h
>> @@ -11,6 +11,7 @@
>>  #include <linux/gfp.h>
>>  #include <linux/dma-mapping.h>
>>  #include <linux/completion.h>
>> +#include <linux/virtio_features.h>
>>  
>>  /**
>>   * struct virtqueue - a queue to register buffers for sending or receiving.
>> @@ -159,11 +160,11 @@ struct virtio_device {
>>  	const struct virtio_config_ops *config;
>>  	const struct vringh_config_ops *vringh_config;
>>  	struct list_head vqs;
>> -	u64 features;
>> +	virtio_features_t features;
>>  	void *priv;
>>  #ifdef CONFIG_VIRTIO_DEBUG
>>  	struct dentry *debugfs_dir;
>> -	u64 debugfs_filter_features;
>> +	virtio_features_t debugfs_filter_features;
>>  #endif
>>  };
>>  
>> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
>> index 169c7d367facb..bff57f675fca7 100644
>> --- a/include/linux/virtio_config.h
>> +++ b/include/linux/virtio_config.h
>> @@ -77,7 +77,7 @@ struct virtqueue_info {
>>   *      vdev: the virtio_device
>>   * @get_features: get the array of feature bits for this device.
>>   *	vdev: the virtio_device
>> - *	Returns the first 64 feature bits (all we currently need).
>> + *	Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
>>   * @finalize_features: confirm what device features we'll be using.
>>   *	vdev: the virtio_device
>>   *	This sends the driver feature bits to the device: it can change
>> @@ -120,7 +120,7 @@ struct virtio_config_ops {
>>  			struct irq_affinity *desc);
>>  	void (*del_vqs)(struct virtio_device *);
>>  	void (*synchronize_cbs)(struct virtio_device *);
>> -	u64 (*get_features)(struct virtio_device *vdev);
>> +	virtio_features_t (*get_features)(struct virtio_device *vdev);
>>  	int (*finalize_features)(struct virtio_device *vdev);
>>  	const char *(*bus_name)(struct virtio_device *vdev);
>>  	int (*set_vq_affinity)(struct virtqueue *vq,
>> @@ -149,11 +149,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>>  {
>>  	/* Did you forget to fix assumptions on max features? */
>>  	if (__builtin_constant_p(fbit))
>> -		BUILD_BUG_ON(fbit >= 64);
>> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>  	else
>> -		BUG_ON(fbit >= 64);
>> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>  
>> -	return vdev->features & BIT_ULL(fbit);
>> +	return vdev->features & VIRTIO_BIT(fbit);
>>  }
>>  
>>  /**
>> @@ -166,11 +166,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
>>  {
>>  	/* Did you forget to fix assumptions on max features? */
>>  	if (__builtin_constant_p(fbit))
>> -		BUILD_BUG_ON(fbit >= 64);
>> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>  	else
>> -		BUG_ON(fbit >= 64);
>> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>  
>> -	vdev->features |= BIT_ULL(fbit);
>> +	vdev->features |= VIRTIO_BIT(fbit);
>>  }
>>  
>>  /**
>> @@ -183,11 +183,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
>>  {
>>  	/* Did you forget to fix assumptions on max features? */
>>  	if (__builtin_constant_p(fbit))
>> -		BUILD_BUG_ON(fbit >= 64);
>> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>  	else
>> -		BUG_ON(fbit >= 64);
>> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>  
>> -	vdev->features &= ~BIT_ULL(fbit);
>> +	vdev->features &= ~VIRTIO_BIT(fbit);
>>  }
>>  
>>  /**
>> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
>> new file mode 100644
>> index 0000000000000..2f742eeb45a29
>> --- /dev/null
>> +++ b/include/linux/virtio_features.h
>> @@ -0,0 +1,23 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_VIRTIO_FEATURES_H
>> +#define _LINUX_VIRTIO_FEATURES_H
>> +
>> +#include <linux/bits.h>
>> +
>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
>> +#define VIRTIO_HAS_EXTENDED_FEATURES
>> +#define VIRTIO_FEATURES_MAX	128
>> +#define VIRTIO_FEATURES_WORDS	4
>> +#define VIRTIO_BIT(b)		_BIT128(b)
>> +
>> +typedef __uint128_t		virtio_features_t;
> 
> Since we are doing it anyway, what about __bitwise ?

Yep, I will add it in the next revision.

>> +
>> +#else
>> +#define VIRTIO_FEATURES_MAX	64
>> +#define VIRTIO_FEATURES_WORDS	2
>> +#define VIRTIO_BIT(b)		BIT_ULL(b)
> 
> Hmm. We have
> #define BIT_ULL(nr)             (ULL(1) << (nr))
> So this is undefined behaviour if given bit > 63.
> 
> 
> How about 
> 
> (nr > 63 ? 0 : BIT_ULL(b))
> 
> 
> I think this will automatically make most code correct
> on these platforms.

Sounds good. Will add it in the next revision.

BTW I'm wondering if sharing a pull request from a stable tree would be
better, so that you could pull this also in the virtio/vhost tree and
avoid conflicts in the later pull to Linus.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
  2025-05-21 16:02   ` Michael S. Tsirkin
@ 2025-05-22  8:17   ` kernel test robot
  2025-05-26  0:43   ` Jason Wang
  2 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2025-05-22  8:17 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: llvm, oe-kbuild-all, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Hi Paolo,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/virtio-introduce-virtio_features_t/20250521-183700
base:   net-next/main
patch link:    https://lore.kernel.org/r/9a1c198245370c3ec403f14d118cd841df0fcfee.1747822866.git.pabeni%40redhat.com
patch subject: [PATCH net-next 1/8] virtio: introduce virtio_features_t
config: x86_64-buildonly-randconfig-001-20250522 (https://download.01.org/0day-ci/archive/20250522/202505221621.MhvgnFni-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505221621.MhvgnFni-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505221621.MhvgnFni-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/remoteproc/remoteproc_virtio.c:328:18: error: incompatible function pointer types initializing 'virtio_features_t (*)(struct virtio_device *)' (aka 'unsigned __int128 (*)(struct virtio_device *)') with an expression of type 'u64 (struct virtio_device *)' (aka 'unsigned long long (struct virtio_device *)') [-Wincompatible-function-pointer-types]
     328 |         .get_features   = rproc_virtio_get_features,
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~~
   1 error generated.


vim +328 drivers/remoteproc/remoteproc_virtio.c

ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  326  
9350393239153c drivers/remoteproc/remoteproc_virtio.c Stephen Hemminger 2013-02-10  327  static const struct virtio_config_ops rproc_virtio_config_ops = {
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20 @328  	.get_features	= rproc_virtio_get_features,
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  329  	.finalize_features = rproc_virtio_finalize_features,
b49503eaf9c74c drivers/remoteproc/remoteproc_virtio.c Jiri Pirko        2024-07-08  330  	.find_vqs	= rproc_virtio_find_vqs,
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  331  	.del_vqs	= rproc_virtio_del_vqs,
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  332  	.reset		= rproc_virtio_reset,
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  333  	.set_status	= rproc_virtio_set_status,
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  334  	.get_status	= rproc_virtio_get_status,
92b38f851470f8 drivers/remoteproc/remoteproc_virtio.c Sjur Brændeland   2013-02-21  335  	.get		= rproc_virtio_get,
92b38f851470f8 drivers/remoteproc/remoteproc_virtio.c Sjur Brændeland   2013-02-21  336  	.set		= rproc_virtio_set,
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  337  };
ac8954a413930d drivers/remoteproc/remoteproc_rpmsg.c  Ohad Ben-Cohen    2011-10-20  338  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-05-21 10:32 ` [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
@ 2025-05-22  8:38   ` kernel test robot
  2025-05-22 22:33   ` Willem de Bruijn
  1 sibling, 0 replies; 59+ messages in thread
From: kernel test robot @ 2025-05-22  8:38 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: oe-kbuild-all, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Hi Paolo,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/virtio-introduce-virtio_features_t/20250521-183700
base:   net-next/main
patch link:    https://lore.kernel.org/r/239bacdac9febd6f604f43fa8571aa2c44fd0f0b.1747822866.git.pabeni%40redhat.com
patch subject: [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support.
config: i386-randconfig-011-20250522 (https://download.01.org/0day-ci/archive/20250522/202505221624.32GrJRU2-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505221624.32GrJRU2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505221624.32GrJRU2-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
   In function '__virtio_test_bit',
       inlined from 'virtio_has_feature' at include/linux/virtio_config.h:204:9,
       inlined from 'virtnet_probe' at drivers/net/virtio_net.c:6805:7:
>> include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_792' declared with attribute error: BUILD_BUG_ON failed: fbit >= VIRTIO_FEATURES_MAX
     557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert'
     538 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert'
     557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^~~~~~~~~~~~~~~~
   include/linux/virtio_config.h:152:17: note: in expansion of macro 'BUILD_BUG_ON'
     152 |                 BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
         |                 ^~~~~~~~~~~~


vim +/__compiletime_assert_792 +557 include/linux/compiler_types.h

eb5c2d4b45e3d2 Will Deacon 2020-07-21  543  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  544  #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21  545  	__compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  546  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  547  /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21  548   * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  549   * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21  550   * @msg:       a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  551   *
eb5c2d4b45e3d2 Will Deacon 2020-07-21  552   * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  553   * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  554   * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21  555   */
eb5c2d4b45e3d2 Will Deacon 2020-07-21  556  #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @557  	_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  558  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-22  7:29     ` Paolo Abeni
@ 2025-05-22 15:26       ` Paolo Abeni
  2025-05-23 19:50         ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-22 15:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Willem de Bruijn, Jason Wang, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On 5/22/25 9:29 AM, Paolo Abeni wrote:
> On 5/21/25 6:02 PM, Michael S. Tsirkin wrote:
>> On Wed, May 21, 2025 at 12:32:35PM +0200, Paolo Abeni wrote:
>>> +++ b/include/linux/virtio_features.h
>>> @@ -0,0 +1,23 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef _LINUX_VIRTIO_FEATURES_H
>>> +#define _LINUX_VIRTIO_FEATURES_H
>>> +
>>> +#include <linux/bits.h>
>>> +
>>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
>>> +#define VIRTIO_HAS_EXTENDED_FEATURES
>>> +#define VIRTIO_FEATURES_MAX	128
>>> +#define VIRTIO_FEATURES_WORDS	4
>>> +#define VIRTIO_BIT(b)		_BIT128(b)
>>> +
>>> +typedef __uint128_t		virtio_features_t;
>>
>> Since we are doing it anyway, what about __bitwise ?
> 
> Yep, I will add it in the next revision.

Uhm... this is actually problematic, as a key point of keeping the
diffstat manageable is converting only the relevant drivers to use the
extended features set - and adjust accordingly local variables and
expressions.

The above means that in other devices a lot of code relies on extended
features being (harmlessly, because nobody is going to set the highest
bits for such features) downgraded to u64, or u64 promoted to
virtio_features_t.

The __bitwise annotation generates warning for each of them; avoiding
that warning require touching the same code I wanted to leave unmodified
(and bring back a terrible diffstat).

/P

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-21 10:32 ` [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
@ 2025-05-22 22:29   ` Willem de Bruijn
  2025-05-23  6:09     ` Paolo Abeni
  2025-05-26  4:40   ` Jason Wang
  1 sibling, 1 reply; 59+ messages in thread
From: Willem de Bruijn @ 2025-05-22 22:29 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Paolo Abeni wrote:
> The virtio specification are introducing support for GSO over
> UDP tunnel.
> 
> This patch brings in the needed defines and the additional
> virtio hdr parsing/building helpers.
> 
> The UDP tunnel support uses additional fields in the virtio hdr,
> and such fields location can change depending on other negotiated
> features - specifically VIRTIO_NET_F_HASH_REPORT.
> 
> Try to be as conservative as possible with the new field validation.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

No major concerns from me on this series. Much of the design
conversations took place earlier on the virtio list.

Maybe consider test coverage. If end-to-end testing requires qemu,
then perhaps KUnit is more suitable for testing basinc to/from skb
transformations. Just a thought.

> ---
>  include/linux/virtio_net.h      | 177 ++++++++++++++++++++++++++++++--
>  include/uapi/linux/virtio_net.h |  33 ++++++
>  2 files changed, 202 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 02a9f4dc594d0..cf9c712a67cd4 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
>  	return 0;
>  }
>  
> +static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
> +					    const struct virtio_net_hdr *hdr,
> +					    unsigned int tnl_hdr_offset,
> +					    bool tnl_csum_negotiated,
> +					    bool little_endian)
> +{
> +	u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
> +	unsigned int inner_nh, outer_th, inner_th;
> +	unsigned int inner_l3min, outer_l3min;
> +	struct virtio_net_hdr_tunnel *tnl;
> +	u8 gso_inner_type;
> +	bool outer_isv6;
> +	int ret;
> +
> +	if (!gso_tunnel_type)
> +		return virtio_net_hdr_to_skb(skb, hdr, little_endian);
> +
> +	/* Tunnel not supported/negotiated, but the hdr asks for it. */
> +	if (!tnl_hdr_offset)
> +		return -EINVAL;
> +
> +	/* Either ipv4 or ipv6. */
> +	if (gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 &&
> +	    gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
> +		return -EINVAL;
> +
> +	/* No UDP fragments over UDP tunnel. */

What are udp fragments and why is TCP with ECN not supported?

> +	gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
> +					   gso_tunnel_type);
> +	if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
> +		return -EINVAL;
> +
> +	/* Relay on csum being present. */

Rely

>  #endif /* _LINUX_VIRTIO_NET_H */
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 963540deae66a..1f1ff88a5749f 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -70,6 +70,28 @@
>  					 * with the same MAC.
>  					 */
>  #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
> +					      * GSO-over-UDP-tunnel packets
> +					      */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
> +						   * GSO-over-UDP-tunnel
> +						   * packets with partial csum
> +						   * for the outer header
> +						   */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
> +					     * GSO-over-UDP-tunnel packets
> +					     */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
> +						  * GSO-over-UDP-tunnel
> +						  * packets with partial csum
> +						  * for the outer header
> +						  */
> +
> +/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
> + * features
> + */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47

I don't quite follow this. These are not real virtio bits?
  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-05-21 10:32 ` [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
  2025-05-22  8:38   ` kernel test robot
@ 2025-05-22 22:33   ` Willem de Bruijn
  1 sibling, 0 replies; 59+ messages in thread
From: Willem de Bruijn @ 2025-05-22 22:33 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

Paolo Abeni wrote:
> If the related virtio feature is set, enable transmission and reception
> of gso over UDP tunnel packets.
> 
> Most of the work is done by the previously introduced helper, just need
> to determine the UDP tunnel features inside the virtio_net_hdr and
> update accordingly the virtio net hdr size.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/virtio_net.c | 78 +++++++++++++++++++++++++++++++---------
>  1 file changed, 62 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 71a972f20f19b..3ca275ab887fe 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -46,11 +46,6 @@ static bool virtio_is_mapped_offload(unsigned int obit)
>  	       obit <= VIRTIO_OFFLOAD_MAP_MAX;
>  }
>  
> -#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)	\
> -	({								\
> -		unsigned int __f = fbit;				\
> -		__f >= VIRTIO_FEATURES_MAP_MIN ? __f - VIRTIO_O2F_DELTA : __f; \
> -	})

This was introduced two patches ago. Never used. Remove entirely from the series.

>  #define VIRTIO_OFFLOAD_TO_FEATURE(obit)	\
>  	({								\
>  		unsigned int __o = obit;				\
> @@ -85,16 +80,30 @@ static const unsigned long guest_offloads[] = {
>  	VIRTIO_NET_F_GUEST_CSUM,
>  	VIRTIO_NET_F_GUEST_USO4,
>  	VIRTIO_NET_F_GUEST_USO6,
> -	VIRTIO_NET_F_GUEST_HDRLEN
> +	VIRTIO_NET_F_GUEST_HDRLEN,
> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> +	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
> +	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
> +#endif
>  };
>  
> -#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> +#define __GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
>  				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
>  				(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
>  				(1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
>  				(1ULL << VIRTIO_NET_F_GUEST_USO4) | \
>  				(1ULL << VIRTIO_NET_F_GUEST_USO6))
>  
> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> +
> +#define GUEST_OFFLOAD_GRO_HW_MASK (__GUEST_OFFLOAD_GRO_HW_MASK | \
> +	(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
> +	(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
> +#else
> +
> +#define GUEST_OFFLOAD_GRO_HW_MASK __GUEST_OFFLOAD_GRO_HW_MASK
> +#endif
> +
>  struct virtnet_stat_desc {
>  	char desc[ETH_GSTRING_LEN];
>  	size_t offset;
> @@ -443,9 +452,14 @@ struct virtnet_info {
>  	/* Packet virtio header size */
>  	u8 hdr_len;
>  
> +	/* UDP tunnel support*/

space before closing asterisk

> +	u8 tnl_offset;
> +
>  	/* Work struct for delayed refilling if we run low on memory. */
>  	struct delayed_work refill;
>  
> +	bool rx_tnl_csum;
> +

There are an awful lot of non consecutive bools here. Probably would
be a nice cleanup to conver to an integer bitfield. Maybe not for this
series.

>  	/* Is delayed refill enabled? */
>  	bool refill_enabled;
>  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-22 22:29   ` Willem de Bruijn
@ 2025-05-23  6:09     ` Paolo Abeni
  2025-05-23  6:44       ` Paolo Abeni
  2025-05-23 13:42       ` Willem de Bruijn
  0 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-23  6:09 UTC (permalink / raw)
  To: Willem de Bruijn, netdev
  Cc: Jason Wang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez

On 5/23/25 12:29 AM, Willem de Bruijn wrote:
> Paolo Abeni wrote:
>> The virtio specification are introducing support for GSO over
>> UDP tunnel.
>>
>> This patch brings in the needed defines and the additional
>> virtio hdr parsing/building helpers.
>>
>> The UDP tunnel support uses additional fields in the virtio hdr,
>> and such fields location can change depending on other negotiated
>> features - specifically VIRTIO_NET_F_HASH_REPORT.
>>
>> Try to be as conservative as possible with the new field validation.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> No major concerns from me on this series. Much of the design
> conversations took place earlier on the virtio list.
> 
> Maybe consider test coverage. If end-to-end testing requires qemu,
> then perhaps KUnit is more suitable for testing basinc to/from skb
> transformations. Just a thought.

My current idea is to follow-up on:

https://lore.kernel.org/netdev/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com/

extending such infra to vhost/virtio, and implement GSO-over-UDP-tunnel
transfer with/without negotiated features on top of that.

In the longer term such infra could be used to have good code coverage
for virtio/vhost bundled into the kernel self-tests.

I hope it could be a follow-up, because I guess this series (and
especially the user-land counter-part) is going to b̵e̵ ̵a̵n̵ ̵h̵u̵g̵e̵
̵b̵l̵o̵o̵d̵b̵a̵t̵h̵  to take some time and effort in the current form.

>> ---
>>  include/linux/virtio_net.h      | 177 ++++++++++++++++++++++++++++++--
>>  include/uapi/linux/virtio_net.h |  33 ++++++
>>  2 files changed, 202 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
>> index 02a9f4dc594d0..cf9c712a67cd4 100644
>> --- a/include/linux/virtio_net.h
>> +++ b/include/linux/virtio_net.h
>> @@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
>>  	return 0;
>>  }
>>  
>> +static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
>> +					    const struct virtio_net_hdr *hdr,
>> +					    unsigned int tnl_hdr_offset,
>> +					    bool tnl_csum_negotiated,
>> +					    bool little_endian)
>> +{
>> +	u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
>> +	unsigned int inner_nh, outer_th, inner_th;
>> +	unsigned int inner_l3min, outer_l3min;
>> +	struct virtio_net_hdr_tunnel *tnl;
>> +	u8 gso_inner_type;
>> +	bool outer_isv6;
>> +	int ret;
>> +
>> +	if (!gso_tunnel_type)
>> +		return virtio_net_hdr_to_skb(skb, hdr, little_endian);
>> +
>> +	/* Tunnel not supported/negotiated, but the hdr asks for it. */
>> +	if (!tnl_hdr_offset)
>> +		return -EINVAL;
>> +
>> +	/* Either ipv4 or ipv6. */
>> +	if (gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 &&
>> +	    gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
>> +		return -EINVAL;
>> +
>> +	/* No UDP fragments over UDP tunnel. */
> 
> What are udp fragments and why is TCP with ECN not supported?

"udp fragments" is the syncopated form of "UDP datagrams carryed by IP
fragments". I'll use UFO to be clearer ;)

The ECN part is cargo cult on my side from my original implementation
which dates back to ... a lot of time ago. A quick recheck makes me
think I could drop it. I'll have a better look and either document the
choice or drop the check in the next revision.

>> +	gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
>> +					   gso_tunnel_type);
>> +	if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
>> +		return -EINVAL;
>> +
>> +	/* Relay on csum being present. */
> 
> Rely
> 
>>  #endif /* _LINUX_VIRTIO_NET_H */
>> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
>> index 963540deae66a..1f1ff88a5749f 100644
>> --- a/include/uapi/linux/virtio_net.h
>> +++ b/include/uapi/linux/virtio_net.h
>> @@ -70,6 +70,28 @@
>>  					 * with the same MAC.
>>  					 */
>>  #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
>> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
>> +					      * GSO-over-UDP-tunnel packets
>> +					      */
>> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
>> +						   * GSO-over-UDP-tunnel
>> +						   * packets with partial csum
>> +						   * for the outer header
>> +						   */
>> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
>> +					     * GSO-over-UDP-tunnel packets
>> +					     */
>> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
>> +						  * GSO-over-UDP-tunnel
>> +						  * packets with partial csum
>> +						  * for the outer header
>> +						  */
>> +
>> +/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
>> + * features
>> + */
>> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
>> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
> 
> I don't quite follow this. These are not real virtio bits?

This comes directly from the recent follow-up on the virtio
specification. While the features space has been extended to 128 bit,
the 'guest offload' space is still 64bit. The 'guest offload' are
used/defined by the specification for the
VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command, which allows the guest do
dynamically enable/disable H/W GRO at runtime.

Up to ~now each offload bit corresponded to the feature bit with the
same value and vice versa.

Due to the limited 'guest offload' space, relevant features in the high
64 bits are 'mapped' to free bits in the lower range. That is simpler
than defining a new command (and associated features) to exchange an
extended guest offloads set.

It's also not a problem from a 'guest offload' space exhaustion PoV
because there are a lot of features in the lower 64 bits range that are
_not_ guest offloads and could be reused for mapping - among them the
'reserved features' that started this somewhat problematic features
space expansion.

For more details:

https://lore.kernel.org/virtio-comment/6af50c9ada76d8168d248827e4af7c44bdfa34a8.1747826378.git.pabeni@redhat.com/T/#u
https://lore.kernel.org/virtio-comment/68c4e73a-fa9e-4e2e-8c38-ed4a322bf47e@redhat.com/

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-23  6:09     ` Paolo Abeni
@ 2025-05-23  6:44       ` Paolo Abeni
  2025-05-23 13:42       ` Willem de Bruijn
  1 sibling, 0 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-23  6:44 UTC (permalink / raw)
  To: Willem de Bruijn, netdev
  Cc: Jason Wang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez

On 5/23/25 8:09 AM, Paolo Abeni wrote:
> On 5/23/25 12:29 AM, Willem de Bruijn wrote:
>> Paolo Abeni wrote:
>>> +	/* No UDP fragments over UDP tunnel. */
>>
>> What are udp fragments and why is TCP with ECN not supported?
> 
> "udp fragments" is the syncopated form of "UDP datagrams carryed by IP
> fragments". I'll use UFO to be clearer ;)
> 
> The ECN part is cargo cult on my side from my original implementation
> which dates back to ... a lot of time ago. A quick recheck makes me
> think I could drop it. I'll have a better look and either document the
> choice or drop the check in the next revision.

Let me quote the relevant code:

>>> +	gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
>>> +					   gso_tunnel_type);
>>> +	if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
>>> +		return -EINVAL;

Actually GSO_ECN is allowed. What is _not_ allowed is the GSO_ECN
offload without a paired plain GSO. The intention here is to ensure that
the GSO over UDP tunnel packets actually includes/requires an inner GSO
offload. I'll update the comment accordingly.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-23  6:09     ` Paolo Abeni
  2025-05-23  6:44       ` Paolo Abeni
@ 2025-05-23 13:42       ` Willem de Bruijn
  2025-05-23 14:00         ` Paolo Abeni
  1 sibling, 1 reply; 59+ messages in thread
From: Willem de Bruijn @ 2025-05-23 13:42 UTC (permalink / raw)
  To: Paolo Abeni, Willem de Bruijn, netdev
  Cc: Jason Wang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez

Paolo Abeni wrote:
> On 5/23/25 12:29 AM, Willem de Bruijn wrote:
> > Paolo Abeni wrote:
> >> The virtio specification are introducing support for GSO over
> >> UDP tunnel.
> >>
> >> This patch brings in the needed defines and the additional
> >> virtio hdr parsing/building helpers.
> >>
> >> The UDP tunnel support uses additional fields in the virtio hdr,
> >> and such fields location can change depending on other negotiated
> >> features - specifically VIRTIO_NET_F_HASH_REPORT.
> >>
> >> Try to be as conservative as possible with the new field validation.
> >>
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > 
> > No major concerns from me on this series. Much of the design
> > conversations took place earlier on the virtio list.
> > 
> > Maybe consider test coverage. If end-to-end testing requires qemu,
> > then perhaps KUnit is more suitable for testing basinc to/from skb
> > transformations. Just a thought.
> 
> My current idea is to follow-up on:
> 
> https://lore.kernel.org/netdev/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com/
> 
> extending such infra to vhost/virtio, and implement GSO-over-UDP-tunnel
> transfer with/without negotiated features on top of that.
> 
> In the longer term such infra could be used to have good code coverage
> for virtio/vhost bundled into the kernel self-tests.
> 
> I hope it could be a follow-up,

SGTM!

Syzkaller will also give us coverage for the extended virtio_net_hdr
format. It has found many creative uses of that header before.

I did see the offset integrity checks you introduced when parsing the
header. Which is exactly what is needed to avoid such frivolous abuse.
They looked sufficient to me too.

> >> +/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
> >> + * features
> >> + */
> >> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
> >> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
> > 
> > I don't quite follow this. These are not real virtio bits?
> 
> This comes directly from the recent follow-up on the virtio
> specification. While the features space has been extended to 128 bit,
> the 'guest offload' space is still 64bit. The 'guest offload' are
> used/defined by the specification for the
> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command, which allows the guest do
> dynamically enable/disable H/W GRO at runtime.
> 
> Up to ~now each offload bit corresponded to the feature bit with the
> same value and vice versa.
> 
> Due to the limited 'guest offload' space, relevant features in the high
> 64 bits are 'mapped' to free bits in the lower range. That is simpler
> than defining a new command (and associated features) to exchange an
> extended guest offloads set.
> 
> It's also not a problem from a 'guest offload' space exhaustion PoV
> because there are a lot of features in the lower 64 bits range that are
> _not_ guest offloads and could be reused for mapping - among them the
> 'reserved features' that started this somewhat problematic features
> space expansion.
> 

That's a great explanation thanks. Can you add it either in the commit
message or as a comment at these definitions?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-23 13:42       ` Willem de Bruijn
@ 2025-05-23 14:00         ` Paolo Abeni
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-23 14:00 UTC (permalink / raw)
  To: Willem de Bruijn, netdev
  Cc: Jason Wang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez

On 5/23/25 3:42 PM, Willem de Bruijn wrote:
> Paolo Abeni wrote:
>> On 5/23/25 12:29 AM, Willem de Bruijn wrote:
>>> Paolo Abeni wrote:
>>>> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
>>>> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
>>>
>>> I don't quite follow this. These are not real virtio bits?
>>
>> This comes directly from the recent follow-up on the virtio
>> specification. While the features space has been extended to 128 bit,
>> the 'guest offload' space is still 64bit. The 'guest offload' are
>> used/defined by the specification for the
>> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command, which allows the guest do
>> dynamically enable/disable H/W GRO at runtime.
>>
>> Up to ~now each offload bit corresponded to the feature bit with the
>> same value and vice versa.
>>
>> Due to the limited 'guest offload' space, relevant features in the high
>> 64 bits are 'mapped' to free bits in the lower range. That is simpler
>> than defining a new command (and associated features) to exchange an
>> extended guest offloads set.
>>
>> It's also not a problem from a 'guest offload' space exhaustion PoV
>> because there are a lot of features in the lower 64 bits range that are
>> _not_ guest offloads and could be reused for mapping - among them the
>> 'reserved features' that started this somewhat problematic features
>> space expansion.
> 
> That's a great explanation thanks. Can you add it either in the commit
> message or as a comment at these definitions?

Sure, I'll add it to the commit message.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-22 15:26       ` Paolo Abeni
@ 2025-05-23 19:50         ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-23 19:50 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Jason Wang, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 22, 2025 at 05:26:33PM +0200, Paolo Abeni wrote:
> On 5/22/25 9:29 AM, Paolo Abeni wrote:
> > On 5/21/25 6:02 PM, Michael S. Tsirkin wrote:
> >> On Wed, May 21, 2025 at 12:32:35PM +0200, Paolo Abeni wrote:
> >>> +++ b/include/linux/virtio_features.h
> >>> @@ -0,0 +1,23 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>> +#ifndef _LINUX_VIRTIO_FEATURES_H
> >>> +#define _LINUX_VIRTIO_FEATURES_H
> >>> +
> >>> +#include <linux/bits.h>
> >>> +
> >>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> >>> +#define VIRTIO_HAS_EXTENDED_FEATURES
> >>> +#define VIRTIO_FEATURES_MAX	128
> >>> +#define VIRTIO_FEATURES_WORDS	4
> >>> +#define VIRTIO_BIT(b)		_BIT128(b)
> >>> +
> >>> +typedef __uint128_t		virtio_features_t;
> >>
> >> Since we are doing it anyway, what about __bitwise ?
> > 
> > Yep, I will add it in the next revision.
> 
> Uhm... this is actually problematic, as a key point of keeping the
> diffstat manageable is converting only the relevant drivers to use the
> extended features set - and adjust accordingly local variables and
> expressions.
> 
> The above means that in other devices a lot of code relies on extended
> features being (harmlessly, because nobody is going to set the highest
> bits for such features) downgraded to u64, or u64 promoted to
> virtio_features_t.
> 
> The __bitwise annotation generates warning for each of them; avoiding
> that warning require touching the same code I wanted to leave unmodified
> (and bring back a terrible diffstat).
> 
> /P

I am not insisting here, we can do it later as a patch on top.
But - could you give an example pls?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 8/8] vhost/net: enable gso over UDP tunnel support.
  2025-05-22  6:43   ` kernel test robot
@ 2025-05-23 19:54     ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-23 19:54 UTC (permalink / raw)
  To: kernel test robot
  Cc: Paolo Abeni, netdev, llvm, oe-kbuild-all, Willem de Bruijn,
	Jason Wang, Andrew Lunn, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 22, 2025 at 02:43:50PM +0800, kernel test robot wrote:
> Hi Paolo,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on net-next/main]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/virtio-introduce-virtio_features_t/20250521-183700
> base:   net-next/main
> patch link:    https://lore.kernel.org/r/f95716aed2c65d079cdb10518431088f3e103899.1747822866.git.pabeni%40redhat.com
> patch subject: [PATCH net-next 8/8] vhost/net: enable gso over UDP tunnel support.
> config: i386-buildonly-randconfig-001-20250522 (https://download.01.org/0day-ci/archive/20250522/202505221428.67HNn025-lkp@intel.com/config)
> compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505221428.67HNn025-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202505221428.67HNn025-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> drivers/vhost/net.c:1633:30: warning: shift count >= width of type [-Wshift-count-overflow]
>     1633 |         has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
>          |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    include/linux/virtio_features.h:18:24: note: expanded from macro 'VIRTIO_BIT'
>       18 | #define VIRTIO_BIT(b)           BIT_ULL(b)
>          |                                 ^~~~~~~~~~
>    include/vdso/bits.h:8:30: note: expanded from macro 'BIT_ULL'
>        8 | #define BIT_ULL(nr)             (ULL(1) << (nr))
>          |                                         ^  ~~~~
>    drivers/vhost/net.c:1634:9: warning: shift count >= width of type [-Wshift-count-overflow]
>     1634 |                                     VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
>          |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


yep, this is why I suggested making VIRTIO_BIT(any value > 63) simply 0 on 32 bit.

>    include/linux/virtio_features.h:18:24: note: expanded from macro 'VIRTIO_BIT'
>       18 | #define VIRTIO_BIT(b)           BIT_ULL(b)
>          |                                 ^~~~~~~~~~
>    include/vdso/bits.h:8:30: note: expanded from macro 'BIT_ULL'
>        8 | #define BIT_ULL(nr)             (ULL(1) << (nr))
>          |                                         ^  ~~~~
>    2 warnings generated.
> 
> 
> vim +1633 drivers/vhost/net.c
> 
>   1622	
>   1623	static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>   1624	{
>   1625		size_t vhost_hlen, sock_hlen, hdr_len;
>   1626		bool has_tunnel;
>   1627		int i;
>   1628	
>   1629		hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
>   1630				       (1ULL << VIRTIO_F_VERSION_1))) ?
>   1631				sizeof(struct virtio_net_hdr_mrg_rxbuf) :
>   1632				sizeof(struct virtio_net_hdr);
> > 1633		has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
>   1634					    VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
>   1635		hdr_len += has_tunnel ? sizeof(struct virtio_net_hdr_tunnel) : 0;
>   1636		if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
>   1637			/* vhost provides vnet_hdr */
>   1638			vhost_hlen = hdr_len;
>   1639			sock_hlen = 0;
>   1640		} else {
>   1641			/* socket provides vnet_hdr */
>   1642			vhost_hlen = 0;
>   1643			sock_hlen = hdr_len;
>   1644		}
>   1645		mutex_lock(&n->dev.mutex);
>   1646		if ((features & (1 << VHOST_F_LOG_ALL)) &&
>   1647		    !vhost_log_access_ok(&n->dev))
>   1648			goto out_unlock;
>   1649	
>   1650		if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
>   1651			if (vhost_init_device_iotlb(&n->dev))
>   1652				goto out_unlock;
>   1653		}
>   1654	
>   1655		for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
>   1656			mutex_lock(&n->vqs[i].vq.mutex);
>   1657			n->vqs[i].vq.acked_features = features;
>   1658			n->vqs[i].vhost_hlen = vhost_hlen;
>   1659			n->vqs[i].sock_hlen = sock_hlen;
>   1660			mutex_unlock(&n->vqs[i].vq.mutex);
>   1661		}
>   1662		mutex_unlock(&n->dev.mutex);
>   1663		return 0;
>   1664	
>   1665	out_unlock:
>   1666		mutex_unlock(&n->dev.mutex);
>   1667		return -EFAULT;
>   1668	}
>   1669	
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
  2025-05-21 16:02   ` Michael S. Tsirkin
  2025-05-22  8:17   ` kernel test robot
@ 2025-05-26  0:43   ` Jason Wang
  2025-05-26  7:20     ` Paolo Abeni
  2 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-26  0:43 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
>
> Introduce an specific type to represent the virtio features bitmask.
> On platform where 128 bits integer are available use such wide int
> for the features bitmask, otherwise maintain the current u64.
>
> Updates all the relevant virtio API to use the new type.
>
> Note that legacy and transport features don't need any change, as
> they are always in the low 64 bit range.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/virtio/virtio.c                | 12 ++++++------
>  drivers/virtio/virtio_mmio.c           |  4 ++--
>  drivers/virtio/virtio_pci_legacy.c     |  2 +-
>  drivers/virtio/virtio_pci_modern.c     |  7 ++++---
>  drivers/virtio/virtio_pci_modern_dev.c | 13 ++++++-------
>  drivers/virtio/virtio_vdpa.c           |  2 +-
>  include/linux/virtio.h                 |  5 +++--
>  include/linux/virtio_config.h          | 22 +++++++++++-----------
>  include/linux/virtio_features.h        | 23 +++++++++++++++++++++++
>  include/linux/virtio_pci_modern.h      | 11 ++++++++---
>  10 files changed, 65 insertions(+), 36 deletions(-)
>  create mode 100644 include/linux/virtio_features.h
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 95d5d7993e5b1..542735d3a12ba 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -272,9 +272,9 @@ static int virtio_dev_probe(struct device *_d)
>         int err, i;
>         struct virtio_device *dev = dev_to_virtio(_d);
>         struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> -       u64 device_features;
> -       u64 driver_features;
> -       u64 driver_features_legacy;
> +       virtio_features_t device_features;
> +       virtio_features_t driver_features;
> +       virtio_features_t driver_features_legacy;
>
>         /* We have a driver! */
>         virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> @@ -286,8 +286,8 @@ static int virtio_dev_probe(struct device *_d)
>         driver_features = 0;
>         for (i = 0; i < drv->feature_table_size; i++) {
>                 unsigned int f = drv->feature_table[i];
> -               BUG_ON(f >= 64);
> -               driver_features |= (1ULL << f);
> +               BUG_ON(f >= VIRTIO_FEATURES_MAX);
> +               driver_features |= VIRTIO_BIT(f);
>         }
>
>         /* Some drivers have a separate feature table for virtio v1.0 */
> @@ -320,7 +320,7 @@ static int virtio_dev_probe(struct device *_d)
>                 goto err;
>
>         if (drv->validate) {
> -               u64 features = dev->features;
> +               virtio_features_t features = dev->features;
>
>                 err = drv->validate(dev);
>                 if (err)
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 5d78c2d572abf..158c47ac67de7 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -106,10 +106,10 @@ struct virtio_mmio_vq_info {
>
>  /* Configuration interface */
>
> -static u64 vm_get_features(struct virtio_device *vdev)
> +static virtio_features_t vm_get_features(struct virtio_device *vdev)
>  {
>         struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> -       u64 features;
> +       virtio_features_t features;
>
>         writel(1, vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES_SEL);
>         features = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES);
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index d9cbb02b35a11..b2fbc74f74b5c 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -18,7 +18,7 @@
>  #include "virtio_pci_common.h"
>
>  /* virtio config->get_features() implementation */
> -static u64 vp_get_features(struct virtio_device *vdev)
> +static virtio_features_t vp_get_features(struct virtio_device *vdev)
>  {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index d50fe030d8253..c3e0ddc7ae9ab 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -22,7 +22,7 @@
>
>  #define VIRTIO_AVQ_SGS_MAX     4
>
> -static u64 vp_get_features(struct virtio_device *vdev)
> +static virtio_features_t vp_get_features(struct virtio_device *vdev)
>  {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>
> @@ -353,7 +353,8 @@ static void vp_modern_avq_cleanup(struct virtio_device *vdev)
>         }
>  }
>
> -static void vp_transport_features(struct virtio_device *vdev, u64 features)
> +static void vp_transport_features(struct virtio_device *vdev,
> +                                 virtio_features_t features)
>  {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>         struct pci_dev *pci_dev = vp_dev->pci_dev;
> @@ -409,7 +410,7 @@ static int vp_check_common_size(struct virtio_device *vdev)
>  static int vp_finalize_features(struct virtio_device *vdev)
>  {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> -       u64 features = vdev->features;
> +       virtio_features_t features = vdev->features;
>
>         /* Give virtio_ring a chance to accept features. */
>         vring_transport_features(vdev);
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 0d3dbfaf4b236..1d34655f6b658 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -393,11 +393,10 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>   *
>   * Returns the features read from the device
>   */
> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> +virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> -
> -       u64 features;
> +       virtio_features_t features;
>
>         vp_iowrite32(0, &cfg->device_feature_select);
>         features = vp_ioread32(&cfg->device_feature);
> @@ -414,11 +413,11 @@ EXPORT_SYMBOL_GPL(vp_modern_get_features);
>   *
>   * Returns the driver features read from the device
>   */
> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
> +virtio_features_t
> +vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> -
> -       u64 features;
> +       virtio_features_t features;
>
>         vp_iowrite32(0, &cfg->guest_feature_select);
>         features = vp_ioread32(&cfg->guest_feature);
> @@ -435,7 +434,7 @@ EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
>   * @features: the features set to device
>   */
>  void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
> -                           u64 features)
> +                           virtio_features_t features)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 1f60c9d5cb181..b92749174885e 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -409,7 +409,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>         return err;
>  }
>
> -static u64 virtio_vdpa_get_features(struct virtio_device *vdev)
> +static virtio_features_t virtio_vdpa_get_features(struct virtio_device *vdev)
>  {
>         struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>         const struct vdpa_config_ops *ops = vdpa->config;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 64cb4b04be7ad..6e51400d04635 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -11,6 +11,7 @@
>  #include <linux/gfp.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/completion.h>
> +#include <linux/virtio_features.h>
>
>  /**
>   * struct virtqueue - a queue to register buffers for sending or receiving.
> @@ -159,11 +160,11 @@ struct virtio_device {
>         const struct virtio_config_ops *config;
>         const struct vringh_config_ops *vringh_config;
>         struct list_head vqs;
> -       u64 features;
> +       virtio_features_t features;
>         void *priv;
>  #ifdef CONFIG_VIRTIO_DEBUG
>         struct dentry *debugfs_dir;
> -       u64 debugfs_filter_features;
> +       virtio_features_t debugfs_filter_features;
>  #endif
>  };
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 169c7d367facb..bff57f675fca7 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -77,7 +77,7 @@ struct virtqueue_info {
>   *      vdev: the virtio_device
>   * @get_features: get the array of feature bits for this device.
>   *     vdev: the virtio_device
> - *     Returns the first 64 feature bits (all we currently need).
> + *     Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
>   * @finalize_features: confirm what device features we'll be using.
>   *     vdev: the virtio_device
>   *     This sends the driver feature bits to the device: it can change
> @@ -120,7 +120,7 @@ struct virtio_config_ops {
>                         struct irq_affinity *desc);
>         void (*del_vqs)(struct virtio_device *);
>         void (*synchronize_cbs)(struct virtio_device *);
> -       u64 (*get_features)(struct virtio_device *vdev);
> +       virtio_features_t (*get_features)(struct virtio_device *vdev);
>         int (*finalize_features)(struct virtio_device *vdev);
>         const char *(*bus_name)(struct virtio_device *vdev);
>         int (*set_vq_affinity)(struct virtqueue *vq,
> @@ -149,11 +149,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>  {
>         /* Did you forget to fix assumptions on max features? */
>         if (__builtin_constant_p(fbit))
> -               BUILD_BUG_ON(fbit >= 64);
> +               BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>         else
> -               BUG_ON(fbit >= 64);
> +               BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>
> -       return vdev->features & BIT_ULL(fbit);
> +       return vdev->features & VIRTIO_BIT(fbit);
>  }
>
>  /**
> @@ -166,11 +166,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
>  {
>         /* Did you forget to fix assumptions on max features? */
>         if (__builtin_constant_p(fbit))
> -               BUILD_BUG_ON(fbit >= 64);
> +               BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>         else
> -               BUG_ON(fbit >= 64);
> +               BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>
> -       vdev->features |= BIT_ULL(fbit);
> +       vdev->features |= VIRTIO_BIT(fbit);
>  }
>
>  /**
> @@ -183,11 +183,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
>  {
>         /* Did you forget to fix assumptions on max features? */
>         if (__builtin_constant_p(fbit))
> -               BUILD_BUG_ON(fbit >= 64);
> +               BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>         else
> -               BUG_ON(fbit >= 64);
> +               BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>
> -       vdev->features &= ~BIT_ULL(fbit);
> +       vdev->features &= ~VIRTIO_BIT(fbit);
>  }
>
>  /**
> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> new file mode 100644
> index 0000000000000..2f742eeb45a29
> --- /dev/null
> +++ b/include/linux/virtio_features.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_FEATURES_H
> +#define _LINUX_VIRTIO_FEATURES_H
> +
> +#include <linux/bits.h>
> +
> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> +#define VIRTIO_HAS_EXTENDED_FEATURES
> +#define VIRTIO_FEATURES_MAX    128
> +#define VIRTIO_FEATURES_WORDS  4
> +#define VIRTIO_BIT(b)          _BIT128(b)
> +
> +typedef __uint128_t            virtio_features_t;

Consider:

1) need the trick for arch that doesn't support 128bit
2) some transport (e.g PCI) allows much more than just 128 bit features

 I wonder if it's better to just use arrays here.

Thanks

> +
> +#else
> +#define VIRTIO_FEATURES_MAX    64
> +#define VIRTIO_FEATURES_WORDS  2
> +#define VIRTIO_BIT(b)          BIT_ULL(b)
> +
> +typedef u64                    virtio_features_t;
> +#endif
> +
> +#endif
> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
> index c0b1b1ca11635..e55fbb272b4d3 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_PCI_MODERN_H
>
>  #include <linux/pci.h>
> +#include <linux/virtio_features.h>
>  #include <linux/virtio_pci.h>
>
>  /**
> @@ -95,10 +96,14 @@ static inline void vp_iowrite64_twopart(u64 val,
>         vp_iowrite32(val >> 32, hi);
>  }
>
> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
> +virtio_features_t
> +vp_modern_get_features(struct virtio_pci_modern_device *mdev);
> +
> +virtio_features_t
> +vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
> +
>  void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
> -                    u64 features);
> +                    virtio_features_t features);
>  u32 vp_modern_generation(struct virtio_pci_modern_device *mdev);
>  u8 vp_modern_get_status(struct virtio_pci_modern_device *mdev);
>  void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 3/8] vhost-net: allow configuring extended features
  2025-05-21 10:32 ` [PATCH net-next 3/8] vhost-net: allow " Paolo Abeni
@ 2025-05-26  0:47   ` Jason Wang
  2025-05-26 10:57     ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-26  0:47 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Use the extended feature type for 'acked_features' and implement
> two new ioctls operation to get and set the extended features.
>
> Note that the legacy ioctls implicitly truncate the negotiated
> features to the lower 64 bits range.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/vhost/net.c        | 26 +++++++++++++++++++++++++-
>  drivers/vhost/vhost.h      |  2 +-
>  include/uapi/linux/vhost.h |  8 ++++++++
>  3 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3f..b894685dded3e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -77,6 +77,10 @@ enum {
>                          (1ULL << VIRTIO_F_RING_RESET)
>  };
>
> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> +#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
> +#endif
> +
>  enum {
>         VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>  };
> @@ -1614,7 +1618,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>         return err;
>  }
>
> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>  {
>         size_t vhost_hlen, sock_hlen, hdr_len;
>         int i;
> @@ -1704,6 +1708,26 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>                 if (features & ~VHOST_NET_FEATURES)
>                         return -EOPNOTSUPP;
>                 return vhost_net_set_features(n, features);
> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES

Vhost doesn't depend on virtio. But this invents a dependency, and I
don't understand why we need to do that.

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-21 10:32 ` [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features Paolo Abeni
@ 2025-05-26  0:49   ` Jason Wang
  2025-05-26 10:53     ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-26  0:49 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
>
> Extend the virtio pci modern driver to support configuring the full
> virtio features range, replacing the unrolled loops reading and
> writing the features space with explicit one bounded to the actual
> features space size in word.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
>  1 file changed, 25 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 1d34655f6b658..e3025b6fa8540 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> -       virtio_features_t features;
> +       virtio_features_t features = 0;
> +       int i;
>
> -       vp_iowrite32(0, &cfg->device_feature_select);
> -       features = vp_ioread32(&cfg->device_feature);
> -       vp_iowrite32(1, &cfg->device_feature_select);
> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> +               virtio_features_t cur;
> +
> +               vp_iowrite32(i, &cfg->device_feature_select);
> +               cur = vp_ioread32(&cfg->device_feature);
> +               features |= cur << (32 * i);
> +       }

No matter if we decide to go with 128bit or not. I think at the lower
layer like this, it's time to allow arbitrary length of the features
as the spec supports.

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 4/8] virtio_net: add supports for extended offloads
  2025-05-21 10:32 ` [PATCH net-next 4/8] virtio_net: add supports for extended offloads Paolo Abeni
@ 2025-05-26  1:01   ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-05-26  1:01 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio_net driver needs it to implement GSO over UDP tunnel
> offload.
>
> The only missing piece is mapping them to/from the extended
> features.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/virtio_net.c | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e53ba600605a5..71a972f20f19b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -35,6 +35,29 @@ module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
>
> +#define VIRTIO_OFFLOAD_MAP_MIN 46
> +#define VIRTIO_OFFLOAD_MAP_MAX 49
> +#define VIRTIO_FEATURES_MAP_MIN        65
> +#define VIRTIO_O2F_DELTA       (VIRTIO_FEATURES_MAP_MIN - VIRTIO_OFFLOAD_MAP_MIN)

Instead of doing this, I wonder if it's simple to just have an array
for the mapping from all guest offload features to guest controllable
offload bits?

> +
> +static bool virtio_is_mapped_offload(unsigned int obit)
> +{
> +       return obit >= VIRTIO_OFFLOAD_MAP_MIN &&
> +              obit <= VIRTIO_OFFLOAD_MAP_MAX;
> +}
> +
> +#define VIRTIO_FEATURE_TO_OFFLOAD(fbit)        \
> +       ({                                                              \
> +               unsigned int __f = fbit;                                \
> +               __f >= VIRTIO_FEATURES_MAP_MIN ? __f - VIRTIO_O2F_DELTA : __f; \
> +       })
> +#define VIRTIO_OFFLOAD_TO_FEATURE(obit)        \
> +       ({                                                              \
> +               unsigned int __o = obit;                                \
> +               virtio_is_mapped_offload(__o) ? __o + VIRTIO_O2F_DELTA :\
> +                                               __o;                    \
> +       })
> +
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>  #define GOOD_COPY_LEN  128
> @@ -7037,9 +7060,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>                 netif_carrier_on(dev);
>         }
>
> -       for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
> -               if (virtio_has_feature(vi->vdev, guest_offloads[i]))
> +       for (i = 0; i < ARRAY_SIZE(guest_offloads); i++) {
> +               unsigned int fbit;
> +
> +               fbit = VIRTIO_OFFLOAD_TO_FEATURE(guest_offloads[i]);
> +               if (virtio_has_feature(vi->vdev, fbit))
>                         set_bit(guest_offloads[i], &vi->guest_offloads);
> +       }
>         vi->guest_offloads_capable = vi->guest_offloads;
>
>         rtnl_unlock();
> --
> 2.49.0
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-21 10:32 ` [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
  2025-05-22 22:29   ` Willem de Bruijn
@ 2025-05-26  4:40   ` Jason Wang
  2025-05-29 11:55     ` Paolo Abeni
  2025-05-29 15:30     ` Paolo Abeni
  1 sibling, 2 replies; 59+ messages in thread
From: Jason Wang @ 2025-05-26  4:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio specification are introducing support for GSO over
> UDP tunnel.
>
> This patch brings in the needed defines and the additional
> virtio hdr parsing/building helpers.
>
> The UDP tunnel support uses additional fields in the virtio hdr,
> and such fields location can change depending on other negotiated
> features - specifically VIRTIO_NET_F_HASH_REPORT.
>
> Try to be as conservative as possible with the new field validation.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/virtio_net.h      | 177 ++++++++++++++++++++++++++++++--
>  include/uapi/linux/virtio_net.h |  33 ++++++
>  2 files changed, 202 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 02a9f4dc594d0..cf9c712a67cd4 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
>         return 0;
>  }
>
> -static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> -                                       const struct virtio_net_hdr *hdr,
> -                                       bool little_endian)
> +static inline int __virtio_net_hdr_to_skb(struct sk_buff *skb,
> +                                         const struct virtio_net_hdr *hdr,
> +                                         bool little_endian, u8 hdr_gso_type)
>  {
>         unsigned int nh_min_len = sizeof(struct iphdr);
>         unsigned int gso_type = 0;
> @@ -57,8 +57,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>         unsigned int p_off = 0;
>         unsigned int ip_proto;
>
> -       if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> -               switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> +       if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +               switch (hdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
>                 case VIRTIO_NET_HDR_GSO_TCPV4:
>                         gso_type = SKB_GSO_TCPV4;
>                         ip_proto = IPPROTO_TCP;
> @@ -84,7 +84,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>                         return -EINVAL;
>                 }
>
> -               if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> +               if (hdr_gso_type & VIRTIO_NET_HDR_GSO_ECN)
>                         gso_type |= SKB_GSO_TCP_ECN;
>
>                 if (hdr->gso_size == 0)
> @@ -122,7 +122,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>
>                                 if (!protocol)
>                                         virtio_net_hdr_set_proto(skb, hdr);
> -                               else if (!virtio_net_hdr_match_proto(protocol, hdr->gso_type))
> +                               else if (!virtio_net_hdr_match_proto(protocol, hdr_gso_type))
>                                         return -EINVAL;
>                                 else
>                                         skb->protocol = protocol;
> @@ -153,7 +153,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>                 }
>         }
>
> -       if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +       if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
>                 u16 gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
>                 unsigned int nh_off = p_off;
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
> @@ -199,6 +199,13 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>         return 0;
>  }
>
> +static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> +                                       const struct virtio_net_hdr *hdr,
> +                                       bool little_endian)
> +{
> +       return __virtio_net_hdr_to_skb(skb, hdr, little_endian, hdr->gso_type);
> +}
> +
>  static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>                                           struct virtio_net_hdr *hdr,
>                                           bool little_endian,
> @@ -242,4 +249,158 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>         return 0;
>  }
>
> +static inline unsigned int virtio_l3min(bool is_ipv6)
> +{
> +       return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
> +}
> +
> +static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
> +                                           const struct virtio_net_hdr *hdr,
> +                                           unsigned int tnl_hdr_offset,
> +                                           bool tnl_csum_negotiated,
> +                                           bool little_endian)

Considering tunnel gso requires VERSION_1, I think there's no chance
for little_endian to be false here.

> +{
> +       u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
> +       unsigned int inner_nh, outer_th, inner_th;
> +       unsigned int inner_l3min, outer_l3min;
> +       struct virtio_net_hdr_tunnel *tnl;
> +       u8 gso_inner_type;
> +       bool outer_isv6;
> +       int ret;
> +
> +       if (!gso_tunnel_type)
> +               return virtio_net_hdr_to_skb(skb, hdr, little_endian);
> +
> +       /* Tunnel not supported/negotiated, but the hdr asks for it. */
> +       if (!tnl_hdr_offset)
> +               return -EINVAL;
> +
> +       /* Either ipv4 or ipv6. */
> +       if (gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 &&
> +           gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
> +               return -EINVAL;

Could be simplified with gso_tunnel_type == VIRTIO_NET_HDR_GSO_UDP_TUNNEL ?

> +
> +       /* No UDP fragments over UDP tunnel. */
> +       gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
> +                                          gso_tunnel_type);

VIRTIO_NET_HDR_GSO_UDP_TUNNEL seems to be better than gso_tunnel_type here.

> +       if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
> +               return -EINVAL;
> +
> +       /* Relay on csum being present. */
> +       if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
> +               return -EINVAL;
> +
> +       /* Validate offsets. */
> +       outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
> +       inner_l3min = virtio_l3min(gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6);
> +       outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
> +
> +       tnl = ((void *)hdr) + tnl_hdr_offset;
> +       inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
> +       inner_nh = __virtio16_to_cpu(little_endian, tnl->inner_nh_offset);
> +       outer_th = __virtio16_to_cpu(little_endian, tnl->outer_th_offset);
> +       if (outer_th < outer_l3min ||
> +           inner_nh < outer_th + sizeof(struct udphdr) ||
> +           inner_th < inner_nh + inner_l3min)
> +               return -EINVAL;

I wonder if kernel has already had helpers to validate the tunnel
headers or if the above check is sufficient here.

> +
> +       /* Let the basic parsing deal with plain GSO features. */
> +       ret = __virtio_net_hdr_to_skb(skb, hdr, little_endian,
> +                                     hdr->gso_type & ~gso_tunnel_type);
> +       if (ret)
> +               return ret;
> +
> +       skb_set_inner_protocol(skb, outer_isv6 ? htons(ETH_P_IPV6) :
> +                                                htons(ETH_P_IP));

The outer_isv6 is somehow misleading here, I think we'd better rename
it as inner_isv6?

> +       if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
> +               if (!tnl_csum_negotiated)
> +                       return -EINVAL;
> +
> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
> +       } else {
> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
> +       }
> +
> +       skb->inner_transport_header = inner_th + skb_headroom(skb);

I may miss something but using skb_headroom() means the value depends
on the geometry of the skb and the headroom might vary depending on
the size of the packet and other factors.  (see receive_buf())

> +       skb->inner_network_header = inner_nh + skb_headroom(skb);
> +       skb->inner_mac_header = inner_nh + skb_headroom(skb);

This actually equals to inner_network_header, is this intended?

> +       skb->transport_header = outer_th + skb_headroom(skb);
> +       skb->encapsulation = 1;
> +       return 0;
> +}
> +
> +static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
> +                                           struct virtio_net_hdr *hdr,
> +                                           bool tun_csum_negotiated)

This is virtio_net.h so it's better to avoid using "tun". Btw, I
wonder why this needs to be called by the virtio-net instead of being
called by hdr_to_skb helpers.

> +{
> +       if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
> +                       return 0;
> +
> +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
> +                       return 0;
> +
> +               /* tunnel csum packets are invalid when the related
> +                * feature has not been negotiated
> +                */
> +               if (!tun_csum_negotiated)
> +                       return -EINVAL;

Should we move this check above VIRTIO_NET_HDR_F_DATA_VALID check?

> +               skb->csum_level = 1;
> +               return 0;
> +       }
> +
> +       /* DATA_VALID is mutually exclusive with NEEDS_CSUM,

I may miss something but I think we had a discussion about this, and
the conclusion is it's too late to fix as it may break some legacy
devices?

> and GSO
> +        * over UDP tunnel requires the latter
> +        */
> +       if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
> +               return -EINVAL;
> +       return 0;
> +}
> +
> +static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
> +                                             struct virtio_net_hdr *hdr,
> +                                             unsigned int tnl_offset,
> +                                             bool little_endian,
> +                                             int vlan_hlen)
> +{
> +       struct virtio_net_hdr_tunnel *tnl;
> +       unsigned int inner_nh, outer_th;
> +       int tnl_gso_type;
> +       int ret;
> +
> +       tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
> +                                                   SKB_GSO_UDP_TUNNEL_CSUM);
> +       if (!tnl_gso_type)
> +               return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
> +                                              vlan_hlen);
> +
> +       /* Tunnel support not negotiated but skb ask for it. */
> +       if (!tnl_offset)
> +               return -EINVAL;

Should we do BUG_ON here?

> +
> +       /* Let the basic parsing deal with plain GSO features. */
> +       skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
> +       ret = virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
> +                                     vlan_hlen);
> +       skb_shinfo(skb)->gso_type |= tnl_gso_type;
> +       if (ret)
> +               return ret;

Could we do the plain GSO after setting inner flags below to avoid
masking and unmasking tnl_gso_type?

> +
> +       if (skb->protocol == htons(ETH_P_IPV6))
> +               hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
> +       else
> +               hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4;
> +
> +       if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
> +               hdr->flags |= VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM;
> +
> +       tnl = ((void *)hdr) + tnl_offset;
> +       inner_nh = skb->inner_network_header - skb_headroom(skb);
> +       outer_th = skb->transport_header - skb_headroom(skb);
> +       tnl->inner_nh_offset =  __cpu_to_virtio16(little_endian, inner_nh);
> +       tnl->outer_th_offset =  __cpu_to_virtio16(little_endian, outer_th);

little_endian should be true here as it depends on version 1.

> +       return 0;
> +}
> +
>  #endif /* _LINUX_VIRTIO_NET_H */
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 963540deae66a..1f1ff88a5749f 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -70,6 +70,28 @@
>                                          * with the same MAC.
>                                          */
>  #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
> +                                             * GSO-over-UDP-tunnel packets
> +                                             */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
> +                                                  * GSO-over-UDP-tunnel
> +                                                  * packets with partial csum
> +                                                  * for the outer header
> +                                                  */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
> +                                            * GSO-over-UDP-tunnel packets
> +                                            */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
> +                                                 * GSO-over-UDP-tunnel
> +                                                 * packets with partial csum
> +                                                 * for the outer header
> +                                                 */
> +
> +/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
> + * features
> + */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED       46
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED  47
>
>  #ifndef VIRTIO_NET_NO_LEGACY
>  #define VIRTIO_NET_F_GSO       6       /* Host handles pkts w/ any GSO type */
> @@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1       /* Use csum_start, csum_offset */
>  #define VIRTIO_NET_HDR_F_DATA_VALID    2       /* Csum is valid */
>  #define VIRTIO_NET_HDR_F_RSC_INFO      4       /* rsc info in csum_ fields */
> +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8     /* UDP tunnel requires csum offload */
>         __u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE                0       /* Not a GSO frame */
>  #define VIRTIO_NET_HDR_GSO_TCPV4       1       /* GSO frame, IPv4 TCP (TSO) */
>  #define VIRTIO_NET_HDR_GSO_UDP         3       /* GSO frame, IPv4 UDP (UFO) */
>  #define VIRTIO_NET_HDR_GSO_TCPV6       4       /* GSO frame, IPv6 TCP */
>  #define VIRTIO_NET_HDR_GSO_UDP_L4      5       /* GSO frame, IPv4& IPv6 UDP (USO) */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
> +                                      VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
>  #define VIRTIO_NET_HDR_GSO_ECN         0x80    /* TCP has ECN set */
>         __u8 gso_type;
>         __virtio16 hdr_len;     /* Ethernet + IP + tcp/udp hdrs */
> @@ -181,6 +208,12 @@ struct virtio_net_hdr_v1_hash {
>         __le16 padding;
>  };
>
> +/* This header after hashing information */
> +struct virtio_net_hdr_tunnel {
> +       __virtio16 outer_th_offset;
> +       __virtio16 inner_nh_offset;
> +};
> +
>  #ifndef VIRTIO_NET_NO_LEGACY
>  /* This header comes first in the scatter-gather list.
>   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
> --
> 2.49.0
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 7/8] tun: enable gso over UDP tunnel support.
  2025-05-21 10:32 ` [PATCH net-next 7/8] tun: " Paolo Abeni
@ 2025-05-26  4:40   ` Jason Wang
  2025-05-26 11:20     ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-26  4:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Add new tun features to represent the newly introduced virtio
> GSO over UDP tunnel offload. Allows detection and selection of
> such features via the existing TUNSETOFFLOAD ioctl, store the
> tunnel offload configuration in the highest bit of the tun flags
> and compute the expected virtio header size and tunnel header
> offset using such bits, so that we can plug almost seamless the
> the newly introduced virtio helpers to serialize the extended
> virtio header.
>
> As the tun features and the virtio hdr size are configured
> separately, the data path need to cope with (hopefully transient)
> inconsistent values.

I'm not sure it's a good idea to deal with this inconsistency in this
series as it is not specific to tunnel offloading. It could be a
dependency for this patch or we can leave it for the future and just
to make sure mis-configuration won't cause any kernel issues.

>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/tun.c           | 77 ++++++++++++++++++++++++++++++++-----
>  drivers/net/tun_vnet.h      | 74 ++++++++++++++++++++++++++++-------
>  include/uapi/linux/if_tun.h |  9 +++++
>  3 files changed, 137 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 7babd1e9a378b..ef8cef48b66f5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -186,7 +186,8 @@ struct tun_struct {
>         struct net_device       *dev;
>         netdev_features_t       set_features;
>  #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> -                         NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
> +                         NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \
> +                         NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM)
>
>         int                     align;
>         int                     vnet_hdr_sz;
> @@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev)
>         dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>                            TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
>                            NETIF_F_HW_VLAN_STAG_TX;
> +       dev->hw_enc_features = dev->hw_features;
>         dev->features = dev->hw_features;
>         dev->vlan_features = dev->features &
>                              ~(NETIF_F_HW_VLAN_CTAG_TX |
> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>         struct sk_buff *skb;
>         size_t total_len = iov_iter_count(from);
>         size_t len = total_len, align = tun->align, linear;
> -       struct virtio_net_hdr gso = { 0 };
> +       char buf[TUN_VNET_TNL_SIZE];

I wonder why not simply

1) define the structure virtio_net_hdr_tnl_gso and use that

or

2) stick the gso here and use iter advance to get
virtio_net_hdr_tunnel when necessary?

> +       struct virtio_net_hdr *gso;
>         int good_linear;
>         int copylen;
>         int hdr_len = 0;
> @@ -1708,6 +1711,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>         int skb_xdp = 1;
>         bool frags = tun_napi_frags_enabled(tfile);
>         enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
> +       unsigned int flags = tun->flags & ~TUN_VNET_TNL_MASK;
> +
> +       /*
> +        * Keep it easy and always zero the whole buffer, even if the
> +        * tunnel-related field will be touched only when the feature
> +        * is enabled and the hdr size id compatible.
> +        */
> +       memset(buf, 0, sizeof(buf));
> +       gso = (void *)buf;
>
>         if (!(tun->flags & IFF_NO_PI)) {
>                 if (len < sizeof(pi))
> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>
>         if (tun->flags & IFF_VNET_HDR) {
>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
> +               int parsed_size;
>
> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
> +                       parsed_size = vnet_hdr_sz;
> +               } else {
> +                       parsed_size = TUN_VNET_TNL_SIZE;
> +                       flags |= TUN_VNET_TNL_MASK;
> +               }
> +               hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, parsed_size,
> +                                            flags, from, gso);
>                 if (hdr_len < 0)
>                         return hdr_len;
>
> @@ -1755,7 +1775,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>                  * (e.g gso or jumbo packet), we will do it at after
>                  * skb was created with generic XDP routine.
>                  */
> -               skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp);
> +               skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp);
>                 err = PTR_ERR_OR_ZERO(skb);
>                 if (err)
>                         goto drop;
> @@ -1799,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>                 }
>         }
>
> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) {
> +       if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
>                 atomic_long_inc(&tun->rx_frame_errors);
>                 err = -EINVAL;
>                 goto free_skb;
> @@ -2050,13 +2070,26 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>         }
>
>         if (vnet_hdr_sz) {
> -               struct virtio_net_hdr gso;
> +               char buf[TUN_VNET_TNL_SIZE];
> +               struct virtio_net_hdr *gso;
> +               int flags = tun->flags;
> +               int parsed_size;
> +
> +               gso = (void *)buf;
> +               parsed_size = tun_vnet_parse_size(tun->flags);
> +               if (unlikely(vnet_hdr_sz < parsed_size)) {
> +                       /* Inconsistent hdr size and (tunnel) offloads:
> +                        * strips the latter
> +                        */
> +                       flags &= ~TUN_VNET_TNL_MASK;
> +                       parsed_size = sizeof(struct virtio_net_hdr);
> +               };
>
> -               ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
> +               ret = tun_vnet_hdr_from_skb(flags, tun->dev, skb, gso);
>                 if (ret)
>                         return ret;
>
> -               ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
> +               ret = __tun_vnet_hdr_put(vnet_hdr_sz, parsed_size, iter, gso);
>                 if (ret)
>                         return ret;
>         }
> @@ -2366,6 +2399,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>         int metasize = 0;
>         int ret = 0;
>         bool skb_xdp = false;
> +       unsigned int flags;
>         struct page *page;
>
>         if (unlikely(datasize < ETH_HLEN))
> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
>         if (metasize > 0)
>                 skb_metadata_set(skb, metasize);
>
> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
> +       /* Assume tun offloads are enabled if the provided hdr is large
> +        * enough.
> +        */
> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
> +               flags = tun->flags | TUN_VNET_TNL_MASK;
> +       else
> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;
> +
> +       if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
>                 atomic_long_inc(&tun->rx_frame_errors);
>                 kfree_skb(skb);
>                 ret = -EINVAL;
> @@ -2812,6 +2855,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
>
>  }
>
> +#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6)
> +
>  /* This is like a cut-down ethtool ops, except done via tun fd so no
>   * privs required. */
>  static int set_offload(struct tun_struct *tun, unsigned long arg)
> @@ -2841,6 +2886,17 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>                         features |= NETIF_F_GSO_UDP_L4;
>                         arg &= ~(TUN_F_USO4 | TUN_F_USO6);
>                 }
> +
> +               /* Tunnel offload is allowed only if some plain offload is
> +                * available, too.
> +                */
> +               if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) {
> +                       features |= NETIF_F_GSO_UDP_TUNNEL;
> +                       if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM)
> +                               features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
> +                       arg &= ~(TUN_F_UDP_TUNNEL_GSO |
> +                                TUN_F_UDP_TUNNEL_GSO_CSUM);
> +               }
>         }
>
>         /* This gives the user a way to test for new features in future by
> @@ -2852,7 +2908,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>         tun->dev->wanted_features &= ~TUN_USER_FEATURES;
>         tun->dev->wanted_features |= features;
>         netdev_update_features(tun->dev);
> -
> +       tun_set_vnet_tnl(&tun->flags, !!(features & NETIF_F_GSO_UDP_TUNNEL),
> +                        !!(features & NETIF_F_GSO_UDP_TUNNEL_CSUM));
>         return 0;
>  }
>
> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
> index 58b9ac7a5fc40..ab2d4396941ca 100644
> --- a/drivers/net/tun_vnet.h
> +++ b/drivers/net/tun_vnet.h
> @@ -5,6 +5,12 @@
>  /* High bits in flags field are unused. */
>  #define TUN_VNET_LE     0x80000000
>  #define TUN_VNET_BE     0x40000000
> +#define TUN_VNET_TNL           0x20000000
> +#define TUN_VNET_TNL_CSUM      0x10000000
> +#define TUN_VNET_TNL_MASK      (TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
> +
> +#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \

Should this be virtio_net_hdr_v1_hash?

> +                          sizeof(struct virtio_net_hdr_tunnel))
>
>  static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
>  {
> @@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
>         return 0;
>  }
>
> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
> +{
> +       *flags = (*flags & ~TUN_VNET_TNL_MASK) |
> +                tnl * TUN_VNET_TNL |
> +                tnl_csum * TUN_VNET_TNL_CSUM;

We could refer to netdev via tun_struct, so I don't understand why we
need to duplicate the features in tun->flags (we don't do that for
other GSO/CSUM stuffs).

> +}
> +
>  static inline bool tun_vnet_is_little_endian(unsigned int flags)
>  {
>         return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags);
> @@ -107,16 +120,33 @@ static inline long tun_vnet_ioctl(int *vnet_hdr_sz, unsigned int *flags,
>         }
>  }
>
> -static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
> -                                  struct iov_iter *from,
> -                                  struct virtio_net_hdr *hdr)
> +static inline unsigned int tun_vnet_parse_size(unsigned int flags)
> +{
> +       if (!(flags & TUN_VNET_TNL))
> +               return sizeof(struct virtio_net_hdr);
> +
> +       return TUN_VNET_TNL_SIZE;
> +}
> +
> +static inline unsigned int tun_vnet_tnl_offset(unsigned int flags)
> +{
> +       if (!(flags & TUN_VNET_TNL))
> +               return 0;
> +
> +       return sizeof(struct virtio_net_hdr_v1);
> +}
> +
> +static inline int __tun_vnet_hdr_get(int sz, int parsed_size,
> +                                    unsigned int flags,
> +                                    struct iov_iter *from,
> +                                    struct virtio_net_hdr *hdr)
>  {
>         u16 hdr_len;
>
>         if (iov_iter_count(from) < sz)
>                 return -EINVAL;
>
> -       if (!copy_from_iter_full(hdr, sizeof(*hdr), from))
> +       if (!copy_from_iter_full(hdr, parsed_size, from))
>                 return -EFAULT;
>
>         hdr_len = tun_vnet16_to_cpu(flags, hdr->hdr_len);
> @@ -129,30 +159,47 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
>         if (hdr_len > iov_iter_count(from))
>                 return -EINVAL;
>
> -       iov_iter_advance(from, sz - sizeof(*hdr));
> +       iov_iter_advance(from, sz - parsed_size);
>
>         return hdr_len;
>  }
>
> -static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
> -                                  const struct virtio_net_hdr *hdr)
> +static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
> +                                  struct iov_iter *from,
> +                                  struct virtio_net_hdr *hdr)
> +{
> +       return __tun_vnet_hdr_get(sz, sizeof(*hdr), flags, from, hdr);
> +}
> +
> +static inline int __tun_vnet_hdr_put(int sz, int parsed_size,
> +                                    struct iov_iter *iter,
> +                                    const struct virtio_net_hdr *hdr)
>  {
>         if (unlikely(iov_iter_count(iter) < sz))
>                 return -EINVAL;
>
> -       if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
> +       if (unlikely(copy_to_iter(hdr, parsed_size, iter) != parsed_size))
>                 return -EFAULT;
>
> -       if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr))
> +       if (iov_iter_zero(sz - parsed_size, iter) != sz - parsed_size)
>                 return -EFAULT;
>
>         return 0;
>  }
>
> +static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
> +                                  const struct virtio_net_hdr *hdr)
> +{
> +       return __tun_vnet_hdr_put(sz, sizeof(*hdr), iter, hdr);
> +}
> +
>  static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
>                                       const struct virtio_net_hdr *hdr)
>  {
> -       return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags));
> +       return virtio_net_hdr_tnl_to_skb(skb, hdr,
> +                                        tun_vnet_tnl_offset(flags),
> +                                        !!(flags & TUN_VNET_TNL_CSUM),
> +                                        tun_vnet_is_little_endian(flags));
>  }
>
>  static inline int tun_vnet_hdr_from_skb(unsigned int flags,
> @@ -161,10 +208,11 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
>                                         struct virtio_net_hdr *hdr)
>  {
>         int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
> +       int tnl_offset = tun_vnet_tnl_offset(flags);
>
> -       if (virtio_net_hdr_from_skb(skb, hdr,
> -                                   tun_vnet_is_little_endian(flags), true,
> -                                   vlan_hlen)) {
> +       if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
> +                                       tun_vnet_is_little_endian(flags),
> +                                       vlan_hlen)) {
>                 struct skb_shared_info *sinfo = skb_shinfo(skb);
>
>                 if (net_ratelimit()) {
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index 287cdc81c9390..a25a5e7a08ffa 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -93,6 +93,15 @@
>  #define TUN_F_USO4     0x20    /* I can handle USO for IPv4 packets */
>  #define TUN_F_USO6     0x40    /* I can handle USO for IPv6 packets */
>
> +#define TUN_F_UDP_TUNNEL_GSO           0x080 /* I can handle TSO/USO for UDP
> +                                              * tunneled packets
> +                                              */
> +#define TUN_F_UDP_TUNNEL_GSO_CSUM      0x100 /* I can handle TSO/USO for UDP
> +                                              * tunneled packets requiring
> +                                              * csum offload for the outer
> +                                              * header
> +                                              */
> +
>  /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
>  #define TUN_PKT_STRIP  0x0001
>  struct tun_pi {
> --
> 2.49.0
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 8/8] vhost/net: enable gso over UDP tunnel support.
  2025-05-21 10:32 ` [PATCH net-next 8/8] vhost/net: " Paolo Abeni
  2025-05-22  6:43   ` kernel test robot
@ 2025-05-26  4:40   ` Jason Wang
  1 sibling, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-05-26  4:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Vhost net need to know the exact virtio net hdr size to be able
> to copy such header correctly. Teach it about the newly defined
> UDP tunnel-related option and update the hdr size computation
> accordingly.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/vhost/net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index b894685dded3e..985f9662a9003 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -78,7 +78,9 @@ enum {
>  };
>
>  #ifdef VIRTIO_HAS_EXTENDED_FEATURES
> -#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
> +#define VHOST_NET_FEATURES_EX (VHOST_NET_FEATURES | \
> +                       (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO)) | \
> +                       (VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)))
>  #endif
>
>  enum {
> @@ -1621,12 +1623,16 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>  static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>  {
>         size_t vhost_hlen, sock_hlen, hdr_len;
> +       bool has_tunnel;
>         int i;
>
>         hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
>                                (1ULL << VIRTIO_F_VERSION_1))) ?
>                         sizeof(struct virtio_net_hdr_mrg_rxbuf) :
>                         sizeof(struct virtio_net_hdr);
> +       has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
> +                                   VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
> +       hdr_len += has_tunnel ? sizeof(struct virtio_net_hdr_tunnel) : 0;

Same as patch 7, this seems to ignore the hash report fields.

Thanks


>         if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
>                 /* vhost provides vnet_hdr */
>                 vhost_hlen = hdr_len;
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-26  0:43   ` Jason Wang
@ 2025-05-26  7:20     ` Paolo Abeni
  2025-05-27  3:51       ` Jason Wang
  2025-05-27 14:14       ` Michael S. Tsirkin
  0 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-26  7:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/26/25 2:43 AM, Jason Wang wrote:
> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
>> new file mode 100644
>> index 0000000000000..2f742eeb45a29
>> --- /dev/null
>> +++ b/include/linux/virtio_features.h
>> @@ -0,0 +1,23 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_VIRTIO_FEATURES_H
>> +#define _LINUX_VIRTIO_FEATURES_H
>> +
>> +#include <linux/bits.h>
>> +
>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
>> +#define VIRTIO_HAS_EXTENDED_FEATURES
>> +#define VIRTIO_FEATURES_MAX    128
>> +#define VIRTIO_FEATURES_WORDS  4
>> +#define VIRTIO_BIT(b)          _BIT128(b)
>> +
>> +typedef __uint128_t            virtio_features_t;
> 
> Consider:
> 
> 1) need the trick for arch that doesn't support 128bit
> 2) some transport (e.g PCI) allows much more than just 128 bit features
> 
>  I wonder if it's better to just use arrays here.

I considered that, it has been discussed both on the virtio ML and
privatelly, and I tried a resonable attempt with such implementation.

The diffstat would be horrible, touching a lot of the virtio/vhost code.
Such approach will block any progress for a long time (more likely
forever, since I will not have the capacity to complete it).

Also the benefit are AFAICS marginal, as 32 bits platform with huge
virtualization deployments on top of it (that could benefit from GSO
over UDP tunnel) are IMHO unlikely, and transport features space
exhaustion is AFAIK far from being reached (also thanks to reserved
features availables).

TL;DR: if you consider a generic implementation for an arbitrary wide
features space blocking, please LMK, because any other consideration
would be likely irrelevant otherwise.

/P

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-26  0:49   ` Jason Wang
@ 2025-05-26 10:53     ` Paolo Abeni
  2025-05-27  3:04       ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-26 10:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/26/25 2:49 AM, Jason Wang wrote:
> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> The virtio specifications allows for up to 128 bits for the
>> device features. Soon we are going to use some of the 'extended'
>> bits features (above 64) for the virtio_net driver.
>>
>> Extend the virtio pci modern driver to support configuring the full
>> virtio features range, replacing the unrolled loops reading and
>> writing the features space with explicit one bounded to the actual
>> features space size in word.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
>>  1 file changed, 25 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
>> index 1d34655f6b658..e3025b6fa8540 100644
>> --- a/drivers/virtio/virtio_pci_modern_dev.c
>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>>  {
>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>> -       virtio_features_t features;
>> +       virtio_features_t features = 0;
>> +       int i;
>>
>> -       vp_iowrite32(0, &cfg->device_feature_select);
>> -       features = vp_ioread32(&cfg->device_feature);
>> -       vp_iowrite32(1, &cfg->device_feature_select);
>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
>> +               virtio_features_t cur;
>> +
>> +               vp_iowrite32(i, &cfg->device_feature_select);
>> +               cur = vp_ioread32(&cfg->device_feature);
>> +               features |= cur << (32 * i);
>> +       }
> 
> No matter if we decide to go with 128bit or not. I think at the lower
> layer like this, it's time to allow arbitrary length of the features
> as the spec supports.

Is that useful if the vhost interface is not going to support it?

Note that the above code is independent from the feature-space. Defining
larger value of VIRTIO_FEATURES_WORDS it will deal with larger number of
features.

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 3/8] vhost-net: allow configuring extended features
  2025-05-26  0:47   ` Jason Wang
@ 2025-05-26 10:57     ` Paolo Abeni
  2025-05-27  3:56       ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-26 10:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/26/25 2:47 AM, Jason Wang wrote:
> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> Use the extended feature type for 'acked_features' and implement
>> two new ioctls operation to get and set the extended features.
>>
>> Note that the legacy ioctls implicitly truncate the negotiated
>> features to the lower 64 bits range.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>  drivers/vhost/net.c        | 26 +++++++++++++++++++++++++-
>>  drivers/vhost/vhost.h      |  2 +-
>>  include/uapi/linux/vhost.h |  8 ++++++++
>>  3 files changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 7cbfc7d718b3f..b894685dded3e 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -77,6 +77,10 @@ enum {
>>                          (1ULL << VIRTIO_F_RING_RESET)
>>  };
>>
>> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
>> +#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
>> +#endif
>> +
>>  enum {
>>         VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>>  };
>> @@ -1614,7 +1618,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>>         return err;
>>  }
>>
>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>>  {
>>         size_t vhost_hlen, sock_hlen, hdr_len;
>>         int i;
>> @@ -1704,6 +1708,26 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>                 if (features & ~VHOST_NET_FEATURES)
>>                         return -EOPNOTSUPP;
>>                 return vhost_net_set_features(n, features);
>> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> 
> Vhost doesn't depend on virtio. But this invents a dependency, and I
> don't understand why we need to do that.

What do you mean with "dependency" here? vhost has already a build
dependency vs virtio, including several virtio headers. It has also a
logical dependency, using several virtio features.

Do you mean a build dependency? this change does not introduce such a thing.

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 7/8] tun: enable gso over UDP tunnel support.
  2025-05-26  4:40   ` Jason Wang
@ 2025-05-26 11:20     ` Paolo Abeni
  2025-05-27  4:19       ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-26 11:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/26/25 6:40 AM, Jason Wang wrote:
> On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> Add new tun features to represent the newly introduced virtio
>> GSO over UDP tunnel offload. Allows detection and selection of
>> such features via the existing TUNSETOFFLOAD ioctl, store the
>> tunnel offload configuration in the highest bit of the tun flags
>> and compute the expected virtio header size and tunnel header
>> offset using such bits, so that we can plug almost seamless the
>> the newly introduced virtio helpers to serialize the extended
>> virtio header.
>>
>> As the tun features and the virtio hdr size are configured
>> separately, the data path need to cope with (hopefully transient)
>> inconsistent values.
> 
> I'm not sure it's a good idea to deal with this inconsistency in this
> series as it is not specific to tunnel offloading. It could be a
> dependency for this patch or we can leave it for the future and just
> to make sure mis-configuration won't cause any kernel issues.

The possible inconsistency is not due to a misconfiguration, but to the
facts that:
- configuring the virtio hdr len and the offload is not atomic
- successful GSO over udp tunnel parsing requires the relevant offloads
to be enabled and a suitable hdr len.

Plain GSO don't have a similar problem because all the relevant fields
are always available for any sane virtio hdr length, but we need to deal
with them here.

>> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>         struct sk_buff *skb;
>>         size_t total_len = iov_iter_count(from);
>>         size_t len = total_len, align = tun->align, linear;
>> -       struct virtio_net_hdr gso = { 0 };
>> +       char buf[TUN_VNET_TNL_SIZE];
> 
> I wonder why not simply
> 
> 1) define the structure virtio_net_hdr_tnl_gso and use that
> 
> or
> 
> 2) stick the gso here and use iter advance to get
> virtio_net_hdr_tunnel when necessary?

Code wise 2) looks more complex and 1) will require additional care when
adding hash report support.

>> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
>> index 58b9ac7a5fc40..ab2d4396941ca 100644
>> --- a/drivers/net/tun_vnet.h
>> +++ b/drivers/net/tun_vnet.h
>> @@ -5,6 +5,12 @@
>>  /* High bits in flags field are unused. */
>>  #define TUN_VNET_LE     0x80000000
>>  #define TUN_VNET_BE     0x40000000
>> +#define TUN_VNET_TNL           0x20000000
>> +#define TUN_VNET_TNL_CSUM      0x10000000
>> +#define TUN_VNET_TNL_MASK      (TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
>> +
>> +#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
> 
> Should this be virtio_net_hdr_v1_hash?

If tun does not support HASH_REPORT, no: the GSO over UDP tunnels header
could be present regardless of the hash-related field presence. This has
been discussed extensively while crafting the specification.

Note that tun_vnet_parse_size() and  tun_vnet_tnl_offset() should be
adjusted accordingly after that HASH_REPORT support is introduced.

>> +                          sizeof(struct virtio_net_hdr_tunnel))
>>
>>  static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
>>  {
>> @@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
>>         return 0;
>>  }
>>
>> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
>> +{
>> +       *flags = (*flags & ~TUN_VNET_TNL_MASK) |
>> +                tnl * TUN_VNET_TNL |
>> +                tnl_csum * TUN_VNET_TNL_CSUM;
> 
> We could refer to netdev via tun_struct, so I don't understand why we
> need to duplicate the features in tun->flags (we don't do that for
> other GSO/CSUM stuffs).

Just to be consistent with commit 60df67b94804b1adca74854db502a72f7aeaa125

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-26 10:53     ` Paolo Abeni
@ 2025-05-27  3:04       ` Jason Wang
  2025-05-28 16:02         ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-27  3:04 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/26/25 2:49 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>
> >> The virtio specifications allows for up to 128 bits for the
> >> device features. Soon we are going to use some of the 'extended'
> >> bits features (above 64) for the virtio_net driver.
> >>
> >> Extend the virtio pci modern driver to support configuring the full
> >> virtio features range, replacing the unrolled loops reading and
> >> writing the features space with explicit one bounded to the actual
> >> features space size in word.
> >>
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >> ---
> >>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
> >>  1 file changed, 25 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> >> index 1d34655f6b658..e3025b6fa8540 100644
> >> --- a/drivers/virtio/virtio_pci_modern_dev.c
> >> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> >> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
> >>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> >>  {
> >>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >> -       virtio_features_t features;
> >> +       virtio_features_t features = 0;
> >> +       int i;
> >>
> >> -       vp_iowrite32(0, &cfg->device_feature_select);
> >> -       features = vp_ioread32(&cfg->device_feature);
> >> -       vp_iowrite32(1, &cfg->device_feature_select);
> >> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> >> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> >> +               virtio_features_t cur;
> >> +
> >> +               vp_iowrite32(i, &cfg->device_feature_select);
> >> +               cur = vp_ioread32(&cfg->device_feature);
> >> +               features |= cur << (32 * i);
> >> +       }
> >
> > No matter if we decide to go with 128bit or not. I think at the lower
> > layer like this, it's time to allow arbitrary length of the features
> > as the spec supports.
>
> Is that useful if the vhost interface is not going to support it?

I think so, as there are hardware virtio devices that can benefit from this.

>
> Note that the above code is independent from the feature-space. Defining
> larger value of VIRTIO_FEATURES_WORDS it will deal with larger number of
> features.
>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-26  7:20     ` Paolo Abeni
@ 2025-05-27  3:51       ` Jason Wang
  2025-05-28 15:47         ` Paolo Abeni
  2025-05-27 14:14       ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-27  3:51 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Mon, May 26, 2025 at 3:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/26/25 2:43 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> >> new file mode 100644
> >> index 0000000000000..2f742eeb45a29
> >> --- /dev/null
> >> +++ b/include/linux/virtio_features.h
> >> @@ -0,0 +1,23 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _LINUX_VIRTIO_FEATURES_H
> >> +#define _LINUX_VIRTIO_FEATURES_H
> >> +
> >> +#include <linux/bits.h>
> >> +
> >> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> >> +#define VIRTIO_HAS_EXTENDED_FEATURES
> >> +#define VIRTIO_FEATURES_MAX    128
> >> +#define VIRTIO_FEATURES_WORDS  4
> >> +#define VIRTIO_BIT(b)          _BIT128(b)
> >> +
> >> +typedef __uint128_t            virtio_features_t;
> >
> > Consider:
> >
> > 1) need the trick for arch that doesn't support 128bit
> > 2) some transport (e.g PCI) allows much more than just 128 bit features
> >
> >  I wonder if it's better to just use arrays here.
>
> I considered that, it has been discussed both on the virtio ML and
> privatelly, and I tried a resonable attempt with such implementation.
>
> The diffstat would be horrible, touching a lot of the virtio/vhost code.

Let's start with the driver. For example, driver had already used
array for features:

        const unsigned int *feature_table;
        unsigned int feature_table_size;

For vhost, we need new ioctls anyhow:

/* Features bitmask for forward compatibility.  Transport bits are used for
 * vhost specific features. */
#define VHOST_GET_FEATURES      _IOR(VHOST_VIRTIO, 0x00, __u64)
#define VHOST_SET_FEATURES      _IOW(VHOST_VIRTIO, 0x00, __u64)

As we can't change uAPI for existing ioctls.

> Such approach will block any progress for a long time (more likely
> forever, since I will not have the capacity to complete it).
>

Well, could we at least start from using u64[2] for virtio_features_t?

> Also the benefit are AFAICS marginal, as 32 bits platform with huge
> virtualization deployments on top of it (that could benefit from GSO
> over UDP tunnel) are IMHO unlikely,

I think it's better to not have those architecture specific assumptions since:

1) need to prove the assumption is correct or
2) we may also create blockers for 64 bit archs that don't support
ARCH_SUPPORTS_INT128.

> and transport features space
> exhaustion is AFAIK far from being reached (also thanks to reserved
> features availables).

I wouldn't be worried if a straightforward switch to int128 worked,
but it looks like that is not the case:

1) ARCH_SUPPORTS_INT128 dependency
2) new uAPI
3) we might want a new virtio config ops as well as most of transport
can only return 64 bit now

>
> TL;DR: if you consider a generic implementation for an arbitrary wide
> features space blocking, please LMK, because any other consideration
> would be likely irrelevant otherwise.
>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 3/8] vhost-net: allow configuring extended features
  2025-05-26 10:57     ` Paolo Abeni
@ 2025-05-27  3:56       ` Jason Wang
  2025-05-29 11:10         ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-27  3:56 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Mon, May 26, 2025 at 6:57 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/26/25 2:47 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>
> >> Use the extended feature type for 'acked_features' and implement
> >> two new ioctls operation to get and set the extended features.
> >>
> >> Note that the legacy ioctls implicitly truncate the negotiated
> >> features to the lower 64 bits range.
> >>
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >> ---
> >>  drivers/vhost/net.c        | 26 +++++++++++++++++++++++++-
> >>  drivers/vhost/vhost.h      |  2 +-
> >>  include/uapi/linux/vhost.h |  8 ++++++++
> >>  3 files changed, 34 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 7cbfc7d718b3f..b894685dded3e 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -77,6 +77,10 @@ enum {
> >>                          (1ULL << VIRTIO_F_RING_RESET)
> >>  };
> >>
> >> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> >> +#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
> >> +#endif
> >> +
> >>  enum {
> >>         VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
> >>  };
> >> @@ -1614,7 +1618,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
> >>         return err;
> >>  }
> >>
> >> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
> >> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
> >>  {
> >>         size_t vhost_hlen, sock_hlen, hdr_len;
> >>         int i;
> >> @@ -1704,6 +1708,26 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> >>                 if (features & ~VHOST_NET_FEATURES)
> >>                         return -EOPNOTSUPP;
> >>                 return vhost_net_set_features(n, features);
> >> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> >
> > Vhost doesn't depend on virtio. But this invents a dependency, and I
> > don't understand why we need to do that.
>
> What do you mean with "dependency" here? vhost has already a build
> dependency vs virtio, including several virtio headers. It has also a
> logical dependency, using several virtio features.
>
> Do you mean a build dependency? this change does not introduce such a thing.

I mean vhost can be built without virtio drivers. So old vhost can run
new virtio drivers on top. So I don't see why vhost needs to check if
virtio of the same source tree supports 128 bit or not.

We can just accept an array of features now as

1) the changes are limited to vhost so it wouldn't be too much
2) we don't have to have VHOST_GET_FEATURES_EX2 in the future.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 7/8] tun: enable gso over UDP tunnel support.
  2025-05-26 11:20     ` Paolo Abeni
@ 2025-05-27  4:19       ` Jason Wang
  2025-05-29 16:17         ` Paolo Abeni
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-27  4:19 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Mon, May 26, 2025 at 7:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/26/25 6:40 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>
> >> Add new tun features to represent the newly introduced virtio
> >> GSO over UDP tunnel offload. Allows detection and selection of
> >> such features via the existing TUNSETOFFLOAD ioctl, store the
> >> tunnel offload configuration in the highest bit of the tun flags
> >> and compute the expected virtio header size and tunnel header
> >> offset using such bits, so that we can plug almost seamless the
> >> the newly introduced virtio helpers to serialize the extended
> >> virtio header.
> >>
> >> As the tun features and the virtio hdr size are configured
> >> separately, the data path need to cope with (hopefully transient)
> >> inconsistent values.
> >
> > I'm not sure it's a good idea to deal with this inconsistency in this
> > series as it is not specific to tunnel offloading. It could be a
> > dependency for this patch or we can leave it for the future and just
> > to make sure mis-configuration won't cause any kernel issues.
>
> The possible inconsistency is not due to a misconfiguration, but to the
> facts that:
> - configuring the virtio hdr len and the offload is not atomic
> - successful GSO over udp tunnel parsing requires the relevant offloads
> to be enabled and a suitable hdr len.
>
> Plain GSO don't have a similar problem because all the relevant fields
> are always available for any sane virtio hdr length, but we need to deal
> with them here.

Just to make sure we're on the same page.

I meant tun has TUNSETVNETHDRSZ, so user space can set it to any value
at any time as long as it's not smaller than sizeof(struct
virtio_net_hdr). Tun and vhost need to cope with this otherwise it
should be a bug. This is allowed before the introduction of tunnel
gso.

>
> >> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >>         struct sk_buff *skb;
> >>         size_t total_len = iov_iter_count(from);
> >>         size_t len = total_len, align = tun->align, linear;
> >> -       struct virtio_net_hdr gso = { 0 };
> >> +       char buf[TUN_VNET_TNL_SIZE];
> >
> > I wonder why not simply
> >
> > 1) define the structure virtio_net_hdr_tnl_gso and use that
> >
> > or
> >
> > 2) stick the gso here and use iter advance to get
> > virtio_net_hdr_tunnel when necessary?
>
> Code wise 2) looks more complex

I don't know how to define complex but we've already use a conatiner structure:

struct virtio_net_hdr_v1_hash {
        struct virtio_net_hdr_v1 hdr;
        __le32 hash_value;
...
        __le16 hash_report;
        __le16 padding;
};

> and 1) will require additional care when
> adding hash report support.

I don't understand here, you're doing:

        iov_iter_advance(from, sz - parsed_size);

in __tun_vnet_hdr_get(), so this logic needs to be extended for hash
report as well.

>
> >> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
> >> index 58b9ac7a5fc40..ab2d4396941ca 100644
> >> --- a/drivers/net/tun_vnet.h
> >> +++ b/drivers/net/tun_vnet.h
> >> @@ -5,6 +5,12 @@
> >>  /* High bits in flags field are unused. */
> >>  #define TUN_VNET_LE     0x80000000
> >>  #define TUN_VNET_BE     0x40000000
> >> +#define TUN_VNET_TNL           0x20000000
> >> +#define TUN_VNET_TNL_CSUM      0x10000000
> >> +#define TUN_VNET_TNL_MASK      (TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
> >> +
> >> +#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
> >
> > Should this be virtio_net_hdr_v1_hash?
>
> If tun does not support HASH_REPORT, no: the GSO over UDP tunnels header
> could be present regardless of the hash-related field presence. This has
> been discussed extensively while crafting the specification.

Ok, so it excludes the hash report fields, more below.

>
> Note that tun_vnet_parse_size() and  tun_vnet_tnl_offset() should be
> adjusted accordingly after that HASH_REPORT support is introduced.

This is suboptimal as we know a hash report will be added so we can
treat the field as anonymous one. See

https://patchwork.kernel.org/project/linux-kselftest/patch/20250307-rss-v9-3-df76624025eb@daynix.com/

>
> >> +                          sizeof(struct virtio_net_hdr_tunnel))
> >>
> >>  static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
> >>  {
> >> @@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
> >>         return 0;
> >>  }
> >>
> >> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
> >> +{
> >> +       *flags = (*flags & ~TUN_VNET_TNL_MASK) |
> >> +                tnl * TUN_VNET_TNL |
> >> +                tnl_csum * TUN_VNET_TNL_CSUM;
> >
> > We could refer to netdev via tun_struct, so I don't understand why we
> > need to duplicate the features in tun->flags (we don't do that for
> > other GSO/CSUM stuffs).
>
> Just to be consistent with commit 60df67b94804b1adca74854db502a72f7aeaa125

I don't see a connection here, the above commit just moves decouple
vnet to make it reusable, it doesn't change the semantic of
tun->flags.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-26  7:20     ` Paolo Abeni
  2025-05-27  3:51       ` Jason Wang
@ 2025-05-27 14:14       ` Michael S. Tsirkin
  1 sibling, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-27 14:14 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jason Wang, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Mon, May 26, 2025 at 09:20:50AM +0200, Paolo Abeni wrote:
> On 5/26/25 2:43 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> >> new file mode 100644
> >> index 0000000000000..2f742eeb45a29
> >> --- /dev/null
> >> +++ b/include/linux/virtio_features.h
> >> @@ -0,0 +1,23 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _LINUX_VIRTIO_FEATURES_H
> >> +#define _LINUX_VIRTIO_FEATURES_H
> >> +
> >> +#include <linux/bits.h>
> >> +
> >> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> >> +#define VIRTIO_HAS_EXTENDED_FEATURES
> >> +#define VIRTIO_FEATURES_MAX    128
> >> +#define VIRTIO_FEATURES_WORDS  4
> >> +#define VIRTIO_BIT(b)          _BIT128(b)
> >> +
> >> +typedef __uint128_t            virtio_features_t;
> > 
> > Consider:
> > 
> > 1) need the trick for arch that doesn't support 128bit
> > 2) some transport (e.g PCI) allows much more than just 128 bit features
> > 
> >  I wonder if it's better to just use arrays here.
> 
> I considered that, it has been discussed both on the virtio ML and
> privatelly, and I tried a resonable attempt with such implementation.
> 
> The diffstat would be horrible, touching a lot of the virtio/vhost code.
> Such approach will block any progress for a long time (more likely
> forever, since I will not have the capacity to complete it).
> 
> Also the benefit are AFAICS marginal, as 32 bits platform with huge
> virtualization deployments on top of it (that could benefit from GSO
> over UDP tunnel) are IMHO unlikely, and transport features space
> exhaustion is AFAIK far from being reached (also thanks to reserved
> features availables).
> 
> TL;DR: if you consider a generic implementation for an arbitrary wide
> features space blocking, please LMK, because any other consideration
> would be likely irrelevant otherwise.
> 
> /P

Let's just say, I'm fine with starting with this, and
we can move to an array later. The nice thing here
is that there's this typedef, it can later be changed to be
any struct at all.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-27  3:51       ` Jason Wang
@ 2025-05-28 15:47         ` Paolo Abeni
  2025-05-28 15:52           ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-28 15:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/27/25 5:51 AM, Jason Wang wrote:
> On Mon, May 26, 2025 at 3:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 5/26/25 2:43 AM, Jason Wang wrote:
>>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
>>>> new file mode 100644
>>>> index 0000000000000..2f742eeb45a29
>>>> --- /dev/null
>>>> +++ b/include/linux/virtio_features.h
>>>> @@ -0,0 +1,23 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +#ifndef _LINUX_VIRTIO_FEATURES_H
>>>> +#define _LINUX_VIRTIO_FEATURES_H
>>>> +
>>>> +#include <linux/bits.h>
>>>> +
>>>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
>>>> +#define VIRTIO_HAS_EXTENDED_FEATURES
>>>> +#define VIRTIO_FEATURES_MAX    128
>>>> +#define VIRTIO_FEATURES_WORDS  4
>>>> +#define VIRTIO_BIT(b)          _BIT128(b)
>>>> +
>>>> +typedef __uint128_t            virtio_features_t;
>>>
>>> Consider:
>>>
>>> 1) need the trick for arch that doesn't support 128bit
>>> 2) some transport (e.g PCI) allows much more than just 128 bit features
>>>
>>>  I wonder if it's better to just use arrays here.
>>
>> I considered that, it has been discussed both on the virtio ML and
>> privatelly, and I tried a resonable attempt with such implementation.
>>
>> The diffstat would be horrible, touching a lot of the virtio/vhost code.
> 
> Let's start with the driver. For example, driver had already used
> array for features:
> 
>         const unsigned int *feature_table;
>         unsigned int feature_table_size;
> 
> For vhost, we need new ioctls anyhow:
> 
> /* Features bitmask for forward compatibility.  Transport bits are used for
>  * vhost specific features. */
> #define VHOST_GET_FEATURES      _IOR(VHOST_VIRTIO, 0x00, __u64)
> #define VHOST_SET_FEATURES      _IOW(VHOST_VIRTIO, 0x00, __u64)
> 
> As we can't change uAPI for existing ioctls.
> 
>> Such approach will block any progress for a long time (more likely
>> forever, since I will not have the capacity to complete it).
>>
> 
> Well, could we at least start from using u64[2] for virtio_features_t?
> 
>> Also the benefit are AFAICS marginal, as 32 bits platform with huge
>> virtualization deployments on top of it (that could benefit from GSO
>> over UDP tunnel) are IMHO unlikely,
> 
> I think it's better to not have those architecture specific assumptions since:
> 
> 1) need to prove the assumption is correct or
> 2) we may also create blockers for 64 bit archs that don't support
> ARCH_SUPPORTS_INT128.
> 
>> and transport features space
>> exhaustion is AFAIK far from being reached (also thanks to reserved
>> features availables).
> 
> I wouldn't be worried if a straightforward switch to int128 worked,
> but it looks like that is not the case:
> 
> 1) ARCH_SUPPORTS_INT128 dependency
> 2) new uAPI
> 3) we might want a new virtio config ops as well as most of transport
> can only return 64 bit now
>>
>> TL;DR: if you consider a generic implementation for an arbitrary wide
>> features space blocking, please LMK, because any other consideration
>> would be likely irrelevant otherwise.

I read your comments above as the only way forward is abandoning the
uint128_t usage. Could you please confirm that?

Side note: new uAPI will be required by every implementation of
feature-space extension, as the current ones are 64-bits bound.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-28 15:47         ` Paolo Abeni
@ 2025-05-28 15:52           ` Michael S. Tsirkin
  2025-05-29  2:15             ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-28 15:52 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jason Wang, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 28, 2025 at 05:47:53PM +0200, Paolo Abeni wrote:
> On 5/27/25 5:51 AM, Jason Wang wrote:
> > On Mon, May 26, 2025 at 3:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/26/25 2:43 AM, Jason Wang wrote:
> >>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> >>>> new file mode 100644
> >>>> index 0000000000000..2f742eeb45a29
> >>>> --- /dev/null
> >>>> +++ b/include/linux/virtio_features.h
> >>>> @@ -0,0 +1,23 @@
> >>>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>>> +#ifndef _LINUX_VIRTIO_FEATURES_H
> >>>> +#define _LINUX_VIRTIO_FEATURES_H
> >>>> +
> >>>> +#include <linux/bits.h>
> >>>> +
> >>>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> >>>> +#define VIRTIO_HAS_EXTENDED_FEATURES
> >>>> +#define VIRTIO_FEATURES_MAX    128
> >>>> +#define VIRTIO_FEATURES_WORDS  4
> >>>> +#define VIRTIO_BIT(b)          _BIT128(b)
> >>>> +
> >>>> +typedef __uint128_t            virtio_features_t;
> >>>
> >>> Consider:
> >>>
> >>> 1) need the trick for arch that doesn't support 128bit
> >>> 2) some transport (e.g PCI) allows much more than just 128 bit features
> >>>
> >>>  I wonder if it's better to just use arrays here.
> >>
> >> I considered that, it has been discussed both on the virtio ML and
> >> privatelly, and I tried a resonable attempt with such implementation.
> >>
> >> The diffstat would be horrible, touching a lot of the virtio/vhost code.
> > 
> > Let's start with the driver. For example, driver had already used
> > array for features:
> > 
> >         const unsigned int *feature_table;
> >         unsigned int feature_table_size;
> > 
> > For vhost, we need new ioctls anyhow:
> > 
> > /* Features bitmask for forward compatibility.  Transport bits are used for
> >  * vhost specific features. */
> > #define VHOST_GET_FEATURES      _IOR(VHOST_VIRTIO, 0x00, __u64)
> > #define VHOST_SET_FEATURES      _IOW(VHOST_VIRTIO, 0x00, __u64)
> > 
> > As we can't change uAPI for existing ioctls.
> > 
> >> Such approach will block any progress for a long time (more likely
> >> forever, since I will not have the capacity to complete it).
> >>
> > 
> > Well, could we at least start from using u64[2] for virtio_features_t?
> > 
> >> Also the benefit are AFAICS marginal, as 32 bits platform with huge
> >> virtualization deployments on top of it (that could benefit from GSO
> >> over UDP tunnel) are IMHO unlikely,
> > 
> > I think it's better to not have those architecture specific assumptions since:
> > 
> > 1) need to prove the assumption is correct or
> > 2) we may also create blockers for 64 bit archs that don't support
> > ARCH_SUPPORTS_INT128.
> > 
> >> and transport features space
> >> exhaustion is AFAIK far from being reached (also thanks to reserved
> >> features availables).
> > 
> > I wouldn't be worried if a straightforward switch to int128 worked,
> > but it looks like that is not the case:
> > 
> > 1) ARCH_SUPPORTS_INT128 dependency
> > 2) new uAPI
> > 3) we might want a new virtio config ops as well as most of transport
> > can only return 64 bit now
> >>
> >> TL;DR: if you consider a generic implementation for an arbitrary wide
> >> features space blocking, please LMK, because any other consideration
> >> would be likely irrelevant otherwise.
> 
> I read your comments above as the only way forward is abandoning the
> uint128_t usage. Could you please confirm that?
> 
> Side note: new uAPI will be required by every implementation of
> feature-space extension, as the current ones are 64-bits bound.
> 
> Thanks,
> 
> Paolo


Jason, I think what Paolo's doing is a step in the right direction, we
can do this, then gradually transfer all drivers, devices and transports
to use virtio_features_t, then make virtio_features_t an array if we want.

If instead you jump to an array straight away, it's a huge change that
can not be split up cleanly.


-- 
MST


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-27  3:04       ` Jason Wang
@ 2025-05-28 16:02         ` Paolo Abeni
  2025-05-29  2:22           ` Jason Wang
  2025-05-29 14:28           ` Michael S. Tsirkin
  0 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-28 16:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/27/25 5:04 AM, Jason Wang wrote:
> On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 5/26/25 2:49 AM, Jason Wang wrote:
>>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>
>>>> The virtio specifications allows for up to 128 bits for the
>>>> device features. Soon we are going to use some of the 'extended'
>>>> bits features (above 64) for the virtio_net driver.
>>>>
>>>> Extend the virtio pci modern driver to support configuring the full
>>>> virtio features range, replacing the unrolled loops reading and
>>>> writing the features space with explicit one bounded to the actual
>>>> features space size in word.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
>>>>  1 file changed, 25 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
>>>> index 1d34655f6b658..e3025b6fa8540 100644
>>>> --- a/drivers/virtio/virtio_pci_modern_dev.c
>>>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
>>>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>>>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>>>>  {
>>>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>>>> -       virtio_features_t features;
>>>> +       virtio_features_t features = 0;
>>>> +       int i;
>>>>
>>>> -       vp_iowrite32(0, &cfg->device_feature_select);
>>>> -       features = vp_ioread32(&cfg->device_feature);
>>>> -       vp_iowrite32(1, &cfg->device_feature_select);
>>>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
>>>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
>>>> +               virtio_features_t cur;
>>>> +
>>>> +               vp_iowrite32(i, &cfg->device_feature_select);
>>>> +               cur = vp_ioread32(&cfg->device_feature);
>>>> +               features |= cur << (32 * i);
>>>> +       }
>>>
>>> No matter if we decide to go with 128bit or not. I think at the lower
>>> layer like this, it's time to allow arbitrary length of the features
>>> as the spec supports.
>>
>> Is that useful if the vhost interface is not going to support it?
> 
> I think so, as there are hardware virtio devices that can benefit from this.

Let me look at the question from another perspective. Let's suppose that
the virtio device supports an arbitrary wide features space, and the
uAPI allows passing to/from the kernel an arbitrary high number of features.

How could the kernel stop the above loop? AFAICS the virtio spec does
not define any way to detect the end of the features space. An arbitrary
bound is actually needed.

If 128 looks too low (why?) it can be raised to say 256 (why?). But
AFAICS the only visible effect would be slower configuration due to
larger number of unneeded I/O operations.

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 1/8] virtio: introduce virtio_features_t
  2025-05-28 15:52           ` Michael S. Tsirkin
@ 2025-05-29  2:15             ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-05-29  2:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Abeni, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 28, 2025 at 11:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, May 28, 2025 at 05:47:53PM +0200, Paolo Abeni wrote:
> > On 5/27/25 5:51 AM, Jason Wang wrote:
> > > On Mon, May 26, 2025 at 3:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >> On 5/26/25 2:43 AM, Jason Wang wrote:
> > >>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >>>> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> > >>>> new file mode 100644
> > >>>> index 0000000000000..2f742eeb45a29
> > >>>> --- /dev/null
> > >>>> +++ b/include/linux/virtio_features.h
> > >>>> @@ -0,0 +1,23 @@
> > >>>> +/* SPDX-License-Identifier: GPL-2.0 */
> > >>>> +#ifndef _LINUX_VIRTIO_FEATURES_H
> > >>>> +#define _LINUX_VIRTIO_FEATURES_H
> > >>>> +
> > >>>> +#include <linux/bits.h>
> > >>>> +
> > >>>> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> > >>>> +#define VIRTIO_HAS_EXTENDED_FEATURES
> > >>>> +#define VIRTIO_FEATURES_MAX    128
> > >>>> +#define VIRTIO_FEATURES_WORDS  4
> > >>>> +#define VIRTIO_BIT(b)          _BIT128(b)
> > >>>> +
> > >>>> +typedef __uint128_t            virtio_features_t;
> > >>>
> > >>> Consider:
> > >>>
> > >>> 1) need the trick for arch that doesn't support 128bit
> > >>> 2) some transport (e.g PCI) allows much more than just 128 bit features
> > >>>
> > >>>  I wonder if it's better to just use arrays here.
> > >>
> > >> I considered that, it has been discussed both on the virtio ML and
> > >> privatelly, and I tried a resonable attempt with such implementation.
> > >>
> > >> The diffstat would be horrible, touching a lot of the virtio/vhost code.
> > >
> > > Let's start with the driver. For example, driver had already used
> > > array for features:
> > >
> > >         const unsigned int *feature_table;
> > >         unsigned int feature_table_size;
> > >
> > > For vhost, we need new ioctls anyhow:
> > >
> > > /* Features bitmask for forward compatibility.  Transport bits are used for
> > >  * vhost specific features. */
> > > #define VHOST_GET_FEATURES      _IOR(VHOST_VIRTIO, 0x00, __u64)
> > > #define VHOST_SET_FEATURES      _IOW(VHOST_VIRTIO, 0x00, __u64)
> > >
> > > As we can't change uAPI for existing ioctls.
> > >
> > >> Such approach will block any progress for a long time (more likely
> > >> forever, since I will not have the capacity to complete it).
> > >>
> > >
> > > Well, could we at least start from using u64[2] for virtio_features_t?
> > >
> > >> Also the benefit are AFAICS marginal, as 32 bits platform with huge
> > >> virtualization deployments on top of it (that could benefit from GSO
> > >> over UDP tunnel) are IMHO unlikely,
> > >
> > > I think it's better to not have those architecture specific assumptions since:
> > >
> > > 1) need to prove the assumption is correct or
> > > 2) we may also create blockers for 64 bit archs that don't support
> > > ARCH_SUPPORTS_INT128.
> > >
> > >> and transport features space
> > >> exhaustion is AFAIK far from being reached (also thanks to reserved
> > >> features availables).
> > >
> > > I wouldn't be worried if a straightforward switch to int128 worked,
> > > but it looks like that is not the case:
> > >
> > > 1) ARCH_SUPPORTS_INT128 dependency
> > > 2) new uAPI
> > > 3) we might want a new virtio config ops as well as most of transport
> > > can only return 64 bit now
> > >>
> > >> TL;DR: if you consider a generic implementation for an arbitrary wide
> > >> features space blocking, please LMK, because any other consideration
> > >> would be likely irrelevant otherwise.
> >
> > I read your comments above as the only way forward is abandoning the
> > uint128_t usage. Could you please confirm that?
> >
> > Side note: new uAPI will be required by every implementation of
> > feature-space extension, as the current ones are 64-bits bound.
> >
> > Thanks,
> >
> > Paolo
>
>
> Jason, I think what Paolo's doing is a step in the right direction, we
> can do this, then gradually transfer all drivers, devices and transports
> to use virtio_features_t, then make virtio_features_t an array if we want.
>
> If instead you jump to an array straight away, it's a huge change that
> can not be split up cleanly.

Ok, consider we're moving forward to arrays. I'm fine.

Thanks

>
>
> --
> MST
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-28 16:02         ` Paolo Abeni
@ 2025-05-29  2:22           ` Jason Wang
  2025-05-29 11:07             ` Paolo Abeni
  2025-05-29 14:28           ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-05-29  2:22 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 29, 2025 at 12:02 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/27/25 5:04 AM, Jason Wang wrote:
> > On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/26/25 2:49 AM, Jason Wang wrote:
> >>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>>
> >>>> The virtio specifications allows for up to 128 bits for the
> >>>> device features. Soon we are going to use some of the 'extended'
> >>>> bits features (above 64) for the virtio_net driver.
> >>>>
> >>>> Extend the virtio pci modern driver to support configuring the full
> >>>> virtio features range, replacing the unrolled loops reading and
> >>>> writing the features space with explicit one bounded to the actual
> >>>> features space size in word.
> >>>>
> >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>>> ---
> >>>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
> >>>>  1 file changed, 25 insertions(+), 14 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> >>>> index 1d34655f6b658..e3025b6fa8540 100644
> >>>> --- a/drivers/virtio/virtio_pci_modern_dev.c
> >>>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> >>>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
> >>>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> >>>>  {
> >>>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >>>> -       virtio_features_t features;
> >>>> +       virtio_features_t features = 0;
> >>>> +       int i;
> >>>>
> >>>> -       vp_iowrite32(0, &cfg->device_feature_select);
> >>>> -       features = vp_ioread32(&cfg->device_feature);
> >>>> -       vp_iowrite32(1, &cfg->device_feature_select);
> >>>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> >>>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> >>>> +               virtio_features_t cur;
> >>>> +
> >>>> +               vp_iowrite32(i, &cfg->device_feature_select);
> >>>> +               cur = vp_ioread32(&cfg->device_feature);
> >>>> +               features |= cur << (32 * i);
> >>>> +       }
> >>>
> >>> No matter if we decide to go with 128bit or not. I think at the lower
> >>> layer like this, it's time to allow arbitrary length of the features
> >>> as the spec supports.
> >>
> >> Is that useful if the vhost interface is not going to support it?
> >
> > I think so, as there are hardware virtio devices that can benefit from this.
>
> Let me look at the question from another perspective. Let's suppose that
> the virtio device supports an arbitrary wide features space, and the
> uAPI allows passing to/from the kernel an arbitrary high number of features.
>
> How could the kernel stop the above loop? AFAICS the virtio spec does
> not define any way to detect the end of the features space. An arbitrary
> bound is actually needed.

I think this is a good question ad we have something that could work:

1) current driver has drv->feature_table_size, so the driver knows
it's meaningless to read above the size

and

2) we can extend the spec, e.g add a transport specific field to let
the driver to know the feature size

>
> If 128 looks too low (why?) it can be raised to say 256 (why?). But
> AFAICS the only visible effect would be slower configuration due to
> larger number of unneeded I/O operations.

See above.

>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-29  2:22           ` Jason Wang
@ 2025-05-29 11:07             ` Paolo Abeni
  2025-05-29 14:28               ` Michael S. Tsirkin
  2025-06-03  2:11               ` Jason Wang
  0 siblings, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-29 11:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/29/25 4:22 AM, Jason Wang wrote:
> On Thu, May 29, 2025 at 12:02 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 5/27/25 5:04 AM, Jason Wang wrote:
>>> On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>> On 5/26/25 2:49 AM, Jason Wang wrote:
>>>>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>>>
>>>>>> The virtio specifications allows for up to 128 bits for the
>>>>>> device features. Soon we are going to use some of the 'extended'
>>>>>> bits features (above 64) for the virtio_net driver.
>>>>>>
>>>>>> Extend the virtio pci modern driver to support configuring the full
>>>>>> virtio features range, replacing the unrolled loops reading and
>>>>>> writing the features space with explicit one bounded to the actual
>>>>>> features space size in word.
>>>>>>
>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>> ---
>>>>>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
>>>>>>  1 file changed, 25 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
>>>>>> index 1d34655f6b658..e3025b6fa8540 100644
>>>>>> --- a/drivers/virtio/virtio_pci_modern_dev.c
>>>>>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
>>>>>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
>>>>>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>>>>>>  {
>>>>>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>>>>>> -       virtio_features_t features;
>>>>>> +       virtio_features_t features = 0;
>>>>>> +       int i;
>>>>>>
>>>>>> -       vp_iowrite32(0, &cfg->device_feature_select);
>>>>>> -       features = vp_ioread32(&cfg->device_feature);
>>>>>> -       vp_iowrite32(1, &cfg->device_feature_select);
>>>>>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
>>>>>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
>>>>>> +               virtio_features_t cur;
>>>>>> +
>>>>>> +               vp_iowrite32(i, &cfg->device_feature_select);
>>>>>> +               cur = vp_ioread32(&cfg->device_feature);
>>>>>> +               features |= cur << (32 * i);
>>>>>> +       }
>>>>>
>>>>> No matter if we decide to go with 128bit or not. I think at the lower
>>>>> layer like this, it's time to allow arbitrary length of the features
>>>>> as the spec supports.
>>>>
>>>> Is that useful if the vhost interface is not going to support it?
>>>
>>> I think so, as there are hardware virtio devices that can benefit from this.
>>
>> Let me look at the question from another perspective. Let's suppose that
>> the virtio device supports an arbitrary wide features space, and the
>> uAPI allows passing to/from the kernel an arbitrary high number of features.
>>
>> How could the kernel stop the above loop? AFAICS the virtio spec does
>> not define any way to detect the end of the features space. An arbitrary
>> bound is actually needed.
> 
> I think this is a good question ad we have something that could work:
> 
> 1) current driver has drv->feature_table_size, so the driver knows
> it's meaningless to read above the size
> 
> and
> 
> 2) we can extend the spec, e.g add a transport specific field to let
> the driver to know the feature size

So I guess we can postpone any additional change here until we have some
spec in place, right?

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 3/8] vhost-net: allow configuring extended features
  2025-05-27  3:56       ` Jason Wang
@ 2025-05-29 11:10         ` Paolo Abeni
  2025-06-03  2:11           ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-29 11:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/27/25 5:56 AM, Jason Wang wrote:
> On Mon, May 26, 2025 at 6:57 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 5/26/25 2:47 AM, Jason Wang wrote:
>>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>
>>>> Use the extended feature type for 'acked_features' and implement
>>>> two new ioctls operation to get and set the extended features.
>>>>
>>>> Note that the legacy ioctls implicitly truncate the negotiated
>>>> features to the lower 64 bits range.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>>  drivers/vhost/net.c        | 26 +++++++++++++++++++++++++-
>>>>  drivers/vhost/vhost.h      |  2 +-
>>>>  include/uapi/linux/vhost.h |  8 ++++++++
>>>>  3 files changed, 34 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> index 7cbfc7d718b3f..b894685dded3e 100644
>>>> --- a/drivers/vhost/net.c
>>>> +++ b/drivers/vhost/net.c
>>>> @@ -77,6 +77,10 @@ enum {
>>>>                          (1ULL << VIRTIO_F_RING_RESET)
>>>>  };
>>>>
>>>> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
>>>> +#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
>>>> +#endif
>>>> +
>>>>  enum {
>>>>         VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>>>>  };
>>>> @@ -1614,7 +1618,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>>>>         return err;
>>>>  }
>>>>
>>>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
>>>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>>>>  {
>>>>         size_t vhost_hlen, sock_hlen, hdr_len;
>>>>         int i;
>>>> @@ -1704,6 +1708,26 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>>                 if (features & ~VHOST_NET_FEATURES)
>>>>                         return -EOPNOTSUPP;
>>>>                 return vhost_net_set_features(n, features);
>>>> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
>>>
>>> Vhost doesn't depend on virtio. But this invents a dependency, and I
>>> don't understand why we need to do that.
>>
>> What do you mean with "dependency" here? vhost has already a build
>> dependency vs virtio, including several virtio headers. It has also a
>> logical dependency, using several virtio features.
>>
>> Do you mean a build dependency? this change does not introduce such a thing.
> 
> I mean vhost can be built without virtio drivers. So old vhost can run
> new virtio drivers on top. So I don't see why vhost needs to check if
> virtio of the same source tree supports 128 bit or not.
> 
> We can just accept an array of features now as
> 
> 1) the changes are limited to vhost so it wouldn't be too much
> 2) we don't have to have VHOST_GET_FEATURES_EX2 in the future.

AFAICS the ioctl() interface code wise only impacts on the device
implementing extended features support, I guess it could be changed to
to something alike:

struct vhost_virtio_features {
	__u64 count;
	__u64 features[];
};

#define VHOST_GET_FEATURES_VECTOR _IOR(VHOST_VIRTIO, 0x83, struct
vhost_virtio_features)
#define VHOST_SET_FEATURES_VECTOR _IOW(VHOST_VIRTIO, 0x83, struct
vhost_virtio_features)

I could drop the above #ifdef, and the implementation would copy in/out
only the known/supported number of features.

WDYT?

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-26  4:40   ` Jason Wang
@ 2025-05-29 11:55     ` Paolo Abeni
  2025-05-30  8:43       ` Paolo Abeni
  2025-06-03  2:11       ` Jason Wang
  2025-05-29 15:30     ` Paolo Abeni
  1 sibling, 2 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-29 11:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/26/25 6:40 AM, Jason Wang wrote:
> On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> +       if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
>> +               return -EINVAL;
>> +
>> +       /* Relay on csum being present. */
>> +       if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
>> +               return -EINVAL;
>> +
>> +       /* Validate offsets. */
>> +       outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
>> +       inner_l3min = virtio_l3min(gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6);
>> +       outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
>> +
>> +       tnl = ((void *)hdr) + tnl_hdr_offset;
>> +       inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
>> +       inner_nh = __virtio16_to_cpu(little_endian, tnl->inner_nh_offset);
>> +       outer_th = __virtio16_to_cpu(little_endian, tnl->outer_th_offset);
>> +       if (outer_th < outer_l3min ||
>> +           inner_nh < outer_th + sizeof(struct udphdr) ||
>> +           inner_th < inner_nh + inner_l3min)
>> +               return -EINVAL;
> 
> I wonder if kernel has already had helpers to validate the tunnel
> headers 

Not that I know of.

> or if the above check is sufficient here.

AFAICS yes. Syzkaller is out there just to prove me wrong...


>> +
>> +       /* Let the basic parsing deal with plain GSO features. */
>> +       ret = __virtio_net_hdr_to_skb(skb, hdr, little_endian,
>> +                                     hdr->gso_type & ~gso_tunnel_type);
>> +       if (ret)
>> +               return ret;
>> +
>> +       skb_set_inner_protocol(skb, outer_isv6 ? htons(ETH_P_IPV6) :
>> +                                                htons(ETH_P_IP));
> 
> The outer_isv6 is somehow misleading here, I think we'd better rename
> it as inner_isv6?

There is bug above, thanks for spotting it. I should not use the
`outer_isv6` variable, instead I should compute separately `inner_isv6`

>> +       if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
>> +               if (!tnl_csum_negotiated)
>> +                       return -EINVAL;
>> +
>> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
>> +       } else {
>> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
>> +       }
>> +
>> +       skb->inner_transport_header = inner_th + skb_headroom(skb);
> 
> I may miss something but using skb_headroom() means the value depends
> on the geometry of the skb and the headroom might vary depending on
> the size of the packet and other factors.  (see receive_buf())

Yes, that is correct: the actual inner_transport_header value depends on
the skb geometry, because the (inner) transport header is located at
skb->head + skb->inner_transport_header.

>> +       skb->inner_network_header = inner_nh + skb_headroom(skb);
>> +       skb->inner_mac_header = inner_nh + skb_headroom(skb);
> 
> This actually equals to inner_network_header, is this intended?

Yes. AFAICS the inner mac header field is used only for GSO/TSO.

At this point we don't know if the inner mac header is actually present
nor it's len (could include vlan tag).

Still the above allows correct segmentation by the GSO stage because the
inner mac header is not copied verbatim in the segmented packets, alike
the tunnel header.

With the above code, the inner mac header if really present will be
logically considered part of the tunnel header by the GSO stage.

Note that some devices restrict the TSO capability to some fixed values
of the UDP tunnel sizes and inner mac header. In such cases, they will
fallback to S/W GSO.

>> +       skb->transport_header = outer_th + skb_headroom(skb);
>> +       skb->encapsulation = 1;
>> +       return 0;
>> +}
>> +
>> +static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
>> +                                           struct virtio_net_hdr *hdr,
>> +                                           bool tun_csum_negotiated)
> 
> This is virtio_net.h so it's better to avoid using "tun". Btw, I
> wonder why this needs to be called by the virtio-net instead of being
> called by hdr_to_skb helpers.

I can squash into virtio_net_hdr_tnl_to_skb(), I kept them separated to
avoid extra long argument lists, but we are dropping an argument from
virtio_net_hdr_tnl_to_skb(), so should be ok.

>> +{
>> +       if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
>> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
>> +                       return 0;
>> +
>> +               skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
>> +                       return 0;
>> +
>> +               /* tunnel csum packets are invalid when the related
>> +                * feature has not been negotiated
>> +                */
>> +               if (!tun_csum_negotiated)
>> +                       return -EINVAL;
> 
> Should we move this check above VIRTIO_NET_HDR_F_DATA_VALID check?

It could break existing setups. We can safely do extra validation only
when we know that the UDP tunnel features have been negotiated.

>> +               skb->csum_level = 1;
>> +               return 0;
>> +       }
>> +
>> +       /* DATA_VALID is mutually exclusive with NEEDS_CSUM,
> 
> I may miss something but I think we had a discussion about this, and
> the conclusion is it's too late to fix as it may break some legacy
> devices?

I'm not sure what should be fixed here? This check implements exactly
restriction you asked for while discussing the spec. We can't have a
similar check for non UDP tunneled packets, because it could break
existing setup.

> 
>> and GSO
>> +        * over UDP tunnel requires the latter
>> +        */
>> +       if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
>> +               return -EINVAL;
>> +       return 0;
>> +}
>> +
>> +static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
>> +                                             struct virtio_net_hdr *hdr,
>> +                                             unsigned int tnl_offset,
>> +                                             bool little_endian,
>> +                                             int vlan_hlen)
>> +{
>> +       struct virtio_net_hdr_tunnel *tnl;
>> +       unsigned int inner_nh, outer_th;
>> +       int tnl_gso_type;
>> +       int ret;
>> +
>> +       tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
>> +                                                   SKB_GSO_UDP_TUNNEL_CSUM);
>> +       if (!tnl_gso_type)
>> +               return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
>> +                                              vlan_hlen);
>> +
>> +       /* Tunnel support not negotiated but skb ask for it. */
>> +       if (!tnl_offset)
>> +               return -EINVAL;
> 
> Should we do BUG_ON here?

I don't think so. BUG_ON()s are explicitly discouraged to avoid crashing
the kernel on exceptional/unexpected situation.

The caller will emit rate limited warns with the relevant info, if this
is hit. The BUG_ON() stack trace will add little value.

>> +
>> +       /* Let the basic parsing deal with plain GSO features. */
>> +       skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
>> +       ret = virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
>> +                                     vlan_hlen);
>> +       skb_shinfo(skb)->gso_type |= tnl_gso_type;
>> +       if (ret)
>> +               return ret;
> 
> Could we do the plain GSO after setting inner flags below to avoid
> masking and unmasking tnl_gso_type?

virtio_net_hdr_from_skb() will still receive a skb with UDP tunnel GSO
type and will error out.

The masking coudl be avoided factoring out a __virtio_net_hdr_from_skb()
helper receiving an explicit gso_type argument. I can do that if it's
preferred.

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-28 16:02         ` Paolo Abeni
  2025-05-29  2:22           ` Jason Wang
@ 2025-05-29 14:28           ` Michael S. Tsirkin
  1 sibling, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-29 14:28 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jason Wang, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Wed, May 28, 2025 at 06:02:43PM +0200, Paolo Abeni wrote:
> On 5/27/25 5:04 AM, Jason Wang wrote:
> > On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/26/25 2:49 AM, Jason Wang wrote:
> >>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>>
> >>>> The virtio specifications allows for up to 128 bits for the
> >>>> device features. Soon we are going to use some of the 'extended'
> >>>> bits features (above 64) for the virtio_net driver.
> >>>>
> >>>> Extend the virtio pci modern driver to support configuring the full
> >>>> virtio features range, replacing the unrolled loops reading and
> >>>> writing the features space with explicit one bounded to the actual
> >>>> features space size in word.
> >>>>
> >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>>> ---
> >>>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
> >>>>  1 file changed, 25 insertions(+), 14 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> >>>> index 1d34655f6b658..e3025b6fa8540 100644
> >>>> --- a/drivers/virtio/virtio_pci_modern_dev.c
> >>>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> >>>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
> >>>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> >>>>  {
> >>>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >>>> -       virtio_features_t features;
> >>>> +       virtio_features_t features = 0;
> >>>> +       int i;
> >>>>
> >>>> -       vp_iowrite32(0, &cfg->device_feature_select);
> >>>> -       features = vp_ioread32(&cfg->device_feature);
> >>>> -       vp_iowrite32(1, &cfg->device_feature_select);
> >>>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> >>>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> >>>> +               virtio_features_t cur;
> >>>> +
> >>>> +               vp_iowrite32(i, &cfg->device_feature_select);
> >>>> +               cur = vp_ioread32(&cfg->device_feature);
> >>>> +               features |= cur << (32 * i);
> >>>> +       }
> >>>
> >>> No matter if we decide to go with 128bit or not. I think at the lower
> >>> layer like this, it's time to allow arbitrary length of the features
> >>> as the spec supports.
> >>
> >> Is that useful if the vhost interface is not going to support it?
> > 
> > I think so, as there are hardware virtio devices that can benefit from this.
> 
> Let me look at the question from another perspective. Let's suppose that
> the virtio device supports an arbitrary wide features space, and the
> uAPI allows passing to/from the kernel an arbitrary high number of features.
> 
> How could the kernel stop the above loop? AFAICS the virtio spec does
> not define any way to detect the end of the features space. An arbitrary
> bound is actually needed.

Well, no. Let me explain.

Only the features that are negotiated matter.

Thus, as long as the driver only knows how to handle the low 128 bits,
it has no reason at all to ever look at other bits.
Not read them, nor write them, nor store them.

Hope that helps.








> If 128 looks too low (why?) it can be raised to say 256 (why?). But
> AFAICS the only visible effect would be slower configuration due to
> larger number of unneeded I/O operations.
> 
> /P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-29 11:07             ` Paolo Abeni
@ 2025-05-29 14:28               ` Michael S. Tsirkin
  2025-06-03  2:11               ` Jason Wang
  1 sibling, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-29 14:28 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jason Wang, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 29, 2025 at 01:07:30PM +0200, Paolo Abeni wrote:
> On 5/29/25 4:22 AM, Jason Wang wrote:
> > On Thu, May 29, 2025 at 12:02 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/27/25 5:04 AM, Jason Wang wrote:
> >>> On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>> On 5/26/25 2:49 AM, Jason Wang wrote:
> >>>>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>>>>
> >>>>>> The virtio specifications allows for up to 128 bits for the
> >>>>>> device features. Soon we are going to use some of the 'extended'
> >>>>>> bits features (above 64) for the virtio_net driver.
> >>>>>>
> >>>>>> Extend the virtio pci modern driver to support configuring the full
> >>>>>> virtio features range, replacing the unrolled loops reading and
> >>>>>> writing the features space with explicit one bounded to the actual
> >>>>>> features space size in word.
> >>>>>>
> >>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>>>>> ---
> >>>>>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
> >>>>>>  1 file changed, 25 insertions(+), 14 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> >>>>>> index 1d34655f6b658..e3025b6fa8540 100644
> >>>>>> --- a/drivers/virtio/virtio_pci_modern_dev.c
> >>>>>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> >>>>>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
> >>>>>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> >>>>>>  {
> >>>>>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >>>>>> -       virtio_features_t features;
> >>>>>> +       virtio_features_t features = 0;
> >>>>>> +       int i;
> >>>>>>
> >>>>>> -       vp_iowrite32(0, &cfg->device_feature_select);
> >>>>>> -       features = vp_ioread32(&cfg->device_feature);
> >>>>>> -       vp_iowrite32(1, &cfg->device_feature_select);
> >>>>>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> >>>>>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> >>>>>> +               virtio_features_t cur;
> >>>>>> +
> >>>>>> +               vp_iowrite32(i, &cfg->device_feature_select);
> >>>>>> +               cur = vp_ioread32(&cfg->device_feature);
> >>>>>> +               features |= cur << (32 * i);
> >>>>>> +       }
> >>>>>
> >>>>> No matter if we decide to go with 128bit or not. I think at the lower
> >>>>> layer like this, it's time to allow arbitrary length of the features
> >>>>> as the spec supports.
> >>>>
> >>>> Is that useful if the vhost interface is not going to support it?
> >>>
> >>> I think so, as there are hardware virtio devices that can benefit from this.
> >>
> >> Let me look at the question from another perspective. Let's suppose that
> >> the virtio device supports an arbitrary wide features space, and the
> >> uAPI allows passing to/from the kernel an arbitrary high number of features.
> >>
> >> How could the kernel stop the above loop? AFAICS the virtio spec does
> >> not define any way to detect the end of the features space. An arbitrary
> >> bound is actually needed.
> > 
> > I think this is a good question ad we have something that could work:
> > 
> > 1) current driver has drv->feature_table_size, so the driver knows
> > it's meaningless to read above the size
> > 
> > and
> > 
> > 2) we can extend the spec, e.g add a transport specific field to let
> > the driver to know the feature size
> 
> So I guess we can postpone any additional change here until we have some
> spec in place, right?
> 
> /P

Agree on this.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-26  4:40   ` Jason Wang
  2025-05-29 11:55     ` Paolo Abeni
@ 2025-05-29 15:30     ` Paolo Abeni
  2025-06-03  2:11       ` Jason Wang
  1 sibling, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-29 15:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/26/25 6:40 AM, Jason Wang wrote:
> On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> @@ -242,4 +249,158 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>>         return 0;
>>  }
>>
>> +static inline unsigned int virtio_l3min(bool is_ipv6)
>> +{
>> +       return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
>> +}
>> +
>> +static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
>> +                                           const struct virtio_net_hdr *hdr,
>> +                                           unsigned int tnl_hdr_offset,
>> +                                           bool tnl_csum_negotiated,
>> +                                           bool little_endian)
> 
> Considering tunnel gso requires VERSION_1, I think there's no chance
> for little_endian to be false here.

If tnl_hdr_offset == 0, tunnel gso has not been negotiated, and
little_endian could be false. I can assume little_endian is true in the
!!tnl_hdr_offset branch.

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 7/8] tun: enable gso over UDP tunnel support.
  2025-05-27  4:19       ` Jason Wang
@ 2025-05-29 16:17         ` Paolo Abeni
  2025-06-03  2:11           ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Abeni @ 2025-05-29 16:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/27/25 6:19 AM, Jason Wang wrote:
> On Mon, May 26, 2025 at 7:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 5/26/25 6:40 AM, Jason Wang wrote:
>>> On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>
>>>> Add new tun features to represent the newly introduced virtio
>>>> GSO over UDP tunnel offload. Allows detection and selection of
>>>> such features via the existing TUNSETOFFLOAD ioctl, store the
>>>> tunnel offload configuration in the highest bit of the tun flags
>>>> and compute the expected virtio header size and tunnel header
>>>> offset using such bits, so that we can plug almost seamless the
>>>> the newly introduced virtio helpers to serialize the extended
>>>> virtio header.
>>>>
>>>> As the tun features and the virtio hdr size are configured
>>>> separately, the data path need to cope with (hopefully transient)
>>>> inconsistent values.
>>>
>>> I'm not sure it's a good idea to deal with this inconsistency in this
>>> series as it is not specific to tunnel offloading. It could be a
>>> dependency for this patch or we can leave it for the future and just
>>> to make sure mis-configuration won't cause any kernel issues.
>>
>> The possible inconsistency is not due to a misconfiguration, but to the
>> facts that:
>> - configuring the virtio hdr len and the offload is not atomic
>> - successful GSO over udp tunnel parsing requires the relevant offloads
>> to be enabled and a suitable hdr len.
>>
>> Plain GSO don't have a similar problem because all the relevant fields
>> are always available for any sane virtio hdr length, but we need to deal
>> with them here.
> 
> Just to make sure we're on the same page.
> 
> I meant tun has TUNSETVNETHDRSZ, so user space can set it to any value
> at any time as long as it's not smaller than sizeof(struct
> virtio_net_hdr). Tun and vhost need to cope with this otherwise it
> should be a bug. This is allowed before the introduction of tunnel
> gso.

This code here is intended to support such scenario; but if the virtio
hdr size is configured to be lower than the minimum required for UDP
tunnel hdr fields, the related offload could not be used.

>>>> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>>>         struct sk_buff *skb;
>>>>         size_t total_len = iov_iter_count(from);
>>>>         size_t len = total_len, align = tun->align, linear;
>>>> -       struct virtio_net_hdr gso = { 0 };
>>>> +       char buf[TUN_VNET_TNL_SIZE];
>>>
>>> I wonder why not simply
>>>
>>> 1) define the structure virtio_net_hdr_tnl_gso and use that
>>>
>>> or
>>>
>>> 2) stick the gso here and use iter advance to get
>>> virtio_net_hdr_tunnel when necessary?
>>
>> Code wise 2) looks more complex
> 
> I don't know how to define complex but we've already use a conatiner structure:
> 
> struct virtio_net_hdr_v1_hash {
>         struct virtio_net_hdr_v1 hdr;
>         __le32 hash_value;
> ...
>         __le16 hash_report;
>         __le16 padding;
> };
> 
>> and 1) will require additional care when
>> adding hash report support.
> 
> I don't understand here, you're doing:
> 
>         iov_iter_advance(from, sz - parsed_size);
> 
> in __tun_vnet_hdr_get(), so this logic needs to be extended for hash
> report as well.

Note that there are at least 2 different virtio net hdr binary layout
supporting UDP tunnel offload:

struct virtio_net_hdr_v1_tnl {
   struct virtio_net_hdr_v1 hdr;
   struct virtio_net_hdr_tunnel tnl;
};

and

struct virtio_net_hdr_v1_hash_tnl {
   struct virtio_net_hdr_v1_hash hdr;
   struct virtio_net_hdr_tunnel tnl;
};

depending on the negotiated features. Using directly a struct to
fill/fetch the tunnel fields is problematic.

With the current approach the binary layout differences are abstracted
by the tun_vnet_parse_size()/tun_vnet_tnl_offset() helpers. The
expectation is that enabling hash report will set a bit in `flags`, too,
 so that helpers could compute the correct offset accordingly.

No other change should be required.

>>>> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
>>>> index 58b9ac7a5fc40..ab2d4396941ca 100644
>>>> --- a/drivers/net/tun_vnet.h
>>>> +++ b/drivers/net/tun_vnet.h
>>>> @@ -5,6 +5,12 @@
>>>>  /* High bits in flags field are unused. */
>>>>  #define TUN_VNET_LE     0x80000000
>>>>  #define TUN_VNET_BE     0x40000000
>>>> +#define TUN_VNET_TNL           0x20000000
>>>> +#define TUN_VNET_TNL_CSUM      0x10000000
>>>> +#define TUN_VNET_TNL_MASK      (TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
>>>> +
>>>> +#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
>>>
>>> Should this be virtio_net_hdr_v1_hash?
>>
>> If tun does not support HASH_REPORT, no: the GSO over UDP tunnels header
>> could be present regardless of the hash-related field presence. This has
>> been discussed extensively while crafting the specification.
> 
> Ok, so it excludes the hash report fields, more below.
> 
>>
>> Note that tun_vnet_parse_size() and  tun_vnet_tnl_offset() should be
>> adjusted accordingly after that HASH_REPORT support is introduced.
> 
> This is suboptimal as we know a hash report will be added so we can
> treat the field as anonymous one. See
> 
> https://patchwork.kernel.org/project/linux-kselftest/patch/20250307-rss-v9-3-df76624025eb@daynix.com/

I know hash support is in the work. The current design is intended to
minimize the conflicts with such feature. But I can't follow the
statement above. Could you please re-phrase it?

>>>> +                          sizeof(struct virtio_net_hdr_tunnel))
>>>>
>>>>  static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
>>>>  {
>>>> @@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
>>>>         return 0;
>>>>  }
>>>>
>>>> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
>>>> +{
>>>> +       *flags = (*flags & ~TUN_VNET_TNL_MASK) |
>>>> +                tnl * TUN_VNET_TNL |
>>>> +                tnl_csum * TUN_VNET_TNL_CSUM;
>>>
>>> We could refer to netdev via tun_struct, so I don't understand why we
>>> need to duplicate the features in tun->flags (we don't do that for
>>> other GSO/CSUM stuffs).
>>
>> Just to be consistent with commit 60df67b94804b1adca74854db502a72f7aeaa125
> 
> I don't see a connection here, the above commit just moves decouple
> vnet to make it reusable, it doesn't change the semantic of
> tun->flags.

You are right, I used a bad commit reference.

The goal here is to keep all the virtio-layout-related information in a
single place. tun->flags is already used for that (for little endian
flag), so I piggybacked there.

Ideally another bit there will be allocated used to mark the hash report
presence, too. That will allow the tun_vnet helpers to determine the
virtio net hdr layout using a single argument.

Note that we can't relay on the netdev->features to determine the virtio
net hdr binary layout because user-space could enable/disable GSO over
UDP tunnel support after ioctl(TUNSETOFFLOAD).

/P




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-29 11:55     ` Paolo Abeni
@ 2025-05-30  8:43       ` Paolo Abeni
  2025-06-03  2:11       ` Jason Wang
  1 sibling, 0 replies; 59+ messages in thread
From: Paolo Abeni @ 2025-05-30  8:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On 5/29/25 1:55 PM, Paolo Abeni wrote:
> On 5/26/25 6:40 AM, Jason Wang wrote:
>> On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>> +       skb->transport_header = outer_th + skb_headroom(skb);
>>> +       skb->encapsulation = 1;
>>> +       return 0;
>>> +}
>>> +
>>> +static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
>>> +                                           struct virtio_net_hdr *hdr,
>>> +                                           bool tun_csum_negotiated)
>>
>> This is virtio_net.h so it's better to avoid using "tun". Btw, I
>> wonder why this needs to be called by the virtio-net instead of being
>> called by hdr_to_skb helpers.
> 
> I can squash into virtio_net_hdr_tnl_to_skb(), I kept them separated to
> avoid extra long argument lists, but we are dropping an argument from
> virtio_net_hdr_tnl_to_skb(), so should be ok.

I have to redact myself WRT the above. driver and device have different
csum-related offload support, as per specification (i.e. DATA_VALID),
and need different validation.

This helper is intended to be called only by the driver, will do the
wrong thing if used on the device side.

I'll try to clarify the usage in the next iteration.

/P


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features
  2025-05-29 11:07             ` Paolo Abeni
  2025-05-29 14:28               ` Michael S. Tsirkin
@ 2025-06-03  2:11               ` Jason Wang
  1 sibling, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-06-03  2:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 29, 2025 at 7:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/29/25 4:22 AM, Jason Wang wrote:
> > On Thu, May 29, 2025 at 12:02 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/27/25 5:04 AM, Jason Wang wrote:
> >>> On Mon, May 26, 2025 at 6:53 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>> On 5/26/25 2:49 AM, Jason Wang wrote:
> >>>>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>>>>
> >>>>>> The virtio specifications allows for up to 128 bits for the
> >>>>>> device features. Soon we are going to use some of the 'extended'
> >>>>>> bits features (above 64) for the virtio_net driver.
> >>>>>>
> >>>>>> Extend the virtio pci modern driver to support configuring the full
> >>>>>> virtio features range, replacing the unrolled loops reading and
> >>>>>> writing the features space with explicit one bounded to the actual
> >>>>>> features space size in word.
> >>>>>>
> >>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>>>>> ---
> >>>>>>  drivers/virtio/virtio_pci_modern_dev.c | 39 +++++++++++++++++---------
> >>>>>>  1 file changed, 25 insertions(+), 14 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> >>>>>> index 1d34655f6b658..e3025b6fa8540 100644
> >>>>>> --- a/drivers/virtio/virtio_pci_modern_dev.c
> >>>>>> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> >>>>>> @@ -396,12 +396,16 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
> >>>>>>  virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> >>>>>>  {
> >>>>>>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >>>>>> -       virtio_features_t features;
> >>>>>> +       virtio_features_t features = 0;
> >>>>>> +       int i;
> >>>>>>
> >>>>>> -       vp_iowrite32(0, &cfg->device_feature_select);
> >>>>>> -       features = vp_ioread32(&cfg->device_feature);
> >>>>>> -       vp_iowrite32(1, &cfg->device_feature_select);
> >>>>>> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> >>>>>> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> >>>>>> +               virtio_features_t cur;
> >>>>>> +
> >>>>>> +               vp_iowrite32(i, &cfg->device_feature_select);
> >>>>>> +               cur = vp_ioread32(&cfg->device_feature);
> >>>>>> +               features |= cur << (32 * i);
> >>>>>> +       }
> >>>>>
> >>>>> No matter if we decide to go with 128bit or not. I think at the lower
> >>>>> layer like this, it's time to allow arbitrary length of the features
> >>>>> as the spec supports.
> >>>>
> >>>> Is that useful if the vhost interface is not going to support it?
> >>>
> >>> I think so, as there are hardware virtio devices that can benefit from this.
> >>
> >> Let me look at the question from another perspective. Let's suppose that
> >> the virtio device supports an arbitrary wide features space, and the
> >> uAPI allows passing to/from the kernel an arbitrary high number of features.
> >>
> >> How could the kernel stop the above loop? AFAICS the virtio spec does
> >> not define any way to detect the end of the features space. An arbitrary
> >> bound is actually needed.
> >
> > I think this is a good question ad we have something that could work:
> >
> > 1) current driver has drv->feature_table_size, so the driver knows
> > it's meaningless to read above the size
> >
> > and
> >
> > 2) we can extend the spec, e.g add a transport specific field to let
> > the driver to know the feature size
>
> So I guess we can postpone any additional change here until we have some
> spec in place, right?

I think 1) should be sufficient. Considering we agree that
virtio_features_t will use arrays in the future, I'm fine to start
with int128.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 3/8] vhost-net: allow configuring extended features
  2025-05-29 11:10         ` Paolo Abeni
@ 2025-06-03  2:11           ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-06-03  2:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 29, 2025 at 7:10 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/27/25 5:56 AM, Jason Wang wrote:
> > On Mon, May 26, 2025 at 6:57 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/26/25 2:47 AM, Jason Wang wrote:
> >>> On Wed, May 21, 2025 at 6:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>>
> >>>> Use the extended feature type for 'acked_features' and implement
> >>>> two new ioctls operation to get and set the extended features.
> >>>>
> >>>> Note that the legacy ioctls implicitly truncate the negotiated
> >>>> features to the lower 64 bits range.
> >>>>
> >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>>> ---
> >>>>  drivers/vhost/net.c        | 26 +++++++++++++++++++++++++-
> >>>>  drivers/vhost/vhost.h      |  2 +-
> >>>>  include/uapi/linux/vhost.h |  8 ++++++++
> >>>>  3 files changed, 34 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >>>> index 7cbfc7d718b3f..b894685dded3e 100644
> >>>> --- a/drivers/vhost/net.c
> >>>> +++ b/drivers/vhost/net.c
> >>>> @@ -77,6 +77,10 @@ enum {
> >>>>                          (1ULL << VIRTIO_F_RING_RESET)
> >>>>  };
> >>>>
> >>>> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> >>>> +#define VHOST_NET_FEATURES_EX VHOST_NET_FEATURES
> >>>> +#endif
> >>>> +
> >>>>  enum {
> >>>>         VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
> >>>>  };
> >>>> @@ -1614,7 +1618,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
> >>>>         return err;
> >>>>  }
> >>>>
> >>>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
> >>>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
> >>>>  {
> >>>>         size_t vhost_hlen, sock_hlen, hdr_len;
> >>>>         int i;
> >>>> @@ -1704,6 +1708,26 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> >>>>                 if (features & ~VHOST_NET_FEATURES)
> >>>>                         return -EOPNOTSUPP;
> >>>>                 return vhost_net_set_features(n, features);
> >>>> +#ifdef VIRTIO_HAS_EXTENDED_FEATURES
> >>>
> >>> Vhost doesn't depend on virtio. But this invents a dependency, and I
> >>> don't understand why we need to do that.
> >>
> >> What do you mean with "dependency" here? vhost has already a build
> >> dependency vs virtio, including several virtio headers. It has also a
> >> logical dependency, using several virtio features.
> >>
> >> Do you mean a build dependency? this change does not introduce such a thing.
> >
> > I mean vhost can be built without virtio drivers. So old vhost can run
> > new virtio drivers on top. So I don't see why vhost needs to check if
> > virtio of the same source tree supports 128 bit or not.
> >
> > We can just accept an array of features now as
> >
> > 1) the changes are limited to vhost so it wouldn't be too much
> > 2) we don't have to have VHOST_GET_FEATURES_EX2 in the future.
>
> AFAICS the ioctl() interface code wise only impacts on the device
> implementing extended features support, I guess it could be changed to
> to something alike:
>
> struct vhost_virtio_features {
>         __u64 count;
>         __u64 features[];
> };
>
> #define VHOST_GET_FEATURES_VECTOR _IOR(VHOST_VIRTIO, 0x83, struct
> vhost_virtio_features)
> #define VHOST_SET_FEATURES_VECTOR _IOW(VHOST_VIRTIO, 0x83, struct
> vhost_virtio_features)
>
> I could drop the above #ifdef, and the implementation would copy in/out
> only the known/supported number of features.
>
> WDYT?

This looks good.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-29 11:55     ` Paolo Abeni
  2025-05-30  8:43       ` Paolo Abeni
@ 2025-06-03  2:11       ` Jason Wang
  1 sibling, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-06-03  2:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 29, 2025 at 7:55 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/26/25 6:40 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> +       if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
> >> +               return -EINVAL;
> >> +
> >> +       /* Relay on csum being present. */
> >> +       if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
> >> +               return -EINVAL;
> >> +
> >> +       /* Validate offsets. */
> >> +       outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
> >> +       inner_l3min = virtio_l3min(gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6);
> >> +       outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
> >> +
> >> +       tnl = ((void *)hdr) + tnl_hdr_offset;
> >> +       inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
> >> +       inner_nh = __virtio16_to_cpu(little_endian, tnl->inner_nh_offset);
> >> +       outer_th = __virtio16_to_cpu(little_endian, tnl->outer_th_offset);
> >> +       if (outer_th < outer_l3min ||
> >> +           inner_nh < outer_th + sizeof(struct udphdr) ||
> >> +           inner_th < inner_nh + inner_l3min)
> >> +               return -EINVAL;
> >
> > I wonder if kernel has already had helpers to validate the tunnel
> > headers
>
> Not that I know of.
>
> > or if the above check is sufficient here.
>
> AFAICS yes. Syzkaller is out there just to prove me wrong...
>
>
> >> +
> >> +       /* Let the basic parsing deal with plain GSO features. */
> >> +       ret = __virtio_net_hdr_to_skb(skb, hdr, little_endian,
> >> +                                     hdr->gso_type & ~gso_tunnel_type);
> >> +       if (ret)
> >> +               return ret;
> >> +
> >> +       skb_set_inner_protocol(skb, outer_isv6 ? htons(ETH_P_IPV6) :
> >> +                                                htons(ETH_P_IP));
> >
> > The outer_isv6 is somehow misleading here, I think we'd better rename
> > it as inner_isv6?
>
> There is bug above, thanks for spotting it. I should not use the
> `outer_isv6` variable, instead I should compute separately `inner_isv6`
>
> >> +       if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
> >> +               if (!tnl_csum_negotiated)
> >> +                       return -EINVAL;
> >> +
> >> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
> >> +       } else {
> >> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
> >> +       }
> >> +
> >> +       skb->inner_transport_header = inner_th + skb_headroom(skb);
> >
> > I may miss something but using skb_headroom() means the value depends
> > on the geometry of the skb and the headroom might vary depending on
> > the size of the packet and other factors.  (see receive_buf())
>
> Yes, that is correct: the actual inner_transport_header value depends on
> the skb geometry, because the (inner) transport header is located at
> skb->head + skb->inner_transport_header.


Right, I see. Btw, is skb_set_inner_transport_header() considered to
be better here?

>
> >> +       skb->inner_network_header = inner_nh + skb_headroom(skb);
> >> +       skb->inner_mac_header = inner_nh + skb_headroom(skb);
> >
> > This actually equals to inner_network_header, is this intended?
>
> Yes. AFAICS the inner mac header field is used only for GSO/TSO.
>
> At this point we don't know if the inner mac header is actually present
> nor it's len (could include vlan tag).
>
> Still the above allows correct segmentation by the GSO stage because the
> inner mac header is not copied verbatim in the segmented packets, alike
> the tunnel header.
>
> With the above code, the inner mac header if really present will be
> logically considered part of the tunnel header by the GSO stage.
>
> Note that some devices restrict the TSO capability to some fixed values
> of the UDP tunnel sizes and inner mac header. In such cases, they will
> fallback to S/W GSO.

Ok.

>
> >> +       skb->transport_header = outer_th + skb_headroom(skb);
> >> +       skb->encapsulation = 1;
> >> +       return 0;
> >> +}
> >> +
> >> +static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
> >> +                                           struct virtio_net_hdr *hdr,
> >> +                                           bool tun_csum_negotiated)
> >
> > This is virtio_net.h so it's better to avoid using "tun". Btw, I
> > wonder why this needs to be called by the virtio-net instead of being
> > called by hdr_to_skb helpers.
>
> I can squash into virtio_net_hdr_tnl_to_skb(), I kept them separated to
> avoid extra long argument lists, but we are dropping an argument from
> virtio_net_hdr_tnl_to_skb(), so should be ok.
>
> >> +{
> >> +       if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
> >> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
> >> +                       return 0;
> >> +
> >> +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> >> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
> >> +                       return 0;
> >> +
> >> +               /* tunnel csum packets are invalid when the related
> >> +                * feature has not been negotiated
> >> +                */
> >> +               if (!tun_csum_negotiated)
> >> +                       return -EINVAL;
> >
> > Should we move this check above VIRTIO_NET_HDR_F_DATA_VALID check?
>
> It could break existing setups. We can safely do extra validation only
> when we know that the UDP tunnel features have been negotiated.

You are right.

>
> >> +               skb->csum_level = 1;
> >> +               return 0;
> >> +       }
> >> +
> >> +       /* DATA_VALID is mutually exclusive with NEEDS_CSUM,
> >
> > I may miss something but I think we had a discussion about this, and
> > the conclusion is it's too late to fix as it may break some legacy
> > devices?
>
> I'm not sure what should be fixed here? This check implements exactly
> restriction you asked for while discussing the spec. We can't have a
> similar check for non UDP tunneled packets, because it could break
> existing setup.

Right.

>
> >
> >> and GSO
> >> +        * over UDP tunnel requires the latter
> >> +        */
> >> +       if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
> >> +               return -EINVAL;
> >> +       return 0;
> >> +}
> >> +
> >> +static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
> >> +                                             struct virtio_net_hdr *hdr,
> >> +                                             unsigned int tnl_offset,
> >> +                                             bool little_endian,
> >> +                                             int vlan_hlen)
> >> +{
> >> +       struct virtio_net_hdr_tunnel *tnl;
> >> +       unsigned int inner_nh, outer_th;
> >> +       int tnl_gso_type;
> >> +       int ret;
> >> +
> >> +       tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
> >> +                                                   SKB_GSO_UDP_TUNNEL_CSUM);
> >> +       if (!tnl_gso_type)
> >> +               return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
> >> +                                              vlan_hlen);
> >> +
> >> +       /* Tunnel support not negotiated but skb ask for it. */
> >> +       if (!tnl_offset)
> >> +               return -EINVAL;
> >
> > Should we do BUG_ON here?
>
> I don't think so. BUG_ON()s are explicitly discouraged to avoid crashing
> the kernel on exceptional/unexpected situation.
>
> The caller will emit rate limited warns with the relevant info, if this
> is hit. The BUG_ON() stack trace will add little value.

Ok.

>
> >> +
> >> +       /* Let the basic parsing deal with plain GSO features. */
> >> +       skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
> >> +       ret = virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
> >> +                                     vlan_hlen);
> >> +       skb_shinfo(skb)->gso_type |= tnl_gso_type;
> >> +       if (ret)
> >> +               return ret;
> >
> > Could we do the plain GSO after setting inner flags below to avoid
> > masking and unmasking tnl_gso_type?
>
> virtio_net_hdr_from_skb() will still receive a skb with UDP tunnel GSO
> type and will error out.
>
> The masking coudl be avoided factoring out a __virtio_net_hdr_from_skb()
> helper receiving an explicit gso_type argument. I can do that if it's
> preferred.

That would be fine.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-29 15:30     ` Paolo Abeni
@ 2025-06-03  2:11       ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-06-03  2:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Thu, May 29, 2025 at 11:30 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/26/25 6:40 AM, Jason Wang wrote:
> > On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> @@ -242,4 +249,158 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
> >>         return 0;
> >>  }
> >>
> >> +static inline unsigned int virtio_l3min(bool is_ipv6)
> >> +{
> >> +       return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
> >> +}
> >> +
> >> +static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
> >> +                                           const struct virtio_net_hdr *hdr,
> >> +                                           unsigned int tnl_hdr_offset,
> >> +                                           bool tnl_csum_negotiated,
> >> +                                           bool little_endian)
> >
> > Considering tunnel gso requires VERSION_1, I think there's no chance
> > for little_endian to be false here.
>
> If tnl_hdr_offset == 0, tunnel gso has not been negotiated, and
> little_endian could be false. I can assume little_endian is true in the
> !!tnl_hdr_offset branch.

That would be fine, and I wonder if tnl_hdr_offset is better than
having an accepting struct virtio_net_hdr_tunnel * (assuming we agree
that it is a part of the uAPI) which seems more consistent and avoids
a cast.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH net-next 7/8] tun: enable gso over UDP tunnel support.
  2025-05-29 16:17         ` Paolo Abeni
@ 2025-06-03  2:11           ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2025-06-03  2:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez

On Fri, May 30, 2025 at 12:18 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/27/25 6:19 AM, Jason Wang wrote:
> > On Mon, May 26, 2025 at 7:20 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 5/26/25 6:40 AM, Jason Wang wrote:
> >>> On Wed, May 21, 2025 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>>
> >>>> Add new tun features to represent the newly introduced virtio
> >>>> GSO over UDP tunnel offload. Allows detection and selection of
> >>>> such features via the existing TUNSETOFFLOAD ioctl, store the
> >>>> tunnel offload configuration in the highest bit of the tun flags
> >>>> and compute the expected virtio header size and tunnel header
> >>>> offset using such bits, so that we can plug almost seamless the
> >>>> the newly introduced virtio helpers to serialize the extended
> >>>> virtio header.
> >>>>
> >>>> As the tun features and the virtio hdr size are configured
> >>>> separately, the data path need to cope with (hopefully transient)
> >>>> inconsistent values.
> >>>
> >>> I'm not sure it's a good idea to deal with this inconsistency in this
> >>> series as it is not specific to tunnel offloading. It could be a
> >>> dependency for this patch or we can leave it for the future and just
> >>> to make sure mis-configuration won't cause any kernel issues.
> >>
> >> The possible inconsistency is not due to a misconfiguration, but to the
> >> facts that:
> >> - configuring the virtio hdr len and the offload is not atomic
> >> - successful GSO over udp tunnel parsing requires the relevant offloads
> >> to be enabled and a suitable hdr len.
> >>
> >> Plain GSO don't have a similar problem because all the relevant fields
> >> are always available for any sane virtio hdr length, but we need to deal
> >> with them here.
> >
> > Just to make sure we're on the same page.
> >
> > I meant tun has TUNSETVNETHDRSZ, so user space can set it to any value
> > at any time as long as it's not smaller than sizeof(struct
> > virtio_net_hdr). Tun and vhost need to cope with this otherwise it
> > should be a bug. This is allowed before the introduction of tunnel
> > gso.
>
> This code here is intended to support such scenario; but if the virtio
> hdr size is configured to be lower than the minimum required for UDP
> tunnel hdr fields, the related offload could not be used.

Ok I see.

>
> >>>> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >>>>         struct sk_buff *skb;
> >>>>         size_t total_len = iov_iter_count(from);
> >>>>         size_t len = total_len, align = tun->align, linear;
> >>>> -       struct virtio_net_hdr gso = { 0 };
> >>>> +       char buf[TUN_VNET_TNL_SIZE];
> >>>
> >>> I wonder why not simply
> >>>
> >>> 1) define the structure virtio_net_hdr_tnl_gso and use that
> >>>
> >>> or
> >>>
> >>> 2) stick the gso here and use iter advance to get
> >>> virtio_net_hdr_tunnel when necessary?
> >>
> >> Code wise 2) looks more complex
> >
> > I don't know how to define complex but we've already use a conatiner structure:
> >
> > struct virtio_net_hdr_v1_hash {
> >         struct virtio_net_hdr_v1 hdr;
> >         __le32 hash_value;
> > ...
> >         __le16 hash_report;
> >         __le16 padding;
> > };
> >
> >> and 1) will require additional care when
> >> adding hash report support.
> >
> > I don't understand here, you're doing:
> >
> >         iov_iter_advance(from, sz - parsed_size);
> >
> > in __tun_vnet_hdr_get(), so this logic needs to be extended for hash
> > report as well.
>
> Note that there are at least 2 different virtio net hdr binary layout
> supporting UDP tunnel offload:
>
> struct virtio_net_hdr_v1_tnl {
>    struct virtio_net_hdr_v1 hdr;
>    struct virtio_net_hdr_tunnel tnl;
> };

Is this used by any guest? It looks problematic:

\begin{lstlisting}
struct virtio_net_hdr {
#define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
#define VIRTIO_NET_HDR_F_DATA_VALID    2
#define VIRTIO_NET_HDR_F_RSC_INFO      4
#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8
        u8 flags;
#define VIRTIO_NET_HDR_GSO_NONE        0
#define VIRTIO_NET_HDR_GSO_TCPV4       1
#define VIRTIO_NET_HDR_GSO_UDP         3
#define VIRTIO_NET_HDR_GSO_TCPV6       4
#define VIRTIO_NET_HDR_GSO_UDP_L4      5
#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20
#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40
#define VIRTIO_NET_HDR_GSO_ECN      0x80
u8 gso_type;
        le16 hdr_len;
        le16 gso_size;
        le16 csum_start;
        le16 csum_offset;
        le16 num_buffers;
        le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
        le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
        le16 outer_th_offset    (Only if
VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
negotiated)
        le16 inner_nh_offset;   (Only if
VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
negotiated)
};
\end{lstlisting}

>
> and
>
> struct virtio_net_hdr_v1_hash_tnl {
>    struct virtio_net_hdr_v1_hash hdr;
>    struct virtio_net_hdr_tunnel tnl;
> };
>
> depending on the negotiated features. Using directly a struct to
> fill/fetch the tunnel fields is problematic.

I'm not sure what's the problem here, we can just skip the hash part
and it would be easier for the hash reporting feature.

>
> With the current approach the binary layout differences are abstracted
> by the tun_vnet_parse_size()/tun_vnet_tnl_offset() helpers. The
> expectation is that enabling hash report will set a bit in `flags`, too,
>  so that helpers could compute the correct offset accordingly.
>
> No other change should be required.
>
> >>>> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
> >>>> index 58b9ac7a5fc40..ab2d4396941ca 100644
> >>>> --- a/drivers/net/tun_vnet.h
> >>>> +++ b/drivers/net/tun_vnet.h
> >>>> @@ -5,6 +5,12 @@
> >>>>  /* High bits in flags field are unused. */
> >>>>  #define TUN_VNET_LE     0x80000000
> >>>>  #define TUN_VNET_BE     0x40000000
> >>>> +#define TUN_VNET_TNL           0x20000000
> >>>> +#define TUN_VNET_TNL_CSUM      0x10000000
> >>>> +#define TUN_VNET_TNL_MASK      (TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
> >>>> +
> >>>> +#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
> >>>
> >>> Should this be virtio_net_hdr_v1_hash?
> >>
> >> If tun does not support HASH_REPORT, no: the GSO over UDP tunnels header
> >> could be present regardless of the hash-related field presence. This has
> >> been discussed extensively while crafting the specification.
> >
> > Ok, so it excludes the hash report fields, more below.
> >
> >>
> >> Note that tun_vnet_parse_size() and  tun_vnet_tnl_offset() should be
> >> adjusted accordingly after that HASH_REPORT support is introduced.
> >
> > This is suboptimal as we know a hash report will be added so we can
> > treat the field as anonymous one. See
> >
> > https://patchwork.kernel.org/project/linux-kselftest/patch/20250307-rss-v9-3-df76624025eb@daynix.com/
>
> I know hash support is in the work. The current design is intended to
> minimize the conflicts with such feature. But I can't follow the
> statement above. Could you please re-phrase it?

See above, if I was not wrong, virtio_net_hdr_v1_hash_tnl should be
sufficient for both tunnel offloading and hash reporting.

>
> >>>> +                          sizeof(struct virtio_net_hdr_tunnel))
> >>>>
> >>>>  static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
> >>>>  {
> >>>> @@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
> >>>>         return 0;
> >>>>  }
> >>>>
> >>>> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
> >>>> +{
> >>>> +       *flags = (*flags & ~TUN_VNET_TNL_MASK) |
> >>>> +                tnl * TUN_VNET_TNL |
> >>>> +                tnl_csum * TUN_VNET_TNL_CSUM;
> >>>
> >>> We could refer to netdev via tun_struct, so I don't understand why we
> >>> need to duplicate the features in tun->flags (we don't do that for
> >>> other GSO/CSUM stuffs).
> >>
> >> Just to be consistent with commit 60df67b94804b1adca74854db502a72f7aeaa125
> >
> > I don't see a connection here, the above commit just moves decouple
> > vnet to make it reusable, it doesn't change the semantic of
> > tun->flags.
>
> You are right, I used a bad commit reference.
>
> The goal here is to keep all the virtio-layout-related information in a
> single place. tun->flags is already used for that (for little endian
> flag), so I piggybacked there.

Note that TUNSET/GETVNETLE stuff is not what virtio should know.

>
> Ideally another bit there will be allocated used to mark the hash report
> presence, too. That will allow the tun_vnet helpers to determine the
> virtio net hdr layout using a single argument.
>
> Note that we can't relay on the netdev->features to determine the virtio
> net hdr binary layout because user-space could enable/disable GSO over
> UDP tunnel support after ioctl(TUNSETOFFLOAD).

I'm not sure I got here, it works for non GSO offload, anything makes
UDP tunnel different here?

Thanks

>
> /P
>
>
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2025-06-03  2:11 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 10:32 [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
2025-05-21 10:32 ` [PATCH net-next 1/8] virtio: introduce virtio_features_t Paolo Abeni
2025-05-21 16:02   ` Michael S. Tsirkin
2025-05-22  7:29     ` Paolo Abeni
2025-05-22 15:26       ` Paolo Abeni
2025-05-23 19:50         ` Michael S. Tsirkin
2025-05-22  8:17   ` kernel test robot
2025-05-26  0:43   ` Jason Wang
2025-05-26  7:20     ` Paolo Abeni
2025-05-27  3:51       ` Jason Wang
2025-05-28 15:47         ` Paolo Abeni
2025-05-28 15:52           ` Michael S. Tsirkin
2025-05-29  2:15             ` Jason Wang
2025-05-27 14:14       ` Michael S. Tsirkin
2025-05-21 10:32 ` [PATCH net-next 2/8] virtio_pci_modern: allow setting configuring extended features Paolo Abeni
2025-05-26  0:49   ` Jason Wang
2025-05-26 10:53     ` Paolo Abeni
2025-05-27  3:04       ` Jason Wang
2025-05-28 16:02         ` Paolo Abeni
2025-05-29  2:22           ` Jason Wang
2025-05-29 11:07             ` Paolo Abeni
2025-05-29 14:28               ` Michael S. Tsirkin
2025-06-03  2:11               ` Jason Wang
2025-05-29 14:28           ` Michael S. Tsirkin
2025-05-21 10:32 ` [PATCH net-next 3/8] vhost-net: allow " Paolo Abeni
2025-05-26  0:47   ` Jason Wang
2025-05-26 10:57     ` Paolo Abeni
2025-05-27  3:56       ` Jason Wang
2025-05-29 11:10         ` Paolo Abeni
2025-06-03  2:11           ` Jason Wang
2025-05-21 10:32 ` [PATCH net-next 4/8] virtio_net: add supports for extended offloads Paolo Abeni
2025-05-26  1:01   ` Jason Wang
2025-05-21 10:32 ` [PATCH net-next 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
2025-05-22 22:29   ` Willem de Bruijn
2025-05-23  6:09     ` Paolo Abeni
2025-05-23  6:44       ` Paolo Abeni
2025-05-23 13:42       ` Willem de Bruijn
2025-05-23 14:00         ` Paolo Abeni
2025-05-26  4:40   ` Jason Wang
2025-05-29 11:55     ` Paolo Abeni
2025-05-30  8:43       ` Paolo Abeni
2025-06-03  2:11       ` Jason Wang
2025-05-29 15:30     ` Paolo Abeni
2025-06-03  2:11       ` Jason Wang
2025-05-21 10:32 ` [PATCH net-next 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
2025-05-22  8:38   ` kernel test robot
2025-05-22 22:33   ` Willem de Bruijn
2025-05-21 10:32 ` [PATCH net-next 7/8] tun: " Paolo Abeni
2025-05-26  4:40   ` Jason Wang
2025-05-26 11:20     ` Paolo Abeni
2025-05-27  4:19       ` Jason Wang
2025-05-29 16:17         ` Paolo Abeni
2025-06-03  2:11           ` Jason Wang
2025-05-21 10:32 ` [PATCH net-next 8/8] vhost/net: " Paolo Abeni
2025-05-22  6:43   ` kernel test robot
2025-05-23 19:54     ` Michael S. Tsirkin
2025-05-26  4:40   ` Jason Wang
2025-05-21 11:38 ` [PATCH net-next 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
2025-05-21 15:52 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).