netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel
@ 2025-06-06 11:45 Paolo Abeni
  2025-06-06 11:45 ` [PATCH RFC v3 1/8] virtio: introduce extended features Paolo Abeni
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.

The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.

Currently the kernel virtio support limits the feature space to 64,
while the virtio specification allows for a larger number of features.
Specifically the GSO-over-UDP-tunnel-related virtio features use bits
65-69.

The first four patches in this series rework the virtio and vhost
feature support to cope with up to 128 bits. The limit is set by
a define and could be easily raised in future, as needed.

This implementation choice is aimed at keeping the code churn as
limited as possible. For the same reason, only the virtio_net driver is
reworked to leverage the extended feature space; all other
virtio/vhost drivers are unaffected, but could be upgraded to support
the extended features space in a later time.

The last four patches bring in the actual GSO over UDP tunnel support.
As per specification, some additional fields are introduced into the
virtio net header to support the new offload. The presence of such
fields depends on the negotiated features.

New helpers are introduced to convert the UDP-tunneled skb metadata to
an extended virtio net header and vice versa. Such helpers are used by
the tun and virtio_net driver to cope with the newly supported offloads.

Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.

--
v2 -> v3:
  - uint128_t -> u64[2]
  - dropped related ifdef
  - define and use vnet_hdr with tunnel layouts
v2: https://lore.kernel.org/netdev/cover.1748614223.git.pabeni@redhat.com/

v1 -> v2:
  - fix build failures
  - many comment clarification
  - changed the vhost_net ioctl API
  - fixed some hdr <> skb helper bugs
v1: https://lore.kernel.org/netdev/cover.1747822866.git.pabeni@redhat.com/

Paolo Abeni (8):
  virtio: introduce extended features
  virtio_pci_modern: allow configuring extended features
  vhost-net: allow configuring extended features
  virtio_net: add supports for extended offloads
  net: implement virtio helpers to handle UDP GSO tunneling.
  virtio_net: enable gso over UDP tunnel support.
  tun: enable gso over UDP tunnel support.
  vhost/net: enable gso over UDP tunnel support.

 drivers/net/tun.c                      |  77 ++++++++--
 drivers/net/tun_vnet.h                 |  73 ++++++++--
 drivers/net/virtio_net.c               |  94 +++++++++---
 drivers/vhost/net.c                    |  93 ++++++++++--
 drivers/vhost/vhost.c                  |   2 +-
 drivers/vhost/vhost.h                  |   4 +-
 drivers/virtio/virtio.c                |  39 +++--
 drivers/virtio/virtio_debug.c          |  27 ++--
 drivers/virtio/virtio_pci_modern.c     |  10 +-
 drivers/virtio/virtio_pci_modern_dev.c |  69 +++++----
 include/linux/virtio.h                 |   5 +-
 include/linux/virtio_config.h          |  35 +++--
 include/linux/virtio_features.h        |  70 +++++++++
 include/linux/virtio_net.h             | 191 +++++++++++++++++++++++--
 include/linux/virtio_pci_modern.h      |  46 +++++-
 include/uapi/linux/if_tun.h            |   9 ++
 include/uapi/linux/vhost.h             |   7 +
 include/uapi/linux/vhost_types.h       |   5 +
 include/uapi/linux/virtio_net.h        |  43 ++++++
 19 files changed, 758 insertions(+), 141 deletions(-)
 create mode 100644 include/linux/virtio_features.h

-- 
2.49.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 1/8] virtio: introduce extended features
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-08  5:49   ` Akihiko Odaki
  2025-06-12  0:50   ` Jason Wang
  2025-06-06 11:45 ` [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring " Paolo Abeni
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio_net driver.

Introduce extended features as a fixed size array of u64. To minimize
the diffstat allows legacy driver to access the low 64 bits via a
transparent union.

Introduce an extended get_extended_features128 configuration callback
that devices supporting the extended features range must implement in
place of the traditional one.

Note that legacy and transport features don't need any change, as
they are always in the low 64 bit range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - uint128_t -> u64[2];
v1 -> v2:
  - let u64 VIRTIO_BIT() cope with higher bit values
  - add .get_features128 instead of changing .get_features signature
---
 drivers/virtio/virtio.c         | 39 +++++++++++-------
 drivers/virtio/virtio_debug.c   | 27 +++++++------
 include/linux/virtio.h          |  5 ++-
 include/linux/virtio_config.h   | 35 ++++++++++++-----
 include/linux/virtio_features.h | 70 +++++++++++++++++++++++++++++++++
 5 files changed, 137 insertions(+), 39 deletions(-)
 create mode 100644 include/linux/virtio_features.h

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 95d5d7993e5b..ed1ccedc47b3 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -53,7 +53,7 @@ static ssize_t features_show(struct device *_d,
 
 	/* We actually represent this as a bitstring, as it could be
 	 * arbitrary length in future. */
-	for (i = 0; i < sizeof(dev->features)*8; i++)
+	for (i = 0; i < VIRTIO_FEATURES_MAX; i++)
 		len += sysfs_emit_at(buf, len, "%c",
 			       __virtio_test_bit(dev, i) ? '1' : '0');
 	len += sysfs_emit_at(buf, len, "\n");
@@ -272,22 +272,22 @@ static int virtio_dev_probe(struct device *_d)
 	int err, i;
 	struct virtio_device *dev = dev_to_virtio(_d);
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
-	u64 device_features;
-	u64 driver_features;
+	u64 device_features[VIRTIO_FEATURES_DWORDS];
+	u64 driver_features[VIRTIO_FEATURES_DWORDS];
 	u64 driver_features_legacy;
 
 	/* We have a driver! */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
 	/* Figure out what features the device supports. */
-	device_features = dev->config->get_features(dev);
+	virtio_get_features(dev, device_features);
 
 	/* Figure out what features the driver supports. */
-	driver_features = 0;
+	virtio_features_zero(driver_features);
 	for (i = 0; i < drv->feature_table_size; i++) {
 		unsigned int f = drv->feature_table[i];
-		BUG_ON(f >= 64);
-		driver_features |= (1ULL << f);
+		BUG_ON(f >= VIRTIO_FEATURES_MAX);
+		virtio_features_set_bit(driver_features, f);
 	}
 
 	/* Some drivers have a separate feature table for virtio v1.0 */
@@ -299,20 +299,25 @@ static int virtio_dev_probe(struct device *_d)
 			driver_features_legacy |= (1ULL << f);
 		}
 	} else {
-		driver_features_legacy = driver_features;
+		driver_features_legacy = driver_features[0];
 	}
 
-	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
-		dev->features = driver_features & device_features;
-	else
-		dev->features = driver_features_legacy & device_features;
+	if (virtio_features_test_bit(device_features, VIRTIO_F_VERSION_1)) {
+		for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
+			dev->features_array[i] = driver_features[i] &
+						 device_features[i];
+	} else {
+		virtio_features_from_u64(dev->features_array,
+					 driver_features_legacy &
+					 device_features[0]);
+	}
 
 	/* When debugging, user may filter some features by hand. */
 	virtio_debug_device_filter_features(dev);
 
 	/* Transport features always preserved to pass to finalize_features. */
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++)
-		if (device_features & (1ULL << i))
+		if (virtio_features_test_bit(device_features, i))
 			__virtio_set_bit(dev, i);
 
 	err = dev->config->finalize_features(dev);
@@ -320,14 +325,15 @@ static int virtio_dev_probe(struct device *_d)
 		goto err;
 
 	if (drv->validate) {
-		u64 features = dev->features;
+		u64 features[VIRTIO_FEATURES_DWORDS];
 
+		virtio_features_copy(features, dev->features_array);
 		err = drv->validate(dev);
 		if (err)
 			goto err;
 
 		/* Did validation change any features? Then write them again. */
-		if (features != dev->features) {
+		if (!virtio_features_equal(features, dev->features_array)) {
 			err = dev->config->finalize_features(dev);
 			if (err)
 				goto err;
@@ -701,6 +707,9 @@ EXPORT_SYMBOL_GPL(virtio_device_reset_done);
 
 static int virtio_init(void)
 {
+	BUILD_BUG_ON(offsetof(struct virtio_device, features) !=
+		     offsetof(struct virtio_device, features_array[0]));
+
 	if (bus_register(&virtio_bus) != 0)
 		panic("virtio bus registration failed");
 	virtio_debug_init();
diff --git a/drivers/virtio/virtio_debug.c b/drivers/virtio/virtio_debug.c
index 95c8fc7705bb..6d066b5e8ec0 100644
--- a/drivers/virtio/virtio_debug.c
+++ b/drivers/virtio/virtio_debug.c
@@ -8,13 +8,13 @@ static struct dentry *virtio_debugfs_dir;
 
 static int virtio_debug_device_features_show(struct seq_file *s, void *data)
 {
+	u64 device_features[VIRTIO_FEATURES_DWORDS];
 	struct virtio_device *dev = s->private;
-	u64 device_features;
 	unsigned int i;
 
-	device_features = dev->config->get_features(dev);
-	for (i = 0; i < BITS_PER_LONG_LONG; i++) {
-		if (device_features & (1ULL << i))
+	virtio_get_features(dev, device_features);
+	for (i = 0; i < VIRTIO_FEATURES_MAX; i++) {
+		if (virtio_features_test_bit(device_features, i))
 			seq_printf(s, "%u\n", i);
 	}
 	return 0;
@@ -26,8 +26,8 @@ static int virtio_debug_filter_features_show(struct seq_file *s, void *data)
 	struct virtio_device *dev = s->private;
 	unsigned int i;
 
-	for (i = 0; i < BITS_PER_LONG_LONG; i++) {
-		if (dev->debugfs_filter_features & (1ULL << i))
+	for (i = 0; i < VIRTIO_FEATURES_MAX; i++) {
+		if (virtio_features_test_bit(dev->debugfs_filter_features, i))
 			seq_printf(s, "%u\n", i);
 	}
 	return 0;
@@ -39,7 +39,7 @@ static int virtio_debug_filter_features_clear(void *data, u64 val)
 	struct virtio_device *dev = data;
 
 	if (val == 1)
-		dev->debugfs_filter_features = 0;
+		virtio_features_zero(dev->debugfs_filter_features);
 	return 0;
 }
 
@@ -50,9 +50,10 @@ static int virtio_debug_filter_feature_add(void *data, u64 val)
 {
 	struct virtio_device *dev = data;
 
-	if (val >= BITS_PER_LONG_LONG)
+	if (val >= VIRTIO_FEATURES_MAX)
 		return -EINVAL;
-	dev->debugfs_filter_features |= BIT_ULL_MASK(val);
+
+	virtio_features_set_bit(dev->debugfs_filter_features, val);
 	return 0;
 }
 
@@ -63,9 +64,10 @@ static int virtio_debug_filter_feature_del(void *data, u64 val)
 {
 	struct virtio_device *dev = data;
 
-	if (val >= BITS_PER_LONG_LONG)
+	if (val >= VIRTIO_FEATURES_MAX)
 		return -EINVAL;
-	dev->debugfs_filter_features &= ~BIT_ULL_MASK(val);
+
+	virtio_features_clear_bit(dev->debugfs_filter_features, val);
 	return 0;
 }
 
@@ -91,7 +93,8 @@ EXPORT_SYMBOL_GPL(virtio_debug_device_init);
 
 void virtio_debug_device_filter_features(struct virtio_device *dev)
 {
-	dev->features &= ~dev->debugfs_filter_features;
+	virtio_features_and_not(dev->features_array, dev->features_array,
+				dev->debugfs_filter_features);
 }
 EXPORT_SYMBOL_GPL(virtio_debug_device_filter_features);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 64cb4b04be7a..dcd3949572bd 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -11,6 +11,7 @@
 #include <linux/gfp.h>
 #include <linux/dma-mapping.h>
 #include <linux/completion.h>
+#include <linux/virtio_features.h>
 
 /**
  * struct virtqueue - a queue to register buffers for sending or receiving.
@@ -159,11 +160,11 @@ struct virtio_device {
 	const struct virtio_config_ops *config;
 	const struct vringh_config_ops *vringh_config;
 	struct list_head vqs;
-	u64 features;
+	VIRTIO_DECLARE_FEATURES(features);
 	void *priv;
 #ifdef CONFIG_VIRTIO_DEBUG
 	struct dentry *debugfs_dir;
-	u64 debugfs_filter_features;
+	u64 debugfs_filter_features[VIRTIO_FEATURES_DWORDS];
 #endif
 };
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 169c7d367fac..83cf25b3028d 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -77,7 +77,10 @@ struct virtqueue_info {
  *      vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  *	vdev: the virtio_device
- *	Returns the first 64 feature bits (all we currently need).
+ *	Returns the first 64 feature bits.
+ * @get_extended_features:
+ *      vdev: the virtio_device
+ *      Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
  * @finalize_features: confirm what device features we'll be using.
  *	vdev: the virtio_device
  *	This sends the driver feature bits to the device: it can change
@@ -121,6 +124,8 @@ struct virtio_config_ops {
 	void (*del_vqs)(struct virtio_device *);
 	void (*synchronize_cbs)(struct virtio_device *);
 	u64 (*get_features)(struct virtio_device *vdev);
+	void (*get_extended_features)(struct virtio_device *vdev,
+				      u64 *features);
 	int (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
 	int (*set_vq_affinity)(struct virtqueue *vq,
@@ -149,11 +154,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	return vdev->features & BIT_ULL(fbit);
+	return virtio_features_test_bit(vdev->features_array, fbit);
 }
 
 /**
@@ -166,11 +171,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	vdev->features |= BIT_ULL(fbit);
+	virtio_features_set_bit(vdev->features_array, fbit);
 }
 
 /**
@@ -183,11 +188,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	vdev->features &= ~BIT_ULL(fbit);
+	virtio_features_clear_bit(vdev->features_array, fbit);
 }
 
 /**
@@ -204,6 +209,16 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 	return __virtio_test_bit(vdev, fbit);
 }
 
+static inline void virtio_get_features(struct virtio_device *vdev, u64 *features)
+{
+	if (vdev->config->get_extended_features) {
+		vdev->config->get_extended_features(vdev, features);
+		return;
+	}
+
+	virtio_features_from_u64(features, vdev->config->get_features(vdev));
+}
+
 /**
  * virtio_has_dma_quirk - determine whether this device has the DMA quirk
  * @vdev: the device
diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
new file mode 100644
index 000000000000..42c3c7cee500
--- /dev/null
+++ b/include/linux/virtio_features.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_FEATURES_H
+#define _LINUX_VIRTIO_FEATURES_H
+
+#include <linux/bits.h>
+
+#define VIRTIO_FEATURES_DWORDS	2
+#define VIRTIO_FEATURES_MAX	(VIRTIO_FEATURES_DWORDS * 64)
+#define VIRTIO_FEATURES_WORDS	(VIRTIO_FEATURES_DWORDS * 2)
+#define VIRTIO_BIT(b)		BIT_ULL((b) & 0x3f)
+#define VIRTIO_DWORD(b)		((b) >> 6)
+#define VIRTIO_DECLARE_FEATURES(name)			\
+	union {						\
+		u64 name;				\
+		u64 name##_array[VIRTIO_FEATURES_DWORDS];\
+	}
+
+static inline bool virtio_features_test_bit(const u64 *features,
+					    unsigned int bit)
+{
+	return !!(features[VIRTIO_DWORD(bit)] & VIRTIO_BIT(bit));
+}
+
+static inline void virtio_features_set_bit(u64 *features,
+					   unsigned int bit)
+{
+	features[VIRTIO_DWORD(bit)] |= VIRTIO_BIT(bit);
+}
+
+static inline void virtio_features_clear_bit(u64 *features,
+					     unsigned int bit)
+{
+	features[VIRTIO_DWORD(bit)] &= ~VIRTIO_BIT(bit);
+}
+
+static inline void virtio_features_zero(u64 *features)
+{
+	memset(features, 0, sizeof(features[0]) * VIRTIO_FEATURES_DWORDS);
+}
+
+static inline void virtio_features_from_u64(u64 *features, u64 from)
+{
+	virtio_features_zero(features);
+	features[0] = from;
+}
+
+static inline bool virtio_features_equal(const u64 *f1, const u64 *f2)
+{
+	u64 diff = 0;
+	int i;
+
+	for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
+		diff |= f1[i] ^ f2[i];
+	return !!diff;
+}
+
+static inline void virtio_features_copy(u64 *to, const u64 *from)
+{
+	memcpy(to, from, sizeof(to[0]) * VIRTIO_FEATURES_DWORDS);
+}
+
+static inline void virtio_features_and_not(u64 *to, const u64 *f1, const u64 *f2)
+{
+	int i;
+
+	for (i = 0; i < VIRTIO_FEATURES_DWORDS; i++)
+		to[i] = f1[i] & ~f2[i];
+}
+
+#endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring extended features
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-06-06 11:45 ` [PATCH RFC v3 1/8] virtio: introduce extended features Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-12  0:57   ` Jason Wang
  2025-06-06 11:45 ` [PATCH RFC v3 3/8] vhost-net: " Paolo Abeni
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio_net driver.

Extend the virtio pci modern driver to support configuring the full
virtio features range, replacing the unrolled loops reading and
writing the features space with explicit one bounded to the actual
features space size in word and implementing the get_extended_features
callback.

Note that in vp_finalize_features() we only need to cache the lower
64 features bits, to process the transport features.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - virtio_features_t -> u64 *

v1 -> v2:
  - use get_extended_features
---
 drivers/virtio/virtio_pci_modern.c     | 10 ++--
 drivers/virtio/virtio_pci_modern_dev.c | 69 +++++++++++++++-----------
 include/linux/virtio_pci_modern.h      | 46 +++++++++++++++--
 3 files changed, 87 insertions(+), 38 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d50fe030d825..255203441201 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -22,11 +22,11 @@
 
 #define VIRTIO_AVQ_SGS_MAX	4
 
-static u64 vp_get_features(struct virtio_device *vdev)
+static void vp_get_features(struct virtio_device *vdev, u64 *features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	return vp_modern_get_features(&vp_dev->mdev);
+	vp_modern_get_extended_features(&vp_dev->mdev, features);
 }
 
 static int vp_avq_index(struct virtio_device *vdev, u16 *index, u16 *num)
@@ -426,7 +426,7 @@ static int vp_finalize_features(struct virtio_device *vdev)
 	if (vp_check_common_size(vdev))
 		return -EINVAL;
 
-	vp_modern_set_features(&vp_dev->mdev, vdev->features);
+	vp_modern_set_extended_features(&vp_dev->mdev, vdev->features_array);
 
 	return 0;
 }
@@ -1223,7 +1223,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
 	.synchronize_cbs = vp_synchronize_vectors,
-	.get_features	= vp_get_features,
+	.get_extended_features = vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
@@ -1243,7 +1243,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
 	.synchronize_cbs = vp_synchronize_vectors,
-	.get_features	= vp_get_features,
+	.get_extended_features = vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 0d3dbfaf4b23..d665f8f73ea8 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -388,63 +388,74 @@ void vp_modern_remove(struct virtio_pci_modern_device *mdev)
 EXPORT_SYMBOL_GPL(vp_modern_remove);
 
 /*
- * vp_modern_get_features - get features from device
+ * vp_modern_get_extended_features - get features from device
  * @mdev: the modern virtio-pci device
+ * @features: the features array to be filled
  *
- * Returns the features read from the device
+ * Fill the specified features array with the features read from the device
  */
-u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
+void vp_modern_get_extended_features(struct virtio_pci_modern_device *mdev,
+				     u64 *features)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	int i;
 
-	u64 features;
+	virtio_features_zero(features);
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		u64 cur;
 
-	vp_iowrite32(0, &cfg->device_feature_select);
-	features = vp_ioread32(&cfg->device_feature);
-	vp_iowrite32(1, &cfg->device_feature_select);
-	features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
-
-	return features;
+		vp_iowrite32(i, &cfg->device_feature_select);
+		cur = vp_ioread32(&cfg->device_feature);
+		features[i >> 1] |= cur << (32 * (i & 1));
+	}
 }
-EXPORT_SYMBOL_GPL(vp_modern_get_features);
+EXPORT_SYMBOL_GPL(vp_modern_get_extended_features);
 
 /*
  * vp_modern_get_driver_features - get driver features from device
  * @mdev: the modern virtio-pci device
+ * @features: the features array to be filled
  *
- * Returns the driver features read from the device
+ * Fill the specified features array with the driver features read from the
+ * device
  */
-u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
+void
+vp_modern_get_driver_extended_features(struct virtio_pci_modern_device *mdev,
+				       u64 *features)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	int i;
 
-	u64 features;
-
-	vp_iowrite32(0, &cfg->guest_feature_select);
-	features = vp_ioread32(&cfg->guest_feature);
-	vp_iowrite32(1, &cfg->guest_feature_select);
-	features |= ((u64)vp_ioread32(&cfg->guest_feature) << 32);
+	virtio_features_zero(features);
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		u64 cur;
 
-	return features;
+		vp_iowrite32(i, &cfg->guest_feature_select);
+		cur = vp_ioread32(&cfg->guest_feature);
+		features[i >> 1] |= cur << (32 * (i & 1));
+	}
 }
-EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
+EXPORT_SYMBOL_GPL(vp_modern_get_driver_extended_features);
 
 /*
- * vp_modern_set_features - set features to device
+ * vp_modern_set_extended_features - set features to device
  * @mdev: the modern virtio-pci device
  * @features: the features set to device
  */
-void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
-			    u64 features)
+void vp_modern_set_extended_features(struct virtio_pci_modern_device *mdev,
+				     const u64 *features)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	int i;
+
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		u32 cur = features[i >> 1] >> (32 * (i & 1));
 
-	vp_iowrite32(0, &cfg->guest_feature_select);
-	vp_iowrite32((u32)features, &cfg->guest_feature);
-	vp_iowrite32(1, &cfg->guest_feature_select);
-	vp_iowrite32(features >> 32, &cfg->guest_feature);
+		vp_iowrite32(i, &cfg->guest_feature_select);
+		vp_iowrite32(cur, &cfg->guest_feature);
+	}
 }
-EXPORT_SYMBOL_GPL(vp_modern_set_features);
+EXPORT_SYMBOL_GPL(vp_modern_set_extended_features);
 
 /*
  * vp_modern_generation - get the device genreation
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index c0b1b1ca1163..0764802a50ea 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -3,6 +3,7 @@
 #define _LINUX_VIRTIO_PCI_MODERN_H
 
 #include <linux/pci.h>
+#include <linux/virtio_config.h>
 #include <linux/virtio_pci.h>
 
 /**
@@ -95,10 +96,47 @@ static inline void vp_iowrite64_twopart(u64 val,
 	vp_iowrite32(val >> 32, hi);
 }
 
-u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
-u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
-void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
-		     u64 features);
+void
+vp_modern_get_driver_extended_features(struct virtio_pci_modern_device *mdev,
+				       u64 *features);
+void vp_modern_get_extended_features(struct virtio_pci_modern_device *mdev,
+				     u64 *features);
+void vp_modern_set_extended_features(struct virtio_pci_modern_device *mdev,
+				     const u64 *features);
+
+static inline u64
+vp_modern_get_features(struct virtio_pci_modern_device *mdev)
+{
+	u64 features_array[VIRTIO_FEATURES_DWORDS];
+	int i;
+
+	vp_modern_get_extended_features(mdev, features_array);
+	for (i = 1; i < VIRTIO_FEATURES_DWORDS; ++i)
+		WARN_ON_ONCE(features_array[i]);
+	return features_array[0];
+}
+
+static inline u64
+vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
+{
+	u64 features_array[VIRTIO_FEATURES_DWORDS];
+	int i;
+
+	vp_modern_get_driver_extended_features(mdev, features_array);
+	for (i = 1; i < VIRTIO_FEATURES_DWORDS; ++i)
+		WARN_ON_ONCE(features_array[i]);
+	return features_array[0];
+}
+
+static inline void
+vp_modern_set_features(struct virtio_pci_modern_device *mdev, u64 features)
+{
+	u64 features_array[VIRTIO_FEATURES_DWORDS];
+
+	virtio_features_from_u64(features_array, features);
+	vp_modern_set_extended_features(mdev, features_array);
+}
+
 u32 vp_modern_generation(struct virtio_pci_modern_device *mdev);
 u8 vp_modern_get_status(struct virtio_pci_modern_device *mdev);
 void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 3/8] vhost-net: allow configuring extended features
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-06-06 11:45 ` [PATCH RFC v3 1/8] virtio: introduce extended features Paolo Abeni
  2025-06-06 11:45 ` [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring " Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-08  6:16   ` Akihiko Odaki
  2025-06-12  1:31   ` Jason Wang
  2025-06-06 11:45 ` [PATCH RFC v3 4/8] virtio_net: add supports for extended offloads Paolo Abeni
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Use the extended feature type for 'acked_features' and implement
two new ioctls operation allowing the user-space to set/query an
unbounded amount of features.

The actual number of processed features is limited by VIRTIO_FEATURES_MAX
and attempts to set features above such limit fail with
EOPNOTSUPP.

Note that: the legacy ioctls implicitly truncate the negotiated
features to the lower 64 bits range and the 'acked_backend_features'
field don't need conversion, as the only negotiated feature there
is in the low 64 bit range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - virtio_features_t -> u64[2]
  - add __counted_by annotation to vhost_features_array

v1 -> v2:
  - change the ioctl to use an extensible API
---
 drivers/vhost/net.c              | 85 +++++++++++++++++++++++++++-----
 drivers/vhost/vhost.c            |  2 +-
 drivers/vhost/vhost.h            |  4 +-
 include/uapi/linux/vhost.h       |  7 +++
 include/uapi/linux/vhost_types.h |  5 ++
 5 files changed, 88 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7cbfc7d718b3..0291fce24bbf 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -77,6 +77,8 @@ enum {
 			 (1ULL << VIRTIO_F_RING_RESET)
 };
 
+const u64 VHOST_NET_ALL_FEATURES[VIRTIO_FEATURES_DWORDS] = { VHOST_NET_FEATURES };
+
 enum {
 	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
 };
@@ -1614,16 +1616,17 @@ static long vhost_net_reset_owner(struct vhost_net *n)
 	return err;
 }
 
-static int vhost_net_set_features(struct vhost_net *n, u64 features)
+static int vhost_net_set_features(struct vhost_net *n, const u64 *features)
 {
 	size_t vhost_hlen, sock_hlen, hdr_len;
 	int i;
 
-	hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-			       (1ULL << VIRTIO_F_VERSION_1))) ?
-			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
-			sizeof(struct virtio_net_hdr);
-	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
+	hdr_len = virtio_features_test_bit(features, VIRTIO_NET_F_MRG_RXBUF) ||
+		  virtio_features_test_bit(features, VIRTIO_F_VERSION_1) ?
+		  sizeof(struct virtio_net_hdr_mrg_rxbuf) :
+		  sizeof(struct virtio_net_hdr);
+
+	if (virtio_features_test_bit(features, VHOST_NET_F_VIRTIO_NET_HDR)) {
 		/* vhost provides vnet_hdr */
 		vhost_hlen = hdr_len;
 		sock_hlen = 0;
@@ -1633,18 +1636,19 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features)
 		sock_hlen = hdr_len;
 	}
 	mutex_lock(&n->dev.mutex);
-	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	if (virtio_features_test_bit(features, VHOST_F_LOG_ALL) &&
 	    !vhost_log_access_ok(&n->dev))
 		goto out_unlock;
 
-	if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
+	if (virtio_features_test_bit(features, VIRTIO_F_ACCESS_PLATFORM)) {
 		if (vhost_init_device_iotlb(&n->dev))
 			goto out_unlock;
 	}
 
 	for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
 		mutex_lock(&n->vqs[i].vq.mutex);
-		n->vqs[i].vq.acked_features = features;
+		virtio_features_copy(n->vqs[i].vq.acked_features_array,
+				     features);
 		n->vqs[i].vhost_hlen = vhost_hlen;
 		n->vqs[i].sock_hlen = sock_hlen;
 		mutex_unlock(&n->vqs[i].vq.mutex);
@@ -1681,12 +1685,13 @@ static long vhost_net_set_owner(struct vhost_net *n)
 static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 			    unsigned long arg)
 {
+	u64 all_features[VIRTIO_FEATURES_DWORDS];
 	struct vhost_net *n = f->private_data;
 	void __user *argp = (void __user *)arg;
 	u64 __user *featurep = argp;
 	struct vhost_vring_file backend;
-	u64 features;
-	int r;
+	u64 features, count;
+	int r, i;
 
 	switch (ioctl) {
 	case VHOST_NET_SET_BACKEND:
@@ -1703,7 +1708,63 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 			return -EFAULT;
 		if (features & ~VHOST_NET_FEATURES)
 			return -EOPNOTSUPP;
-		return vhost_net_set_features(n, features);
+
+		virtio_features_from_u64(all_features, features);
+		return vhost_net_set_features(n, all_features);
+	case VHOST_GET_FEATURES_ARRAY:
+	{
+		if (copy_from_user(&count, argp, sizeof(u64)))
+			return -EFAULT;
+
+		/* Copy the net features, up to the user-provided buffer size */
+		virtio_features_copy(all_features, VHOST_NET_ALL_FEATURES);
+		argp += sizeof(u64);
+		for (i = 0; i < min(count, VIRTIO_FEATURES_DWORDS); ++i) {
+			i = array_index_nospec(i, VIRTIO_FEATURES_DWORDS);
+			if (copy_to_user(argp, &all_features[i], sizeof(u64)))
+				return -EFAULT;
+
+			argp += sizeof(u64);
+		}
+
+		/* Zero the trailing space provided by user-space, if any */
+		if (i < count && clear_user(argp, (count - i) * sizeof(u64)))
+			return -EFAULT;
+		return 0;
+	}
+	case VHOST_SET_FEATURES_ARRAY:
+	{
+		u64 tmp[VIRTIO_FEATURES_DWORDS];
+
+		if (copy_from_user(&count, argp, sizeof(u64)))
+			return -EFAULT;
+
+		virtio_features_zero(all_features);
+		for (i = 0; i < min(count, VIRTIO_FEATURES_DWORDS); ++i) {
+			argp += sizeof(u64);
+			if (copy_from_user(&features, argp, sizeof(u64)))
+				return -EFAULT;
+
+			all_features[i] = features;
+		}
+
+		/* Any feature specified by user-space above VIRTIO_FEATURES_MAX is
+		 * not supported by definition.
+		 */
+		for (; i < count; ++i) {
+			if (copy_from_user(&features, argp, sizeof(u64)))
+				return -EFAULT;
+			if (features)
+				return -EOPNOTSUPP;
+		}
+
+		virtio_features_and_not(tmp, all_features, VHOST_NET_ALL_FEATURES);
+		for (i = 0; i < VIRTIO_FEATURES_DWORDS; i++)
+			if (tmp[i])
+				return -EOPNOTSUPP;
+
+		return vhost_net_set_features(n, all_features);
+	}
 	case VHOST_GET_BACKEND_FEATURES:
 		features = VHOST_NET_BACKEND_FEATURES;
 		if (copy_to_user(featurep, &features, sizeof(features)))
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 63612faeab72..6d3b9f0a9163 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -372,7 +372,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->log_used = false;
 	vq->log_addr = -1ull;
 	vq->private_data = NULL;
-	vq->acked_features = 0;
+	virtio_features_zero(vq->acked_features_array);
 	vq->acked_backend_features = 0;
 	vq->log_base = NULL;
 	vq->error_ctx = NULL;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index bb75a292d50c..d1aed35c4b07 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -133,7 +133,7 @@ struct vhost_virtqueue {
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	void *private_data;
-	u64 acked_features;
+	VIRTIO_DECLARE_FEATURES(acked_features);
 	u64 acked_backend_features;
 	/* Log write descriptors */
 	void __user *log_base;
@@ -291,7 +291,7 @@ static inline void *vhost_vq_get_backend(struct vhost_virtqueue *vq)
 
 static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
 {
-	return vq->acked_features & (1ULL << bit);
+	return virtio_features_test_bit(vq->acked_features_array, bit);
 }
 
 static inline bool vhost_backend_has_feature(struct vhost_virtqueue *vq, int bit)
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index d4b3e2ae1314..d6ad01fbb8d2 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -235,4 +235,11 @@
  */
 #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
 					      struct vhost_vring_state)
+
+/* Extended features manipulation */
+#define VHOST_GET_FEATURES_ARRAY _IOR(VHOST_VIRTIO, 0x83, \
+				       struct vhost_features_array)
+#define VHOST_SET_FEATURES_ARRAY _IOW(VHOST_VIRTIO, 0x83, \
+				       struct vhost_features_array)
+
 #endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d7656908f730..1c39cc5f5a31 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -110,6 +110,11 @@ struct vhost_msg_v2 {
 	};
 };
 
+struct vhost_features_array {
+	__u64 count; /* number of entries present in features array */
+	__u64 features[] __counted_by(count);
+};
+
 struct vhost_memory_region {
 	__u64 guest_phys_addr;
 	__u64 memory_size; /* bytes */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 4/8] virtio_net: add supports for extended offloads
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-06-06 11:45 ` [PATCH RFC v3 3/8] vhost-net: " Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-06 11:45 ` [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio_net driver needs it to implement GSO over UDP tunnel
offload.

The only missing piece is mapping them to/from the extended
features.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
 - offload to feature conversion is now an inline function

v1 -> v2:
 - drop unused macro
 - restrict the offload remap range as per latest spec update
---
 drivers/net/virtio_net.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e53ba600605a..18ad50de4928 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -35,6 +35,22 @@ module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
 
+#define VIRTIO_OFFLOAD_MAP_MIN	46
+#define VIRTIO_OFFLOAD_MAP_MAX	47
+#define VIRTIO_FEATURES_MAP_MIN	65
+#define VIRTIO_O2F_DELTA	(VIRTIO_FEATURES_MAP_MIN - VIRTIO_OFFLOAD_MAP_MIN)
+
+static bool virtio_is_mapped_offload(unsigned int obit)
+{
+	return obit >= VIRTIO_OFFLOAD_MAP_MIN &&
+	       obit <= VIRTIO_OFFLOAD_MAP_MAX;
+}
+
+static unsigned int virtio_offload_to_feature(unsigned int obit)
+{
+	return virtio_is_mapped_offload(obit) ? obit + VIRTIO_O2F_DELTA : obit;
+}
+
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
@@ -7037,9 +7053,13 @@ static int virtnet_probe(struct virtio_device *vdev)
 		netif_carrier_on(dev);
 	}
 
-	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
-		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
+	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++) {
+		unsigned int fbit;
+
+		fbit = virtio_offload_to_feature(guest_offloads[i]);
+		if (virtio_has_feature(vi->vdev, fbit))
 			set_bit(guest_offloads[i], &vi->guest_offloads);
+	}
 	vi->guest_offloads_capable = vi->guest_offloads;
 
 	rtnl_unlock();
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-06-06 11:45 ` [PATCH RFC v3 4/8] virtio_net: add supports for extended offloads Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-12  3:53   ` Jason Wang
  2025-06-06 11:45 ` [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio specification are introducing support for GSO over
UDP tunnel.

This patch brings in the needed defines and the additional
virtio hdr parsing/building helpers.

The UDP tunnel support uses additional fields in the virtio hdr,
and such fields location can change depending on other negotiated
features - specifically VIRTIO_NET_F_HASH_REPORT.

Try to be as conservative as possible with the new field validation.

Existing implementation for plain GSO offloads allow for invalid/
self-contradictory values of such fields. With GSO over UDP tunnel we can
be more strict, with no need to deal with legacy implementation.

Since the checksum-related field validation is asymmetric in the driver
and in the device, introduce a separate helper to implement the new checks
(to be used only on the driver side).

Note that while the feature space exceeds the 64-bit boundaries, the
guest offload space is fixed by the specification of the
VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command to a 64-bit size.

Prior to the UDP tunnel GSO support, each guest offload bit corresponded
to the feature bit with the same value and vice versa.

Due to the limited 'guest offload' space, relevant features in the high
64 bits are 'mapped' to free bits in the lower range. That is simpler
than defining a new command (and associated features) to exchange an
extended guest offloads set.

As a consequence, the uAPIs also specify the mapped guest offload value
corresponding to the UDP tunnel GSO features.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
v2 -> v3:
  - add definitions for possible vnet hdr layouts with tunnel support

v1 -> v2:
  - 'relay' -> 'rely' typo
  - less unclear comment WRT enforced inner GSO checks
  - inner header fields are allowed only with 'modern' virtio,
    thus are always le
  - clarified in the commit message the need for 'mapped features'
    defines
  - assume little_endian is true when UDP GSO is enabled.
  - fix inner proto type value
---
 include/linux/virtio_net.h      | 191 ++++++++++++++++++++++++++++++--
 include/uapi/linux/virtio_net.h |  43 +++++++
 2 files changed, 226 insertions(+), 8 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 02a9f4dc594d..bcf80534a739 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
 	return 0;
 }
 
-static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
-					const struct virtio_net_hdr *hdr,
-					bool little_endian)
+static inline int __virtio_net_hdr_to_skb(struct sk_buff *skb,
+					  const struct virtio_net_hdr *hdr,
+					  bool little_endian, u8 hdr_gso_type)
 {
 	unsigned int nh_min_len = sizeof(struct iphdr);
 	unsigned int gso_type = 0;
@@ -57,8 +57,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 	unsigned int p_off = 0;
 	unsigned int ip_proto;
 
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+	if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		switch (hdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 			gso_type = SKB_GSO_TCPV4;
 			ip_proto = IPPROTO_TCP;
@@ -84,7 +84,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 			return -EINVAL;
 		}
 
-		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
+		if (hdr_gso_type & VIRTIO_NET_HDR_GSO_ECN)
 			gso_type |= SKB_GSO_TCP_ECN;
 
 		if (hdr->gso_size == 0)
@@ -122,7 +122,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 
 				if (!protocol)
 					virtio_net_hdr_set_proto(skb, hdr);
-				else if (!virtio_net_hdr_match_proto(protocol, hdr->gso_type))
+				else if (!virtio_net_hdr_match_proto(protocol, hdr_gso_type))
 					return -EINVAL;
 				else
 					skb->protocol = protocol;
@@ -153,7 +153,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 		}
 	}
 
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+	if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
 		u16 gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
 		unsigned int nh_off = p_off;
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
@@ -199,6 +199,13 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 	return 0;
 }
 
+static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
+					const struct virtio_net_hdr *hdr,
+					bool little_endian)
+{
+	return __virtio_net_hdr_to_skb(skb, hdr, little_endian, hdr->gso_type);
+}
+
 static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 					  struct virtio_net_hdr *hdr,
 					  bool little_endian,
@@ -242,4 +249,172 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 	return 0;
 }
 
+static inline unsigned int virtio_l3min(bool is_ipv6)
+{
+	return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
+}
+
+static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
+					    const struct virtio_net_hdr *hdr,
+					    unsigned int tnl_hdr_offset,
+					    bool tnl_csum_negotiated,
+					    bool little_endian)
+{
+	u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
+	unsigned int inner_nh, outer_th, inner_th;
+	unsigned int inner_l3min, outer_l3min;
+	struct virtio_net_hdr_tunnel *tnl;
+	bool outer_isv6, inner_isv6;
+	u8 gso_inner_type;
+	int ret;
+
+	if (!gso_tunnel_type)
+		return virtio_net_hdr_to_skb(skb, hdr, little_endian);
+
+	/* Tunnel not supported/negotiated, but the hdr asks for it. */
+	if (!tnl_hdr_offset)
+		return -EINVAL;
+
+	/* Either ipv4 or ipv6. */
+	if (gso_tunnel_type == VIRTIO_NET_HDR_GSO_UDP_TUNNEL)
+		return -EINVAL;
+
+	/* The UDP tunnel must carry a GSO packet, but no UFO. */
+	gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
+					   VIRTIO_NET_HDR_GSO_UDP_TUNNEL);
+	if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
+		return -EINVAL;
+
+	/* Rely on csum being present. */
+	if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
+		return -EINVAL;
+
+	/* Validate offsets. */
+	outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
+	inner_isv6 = gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6;
+	inner_l3min = virtio_l3min(inner_isv6);
+	outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
+
+	tnl = ((void *)hdr) + tnl_hdr_offset;
+	inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
+	inner_nh = le16_to_cpu(tnl->inner_nh_offset);
+	outer_th = le16_to_cpu(tnl->outer_th_offset);
+	if (outer_th < outer_l3min ||
+	    inner_nh < outer_th + sizeof(struct udphdr) ||
+	    inner_th < inner_nh + inner_l3min)
+		return -EINVAL;
+
+	/* Let the basic parsing deal with plain GSO features. */
+	ret = __virtio_net_hdr_to_skb(skb, hdr, true,
+				      hdr->gso_type & ~gso_tunnel_type);
+	if (ret)
+		return ret;
+
+	/* In case of USO, the inner protocol is still unknown and
+	 * `inner_isv6` is just a guess, additional parsing is needed.
+	 * The previous validation ensures that accessing an ipv4 inner
+	 * network header is safe.
+	 */
+	if (gso_inner_type == VIRTIO_NET_HDR_GSO_UDP_L4) {
+		struct iphdr *iphdr = (struct iphdr *)(skb->data + inner_nh);
+
+		inner_isv6 = iphdr->version == 6;
+		inner_l3min = virtio_l3min(inner_isv6);
+		if (inner_th < inner_nh + inner_l3min)
+			return -EINVAL;
+	}
+
+	skb_set_inner_protocol(skb, inner_isv6 ? htons(ETH_P_IPV6) :
+						 htons(ETH_P_IP));
+	if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
+		if (!tnl_csum_negotiated)
+			return -EINVAL;
+
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	} else {
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
+	}
+
+	skb->inner_transport_header = inner_th + skb_headroom(skb);
+	skb->inner_network_header = inner_nh + skb_headroom(skb);
+	skb->inner_mac_header = inner_nh + skb_headroom(skb);
+	skb->transport_header = outer_th + skb_headroom(skb);
+	skb->encapsulation = 1;
+	return 0;
+}
+
+/* Checksum-related fields validation for the driver */
+static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
+					    struct virtio_net_hdr *hdr,
+					    bool tnl_csum_negotiated)
+{
+	if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
+		if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
+			return 0;
+
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
+			return 0;
+
+		/* tunnel csum packets are invalid when the related
+		 * feature has not been negotiated
+		 */
+		if (!tnl_csum_negotiated)
+			return -EINVAL;
+		skb->csum_level = 1;
+		return 0;
+	}
+
+	/* DATA_VALID is mutually exclusive with NEEDS_CSUM, and GSO
+	 * over UDP tunnel requires the latter
+	 */
+	if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
+		return -EINVAL;
+	return 0;
+}
+
+static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
+					      struct virtio_net_hdr *hdr,
+					      unsigned int tnl_offset,
+					      bool little_endian,
+					      int vlan_hlen)
+{
+	struct virtio_net_hdr_tunnel *tnl;
+	unsigned int inner_nh, outer_th;
+	int tnl_gso_type;
+	int ret;
+
+	tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
+						    SKB_GSO_UDP_TUNNEL_CSUM);
+	if (!tnl_gso_type)
+		return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
+					       vlan_hlen);
+
+	/* Tunnel support not negotiated but skb ask for it. */
+	if (!tnl_offset)
+		return -EINVAL;
+
+	/* Let the basic parsing deal with plain GSO features. */
+	skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
+	ret = virtio_net_hdr_from_skb(skb, hdr, true, false, vlan_hlen);
+	skb_shinfo(skb)->gso_type |= tnl_gso_type;
+	if (ret)
+		return ret;
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
+	else
+		hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
+		hdr->flags |= VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM;
+
+	tnl = ((void *)hdr) + tnl_offset;
+	inner_nh = skb->inner_network_header - skb_headroom(skb);
+	outer_th = skb->transport_header - skb_headroom(skb);
+	tnl->inner_nh_offset = cpu_to_le16(inner_nh);
+	tnl->outer_th_offset = cpu_to_le16(outer_th);
+	return 0;
+}
+
 #endif /* _LINUX_VIRTIO_NET_H */
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 963540deae66..313761be99b2 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -70,6 +70,28 @@
 					 * with the same MAC.
 					 */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
+					      * GSO-over-UDP-tunnel packets
+					      */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
+						   * GSO-over-UDP-tunnel
+						   * packets with partial csum
+						   * for the outer header
+						   */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
+					     * GSO-over-UDP-tunnel packets
+					     */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
+						  * GSO-over-UDP-tunnel
+						  * packets with partial csum
+						  * for the outer header
+						  */
+
+/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
+ * features
+ */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
@@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
 #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
 #define VIRTIO_NET_HDR_F_RSC_INFO	4	/* rsc info in csum_ fields */
+#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8	/* UDP tunnel requires csum offload */
 	__u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
 #define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
+				       VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
 	__u8 gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
@@ -181,6 +208,22 @@ struct virtio_net_hdr_v1_hash {
 	__le16 padding;
 };
 
+/* This header after hashing information */
+struct virtio_net_hdr_tunnel {
+	__le16 outer_th_offset;
+	__le16 inner_nh_offset;
+};
+
+struct virtio_net_hdr_v1_tunnel {
+	struct virtio_net_hdr_v1 hdr;
+	struct virtio_net_hdr_tunnel tnl;
+};
+
+struct virtio_net_hdr_v1_hash_tunnel {
+	struct virtio_net_hdr_v1_hash hdr;
+	struct virtio_net_hdr_tunnel tnl;
+};
+
 #ifndef VIRTIO_NET_NO_LEGACY
 /* This header comes first in the scatter-gather list.
  * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (4 preceding siblings ...)
  2025-06-06 11:45 ` [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-12  4:05   ` Jason Wang
  2025-06-06 11:45 ` [PATCH RFC v3 7/8] tun: " Paolo Abeni
  2025-06-06 11:45 ` [PATCH RFC v3 8/8] vhost/net: " Paolo Abeni
  7 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

If the related virtio feature is set, enable transmission and reception
of gso over UDP tunnel packets.

Most of the work is done by the previously introduced helper, just need
to determine the UDP tunnel features inside the virtio_net_hdr and
update accordingly the virtio net hdr size.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - drop the VIRTIO_HAS_EXTENDED_FEATURES conditionals

v1 -> v2:
  - test for UDP_TUNNEL_GSO* only on builds with extended features support
  - comment indentation cleanup
  - rebased on top of virtio helpers changes
  - dump more information in case of bad offloads
---
 drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 18ad50de4928..0b234f318e39 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -78,15 +78,19 @@ static const unsigned long guest_offloads[] = {
 	VIRTIO_NET_F_GUEST_CSUM,
 	VIRTIO_NET_F_GUEST_USO4,
 	VIRTIO_NET_F_GUEST_USO6,
-	VIRTIO_NET_F_GUEST_HDRLEN
+	VIRTIO_NET_F_GUEST_HDRLEN,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
 };
 
 #define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
-				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
-				(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
-				(1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
-				(1ULL << VIRTIO_NET_F_GUEST_USO4) | \
-				(1ULL << VIRTIO_NET_F_GUEST_USO6))
+			(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
+			(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
+			(1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
+			(1ULL << VIRTIO_NET_F_GUEST_USO4) | \
+			(1ULL << VIRTIO_NET_F_GUEST_USO6) | \
+			(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
+			(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
 
 struct virtnet_stat_desc {
 	char desc[ETH_GSTRING_LEN];
@@ -436,9 +440,14 @@ struct virtnet_info {
 	/* Packet virtio header size */
 	u8 hdr_len;
 
+	/* UDP tunnel support */
+	u8 tnl_offset;
+
 	/* Work struct for delayed refilling if we run low on memory. */
 	struct delayed_work refill;
 
+	bool rx_tnl_csum;
+
 	/* Is delayed refill enabled? */
 	bool refill_enabled;
 
@@ -2531,14 +2540,22 @@ static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *
 	if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
 		virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
 
-	if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
-		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	/* restore the received value */
+	hdr->hdr.flags = flags;
+	if (virtio_net_chk_data_valid(skb, &hdr->hdr, vi->rx_tnl_csum)) {
+		net_warn_ratelimited("%s: bad csum: flags: %x, gso_type: %x rx_tnl_csum %d\n",
+				     dev->name, hdr->hdr.flags,
+				     hdr->hdr.gso_type, vi->rx_tnl_csum);
+		goto frame_err;
+	}
 
-	if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
-				  virtio_is_little_endian(vi->vdev))) {
-		net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
+	if (virtio_net_hdr_tnl_to_skb(skb, &hdr->hdr, vi->tnl_offset,
+				      vi->rx_tnl_csum,
+				      virtio_is_little_endian(vi->vdev))) {
+		net_warn_ratelimited("%s: bad gso: type: %x, size: %u, flags %x tunnel %d tnl csum %d\n",
 				     dev->name, hdr->hdr.gso_type,
-				     hdr->hdr.gso_size);
+				     hdr->hdr.gso_size, hdr->hdr.flags,
+				     vi->tnl_offset, vi->rx_tnl_csum);
 		goto frame_err;
 	}
 
@@ -3269,9 +3286,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb, bool orphan)
 	else
 		hdr = &skb_vnet_common_hdr(skb)->mrg_hdr;
 
-	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
-				    virtio_is_little_endian(vi->vdev), false,
-				    0))
+	if (virtio_net_hdr_tnl_from_skb(skb, &hdr->hdr, vi->tnl_offset,
+					virtio_is_little_endian(vi->vdev), 0))
 		return -EPROTO;
 
 	if (vi->mergeable_rx_bufs)
@@ -6775,10 +6791,20 @@ static int virtnet_probe(struct virtio_device *vdev)
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_USO))
 			dev->hw_features |= NETIF_F_GSO_UDP_L4;
 
+		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)) {
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+			dev->hw_enc_features = dev->hw_features;
+		}
+		if (dev->hw_features & NETIF_F_GSO_UDP_TUNNEL &&
+		    virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM)) {
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+		}
+
 		dev->features |= NETIF_F_GSO_ROBUST;
 
 		if (gso)
-			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
+			dev->features |= dev->hw_features;
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 
@@ -6879,6 +6905,14 @@ static int virtnet_probe(struct virtio_device *vdev)
 	else
 		vi->hdr_len = sizeof(struct virtio_net_hdr);
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) ||
+	    virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO))
+		vi->tnl_offset = vi->hdr_len;
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM))
+		vi->rx_tnl_csum = true;
+	if (vi->tnl_offset)
+		vi->hdr_len += sizeof(struct virtio_net_hdr_tunnel);
+
 	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
 	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 		vi->any_header_sg = true;
@@ -7189,6 +7223,10 @@ static struct virtio_device_id id_table[] = {
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
+	VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
+	VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM,
 };
 
 static unsigned int features_legacy[] = {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (5 preceding siblings ...)
  2025-06-06 11:45 ` [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  2025-06-12  4:55   ` Jason Wang
  2025-06-06 11:45 ` [PATCH RFC v3 8/8] vhost/net: " Paolo Abeni
  7 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Add new tun features to represent the newly introduced virtio
GSO over UDP tunnel offload. Allows detection and selection of
such features via the existing TUNSETOFFLOAD ioctl, store the
tunnel offload configuration in the highest bit of the tun flags
and compute the expected virtio header size and tunnel header
offset using such bits, so that we can plug almost seamless the
the newly introduced virtio helpers to serialize the extended
virtio header.

As the tun features and the virtio hdr size are configured
separately, the data path need to cope with (hopefully transient)
inconsistent values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - cleaned-up uAPI comments
  - use explicit struct layout instead of raw buf.
---
 drivers/net/tun.c           | 77 ++++++++++++++++++++++++++++++++-----
 drivers/net/tun_vnet.h      | 73 ++++++++++++++++++++++++++++-------
 include/uapi/linux/if_tun.h |  9 +++++
 3 files changed, 136 insertions(+), 23 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1207196cbbed..d326800dce9d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -186,7 +186,8 @@ struct tun_struct {
 	struct net_device	*dev;
 	netdev_features_t	set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
-			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
+			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \
+			  NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
 	int			align;
 	int			vnet_hdr_sz;
@@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev)
 	dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
 			   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
 			   NETIF_F_HW_VLAN_STAG_TX;
+	dev->hw_enc_features = dev->hw_features;
 	dev->features = dev->hw_features;
 	dev->vlan_features = dev->features &
 			     ~(NETIF_F_HW_VLAN_CTAG_TX |
@@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	struct sk_buff *skb;
 	size_t total_len = iov_iter_count(from);
 	size_t len = total_len, align = tun->align, linear;
-	struct virtio_net_hdr gso = { 0 };
+	struct virtio_net_hdr_v1_tunnel full_hdr;
+	struct virtio_net_hdr *gso;
 	int good_linear;
 	int copylen;
 	int hdr_len = 0;
@@ -1708,6 +1711,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	int skb_xdp = 1;
 	bool frags = tun_napi_frags_enabled(tfile);
 	enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
+	unsigned int flags = tun->flags & ~TUN_VNET_TNL_MASK;
+
+	/*
+	 * Keep it easy and always zero the whole buffer, even if the
+	 * tunnel-related field will be touched only when the feature
+	 * is enabled and the hdr size id compatible.
+	 */
+	memset(&full_hdr, 0, sizeof(full_hdr));
+	gso = (struct virtio_net_hdr *)&full_hdr.hdr;
 
 	if (!(tun->flags & IFF_NO_PI)) {
 		if (len < sizeof(pi))
@@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	if (tun->flags & IFF_VNET_HDR) {
 		int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
+		int parsed_size;
 
-		hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
+		if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
+			parsed_size = vnet_hdr_sz;
+		} else {
+			parsed_size = TUN_VNET_TNL_SIZE;
+			flags |= TUN_VNET_TNL_MASK;
+		}
+		hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, parsed_size,
+					     flags, from, gso);
 		if (hdr_len < 0)
 			return hdr_len;
 
@@ -1755,7 +1775,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		 * (e.g gso or jumbo packet), we will do it at after
 		 * skb was created with generic XDP routine.
 		 */
-		skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp);
+		skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp);
 		err = PTR_ERR_OR_ZERO(skb);
 		if (err)
 			goto drop;
@@ -1799,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		}
 	}
 
-	if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) {
+	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		err = -EINVAL;
 		goto free_skb;
@@ -2050,13 +2070,26 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	}
 
 	if (vnet_hdr_sz) {
-		struct virtio_net_hdr gso;
+		struct virtio_net_hdr_v1_tunnel full_hdr;
+		struct virtio_net_hdr *gso;
+		int flags = tun->flags;
+		int parsed_size;
+
+		gso = (struct virtio_net_hdr *)&full_hdr.hdr;
+		parsed_size = tun_vnet_parse_size(tun->flags);
+		if (unlikely(vnet_hdr_sz < parsed_size)) {
+			/* Inconsistent hdr size and (tunnel) offloads:
+			 * strips the latter
+			 */
+			flags &= ~TUN_VNET_TNL_MASK;
+			parsed_size = sizeof(struct virtio_net_hdr);
+		};
 
-		ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
+		ret = tun_vnet_hdr_from_skb(flags, tun->dev, skb, gso);
 		if (ret)
 			return ret;
 
-		ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
+		ret = __tun_vnet_hdr_put(vnet_hdr_sz, parsed_size, iter, gso);
 		if (ret)
 			return ret;
 	}
@@ -2366,6 +2399,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	int metasize = 0;
 	int ret = 0;
 	bool skb_xdp = false;
+	unsigned int flags;
 	struct page *page;
 
 	if (unlikely(datasize < ETH_HLEN))
@@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
 	if (metasize > 0)
 		skb_metadata_set(skb, metasize);
 
-	if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
+	/* Assume tun offloads are enabled if the provided hdr is large
+	 * enough.
+	 */
+	if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
+	    xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
+		flags = tun->flags | TUN_VNET_TNL_MASK;
+	else
+		flags = tun->flags & ~TUN_VNET_TNL_MASK;
+
+	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		kfree_skb(skb);
 		ret = -EINVAL;
@@ -2812,6 +2855,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
 
 }
 
+#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6)
+
 /* This is like a cut-down ethtool ops, except done via tun fd so no
  * privs required. */
 static int set_offload(struct tun_struct *tun, unsigned long arg)
@@ -2841,6 +2886,17 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 			features |= NETIF_F_GSO_UDP_L4;
 			arg &= ~(TUN_F_USO4 | TUN_F_USO6);
 		}
+
+		/* Tunnel offload is allowed only if some plain offload is
+		 * available, too.
+		 */
+		if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) {
+			features |= NETIF_F_GSO_UDP_TUNNEL;
+			if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM)
+				features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			arg &= ~(TUN_F_UDP_TUNNEL_GSO |
+				 TUN_F_UDP_TUNNEL_GSO_CSUM);
+		}
 	}
 
 	/* This gives the user a way to test for new features in future by
@@ -2852,7 +2908,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 	tun->dev->wanted_features &= ~TUN_USER_FEATURES;
 	tun->dev->wanted_features |= features;
 	netdev_update_features(tun->dev);
-
+	tun_set_vnet_tnl(&tun->flags, !!(features & NETIF_F_GSO_UDP_TUNNEL),
+			 !!(features & NETIF_F_GSO_UDP_TUNNEL_CSUM));
 	return 0;
 }
 
diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
index 58b9ac7a5fc4..c2374c26538b 100644
--- a/drivers/net/tun_vnet.h
+++ b/drivers/net/tun_vnet.h
@@ -5,6 +5,11 @@
 /* High bits in flags field are unused. */
 #define TUN_VNET_LE     0x80000000
 #define TUN_VNET_BE     0x40000000
+#define TUN_VNET_TNL		0x20000000
+#define TUN_VNET_TNL_CSUM	0x10000000
+#define TUN_VNET_TNL_MASK	(TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
+
+#define TUN_VNET_TNL_SIZE	sizeof(struct virtio_net_hdr_v1_tunnel)
 
 static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
 {
@@ -45,6 +50,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
 	return 0;
 }
 
+static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
+{
+	*flags = (*flags & ~TUN_VNET_TNL_MASK) |
+		 tnl * TUN_VNET_TNL |
+		 tnl_csum * TUN_VNET_TNL_CSUM;
+}
+
 static inline bool tun_vnet_is_little_endian(unsigned int flags)
 {
 	return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags);
@@ -107,16 +119,33 @@ static inline long tun_vnet_ioctl(int *vnet_hdr_sz, unsigned int *flags,
 	}
 }
 
-static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
-				   struct iov_iter *from,
-				   struct virtio_net_hdr *hdr)
+static inline unsigned int tun_vnet_parse_size(unsigned int flags)
+{
+	if (!(flags & TUN_VNET_TNL))
+		return sizeof(struct virtio_net_hdr);
+
+	return TUN_VNET_TNL_SIZE;
+}
+
+static inline unsigned int tun_vnet_tnl_offset(unsigned int flags)
+{
+	if (!(flags & TUN_VNET_TNL))
+		return 0;
+
+	return sizeof(struct virtio_net_hdr_v1);
+}
+
+static inline int __tun_vnet_hdr_get(int sz, int parsed_size,
+				     unsigned int flags,
+				     struct iov_iter *from,
+				     struct virtio_net_hdr *hdr)
 {
 	u16 hdr_len;
 
 	if (iov_iter_count(from) < sz)
 		return -EINVAL;
 
-	if (!copy_from_iter_full(hdr, sizeof(*hdr), from))
+	if (!copy_from_iter_full(hdr, parsed_size, from))
 		return -EFAULT;
 
 	hdr_len = tun_vnet16_to_cpu(flags, hdr->hdr_len);
@@ -129,30 +158,47 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
 	if (hdr_len > iov_iter_count(from))
 		return -EINVAL;
 
-	iov_iter_advance(from, sz - sizeof(*hdr));
+	iov_iter_advance(from, sz - parsed_size);
 
 	return hdr_len;
 }
 
-static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
-				   const struct virtio_net_hdr *hdr)
+static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
+				   struct iov_iter *from,
+				   struct virtio_net_hdr *hdr)
+{
+	return __tun_vnet_hdr_get(sz, sizeof(*hdr), flags, from, hdr);
+}
+
+static inline int __tun_vnet_hdr_put(int sz, int parsed_size,
+				     struct iov_iter *iter,
+				     const struct virtio_net_hdr *hdr)
 {
 	if (unlikely(iov_iter_count(iter) < sz))
 		return -EINVAL;
 
-	if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
+	if (unlikely(copy_to_iter(hdr, parsed_size, iter) != parsed_size))
 		return -EFAULT;
 
-	if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr))
+	if (iov_iter_zero(sz - parsed_size, iter) != sz - parsed_size)
 		return -EFAULT;
 
 	return 0;
 }
 
+static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
+				   const struct virtio_net_hdr *hdr)
+{
+	return __tun_vnet_hdr_put(sz, sizeof(*hdr), iter, hdr);
+}
+
 static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
 				      const struct virtio_net_hdr *hdr)
 {
-	return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags));
+	return virtio_net_hdr_tnl_to_skb(skb, hdr,
+					 tun_vnet_tnl_offset(flags),
+					 !!(flags & TUN_VNET_TNL_CSUM),
+					 tun_vnet_is_little_endian(flags));
 }
 
 static inline int tun_vnet_hdr_from_skb(unsigned int flags,
@@ -161,10 +207,11 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
 					struct virtio_net_hdr *hdr)
 {
 	int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
+	int tnl_offset = tun_vnet_tnl_offset(flags);
 
-	if (virtio_net_hdr_from_skb(skb, hdr,
-				    tun_vnet_is_little_endian(flags), true,
-				    vlan_hlen)) {
+	if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
+					tun_vnet_is_little_endian(flags),
+					vlan_hlen)) {
 		struct skb_shared_info *sinfo = skb_shinfo(skb);
 
 		if (net_ratelimit()) {
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 287cdc81c939..79d53c7a1ebd 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -93,6 +93,15 @@
 #define TUN_F_USO4	0x20	/* I can handle USO for IPv4 packets */
 #define TUN_F_USO6	0x40	/* I can handle USO for IPv6 packets */
 
+/* I can handle TSO/USO for UDP tunneled packets */
+#define TUN_F_UDP_TUNNEL_GSO		0x080
+
+/*
+ * I can handle TSO/USO for UDP tunneled packets requiring csum offload for
+ * the outer header
+ */
+#define TUN_F_UDP_TUNNEL_GSO_CSUM	0x100
+
 /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
 #define TUN_PKT_STRIP	0x0001
 struct tun_pi {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 8/8] vhost/net: enable gso over UDP tunnel support.
  2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (6 preceding siblings ...)
  2025-06-06 11:45 ` [PATCH RFC v3 7/8] tun: " Paolo Abeni
@ 2025-06-06 11:45 ` Paolo Abeni
  7 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:45 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Vhost net need to know the exact virtio net hdr size to be able
to copy such header correctly. Teach it about the newly defined
UDP tunnel-related option and update the hdr size computation
accordingly.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/vhost/net.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0291fce24bbf..53491c67e89c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -77,7 +77,9 @@ enum {
 			 (1ULL << VIRTIO_F_RING_RESET)
 };
 
-const u64 VHOST_NET_ALL_FEATURES[VIRTIO_FEATURES_DWORDS] = { VHOST_NET_FEATURES };
+const u64 VHOST_NET_ALL_FEATURES[VIRTIO_FEATURES_DWORDS] = { VHOST_NET_FEATURES,
+	VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
+	VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO) };
 
 enum {
 	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
@@ -1626,6 +1628,12 @@ static int vhost_net_set_features(struct vhost_net *n, const u64 *features)
 		  sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 		  sizeof(struct virtio_net_hdr);
 
+	if (virtio_features_test_bit(features,
+				     VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO) ||
+	    virtio_features_test_bit(features,
+				     VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO))
+		hdr_len += sizeof(struct virtio_net_hdr_tunnel);
+
 	if (virtio_features_test_bit(features, VHOST_NET_F_VIRTIO_NET_HDR)) {
 		/* vhost provides vnet_hdr */
 		vhost_hlen = hdr_len;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 1/8] virtio: introduce extended features
  2025-06-06 11:45 ` [PATCH RFC v3 1/8] virtio: introduce extended features Paolo Abeni
@ 2025-06-08  5:49   ` Akihiko Odaki
  2025-06-12  8:50     ` Paolo Abeni
  2025-06-12  0:50   ` Jason Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Akihiko Odaki @ 2025-06-08  5:49 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/06/06 20:45, Paolo Abeni wrote:
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
> 
> Introduce extended features as a fixed size array of u64. To minimize
> the diffstat allows legacy driver to access the low 64 bits via a
> transparent union.
> 
> Introduce an extended get_extended_features128 configuration callback
> that devices supporting the extended features range must implement in
> place of the traditional one.

It's named as get_extended_features(); now it can grow to have more than 
128 bits, which is nice.

> 
> Note that legacy and transport features don't need any change, as
> they are always in the low 64 bit range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>    - uint128_t -> u64[2];
> v1 -> v2:
>    - let u64 VIRTIO_BIT() cope with higher bit values
>    - add .get_features128 instead of changing .get_features signature
> ---
>   drivers/virtio/virtio.c         | 39 +++++++++++-------
>   drivers/virtio/virtio_debug.c   | 27 +++++++------
>   include/linux/virtio.h          |  5 ++-
>   include/linux/virtio_config.h   | 35 ++++++++++++-----
>   include/linux/virtio_features.h | 70 +++++++++++++++++++++++++++++++++
>   5 files changed, 137 insertions(+), 39 deletions(-)
>   create mode 100644 include/linux/virtio_features.h
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 95d5d7993e5b..ed1ccedc47b3 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -53,7 +53,7 @@ static ssize_t features_show(struct device *_d,
>   
>   	/* We actually represent this as a bitstring, as it could be
>   	 * arbitrary length in future. */
> -	for (i = 0; i < sizeof(dev->features)*8; i++)
> +	for (i = 0; i < VIRTIO_FEATURES_MAX; i++)
>   		len += sysfs_emit_at(buf, len, "%c",
>   			       __virtio_test_bit(dev, i) ? '1' : '0');
>   	len += sysfs_emit_at(buf, len, "\n");
> @@ -272,22 +272,22 @@ static int virtio_dev_probe(struct device *_d)
>   	int err, i;
>   	struct virtio_device *dev = dev_to_virtio(_d);
>   	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> -	u64 device_features;
> -	u64 driver_features;
> +	u64 device_features[VIRTIO_FEATURES_DWORDS];
> +	u64 driver_features[VIRTIO_FEATURES_DWORDS];
>   	u64 driver_features_legacy;
>   
>   	/* We have a driver! */
>   	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>   
>   	/* Figure out what features the device supports. */
> -	device_features = dev->config->get_features(dev);
> +	virtio_get_features(dev, device_features);
>   
>   	/* Figure out what features the driver supports. */
> -	driver_features = 0;
> +	virtio_features_zero(driver_features);
>   	for (i = 0; i < drv->feature_table_size; i++) {
>   		unsigned int f = drv->feature_table[i];
> -		BUG_ON(f >= 64);
> -		driver_features |= (1ULL << f);
> +		BUG_ON(f >= VIRTIO_FEATURES_MAX);
> +		virtio_features_set_bit(driver_features, f);
>   	}
>   
>   	/* Some drivers have a separate feature table for virtio v1.0 */
> @@ -299,20 +299,25 @@ static int virtio_dev_probe(struct device *_d)
>   			driver_features_legacy |= (1ULL << f);
>   		}
>   	} else {
> -		driver_features_legacy = driver_features;
> +		driver_features_legacy = driver_features[0];
>   	}
>   
> -	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> -		dev->features = driver_features & device_features;
> -	else
> -		dev->features = driver_features_legacy & device_features;
> +	if (virtio_features_test_bit(device_features, VIRTIO_F_VERSION_1)) {
> +		for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
> +			dev->features_array[i] = driver_features[i] &
> +						 device_features[i];
> +	} else {
> +		virtio_features_from_u64(dev->features_array,
> +					 driver_features_legacy &
> +					 device_features[0]);
> +	}
>   
>   	/* When debugging, user may filter some features by hand. */
>   	virtio_debug_device_filter_features(dev);
>   
>   	/* Transport features always preserved to pass to finalize_features. */
>   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++)
> -		if (device_features & (1ULL << i))
> +		if (virtio_features_test_bit(device_features, i))
>   			__virtio_set_bit(dev, i);
>   
>   	err = dev->config->finalize_features(dev);
> @@ -320,14 +325,15 @@ static int virtio_dev_probe(struct device *_d)
>   		goto err;
>   
>   	if (drv->validate) {
> -		u64 features = dev->features;
> +		u64 features[VIRTIO_FEATURES_DWORDS];
>   
> +		virtio_features_copy(features, dev->features_array);
>   		err = drv->validate(dev);
>   		if (err)
>   			goto err;
>   
>   		/* Did validation change any features? Then write them again. */
> -		if (features != dev->features) {
> +		if (!virtio_features_equal(features, dev->features_array)) {
>   			err = dev->config->finalize_features(dev);
>   			if (err)
>   				goto err;
> @@ -701,6 +707,9 @@ EXPORT_SYMBOL_GPL(virtio_device_reset_done);
>   
>   static int virtio_init(void)
>   {
> +	BUILD_BUG_ON(offsetof(struct virtio_device, features) !=
> +		     offsetof(struct virtio_device, features_array[0]));
> +
>   	if (bus_register(&virtio_bus) != 0)
>   		panic("virtio bus registration failed");
>   	virtio_debug_init();
> diff --git a/drivers/virtio/virtio_debug.c b/drivers/virtio/virtio_debug.c
> index 95c8fc7705bb..6d066b5e8ec0 100644
> --- a/drivers/virtio/virtio_debug.c
> +++ b/drivers/virtio/virtio_debug.c
> @@ -8,13 +8,13 @@ static struct dentry *virtio_debugfs_dir;
>   
>   static int virtio_debug_device_features_show(struct seq_file *s, void *data)
>   {
> +	u64 device_features[VIRTIO_FEATURES_DWORDS];
>   	struct virtio_device *dev = s->private;
> -	u64 device_features;
>   	unsigned int i;
>   
> -	device_features = dev->config->get_features(dev);
> -	for (i = 0; i < BITS_PER_LONG_LONG; i++) {
> -		if (device_features & (1ULL << i))
> +	virtio_get_features(dev, device_features);
> +	for (i = 0; i < VIRTIO_FEATURES_MAX; i++) {
> +		if (virtio_features_test_bit(device_features, i))
>   			seq_printf(s, "%u\n", i);
>   	}
>   	return 0;
> @@ -26,8 +26,8 @@ static int virtio_debug_filter_features_show(struct seq_file *s, void *data)
>   	struct virtio_device *dev = s->private;
>   	unsigned int i;
>   
> -	for (i = 0; i < BITS_PER_LONG_LONG; i++) {
> -		if (dev->debugfs_filter_features & (1ULL << i))
> +	for (i = 0; i < VIRTIO_FEATURES_MAX; i++) {
> +		if (virtio_features_test_bit(dev->debugfs_filter_features, i))
>   			seq_printf(s, "%u\n", i);
>   	}
>   	return 0;
> @@ -39,7 +39,7 @@ static int virtio_debug_filter_features_clear(void *data, u64 val)
>   	struct virtio_device *dev = data;
>   
>   	if (val == 1)
> -		dev->debugfs_filter_features = 0;
> +		virtio_features_zero(dev->debugfs_filter_features);
>   	return 0;
>   }
>   
> @@ -50,9 +50,10 @@ static int virtio_debug_filter_feature_add(void *data, u64 val)
>   {
>   	struct virtio_device *dev = data;
>   
> -	if (val >= BITS_PER_LONG_LONG)
> +	if (val >= VIRTIO_FEATURES_MAX)
>   		return -EINVAL;
> -	dev->debugfs_filter_features |= BIT_ULL_MASK(val);
> +
> +	virtio_features_set_bit(dev->debugfs_filter_features, val);
>   	return 0;
>   }
>   
> @@ -63,9 +64,10 @@ static int virtio_debug_filter_feature_del(void *data, u64 val)
>   {
>   	struct virtio_device *dev = data;
>   
> -	if (val >= BITS_PER_LONG_LONG)
> +	if (val >= VIRTIO_FEATURES_MAX)
>   		return -EINVAL;
> -	dev->debugfs_filter_features &= ~BIT_ULL_MASK(val);
> +
> +	virtio_features_clear_bit(dev->debugfs_filter_features, val);
>   	return 0;
>   }
>   
> @@ -91,7 +93,8 @@ EXPORT_SYMBOL_GPL(virtio_debug_device_init);
>   
>   void virtio_debug_device_filter_features(struct virtio_device *dev)
>   {
> -	dev->features &= ~dev->debugfs_filter_features;
> +	virtio_features_and_not(dev->features_array, dev->features_array,
> +				dev->debugfs_filter_features);
>   }
>   EXPORT_SYMBOL_GPL(virtio_debug_device_filter_features);
>   
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 64cb4b04be7a..dcd3949572bd 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -11,6 +11,7 @@
>   #include <linux/gfp.h>
>   #include <linux/dma-mapping.h>
>   #include <linux/completion.h>
> +#include <linux/virtio_features.h>
>   
>   /**
>    * struct virtqueue - a queue to register buffers for sending or receiving.
> @@ -159,11 +160,11 @@ struct virtio_device {
>   	const struct virtio_config_ops *config;
>   	const struct vringh_config_ops *vringh_config;
>   	struct list_head vqs;
> -	u64 features;
> +	VIRTIO_DECLARE_FEATURES(features);
>   	void *priv;
>   #ifdef CONFIG_VIRTIO_DEBUG
>   	struct dentry *debugfs_dir;
> -	u64 debugfs_filter_features;
> +	u64 debugfs_filter_features[VIRTIO_FEATURES_DWORDS];
>   #endif
>   };
>   
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 169c7d367fac..83cf25b3028d 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -77,7 +77,10 @@ struct virtqueue_info {
>    *      vdev: the virtio_device
>    * @get_features: get the array of feature bits for this device.
>    *	vdev: the virtio_device
> - *	Returns the first 64 feature bits (all we currently need).
> + *	Returns the first 64 feature bits.
> + * @get_extended_features:
> + *      vdev: the virtio_device
> + *      Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
>    * @finalize_features: confirm what device features we'll be using.
>    *	vdev: the virtio_device
>    *	This sends the driver feature bits to the device: it can change
> @@ -121,6 +124,8 @@ struct virtio_config_ops {
>   	void (*del_vqs)(struct virtio_device *);
>   	void (*synchronize_cbs)(struct virtio_device *);
>   	u64 (*get_features)(struct virtio_device *vdev);
> +	void (*get_extended_features)(struct virtio_device *vdev,
> +				      u64 *features);
>   	int (*finalize_features)(struct virtio_device *vdev);
>   	const char *(*bus_name)(struct virtio_device *vdev);
>   	int (*set_vq_affinity)(struct virtqueue *vq,
> @@ -149,11 +154,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>   {
>   	/* Did you forget to fix assumptions on max features? */
>   	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);

This check is better to be moved into virtio_features_test_bit().

>   
> -	return vdev->features & BIT_ULL(fbit);
> +	return virtio_features_test_bit(vdev->features_array, fbit);
>   }
>   
>   /**
> @@ -166,11 +171,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
>   {
>   	/* Did you forget to fix assumptions on max features? */
>   	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   
> -	vdev->features |= BIT_ULL(fbit);
> +	virtio_features_set_bit(vdev->features_array, fbit);
>   }
>   
>   /**
> @@ -183,11 +188,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
>   {
>   	/* Did you forget to fix assumptions on max features? */
>   	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   
> -	vdev->features &= ~BIT_ULL(fbit);
> +	virtio_features_clear_bit(vdev->features_array, fbit);
>   }
>   
>   /**
> @@ -204,6 +209,16 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
>   	return __virtio_test_bit(vdev, fbit);
>   }
>   
> +static inline void virtio_get_features(struct virtio_device *vdev, u64 *features)
> +{
> +	if (vdev->config->get_extended_features) {
> +		vdev->config->get_extended_features(vdev, features);
> +		return;
> +	}
> +
> +	virtio_features_from_u64(features, vdev->config->get_features(vdev));
> +}
> +
>   /**
>    * virtio_has_dma_quirk - determine whether this device has the DMA quirk
>    * @vdev: the device
> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> new file mode 100644
> index 000000000000..42c3c7cee500
> --- /dev/null
> +++ b/include/linux/virtio_features.h
> @@ -0,0 +1,70 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_FEATURES_H
> +#define _LINUX_VIRTIO_FEATURES_H
> +
> +#include <linux/bits.h>
> +
> +#define VIRTIO_FEATURES_DWORDS	2
> +#define VIRTIO_FEATURES_MAX	(VIRTIO_FEATURES_DWORDS * 64)
> +#define VIRTIO_FEATURES_WORDS	(VIRTIO_FEATURES_DWORDS * 2)
> +#define VIRTIO_BIT(b)		BIT_ULL((b) & 0x3f)
> +#define VIRTIO_DWORD(b)		((b) >> 6)

You can use BIT_ULL_MASK() and BIT_ULL_WORD().

> +#define VIRTIO_DECLARE_FEATURES(name)			\
> +	union {						\
> +		u64 name;				\
> +		u64 name##_array[VIRTIO_FEATURES_DWORDS];\
> +	}

This is clever. I like this.

> +
> +static inline bool virtio_features_test_bit(const u64 *features,
> +					    unsigned int bit)
> +{
> +	return !!(features[VIRTIO_DWORD(bit)] & VIRTIO_BIT(bit));
> +}
> +
> +static inline void virtio_features_set_bit(u64 *features,
> +					   unsigned int bit)
> +{
> +	features[VIRTIO_DWORD(bit)] |= VIRTIO_BIT(bit);
> +}
> +
> +static inline void virtio_features_clear_bit(u64 *features,
> +					     unsigned int bit)
> +{
> +	features[VIRTIO_DWORD(bit)] &= ~VIRTIO_BIT(bit);
> +}
> +
> +static inline void virtio_features_zero(u64 *features)
> +{
> +	memset(features, 0, sizeof(features[0]) * VIRTIO_FEATURES_DWORDS);
> +}
> +
> +static inline void virtio_features_from_u64(u64 *features, u64 from)
> +{
> +	virtio_features_zero(features);
> +	features[0] = from;
> +}
> +
> +static inline bool virtio_features_equal(const u64 *f1, const u64 *f2)
> +{
> +	u64 diff = 0;
> +	int i;
> +
> +	for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
> +		diff |= f1[i] ^ f2[i];
> +	return !!diff;
> +}
> +
> +static inline void virtio_features_copy(u64 *to, const u64 *from)
> +{
> +	memcpy(to, from, sizeof(to[0]) * VIRTIO_FEATURES_DWORDS);
> +}
> +
> +static inline void virtio_features_and_not(u64 *to, const u64 *f1, const u64 *f2)

Other similar functions uses "andnot" instead "and_not" for their names.

Regards,
Akihiko Odaki

> +{
> +	int i;
> +
> +	for (i = 0; i < VIRTIO_FEATURES_DWORDS; i++)
> +		to[i] = f1[i] & ~f2[i];
> +}
> +
> +#endif


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 3/8] vhost-net: allow configuring extended features
  2025-06-06 11:45 ` [PATCH RFC v3 3/8] vhost-net: " Paolo Abeni
@ 2025-06-08  6:16   ` Akihiko Odaki
  2025-06-14  7:27     ` Paolo Abeni
  2025-06-12  1:31   ` Jason Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Akihiko Odaki @ 2025-06-08  6:16 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/06/06 20:45, Paolo Abeni wrote:
> Use the extended feature type for 'acked_features' and implement
> two new ioctls operation allowing the user-space to set/query an
> unbounded amount of features.
> 
> The actual number of processed features is limited by VIRTIO_FEATURES_MAX
> and attempts to set features above such limit fail with
> EOPNOTSUPP.
> 
> Note that: the legacy ioctls implicitly truncate the negotiated
> features to the lower 64 bits range and the 'acked_backend_features'
> field don't need conversion, as the only negotiated feature there
> is in the low 64 bit range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>    - virtio_features_t -> u64[2]
>    - add __counted_by annotation to vhost_features_array
> 
> v1 -> v2:
>    - change the ioctl to use an extensible API
> ---
>   drivers/vhost/net.c              | 85 +++++++++++++++++++++++++++-----
>   drivers/vhost/vhost.c            |  2 +-
>   drivers/vhost/vhost.h            |  4 +-
>   include/uapi/linux/vhost.h       |  7 +++
>   include/uapi/linux/vhost_types.h |  5 ++
>   5 files changed, 88 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3..0291fce24bbf 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -77,6 +77,8 @@ enum {
>   			 (1ULL << VIRTIO_F_RING_RESET)
>   };
>   
> +const u64 VHOST_NET_ALL_FEATURES[VIRTIO_FEATURES_DWORDS] = { VHOST_NET_FEATURES };

This should have static.

Probably it should be lower-case too. 
Documentation/process/coding-style.rst says: "Names of macros defining 
constants and labels in enums are capitalized". Note that variables are 
not named here.

I think it's also better to remove the definition of VHOST_NET_FEATURES 
since having two definitions with similar names and meaning is 
confusing. (Just in case you wonder: GCC is able to optimize accesses 
like "VHOST_NET_ALL_FEATURES[0]" to eliminate array accesses, by the way.)

> +
>   enum {
>   	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>   };
> @@ -1614,16 +1616,17 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>   	return err;
>   }
>   
> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
> +static int vhost_net_set_features(struct vhost_net *n, const u64 *features)
>   {
>   	size_t vhost_hlen, sock_hlen, hdr_len;
>   	int i;
>   
> -	hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
> -			       (1ULL << VIRTIO_F_VERSION_1))) ?
> -			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
> -			sizeof(struct virtio_net_hdr);
> -	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
> +	hdr_len = virtio_features_test_bit(features, VIRTIO_NET_F_MRG_RXBUF) ||
> +		  virtio_features_test_bit(features, VIRTIO_F_VERSION_1) ?
> +		  sizeof(struct virtio_net_hdr_mrg_rxbuf) :
> +		  sizeof(struct virtio_net_hdr);
> +
> +	if (virtio_features_test_bit(features, VHOST_NET_F_VIRTIO_NET_HDR)) {
>   		/* vhost provides vnet_hdr */
>   		vhost_hlen = hdr_len;
>   		sock_hlen = 0;
> @@ -1633,18 +1636,19 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features)
>   		sock_hlen = hdr_len;
>   	}
>   	mutex_lock(&n->dev.mutex);
> -	if ((features & (1 << VHOST_F_LOG_ALL)) &&
> +	if (virtio_features_test_bit(features, VHOST_F_LOG_ALL) &&
>   	    !vhost_log_access_ok(&n->dev))
>   		goto out_unlock;
>   
> -	if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
> +	if (virtio_features_test_bit(features, VIRTIO_F_ACCESS_PLATFORM)) {
>   		if (vhost_init_device_iotlb(&n->dev))
>   			goto out_unlock;
>   	}
>   
>   	for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
>   		mutex_lock(&n->vqs[i].vq.mutex);
> -		n->vqs[i].vq.acked_features = features;
> +		virtio_features_copy(n->vqs[i].vq.acked_features_array,
> +				     features);
>   		n->vqs[i].vhost_hlen = vhost_hlen;
>   		n->vqs[i].sock_hlen = sock_hlen;
>   		mutex_unlock(&n->vqs[i].vq.mutex);
> @@ -1681,12 +1685,13 @@ static long vhost_net_set_owner(struct vhost_net *n)
>   static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>   			    unsigned long arg)
>   {
> +	u64 all_features[VIRTIO_FEATURES_DWORDS];
>   	struct vhost_net *n = f->private_data;
>   	void __user *argp = (void __user *)arg;
>   	u64 __user *featurep = argp;
>   	struct vhost_vring_file backend;
> -	u64 features;
> -	int r;
> +	u64 features, count;
> +	int r, i;
>   
>   	switch (ioctl) {
>   	case VHOST_NET_SET_BACKEND:
> @@ -1703,7 +1708,63 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>   			return -EFAULT;
>   		if (features & ~VHOST_NET_FEATURES)
>   			return -EOPNOTSUPP;
> -		return vhost_net_set_features(n, features);
> +
> +		virtio_features_from_u64(all_features, features);
> +		return vhost_net_set_features(n, all_features);
> +	case VHOST_GET_FEATURES_ARRAY:
> +	{
> +		if (copy_from_user(&count, argp, sizeof(u64)))
> +			return -EFAULT;
> +
> +		/* Copy the net features, up to the user-provided buffer size */
> +		virtio_features_copy(all_features, VHOST_NET_ALL_FEATURES);
> +		argp += sizeof(u64);
> +		for (i = 0; i < min(count, VIRTIO_FEATURES_DWORDS); ++i) {
> +			i = array_index_nospec(i, VIRTIO_FEATURES_DWORDS);
> +			if (copy_to_user(argp, &all_features[i], sizeof(u64)))
> +				return -EFAULT;
> +
> +			argp += sizeof(u64);
> +		}

Simpler:

copy_to_user(argp, all_features, min(count, VIRTIO_FEATURES_DWORDS) * 
sizeof(u64));

> +
> +		/* Zero the trailing space provided by user-space, if any */
> +		if (i < count && clear_user(argp, (count - i) * sizeof(u64)))

I think checking i < count is a premature optimization; it doesn't 
matter even if we spend a bit longer because of the lack of the check.

> +			return -EFAULT;
> +		return 0;
> +	}
> +	case VHOST_SET_FEATURES_ARRAY:
> +	{
> +		u64 tmp[VIRTIO_FEATURES_DWORDS];
> +
> +		if (copy_from_user(&count, argp, sizeof(u64)))
> +			return -EFAULT;
> +
> +		virtio_features_zero(all_features);
> +		for (i = 0; i < min(count, VIRTIO_FEATURES_DWORDS); ++i) {
> +			argp += sizeof(u64);
> +			if (copy_from_user(&features, argp, sizeof(u64)))
> +				return -EFAULT;
> +
> +			all_features[i] = features;
> +		}
> +
> +		/* Any feature specified by user-space above VIRTIO_FEATURES_MAX is
> +		 * not supported by definition.
> +		 */
> +		for (; i < count; ++i) {
> +			if (copy_from_user(&features, argp, sizeof(u64)))
> +				return -EFAULT;
> +			if (features)
> +				return -EOPNOTSUPP;
> +		}
> +
> +		virtio_features_and_not(tmp, all_features, VHOST_NET_ALL_FEATURES);
> +		for (i = 0; i < VIRTIO_FEATURES_DWORDS; i++)
> +			if (tmp[i])

I think using virtio_features_and_not() helps much. Instead, we can 
check all_features[i] & ~VHOST_NET_ALL_FEATURES[i] here, allowing to 
remove the tmp array.

Regards,
Akihiko Odaki

> +				return -EOPNOTSUPP;> +
> +		return vhost_net_set_features(n, all_features);
> +	}
>   	case VHOST_GET_BACKEND_FEATURES:
>   		features = VHOST_NET_BACKEND_FEATURES;
>   		if (copy_to_user(featurep, &features, sizeof(features)))
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 63612faeab72..6d3b9f0a9163 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -372,7 +372,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>   	vq->log_used = false;
>   	vq->log_addr = -1ull;
>   	vq->private_data = NULL;
> -	vq->acked_features = 0;
> +	virtio_features_zero(vq->acked_features_array);
>   	vq->acked_backend_features = 0;
>   	vq->log_base = NULL;
>   	vq->error_ctx = NULL;
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index bb75a292d50c..d1aed35c4b07 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -133,7 +133,7 @@ struct vhost_virtqueue {
>   	struct vhost_iotlb *umem;
>   	struct vhost_iotlb *iotlb;
>   	void *private_data;
> -	u64 acked_features;
> +	VIRTIO_DECLARE_FEATURES(acked_features);
>   	u64 acked_backend_features;
>   	/* Log write descriptors */
>   	void __user *log_base;
> @@ -291,7 +291,7 @@ static inline void *vhost_vq_get_backend(struct vhost_virtqueue *vq)
>   
>   static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
>   {
> -	return vq->acked_features & (1ULL << bit);
> +	return virtio_features_test_bit(vq->acked_features_array, bit);
>   }
>   
>   static inline bool vhost_backend_has_feature(struct vhost_virtqueue *vq, int bit)
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index d4b3e2ae1314..d6ad01fbb8d2 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -235,4 +235,11 @@
>    */
>   #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
>   					      struct vhost_vring_state)
> +
> +/* Extended features manipulation */
> +#define VHOST_GET_FEATURES_ARRAY _IOR(VHOST_VIRTIO, 0x83, \
> +				       struct vhost_features_array)
> +#define VHOST_SET_FEATURES_ARRAY _IOW(VHOST_VIRTIO, 0x83, \
> +				       struct vhost_features_array)
> +
>   #endif
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..1c39cc5f5a31 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -110,6 +110,11 @@ struct vhost_msg_v2 {
>   	};
>   };
>   
> +struct vhost_features_array {
> +	__u64 count; /* number of entries present in features array */
> +	__u64 features[] __counted_by(count);
> +};
> +
>   struct vhost_memory_region {
>   	__u64 guest_phys_addr;
>   	__u64 memory_size; /* bytes */


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 1/8] virtio: introduce extended features
  2025-06-06 11:45 ` [PATCH RFC v3 1/8] virtio: introduce extended features Paolo Abeni
  2025-06-08  5:49   ` Akihiko Odaki
@ 2025-06-12  0:50   ` Jason Wang
  2025-06-12  9:05     ` Paolo Abeni
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Wang @ 2025-06-12  0:50 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
>
> Introduce extended features as a fixed size array of u64. To minimize
> the diffstat allows legacy driver to access the low 64 bits via a
> transparent union.
>
> Introduce an extended get_extended_features128 configuration callback
> that devices supporting the extended features range must implement in
> place of the traditional one.
>
> Note that legacy and transport features don't need any change, as
> they are always in the low 64 bit range.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>   - uint128_t -> u64[2];
> v1 -> v2:
>   - let u64 VIRTIO_BIT() cope with higher bit values
>   - add .get_features128 instead of changing .get_features signature
> ---
>  drivers/virtio/virtio.c         | 39 +++++++++++-------
>  drivers/virtio/virtio_debug.c   | 27 +++++++------
>  include/linux/virtio.h          |  5 ++-
>  include/linux/virtio_config.h   | 35 ++++++++++++-----
>  include/linux/virtio_features.h | 70 +++++++++++++++++++++++++++++++++
>  5 files changed, 137 insertions(+), 39 deletions(-)
>  create mode 100644 include/linux/virtio_features.h
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 95d5d7993e5b..ed1ccedc47b3 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -53,7 +53,7 @@ static ssize_t features_show(struct device *_d,
>
>         /* We actually represent this as a bitstring, as it could be
>          * arbitrary length in future. */
> -       for (i = 0; i < sizeof(dev->features)*8; i++)
> +       for (i = 0; i < VIRTIO_FEATURES_MAX; i++)
>                 len += sysfs_emit_at(buf, len, "%c",
>                                __virtio_test_bit(dev, i) ? '1' : '0');
>         len += sysfs_emit_at(buf, len, "\n");
> @@ -272,22 +272,22 @@ static int virtio_dev_probe(struct device *_d)
>         int err, i;
>         struct virtio_device *dev = dev_to_virtio(_d);
>         struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> -       u64 device_features;
> -       u64 driver_features;
> +       u64 device_features[VIRTIO_FEATURES_DWORDS];
> +       u64 driver_features[VIRTIO_FEATURES_DWORDS];
>         u64 driver_features_legacy;
>
>         /* We have a driver! */
>         virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>
>         /* Figure out what features the device supports. */
> -       device_features = dev->config->get_features(dev);
> +       virtio_get_features(dev, device_features);
>
>         /* Figure out what features the driver supports. */
> -       driver_features = 0;
> +       virtio_features_zero(driver_features);
>         for (i = 0; i < drv->feature_table_size; i++) {
>                 unsigned int f = drv->feature_table[i];
> -               BUG_ON(f >= 64);
> -               driver_features |= (1ULL << f);
> +               BUG_ON(f >= VIRTIO_FEATURES_MAX);
> +               virtio_features_set_bit(driver_features, f);

Instead of doing BUG_ON here, could we just stop at 128 bits?

>         }
>
>         /* Some drivers have a separate feature table for virtio v1.0 */
> @@ -299,20 +299,25 @@ static int virtio_dev_probe(struct device *_d)
>                         driver_features_legacy |= (1ULL << f);
>                 }
>         } else {
> -               driver_features_legacy = driver_features;
> +               driver_features_legacy = driver_features[0];
>         }
>
> -       if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> -               dev->features = driver_features & device_features;
> -       else
> -               dev->features = driver_features_legacy & device_features;
> +       if (virtio_features_test_bit(device_features, VIRTIO_F_VERSION_1)) {
> +               for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
> +                       dev->features_array[i] = driver_features[i] &
> +                                                device_features[i];
> +       } else {
> +               virtio_features_from_u64(dev->features_array,
> +                                        driver_features_legacy &
> +                                        device_features[0]);
> +       }
>
>         /* When debugging, user may filter some features by hand. */
>         virtio_debug_device_filter_features(dev);
>
>         /* Transport features always preserved to pass to finalize_features. */
>         for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++)
> -               if (device_features & (1ULL << i))
> +               if (virtio_features_test_bit(device_features, i))
>                         __virtio_set_bit(dev, i);
>
>         err = dev->config->finalize_features(dev);
> @@ -320,14 +325,15 @@ static int virtio_dev_probe(struct device *_d)
>                 goto err;
>
>         if (drv->validate) {
> -               u64 features = dev->features;
> +               u64 features[VIRTIO_FEATURES_DWORDS];
>
> +               virtio_features_copy(features, dev->features_array);
>                 err = drv->validate(dev);
>                 if (err)
>                         goto err;
>
>                 /* Did validation change any features? Then write them again. */
> -               if (features != dev->features) {
> +               if (!virtio_features_equal(features, dev->features_array)) {
>                         err = dev->config->finalize_features(dev);
>                         if (err)
>                                 goto err;
> @@ -701,6 +707,9 @@ EXPORT_SYMBOL_GPL(virtio_device_reset_done);
>
>  static int virtio_init(void)
>  {
> +       BUILD_BUG_ON(offsetof(struct virtio_device, features) !=
> +                    offsetof(struct virtio_device, features_array[0]));
> +
>         if (bus_register(&virtio_bus) != 0)
>                 panic("virtio bus registration failed");
>         virtio_debug_init();
> diff --git a/drivers/virtio/virtio_debug.c b/drivers/virtio/virtio_debug.c
> index 95c8fc7705bb..6d066b5e8ec0 100644
> --- a/drivers/virtio/virtio_debug.c
> +++ b/drivers/virtio/virtio_debug.c
> @@ -8,13 +8,13 @@ static struct dentry *virtio_debugfs_dir;
>
>  static int virtio_debug_device_features_show(struct seq_file *s, void *data)
>  {
> +       u64 device_features[VIRTIO_FEATURES_DWORDS];
>         struct virtio_device *dev = s->private;
> -       u64 device_features;
>         unsigned int i;
>
> -       device_features = dev->config->get_features(dev);
> -       for (i = 0; i < BITS_PER_LONG_LONG; i++) {
> -               if (device_features & (1ULL << i))
> +       virtio_get_features(dev, device_features);
> +       for (i = 0; i < VIRTIO_FEATURES_MAX; i++) {
> +               if (virtio_features_test_bit(device_features, i))
>                         seq_printf(s, "%u\n", i);
>         }
>         return 0;
> @@ -26,8 +26,8 @@ static int virtio_debug_filter_features_show(struct seq_file *s, void *data)
>         struct virtio_device *dev = s->private;
>         unsigned int i;
>
> -       for (i = 0; i < BITS_PER_LONG_LONG; i++) {
> -               if (dev->debugfs_filter_features & (1ULL << i))
> +       for (i = 0; i < VIRTIO_FEATURES_MAX; i++) {
> +               if (virtio_features_test_bit(dev->debugfs_filter_features, i))
>                         seq_printf(s, "%u\n", i);
>         }
>         return 0;
> @@ -39,7 +39,7 @@ static int virtio_debug_filter_features_clear(void *data, u64 val)
>         struct virtio_device *dev = data;
>
>         if (val == 1)
> -               dev->debugfs_filter_features = 0;
> +               virtio_features_zero(dev->debugfs_filter_features);
>         return 0;
>  }
>
> @@ -50,9 +50,10 @@ static int virtio_debug_filter_feature_add(void *data, u64 val)
>  {
>         struct virtio_device *dev = data;
>
> -       if (val >= BITS_PER_LONG_LONG)
> +       if (val >= VIRTIO_FEATURES_MAX)
>                 return -EINVAL;
> -       dev->debugfs_filter_features |= BIT_ULL_MASK(val);
> +
> +       virtio_features_set_bit(dev->debugfs_filter_features, val);
>         return 0;
>  }
>
> @@ -63,9 +64,10 @@ static int virtio_debug_filter_feature_del(void *data, u64 val)
>  {
>         struct virtio_device *dev = data;
>
> -       if (val >= BITS_PER_LONG_LONG)
> +       if (val >= VIRTIO_FEATURES_MAX)
>                 return -EINVAL;
> -       dev->debugfs_filter_features &= ~BIT_ULL_MASK(val);
> +
> +       virtio_features_clear_bit(dev->debugfs_filter_features, val);
>         return 0;
>  }
>
> @@ -91,7 +93,8 @@ EXPORT_SYMBOL_GPL(virtio_debug_device_init);
>
>  void virtio_debug_device_filter_features(struct virtio_device *dev)
>  {
> -       dev->features &= ~dev->debugfs_filter_features;
> +       virtio_features_and_not(dev->features_array, dev->features_array,
> +                               dev->debugfs_filter_features);
>  }
>  EXPORT_SYMBOL_GPL(virtio_debug_device_filter_features);
>
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 64cb4b04be7a..dcd3949572bd 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -11,6 +11,7 @@
>  #include <linux/gfp.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/completion.h>
> +#include <linux/virtio_features.h>
>
>  /**
>   * struct virtqueue - a queue to register buffers for sending or receiving.
> @@ -159,11 +160,11 @@ struct virtio_device {
>         const struct virtio_config_ops *config;
>         const struct vringh_config_ops *vringh_config;
>         struct list_head vqs;
> -       u64 features;
> +       VIRTIO_DECLARE_FEATURES(features);
>         void *priv;
>  #ifdef CONFIG_VIRTIO_DEBUG
>         struct dentry *debugfs_dir;
> -       u64 debugfs_filter_features;
> +       u64 debugfs_filter_features[VIRTIO_FEATURES_DWORDS];
>  #endif
>  };
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 169c7d367fac..83cf25b3028d 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -77,7 +77,10 @@ struct virtqueue_info {
>   *      vdev: the virtio_device
>   * @get_features: get the array of feature bits for this device.
>   *     vdev: the virtio_device
> - *     Returns the first 64 feature bits (all we currently need).
> + *     Returns the first 64 feature bits.
> + * @get_extended_features:
> + *      vdev: the virtio_device
> + *      Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
>   * @finalize_features: confirm what device features we'll be using.
>   *     vdev: the virtio_device
>   *     This sends the driver feature bits to the device: it can change
> @@ -121,6 +124,8 @@ struct virtio_config_ops {
>         void (*del_vqs)(struct virtio_device *);
>         void (*synchronize_cbs)(struct virtio_device *);
>         u64 (*get_features)(struct virtio_device *vdev);
> +       void (*get_extended_features)(struct virtio_device *vdev,
> +                                     u64 *features);

I think it would be better to add a size to simplify the future extension.

>         int (*finalize_features)(struct virtio_device *vdev);
>         const char *(*bus_name)(struct virtio_device *vdev);
>         int (*set_vq_affinity)(struct virtqueue *vq,
> @@ -149,11 +154,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>  {
>         /* Did you forget to fix assumptions on max features? */
>         if (__builtin_constant_p(fbit))
> -               BUILD_BUG_ON(fbit >= 64);
> +               BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>         else
> -               BUG_ON(fbit >= 64);
> +               BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>
> -       return vdev->features & BIT_ULL(fbit);
> +       return virtio_features_test_bit(vdev->features_array, fbit);
>  }
>
>  /**
> @@ -166,11 +171,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
>  {
>         /* Did you forget to fix assumptions on max features? */
>         if (__builtin_constant_p(fbit))
> -               BUILD_BUG_ON(fbit >= 64);
> +               BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>         else
> -               BUG_ON(fbit >= 64);
> +               BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>
> -       vdev->features |= BIT_ULL(fbit);
> +       virtio_features_set_bit(vdev->features_array, fbit);
>  }
>
>  /**
> @@ -183,11 +188,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
>  {
>         /* Did you forget to fix assumptions on max features? */
>         if (__builtin_constant_p(fbit))
> -               BUILD_BUG_ON(fbit >= 64);
> +               BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>         else
> -               BUG_ON(fbit >= 64);
> +               BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>
> -       vdev->features &= ~BIT_ULL(fbit);
> +       virtio_features_clear_bit(vdev->features_array, fbit);
>  }
>
>  /**
> @@ -204,6 +209,16 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
>         return __virtio_test_bit(vdev, fbit);
>  }
>
> +static inline void virtio_get_features(struct virtio_device *vdev, u64 *features)
> +{
> +       if (vdev->config->get_extended_features) {
> +               vdev->config->get_extended_features(vdev, features);
> +               return;
> +       }
> +
> +       virtio_features_from_u64(features, vdev->config->get_features(vdev));
> +}
> +
>  /**
>   * virtio_has_dma_quirk - determine whether this device has the DMA quirk
>   * @vdev: the device
> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> new file mode 100644
> index 000000000000..42c3c7cee500
> --- /dev/null
> +++ b/include/linux/virtio_features.h
> @@ -0,0 +1,70 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_FEATURES_H
> +#define _LINUX_VIRTIO_FEATURES_H
> +
> +#include <linux/bits.h>
> +
> +#define VIRTIO_FEATURES_DWORDS 2
> +#define VIRTIO_FEATURES_MAX    (VIRTIO_FEATURES_DWORDS * 64)
> +#define VIRTIO_FEATURES_WORDS  (VIRTIO_FEATURES_DWORDS * 2)
> +#define VIRTIO_BIT(b)          BIT_ULL((b) & 0x3f)
> +#define VIRTIO_DWORD(b)                ((b) >> 6)
> +#define VIRTIO_DECLARE_FEATURES(name)                  \
> +       union {                                         \
> +               u64 name;                               \
> +               u64 name##_array[VIRTIO_FEATURES_DWORDS];\
> +       }
> +
> +static inline bool virtio_features_test_bit(const u64 *features,
> +                                           unsigned int bit)
> +{
> +       return !!(features[VIRTIO_DWORD(bit)] & VIRTIO_BIT(bit));
> +}
> +
> +static inline void virtio_features_set_bit(u64 *features,
> +                                          unsigned int bit)
> +{
> +       features[VIRTIO_DWORD(bit)] |= VIRTIO_BIT(bit);
> +}
> +
> +static inline void virtio_features_clear_bit(u64 *features,
> +                                            unsigned int bit)
> +{
> +       features[VIRTIO_DWORD(bit)] &= ~VIRTIO_BIT(bit);
> +}
> +
> +static inline void virtio_features_zero(u64 *features)
> +{
> +       memset(features, 0, sizeof(features[0]) * VIRTIO_FEATURES_DWORDS);
> +}
> +
> +static inline void virtio_features_from_u64(u64 *features, u64 from)
> +{
> +       virtio_features_zero(features);
> +       features[0] = from;
> +}
> +
> +static inline bool virtio_features_equal(const u64 *f1, const u64 *f2)
> +{
> +       u64 diff = 0;
> +       int i;
> +
> +       for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
> +               diff |= f1[i] ^ f2[i];
> +       return !!diff;

Nit: we can return false early here.

> +}
> +
> +static inline void virtio_features_copy(u64 *to, const u64 *from)
> +{
> +       memcpy(to, from, sizeof(to[0]) * VIRTIO_FEATURES_DWORDS);
> +}
> +
> +static inline void virtio_features_and_not(u64 *to, const u64 *f1, const u64 *f2)
> +{
> +       int i;
> +
> +       for (i = 0; i < VIRTIO_FEATURES_DWORDS; i++)
> +               to[i] = f1[i] & ~f2[i];
> +}
> +
> +#endif
> --
> 2.49.0

Others look good to me.

Thanks

>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring extended features
  2025-06-06 11:45 ` [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring " Paolo Abeni
@ 2025-06-12  0:57   ` Jason Wang
  2025-06-12  9:34     ` Paolo Abeni
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2025-06-12  0:57 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
>
> Extend the virtio pci modern driver to support configuring the full
> virtio features range, replacing the unrolled loops reading and
> writing the features space with explicit one bounded to the actual
> features space size in word and implementing the get_extended_features
> callback.
>
> Note that in vp_finalize_features() we only need to cache the lower
> 64 features bits, to process the transport features.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>   - virtio_features_t -> u64 *
>
> v1 -> v2:
>   - use get_extended_features
> ---
>  drivers/virtio/virtio_pci_modern.c     | 10 ++--
>  drivers/virtio/virtio_pci_modern_dev.c | 69 +++++++++++++++-----------
>  include/linux/virtio_pci_modern.h      | 46 +++++++++++++++--
>  3 files changed, 87 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index d50fe030d825..255203441201 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -22,11 +22,11 @@
>
>  #define VIRTIO_AVQ_SGS_MAX     4
>
> -static u64 vp_get_features(struct virtio_device *vdev)
> +static void vp_get_features(struct virtio_device *vdev, u64 *features)
>  {
>         struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>
> -       return vp_modern_get_features(&vp_dev->mdev);
> +       vp_modern_get_extended_features(&vp_dev->mdev, features);
>  }
>
>  static int vp_avq_index(struct virtio_device *vdev, u16 *index, u16 *num)
> @@ -426,7 +426,7 @@ static int vp_finalize_features(struct virtio_device *vdev)
>         if (vp_check_common_size(vdev))
>                 return -EINVAL;
>
> -       vp_modern_set_features(&vp_dev->mdev, vdev->features);
> +       vp_modern_set_extended_features(&vp_dev->mdev, vdev->features_array);
>
>         return 0;
>  }
> @@ -1223,7 +1223,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>         .find_vqs       = vp_modern_find_vqs,
>         .del_vqs        = vp_del_vqs,
>         .synchronize_cbs = vp_synchronize_vectors,
> -       .get_features   = vp_get_features,
> +       .get_extended_features = vp_get_features,
>         .finalize_features = vp_finalize_features,
>         .bus_name       = vp_bus_name,
>         .set_vq_affinity = vp_set_vq_affinity,
> @@ -1243,7 +1243,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>         .find_vqs       = vp_modern_find_vqs,
>         .del_vqs        = vp_del_vqs,
>         .synchronize_cbs = vp_synchronize_vectors,
> -       .get_features   = vp_get_features,
> +       .get_extended_features = vp_get_features,
>         .finalize_features = vp_finalize_features,
>         .bus_name       = vp_bus_name,
>         .set_vq_affinity = vp_set_vq_affinity,
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 0d3dbfaf4b23..d665f8f73ea8 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -388,63 +388,74 @@ void vp_modern_remove(struct virtio_pci_modern_device *mdev)
>  EXPORT_SYMBOL_GPL(vp_modern_remove);
>
>  /*
> - * vp_modern_get_features - get features from device
> + * vp_modern_get_extended_features - get features from device
>   * @mdev: the modern virtio-pci device
> + * @features: the features array to be filled
>   *
> - * Returns the features read from the device
> + * Fill the specified features array with the features read from the device
>   */
> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> +void vp_modern_get_extended_features(struct virtio_pci_modern_device *mdev,
> +                                    u64 *features)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> +       int i;
>
> -       u64 features;
> +       virtio_features_zero(features);
> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> +               u64 cur;
>
> -       vp_iowrite32(0, &cfg->device_feature_select);
> -       features = vp_ioread32(&cfg->device_feature);
> -       vp_iowrite32(1, &cfg->device_feature_select);
> -       features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
> -
> -       return features;
> +               vp_iowrite32(i, &cfg->device_feature_select);
> +               cur = vp_ioread32(&cfg->device_feature);
> +               features[i >> 1] |= cur << (32 * (i & 1));
> +       }
>  }
> -EXPORT_SYMBOL_GPL(vp_modern_get_features);
> +EXPORT_SYMBOL_GPL(vp_modern_get_extended_features);
>
>  /*
>   * vp_modern_get_driver_features - get driver features from device
>   * @mdev: the modern virtio-pci device
> + * @features: the features array to be filled
>   *
> - * Returns the driver features read from the device
> + * Fill the specified features array with the driver features read from the
> + * device
>   */
> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
> +void
> +vp_modern_get_driver_extended_features(struct virtio_pci_modern_device *mdev,
> +                                      u64 *features)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> +       int i;
>
> -       u64 features;
> -
> -       vp_iowrite32(0, &cfg->guest_feature_select);
> -       features = vp_ioread32(&cfg->guest_feature);
> -       vp_iowrite32(1, &cfg->guest_feature_select);
> -       features |= ((u64)vp_ioread32(&cfg->guest_feature) << 32);
> +       virtio_features_zero(features);
> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> +               u64 cur;
>
> -       return features;
> +               vp_iowrite32(i, &cfg->guest_feature_select);
> +               cur = vp_ioread32(&cfg->guest_feature);
> +               features[i >> 1] |= cur << (32 * (i & 1));
> +       }
>  }
> -EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
> +EXPORT_SYMBOL_GPL(vp_modern_get_driver_extended_features);
>
>  /*
> - * vp_modern_set_features - set features to device
> + * vp_modern_set_extended_features - set features to device
>   * @mdev: the modern virtio-pci device
>   * @features: the features set to device
>   */
> -void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
> -                           u64 features)
> +void vp_modern_set_extended_features(struct virtio_pci_modern_device *mdev,
> +                                    const u64 *features)
>  {
>         struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> +       int i;
> +
> +       for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
> +               u32 cur = features[i >> 1] >> (32 * (i & 1));
>
> -       vp_iowrite32(0, &cfg->guest_feature_select);
> -       vp_iowrite32((u32)features, &cfg->guest_feature);
> -       vp_iowrite32(1, &cfg->guest_feature_select);
> -       vp_iowrite32(features >> 32, &cfg->guest_feature);
> +               vp_iowrite32(i, &cfg->guest_feature_select);
> +               vp_iowrite32(cur, &cfg->guest_feature);
> +       }
>  }
> -EXPORT_SYMBOL_GPL(vp_modern_set_features);
> +EXPORT_SYMBOL_GPL(vp_modern_set_extended_features);
>
>  /*
>   * vp_modern_generation - get the device genreation
> diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
> index c0b1b1ca1163..0764802a50ea 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_PCI_MODERN_H
>
>  #include <linux/pci.h>
> +#include <linux/virtio_config.h>
>  #include <linux/virtio_pci.h>
>
>  /**
> @@ -95,10 +96,47 @@ static inline void vp_iowrite64_twopart(u64 val,
>         vp_iowrite32(val >> 32, hi);
>  }
>
> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
> -void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
> -                    u64 features);
> +void
> +vp_modern_get_driver_extended_features(struct virtio_pci_modern_device *mdev,
> +                                      u64 *features);
> +void vp_modern_get_extended_features(struct virtio_pci_modern_device *mdev,
> +                                    u64 *features);
> +void vp_modern_set_extended_features(struct virtio_pci_modern_device *mdev,
> +                                    const u64 *features);
> +
> +static inline u64
> +vp_modern_get_features(struct virtio_pci_modern_device *mdev)
> +{
> +       u64 features_array[VIRTIO_FEATURES_DWORDS];
> +       int i;
> +
> +       vp_modern_get_extended_features(mdev, features_array);
> +       for (i = 1; i < VIRTIO_FEATURES_DWORDS; ++i)
> +               WARN_ON_ONCE(features_array[i]);

It looks to me it's sufficient and safe to just return
featuers_array[0] here. Or maybe we need some comment to explain why
we need WARN here.

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 3/8] vhost-net: allow configuring extended features
  2025-06-06 11:45 ` [PATCH RFC v3 3/8] vhost-net: " Paolo Abeni
  2025-06-08  6:16   ` Akihiko Odaki
@ 2025-06-12  1:31   ` Jason Wang
  1 sibling, 0 replies; 37+ messages in thread
From: Jason Wang @ 2025-06-12  1:31 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Use the extended feature type for 'acked_features' and implement
> two new ioctls operation allowing the user-space to set/query an
> unbounded amount of features.
>
> The actual number of processed features is limited by VIRTIO_FEATURES_MAX
> and attempts to set features above such limit fail with
> EOPNOTSUPP.
>
> Note that: the legacy ioctls implicitly truncate the negotiated
> features to the lower 64 bits range and the 'acked_backend_features'
> field don't need conversion, as the only negotiated feature there
> is in the low 64 bit range.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-06-06 11:45 ` [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
@ 2025-06-12  3:53   ` Jason Wang
  2025-06-12 10:10     ` Paolo Abeni
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2025-06-12  3:53 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The virtio specification are introducing support for GSO over
> UDP tunnel.
>
> This patch brings in the needed defines and the additional
> virtio hdr parsing/building helpers.
>
> The UDP tunnel support uses additional fields in the virtio hdr,
> and such fields location can change depending on other negotiated
> features - specifically VIRTIO_NET_F_HASH_REPORT.
>
> Try to be as conservative as possible with the new field validation.
>
> Existing implementation for plain GSO offloads allow for invalid/
> self-contradictory values of such fields. With GSO over UDP tunnel we can
> be more strict, with no need to deal with legacy implementation.
>
> Since the checksum-related field validation is asymmetric in the driver
> and in the device, introduce a separate helper to implement the new checks
> (to be used only on the driver side).
>
> Note that while the feature space exceeds the 64-bit boundaries, the
> guest offload space is fixed by the specification of the
> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command to a 64-bit size.
>
> Prior to the UDP tunnel GSO support, each guest offload bit corresponded
> to the feature bit with the same value and vice versa.
>
> Due to the limited 'guest offload' space, relevant features in the high
> 64 bits are 'mapped' to free bits in the lower range. That is simpler
> than defining a new command (and associated features) to exchange an
> extended guest offloads set.
>
> As a consequence, the uAPIs also specify the mapped guest offload value
> corresponding to the UDP tunnel GSO features.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> --
> v2 -> v3:
>   - add definitions for possible vnet hdr layouts with tunnel support
>
> v1 -> v2:
>   - 'relay' -> 'rely' typo
>   - less unclear comment WRT enforced inner GSO checks
>   - inner header fields are allowed only with 'modern' virtio,
>     thus are always le
>   - clarified in the commit message the need for 'mapped features'
>     defines
>   - assume little_endian is true when UDP GSO is enabled.
>   - fix inner proto type value
> ---
>  include/linux/virtio_net.h      | 191 ++++++++++++++++++++++++++++++--
>  include/uapi/linux/virtio_net.h |  43 +++++++
>  2 files changed, 226 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 02a9f4dc594d..bcf80534a739 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
>         return 0;
>  }
>
> -static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> -                                       const struct virtio_net_hdr *hdr,
> -                                       bool little_endian)
> +static inline int __virtio_net_hdr_to_skb(struct sk_buff *skb,
> +                                         const struct virtio_net_hdr *hdr,
> +                                         bool little_endian, u8 hdr_gso_type)
>  {
>         unsigned int nh_min_len = sizeof(struct iphdr);
>         unsigned int gso_type = 0;
> @@ -57,8 +57,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>         unsigned int p_off = 0;
>         unsigned int ip_proto;
>
> -       if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> -               switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> +       if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +               switch (hdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
>                 case VIRTIO_NET_HDR_GSO_TCPV4:
>                         gso_type = SKB_GSO_TCPV4;
>                         ip_proto = IPPROTO_TCP;
> @@ -84,7 +84,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>                         return -EINVAL;
>                 }
>
> -               if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> +               if (hdr_gso_type & VIRTIO_NET_HDR_GSO_ECN)
>                         gso_type |= SKB_GSO_TCP_ECN;
>
>                 if (hdr->gso_size == 0)
> @@ -122,7 +122,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>
>                                 if (!protocol)
>                                         virtio_net_hdr_set_proto(skb, hdr);
> -                               else if (!virtio_net_hdr_match_proto(protocol, hdr->gso_type))
> +                               else if (!virtio_net_hdr_match_proto(protocol, hdr_gso_type))
>                                         return -EINVAL;
>                                 else
>                                         skb->protocol = protocol;
> @@ -153,7 +153,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>                 }
>         }
>
> -       if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +       if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
>                 u16 gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
>                 unsigned int nh_off = p_off;
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
> @@ -199,6 +199,13 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>         return 0;
>  }
>
> +static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> +                                       const struct virtio_net_hdr *hdr,
> +                                       bool little_endian)
> +{
> +       return __virtio_net_hdr_to_skb(skb, hdr, little_endian, hdr->gso_type);
> +}
> +
>  static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>                                           struct virtio_net_hdr *hdr,
>                                           bool little_endian,
> @@ -242,4 +249,172 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>         return 0;
>  }
>
> +static inline unsigned int virtio_l3min(bool is_ipv6)
> +{
> +       return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
> +}
> +
> +static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
> +                                           const struct virtio_net_hdr *hdr,
> +                                           unsigned int tnl_hdr_offset,
> +                                           bool tnl_csum_negotiated,
> +                                           bool little_endian)
> +{
> +       u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
> +       unsigned int inner_nh, outer_th, inner_th;
> +       unsigned int inner_l3min, outer_l3min;
> +       struct virtio_net_hdr_tunnel *tnl;
> +       bool outer_isv6, inner_isv6;
> +       u8 gso_inner_type;
> +       int ret;
> +
> +       if (!gso_tunnel_type)
> +               return virtio_net_hdr_to_skb(skb, hdr, little_endian);
> +
> +       /* Tunnel not supported/negotiated, but the hdr asks for it. */
> +       if (!tnl_hdr_offset)
> +               return -EINVAL;
> +
> +       /* Either ipv4 or ipv6. */
> +       if (gso_tunnel_type == VIRTIO_NET_HDR_GSO_UDP_TUNNEL)
> +               return -EINVAL;
> +
> +       /* The UDP tunnel must carry a GSO packet, but no UFO. */
> +       gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
> +                                          VIRTIO_NET_HDR_GSO_UDP_TUNNEL);
> +       if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
> +               return -EINVAL;
> +
> +       /* Rely on csum being present. */
> +       if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
> +               return -EINVAL;
> +
> +       /* Validate offsets. */
> +       outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
> +       inner_isv6 = gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6;
> +       inner_l3min = virtio_l3min(inner_isv6);
> +       outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
> +
> +       tnl = ((void *)hdr) + tnl_hdr_offset;
> +       inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
> +       inner_nh = le16_to_cpu(tnl->inner_nh_offset);
> +       outer_th = le16_to_cpu(tnl->outer_th_offset);
> +       if (outer_th < outer_l3min ||
> +           inner_nh < outer_th + sizeof(struct udphdr) ||
> +           inner_th < inner_nh + inner_l3min)
> +               return -EINVAL;
> +
> +       /* Let the basic parsing deal with plain GSO features. */
> +       ret = __virtio_net_hdr_to_skb(skb, hdr, true,
> +                                     hdr->gso_type & ~gso_tunnel_type);
> +       if (ret)
> +               return ret;
> +
> +       /* In case of USO, the inner protocol is still unknown and
> +        * `inner_isv6` is just a guess, additional parsing is needed.
> +        * The previous validation ensures that accessing an ipv4 inner
> +        * network header is safe.
> +        */
> +       if (gso_inner_type == VIRTIO_NET_HDR_GSO_UDP_L4) {
> +               struct iphdr *iphdr = (struct iphdr *)(skb->data + inner_nh);
> +
> +               inner_isv6 = iphdr->version == 6;
> +               inner_l3min = virtio_l3min(inner_isv6);
> +               if (inner_th < inner_nh + inner_l3min)
> +                       return -EINVAL;
> +       }
> +
> +       skb_set_inner_protocol(skb, inner_isv6 ? htons(ETH_P_IPV6) :
> +                                                htons(ETH_P_IP));
> +       if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
> +               if (!tnl_csum_negotiated)
> +                       return -EINVAL;
> +
> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
> +       } else {
> +               skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
> +       }
> +
> +       skb->inner_transport_header = inner_th + skb_headroom(skb);
> +       skb->inner_network_header = inner_nh + skb_headroom(skb);
> +       skb->inner_mac_header = inner_nh + skb_headroom(skb);
> +       skb->transport_header = outer_th + skb_headroom(skb);
> +       skb->encapsulation = 1;
> +       return 0;
> +}
> +
> +/* Checksum-related fields validation for the driver */
> +static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
> +                                           struct virtio_net_hdr *hdr,
> +                                           bool tnl_csum_negotiated)
> +{
> +       if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
> +                       return 0;
> +
> +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> +               if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
> +                       return 0;
> +
> +               /* tunnel csum packets are invalid when the related
> +                * feature has not been negotiated
> +                */
> +               if (!tnl_csum_negotiated)
> +                       return -EINVAL;
> +               skb->csum_level = 1;
> +               return 0;
> +       }
> +
> +       /* DATA_VALID is mutually exclusive with NEEDS_CSUM, and GSO
> +        * over UDP tunnel requires the latter
> +        */
> +       if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
> +               return -EINVAL;
> +       return 0;
> +}
> +
> +static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
> +                                             struct virtio_net_hdr *hdr,
> +                                             unsigned int tnl_offset,
> +                                             bool little_endian,
> +                                             int vlan_hlen)
> +{
> +       struct virtio_net_hdr_tunnel *tnl;
> +       unsigned int inner_nh, outer_th;
> +       int tnl_gso_type;
> +       int ret;
> +
> +       tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
> +                                                   SKB_GSO_UDP_TUNNEL_CSUM);
> +       if (!tnl_gso_type)
> +               return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
> +                                              vlan_hlen);

So tun_vnet_hdr_from_skb() has

        int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
        int tnl_offset = tun_vnet_tnl_offset(flags);

        if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
                                        tun_vnet_is_little_endian(flags),
                                        vlan_hlen)) {


It looks like the outer vlan_hlen is used for the inner here?

> +
> +       /* Tunnel support not negotiated but skb ask for it. */
> +       if (!tnl_offset)
> +               return -EINVAL;
> +
> +       /* Let the basic parsing deal with plain GSO features. */
> +       skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
> +       ret = virtio_net_hdr_from_skb(skb, hdr, true, false, vlan_hlen);
> +       skb_shinfo(skb)->gso_type |= tnl_gso_type;
> +       if (ret)
> +               return ret;
> +
> +       if (skb->protocol == htons(ETH_P_IPV6))
> +               hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
> +       else
> +               hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4;
> +
> +       if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
> +               hdr->flags |= VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM;
> +
> +       tnl = ((void *)hdr) + tnl_offset;
> +       inner_nh = skb->inner_network_header - skb_headroom(skb);
> +       outer_th = skb->transport_header - skb_headroom(skb);
> +       tnl->inner_nh_offset = cpu_to_le16(inner_nh);
> +       tnl->outer_th_offset = cpu_to_le16(outer_th);
> +       return 0;
> +}
> +
>  #endif /* _LINUX_VIRTIO_NET_H */
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 963540deae66..313761be99b2 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -70,6 +70,28 @@
>                                          * with the same MAC.
>                                          */
>  #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
> +                                             * GSO-over-UDP-tunnel packets
> +                                             */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
> +                                                  * GSO-over-UDP-tunnel
> +                                                  * packets with partial csum
> +                                                  * for the outer header
> +                                                  */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
> +                                            * GSO-over-UDP-tunnel packets
> +                                            */
> +#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
> +                                                 * GSO-over-UDP-tunnel
> +                                                 * packets with partial csum
> +                                                 * for the outer header
> +                                                 */
> +
> +/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
> + * features
> + */
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED       46
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED  47
>
>  #ifndef VIRTIO_NET_NO_LEGACY
>  #define VIRTIO_NET_F_GSO       6       /* Host handles pkts w/ any GSO type */
> @@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1       /* Use csum_start, csum_offset */
>  #define VIRTIO_NET_HDR_F_DATA_VALID    2       /* Csum is valid */
>  #define VIRTIO_NET_HDR_F_RSC_INFO      4       /* rsc info in csum_ fields */
> +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8     /* UDP tunnel requires csum offload */
>         __u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE                0       /* Not a GSO frame */
>  #define VIRTIO_NET_HDR_GSO_TCPV4       1       /* GSO frame, IPv4 TCP (TSO) */
>  #define VIRTIO_NET_HDR_GSO_UDP         3       /* GSO frame, IPv4 UDP (UFO) */
>  #define VIRTIO_NET_HDR_GSO_TCPV6       4       /* GSO frame, IPv6 TCP */
>  #define VIRTIO_NET_HDR_GSO_UDP_L4      5       /* GSO frame, IPv4& IPv6 UDP (USO) */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
> +                                      VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
>  #define VIRTIO_NET_HDR_GSO_ECN         0x80    /* TCP has ECN set */
>         __u8 gso_type;
>         __virtio16 hdr_len;     /* Ethernet + IP + tcp/udp hdrs */
> @@ -181,6 +208,22 @@ struct virtio_net_hdr_v1_hash {
>         __le16 padding;
>  };
>
> +/* This header after hashing information */
> +struct virtio_net_hdr_tunnel {
> +       __le16 outer_th_offset;
> +       __le16 inner_nh_offset;
> +};
> +
> +struct virtio_net_hdr_v1_tunnel {
> +       struct virtio_net_hdr_v1 hdr;
> +       struct virtio_net_hdr_tunnel tnl;
> +};
> +
> +struct virtio_net_hdr_v1_hash_tunnel {
> +       struct virtio_net_hdr_v1_hash hdr;
> +       struct virtio_net_hdr_tunnel tnl;
> +};

Not a native speaker but I realize there's probably an issue:

        le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
        le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
        le16 outer_th_offset    (Only if
VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
negotiated)
        le16 inner_nh_offset;   (Only if
VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
negotiated)
        le16 outer_nh_offset;   /* Only if VIRTIO_NET_F_OUT_NET_HEADER
negotiated */
        /* Only if VIRTIO_NET_F_OUT_NET_HEADER or VIRTIO_NET_F_IPSEC
negotiated */
        union {
                u8 padding_reserved_2[6];
                struct ipsec_resource_hdr {
                        le32 resource_id;
                        le16 resource_type;
                } ipsec_resource_hdr;
        };

I thought e.g outer_th_offset should have a fixed offset then
everything is simplified but it looks not the case here. If we decide
to do things like this, we will end up with a very huge uAPI
definition for different features combinations. This doesn't follow
the existing headers for example num_buffers exist no matter if
MRG_RXBUF is negotiated.

At least, if we decide to go with the dynamic offset, it seems less
valuable to define those headers with different combinations if both
device and driver process the vnet header piece wisely

> +
>  #ifndef VIRTIO_NET_NO_LEGACY
>  /* This header comes first in the scatter-gather list.
>   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
> --
> 2.49.0
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-06-06 11:45 ` [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
@ 2025-06-12  4:05   ` Jason Wang
  2025-06-12 10:17     ` Paolo Abeni
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2025-06-12  4:05 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> If the related virtio feature is set, enable transmission and reception
> of gso over UDP tunnel packets.
>
> Most of the work is done by the previously introduced helper, just need
> to determine the UDP tunnel features inside the virtio_net_hdr and
> update accordingly the virtio net hdr size.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>   - drop the VIRTIO_HAS_EXTENDED_FEATURES conditionals
>
> v1 -> v2:
>   - test for UDP_TUNNEL_GSO* only on builds with extended features support
>   - comment indentation cleanup
>   - rebased on top of virtio helpers changes
>   - dump more information in case of bad offloads
> ---
>  drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++---------
>  1 file changed, 54 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 18ad50de4928..0b234f318e39 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -78,15 +78,19 @@ static const unsigned long guest_offloads[] = {
>         VIRTIO_NET_F_GUEST_CSUM,
>         VIRTIO_NET_F_GUEST_USO4,
>         VIRTIO_NET_F_GUEST_USO6,
> -       VIRTIO_NET_F_GUEST_HDRLEN
> +       VIRTIO_NET_F_GUEST_HDRLEN,
> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
>  };
>
>  #define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> -                               (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> -                               (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
> -                               (1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
> -                               (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
> -                               (1ULL << VIRTIO_NET_F_GUEST_USO6))
> +                       (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> +                       (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
> +                       (1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
> +                       (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
> +                       (1ULL << VIRTIO_NET_F_GUEST_USO6) | \
> +                       (1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
> +                       (1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
>
>  struct virtnet_stat_desc {
>         char desc[ETH_GSTRING_LEN];
> @@ -436,9 +440,14 @@ struct virtnet_info {
>         /* Packet virtio header size */
>         u8 hdr_len;
>
> +       /* UDP tunnel support */
> +       u8 tnl_offset;
> +
>         /* Work struct for delayed refilling if we run low on memory. */
>         struct delayed_work refill;
>
> +       bool rx_tnl_csum;
> +
>         /* Is delayed refill enabled? */
>         bool refill_enabled;
>
> @@ -2531,14 +2540,22 @@ static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *
>         if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
>                 virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
>
> -       if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
> -               skb->ip_summed = CHECKSUM_UNNECESSARY;
> +       /* restore the received value */

Nit: this comment seems to be redundant

> +       hdr->hdr.flags = flags;
> +       if (virtio_net_chk_data_valid(skb, &hdr->hdr, vi->rx_tnl_csum)) {

Nit: this function did more than just check DATA_VALID, we probably
need a better name.

> +               net_warn_ratelimited("%s: bad csum: flags: %x, gso_type: %x rx_tnl_csum %d\n",
> +                                    dev->name, hdr->hdr.flags,
> +                                    hdr->hdr.gso_type, vi->rx_tnl_csum);
> +               goto frame_err;
> +       }
>
> -       if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
> -                                 virtio_is_little_endian(vi->vdev))) {
> -               net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
> +       if (virtio_net_hdr_tnl_to_skb(skb, &hdr->hdr, vi->tnl_offset,

I wonder why virtio_net_chk_data_valid() is not part of the
virtio_net_hdr_tnl_to_skb().

> +                                     vi->rx_tnl_csum,
> +                                     virtio_is_little_endian(vi->vdev))) {
> +               net_warn_ratelimited("%s: bad gso: type: %x, size: %u, flags %x tunnel %d tnl csum %d\n",
>                                      dev->name, hdr->hdr.gso_type,
> -                                    hdr->hdr.gso_size);
> +                                    hdr->hdr.gso_size, hdr->hdr.flags,
> +                                    vi->tnl_offset, vi->rx_tnl_csum);
>                 goto frame_err;
>         }
>
> @@ -3269,9 +3286,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb, bool orphan)
>         else
>                 hdr = &skb_vnet_common_hdr(skb)->mrg_hdr;
>
> -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> -                                   virtio_is_little_endian(vi->vdev), false,
> -                                   0))
> +       if (virtio_net_hdr_tnl_from_skb(skb, &hdr->hdr, vi->tnl_offset,
> +                                       virtio_is_little_endian(vi->vdev), 0))
>                 return -EPROTO;
>
>         if (vi->mergeable_rx_bufs)
> @@ -6775,10 +6791,20 @@ static int virtnet_probe(struct virtio_device *vdev)
>                 if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_USO))
>                         dev->hw_features |= NETIF_F_GSO_UDP_L4;
>
> +               if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)) {
> +                       dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> +                       dev->hw_enc_features = dev->hw_features;
> +               }
> +               if (dev->hw_features & NETIF_F_GSO_UDP_TUNNEL &&
> +                   virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM)) {
> +                       dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
> +                       dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
> +               }
> +
>                 dev->features |= NETIF_F_GSO_ROBUST;
>
>                 if (gso)
> -                       dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
> +                       dev->features |= dev->hw_features;
>                 /* (!csum && gso) case will be fixed by register_netdev() */
>         }
>
> @@ -6879,6 +6905,14 @@ static int virtnet_probe(struct virtio_device *vdev)
>         else
>                 vi->hdr_len = sizeof(struct virtio_net_hdr);
>
> +       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) ||
> +           virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO))
> +               vi->tnl_offset = vi->hdr_len;
> +       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM))
> +               vi->rx_tnl_csum = true;
> +       if (vi->tnl_offset)
> +               vi->hdr_len += sizeof(struct virtio_net_hdr_tunnel);
> +
>         if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
>             virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>                 vi->any_header_sg = true;
> @@ -7189,6 +7223,10 @@ static struct virtio_device_id id_table[] = {
>
>  static unsigned int features[] = {
>         VIRTNET_FEATURES,
> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
> +       VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
> +       VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM,
>  };
>
>  static unsigned int features_legacy[] = {
> --
> 2.49.0
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-06 11:45 ` [PATCH RFC v3 7/8] tun: " Paolo Abeni
@ 2025-06-12  4:55   ` Jason Wang
  2025-06-12 11:03     ` Paolo Abeni
  2025-06-14  8:04     ` Paolo Abeni
  0 siblings, 2 replies; 37+ messages in thread
From: Jason Wang @ 2025-06-12  4:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Add new tun features to represent the newly introduced virtio
> GSO over UDP tunnel offload. Allows detection and selection of
> such features via the existing TUNSETOFFLOAD ioctl, store the
> tunnel offload configuration in the highest bit of the tun flags
> and compute the expected virtio header size and tunnel header
> offset using such bits, so that we can plug almost seamless the
> the newly introduced virtio helpers to serialize the extended
> virtio header.
>
> As the tun features and the virtio hdr size are configured
> separately, the data path need to cope with (hopefully transient)
> inconsistent values.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>   - cleaned-up uAPI comments
>   - use explicit struct layout instead of raw buf.
> ---
>  drivers/net/tun.c           | 77 ++++++++++++++++++++++++++++++++-----
>  drivers/net/tun_vnet.h      | 73 ++++++++++++++++++++++++++++-------
>  include/uapi/linux/if_tun.h |  9 +++++
>  3 files changed, 136 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 1207196cbbed..d326800dce9d 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -186,7 +186,8 @@ struct tun_struct {
>         struct net_device       *dev;
>         netdev_features_t       set_features;
>  #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> -                         NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
> +                         NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \
> +                         NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM)
>
>         int                     align;
>         int                     vnet_hdr_sz;
> @@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev)
>         dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>                            TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
>                            NETIF_F_HW_VLAN_STAG_TX;
> +       dev->hw_enc_features = dev->hw_features;
>         dev->features = dev->hw_features;
>         dev->vlan_features = dev->features &
>                              ~(NETIF_F_HW_VLAN_CTAG_TX |
> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>         struct sk_buff *skb;
>         size_t total_len = iov_iter_count(from);
>         size_t len = total_len, align = tun->align, linear;
> -       struct virtio_net_hdr gso = { 0 };
> +       struct virtio_net_hdr_v1_tunnel full_hdr;
> +       struct virtio_net_hdr *gso;
>         int good_linear;
>         int copylen;
>         int hdr_len = 0;
> @@ -1708,6 +1711,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>         int skb_xdp = 1;
>         bool frags = tun_napi_frags_enabled(tfile);
>         enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
> +       unsigned int flags = tun->flags & ~TUN_VNET_TNL_MASK;
> +
> +       /*
> +        * Keep it easy and always zero the whole buffer, even if the
> +        * tunnel-related field will be touched only when the feature
> +        * is enabled and the hdr size id compatible.
> +        */
> +       memset(&full_hdr, 0, sizeof(full_hdr));
> +       gso = (struct virtio_net_hdr *)&full_hdr.hdr;
>
>         if (!(tun->flags & IFF_NO_PI)) {
>                 if (len < sizeof(pi))
> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>
>         if (tun->flags & IFF_VNET_HDR) {
>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
> +               int parsed_size;
>
> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {

I still don't understand why we need to duplicate netdev features in
flags, and it seems to introduce unnecessary complexities. Can we
simply check dev->features instead?

I think I've asked before, for example, we don't duplicate gso and
csum for non tunnel packets.

> +                       parsed_size = vnet_hdr_sz;
> +               } else {
> +                       parsed_size = TUN_VNET_TNL_SIZE;
> +                       flags |= TUN_VNET_TNL_MASK;
> +               }
> +               hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, parsed_size,
> +                                            flags, from, gso);
>                 if (hdr_len < 0)
>                         return hdr_len;
>
> @@ -1755,7 +1775,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>                  * (e.g gso or jumbo packet), we will do it at after
>                  * skb was created with generic XDP routine.
>                  */
> -               skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp);
> +               skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp);
>                 err = PTR_ERR_OR_ZERO(skb);
>                 if (err)
>                         goto drop;
> @@ -1799,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>                 }
>         }
>
> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) {
> +       if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
>                 atomic_long_inc(&tun->rx_frame_errors);
>                 err = -EINVAL;
>                 goto free_skb;
> @@ -2050,13 +2070,26 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>         }
>
>         if (vnet_hdr_sz) {
> -               struct virtio_net_hdr gso;
> +               struct virtio_net_hdr_v1_tunnel full_hdr;
> +               struct virtio_net_hdr *gso;
> +               int flags = tun->flags;
> +               int parsed_size;
> +
> +               gso = (struct virtio_net_hdr *)&full_hdr.hdr;
> +               parsed_size = tun_vnet_parse_size(tun->flags);
> +               if (unlikely(vnet_hdr_sz < parsed_size)) {
> +                       /* Inconsistent hdr size and (tunnel) offloads:
> +                        * strips the latter
> +                        */
> +                       flags &= ~TUN_VNET_TNL_MASK;
> +                       parsed_size = sizeof(struct virtio_net_hdr);
> +               };
>
> -               ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
> +               ret = tun_vnet_hdr_from_skb(flags, tun->dev, skb, gso);
>                 if (ret)
>                         return ret;
>
> -               ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
> +               ret = __tun_vnet_hdr_put(vnet_hdr_sz, parsed_size, iter, gso);
>                 if (ret)
>                         return ret;
>         }
> @@ -2366,6 +2399,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>         int metasize = 0;
>         int ret = 0;
>         bool skb_xdp = false;
> +       unsigned int flags;
>         struct page *page;
>
>         if (unlikely(datasize < ETH_HLEN))
> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
>         if (metasize > 0)
>                 skb_metadata_set(skb, metasize);
>
> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
> +       /* Assume tun offloads are enabled if the provided hdr is large
> +        * enough.
> +        */
> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
> +               flags = tun->flags | TUN_VNET_TNL_MASK;
> +       else
> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;

I'm not sure I get the point that we need dynamics of
TUN_VNET_TNL_MASK here. We know if tunnel gso and its csum or enabled
or not, and we know the vnet_hdr_sz here, we can simply drop the
packet with less header.

> +
> +       if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
>                 atomic_long_inc(&tun->rx_frame_errors);
>                 kfree_skb(skb);
>                 ret = -EINVAL;
> @@ -2812,6 +2855,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
>
>  }
>
> +#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6)
> +
>  /* This is like a cut-down ethtool ops, except done via tun fd so no
>   * privs required. */
>  static int set_offload(struct tun_struct *tun, unsigned long arg)
> @@ -2841,6 +2886,17 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>                         features |= NETIF_F_GSO_UDP_L4;
>                         arg &= ~(TUN_F_USO4 | TUN_F_USO6);
>                 }
> +
> +               /* Tunnel offload is allowed only if some plain offload is
> +                * available, too.
> +                */
> +               if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) {
> +                       features |= NETIF_F_GSO_UDP_TUNNEL;
> +                       if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM)
> +                               features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
> +                       arg &= ~(TUN_F_UDP_TUNNEL_GSO |
> +                                TUN_F_UDP_TUNNEL_GSO_CSUM);
> +               }
>         }
>
>         /* This gives the user a way to test for new features in future by
> @@ -2852,7 +2908,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>         tun->dev->wanted_features &= ~TUN_USER_FEATURES;
>         tun->dev->wanted_features |= features;
>         netdev_update_features(tun->dev);
> -
> +       tun_set_vnet_tnl(&tun->flags, !!(features & NETIF_F_GSO_UDP_TUNNEL),
> +                        !!(features & NETIF_F_GSO_UDP_TUNNEL_CSUM));
>         return 0;
>  }
>
> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
> index 58b9ac7a5fc4..c2374c26538b 100644
> --- a/drivers/net/tun_vnet.h
> +++ b/drivers/net/tun_vnet.h
> @@ -5,6 +5,11 @@
>  /* High bits in flags field are unused. */
>  #define TUN_VNET_LE     0x80000000
>  #define TUN_VNET_BE     0x40000000
> +#define TUN_VNET_TNL           0x20000000
> +#define TUN_VNET_TNL_CSUM      0x10000000
> +#define TUN_VNET_TNL_MASK      (TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
> +
> +#define TUN_VNET_TNL_SIZE      sizeof(struct virtio_net_hdr_v1_tunnel)
>
>  static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
>  {
> @@ -45,6 +50,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
>         return 0;
>  }
>
> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
> +{
> +       *flags = (*flags & ~TUN_VNET_TNL_MASK) |
> +                tnl * TUN_VNET_TNL |
> +                tnl_csum * TUN_VNET_TNL_CSUM;
> +}
> +
>  static inline bool tun_vnet_is_little_endian(unsigned int flags)
>  {
>         return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags);
> @@ -107,16 +119,33 @@ static inline long tun_vnet_ioctl(int *vnet_hdr_sz, unsigned int *flags,
>         }
>  }
>
> -static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
> -                                  struct iov_iter *from,
> -                                  struct virtio_net_hdr *hdr)
> +static inline unsigned int tun_vnet_parse_size(unsigned int flags)
> +{
> +       if (!(flags & TUN_VNET_TNL))
> +               return sizeof(struct virtio_net_hdr);
> +
> +       return TUN_VNET_TNL_SIZE;
> +}
> +
> +static inline unsigned int tun_vnet_tnl_offset(unsigned int flags)
> +{
> +       if (!(flags & TUN_VNET_TNL))
> +               return 0;
> +
> +       return sizeof(struct virtio_net_hdr_v1);
> +}
> +
> +static inline int __tun_vnet_hdr_get(int sz, int parsed_size,
> +                                    unsigned int flags,
> +                                    struct iov_iter *from,
> +                                    struct virtio_net_hdr *hdr)
>  {
>         u16 hdr_len;
>
>         if (iov_iter_count(from) < sz)
>                 return -EINVAL;
>
> -       if (!copy_from_iter_full(hdr, sizeof(*hdr), from))
> +       if (!copy_from_iter_full(hdr, parsed_size, from))
>                 return -EFAULT;
>
>         hdr_len = tun_vnet16_to_cpu(flags, hdr->hdr_len);
> @@ -129,30 +158,47 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
>         if (hdr_len > iov_iter_count(from))
>                 return -EINVAL;
>
> -       iov_iter_advance(from, sz - sizeof(*hdr));
> +       iov_iter_advance(from, sz - parsed_size);
>
>         return hdr_len;
>  }
>
> -static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
> -                                  const struct virtio_net_hdr *hdr)
> +static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
> +                                  struct iov_iter *from,
> +                                  struct virtio_net_hdr *hdr)
> +{
> +       return __tun_vnet_hdr_get(sz, sizeof(*hdr), flags, from, hdr);
> +}
> +
> +static inline int __tun_vnet_hdr_put(int sz, int parsed_size,
> +                                    struct iov_iter *iter,
> +                                    const struct virtio_net_hdr *hdr)
>  {
>         if (unlikely(iov_iter_count(iter) < sz))
>                 return -EINVAL;
>
> -       if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
> +       if (unlikely(copy_to_iter(hdr, parsed_size, iter) != parsed_size))
>                 return -EFAULT;
>
> -       if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr))
> +       if (iov_iter_zero(sz - parsed_size, iter) != sz - parsed_size)
>                 return -EFAULT;
>
>         return 0;
>  }
>
> +static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
> +                                  const struct virtio_net_hdr *hdr)
> +{
> +       return __tun_vnet_hdr_put(sz, sizeof(*hdr), iter, hdr);
> +}
> +
>  static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
>                                       const struct virtio_net_hdr *hdr)
>  {
> -       return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags));
> +       return virtio_net_hdr_tnl_to_skb(skb, hdr,
> +                                        tun_vnet_tnl_offset(flags),
> +                                        !!(flags & TUN_VNET_TNL_CSUM),
> +                                        tun_vnet_is_little_endian(flags));
>  }
>
>  static inline int tun_vnet_hdr_from_skb(unsigned int flags,
> @@ -161,10 +207,11 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
>                                         struct virtio_net_hdr *hdr)
>  {
>         int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
> +       int tnl_offset = tun_vnet_tnl_offset(flags);
>
> -       if (virtio_net_hdr_from_skb(skb, hdr,
> -                                   tun_vnet_is_little_endian(flags), true,
> -                                   vlan_hlen)) {
> +       if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
> +                                       tun_vnet_is_little_endian(flags),
> +                                       vlan_hlen)) {
>                 struct skb_shared_info *sinfo = skb_shinfo(skb);
>
>                 if (net_ratelimit()) {
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index 287cdc81c939..79d53c7a1ebd 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -93,6 +93,15 @@
>  #define TUN_F_USO4     0x20    /* I can handle USO for IPv4 packets */
>  #define TUN_F_USO6     0x40    /* I can handle USO for IPv6 packets */
>
> +/* I can handle TSO/USO for UDP tunneled packets */
> +#define TUN_F_UDP_TUNNEL_GSO           0x080
> +
> +/*
> + * I can handle TSO/USO for UDP tunneled packets requiring csum offload for
> + * the outer header
> + */
> +#define TUN_F_UDP_TUNNEL_GSO_CSUM      0x100
> +

Any reason we don't choose to use 0x40 and 0x60?

>  /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
>  #define TUN_PKT_STRIP  0x0001
>  struct tun_pi {
> --
> 2.49.0
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 1/8] virtio: introduce extended features
  2025-06-08  5:49   ` Akihiko Odaki
@ 2025-06-12  8:50     ` Paolo Abeni
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-12  8:50 UTC (permalink / raw)
  To: Akihiko Odaki, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 6/8/25 7:49 AM, Akihiko Odaki wrote:
> On 2025/06/06 20:45, Paolo Abeni wrote:
>>   	int (*set_vq_affinity)(struct virtqueue *vq,
>> @@ -149,11 +154,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>>   {
>>   	/* Did you forget to fix assumptions on max features? */
>>   	if (__builtin_constant_p(fbit))
>> -		BUILD_BUG_ON(fbit >= 64);
>> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>>   	else
>> -		BUG_ON(fbit >= 64);
>> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
> 
> This check is better to be moved into virtio_features_test_bit().

I leaved the check here mostly unmodified to try to keep the diffstat as
low as possible. I see there is consensus to clean this up, I'll do in
the next revision.

[BTW, I'm sorry for the latency: I'm traveling for the whole week, my
replies will be sparse]

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 1/8] virtio: introduce extended features
  2025-06-12  0:50   ` Jason Wang
@ 2025-06-12  9:05     ` Paolo Abeni
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-12  9:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/12/25 2:50 AM, Jason Wang wrote:
> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> @@ -272,22 +272,22 @@ static int virtio_dev_probe(struct device *_d)
>>         int err, i;
>>         struct virtio_device *dev = dev_to_virtio(_d);
>>         struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>> -       u64 device_features;
>> -       u64 driver_features;
>> +       u64 device_features[VIRTIO_FEATURES_DWORDS];
>> +       u64 driver_features[VIRTIO_FEATURES_DWORDS];
>>         u64 driver_features_legacy;
>>
>>         /* We have a driver! */
>>         virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>>
>>         /* Figure out what features the device supports. */
>> -       device_features = dev->config->get_features(dev);
>> +       virtio_get_features(dev, device_features);
>>
>>         /* Figure out what features the driver supports. */
>> -       driver_features = 0;
>> +       virtio_features_zero(driver_features);
>>         for (i = 0; i < drv->feature_table_size; i++) {
>>                 unsigned int f = drv->feature_table[i];
>> -               BUG_ON(f >= 64);
>> -               driver_features |= (1ULL << f);
>> +               BUG_ON(f >= VIRTIO_FEATURES_MAX);
>> +               virtio_features_set_bit(driver_features, f);
> 
> Instead of doing BUG_ON here, could we just stop at 128 bits?

I think it would be nice to have a sanity check to ensure the driver
code is sync with the core. What about a WARN_ON_ONCE?

>> @@ -121,6 +124,8 @@ struct virtio_config_ops {
>>         void (*del_vqs)(struct virtio_device *);
>>         void (*synchronize_cbs)(struct virtio_device *);
>>         u64 (*get_features)(struct virtio_device *vdev);
>> +       void (*get_extended_features)(struct virtio_device *vdev,
>> +                                     u64 *features);
> 
> I think it would be better to add a size to simplify the future extension.

Note that the array size is implied by the virtio-features definition.
Future extensions will be obtained by increasing the
VIRTIO_FEATURES_DWORD define, with no other change to the code.

I think a length here would be redundant.

>> +static inline bool virtio_features_equal(const u64 *f1, const u64 *f2)
>> +{
>> +       u64 diff = 0;
>> +       int i;
>> +
>> +       for (i = 0; i < VIRTIO_FEATURES_DWORDS; ++i)
>> +               diff |= f1[i] ^ f2[i];
>> +       return !!diff;
> 
> Nit: we can return false early here.

I can do in in the next revision.

[same disclaimer here: I'm traveling for the whole week, my replies will
be rare and delayed]

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring extended features
  2025-06-12  0:57   ` Jason Wang
@ 2025-06-12  9:34     ` Paolo Abeni
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-12  9:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/12/25 2:57 AM, Jason Wang wrote:
> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> -u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
>> -u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
>> -void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
>> -                    u64 features);
>> +void
>> +vp_modern_get_driver_extended_features(struct virtio_pci_modern_device *mdev,
>> +                                      u64 *features);
>> +void vp_modern_get_extended_features(struct virtio_pci_modern_device *mdev,
>> +                                    u64 *features);
>> +void vp_modern_set_extended_features(struct virtio_pci_modern_device *mdev,
>> +                                    const u64 *features);
>> +
>> +static inline u64
>> +vp_modern_get_features(struct virtio_pci_modern_device *mdev)
>> +{
>> +       u64 features_array[VIRTIO_FEATURES_DWORDS];
>> +       int i;
>> +
>> +       vp_modern_get_extended_features(mdev, features_array);
>> +       for (i = 1; i < VIRTIO_FEATURES_DWORDS; ++i)
>> +               WARN_ON_ONCE(features_array[i]);
> 
> It looks to me it's sufficient and safe to just return
> featuers_array[0] here. Or maybe we need some comment to explain why
> we need WARN here.

vp_modern_get_extended_features() can return a 'features' array
including extended features. Callers of vp_modern_get_features() can
deal with/expect features to be present only in the lower 64 bit range.

This check is intended to catch early device bug/inconsistencies, but
I'm unsure if e.g. syzkaller could hit with some fancy setup.

I can drop the check in the next revision.

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-06-12  3:53   ` Jason Wang
@ 2025-06-12 10:10     ` Paolo Abeni
  2025-06-16  3:17       ` Jason Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-12 10:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/12/25 5:53 AM, Jason Wang wrote:
> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> +static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
>> +                                             struct virtio_net_hdr *hdr,
>> +                                             unsigned int tnl_offset,
>> +                                             bool little_endian,
>> +                                             int vlan_hlen)
>> +{
>> +       struct virtio_net_hdr_tunnel *tnl;
>> +       unsigned int inner_nh, outer_th;
>> +       int tnl_gso_type;
>> +       int ret;
>> +
>> +       tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
>> +
SKB_GSO_UDP_TUNNEL_CSUM);
>> +       if (!tnl_gso_type)
>> +               return virtio_net_hdr_from_skb(skb, hdr,
little_endian, false,
>> +                                              vlan_hlen);
>
> So tun_vnet_hdr_from_skb() has
>
>         int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
>         int tnl_offset = tun_vnet_tnl_offset(flags);
>
>         if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
>                                         tun_vnet_is_little_endian(flags),
>                                         vlan_hlen)) {
>
>
> It looks like the outer vlan_hlen is used for the inner here?
vlan_hlen always refers to the outer vlan tag (if present), as it moves
the (inner) transport csum offset accordingly.

I can a comment to clarify the parsing.

Note that in the above call there is a single set of headers (no
encapsulation) so the vlan_hlen should be unambigous.
>> +
>> +       /* Tunnel support not negotiated but skb ask for it. */
>> +       if (!tnl_offset)
>> +               return -EINVAL;
>> +
>> +       /* Let the basic parsing deal with plain GSO features. */
>> +       skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
>> +       ret = virtio_net_hdr_from_skb(skb, hdr, true, false, vlan_hlen);

Here I'll add:

	Here vlan_hlen refers to the outer headers set, but still affect
	the inner transport header offset.

>> @@ -181,6 +208,22 @@ struct virtio_net_hdr_v1_hash {
>>         __le16 padding;
>>  };
>>
>> +/* This header after hashing information */
>> +struct virtio_net_hdr_tunnel {
>> +       __le16 outer_th_offset;
>> +       __le16 inner_nh_offset;
>> +};
>> +
>> +struct virtio_net_hdr_v1_tunnel {
>> +       struct virtio_net_hdr_v1 hdr;
>> +       struct virtio_net_hdr_tunnel tnl;
>> +};
>> +
>> +struct virtio_net_hdr_v1_hash_tunnel {
>> +       struct virtio_net_hdr_v1_hash hdr;
>> +       struct virtio_net_hdr_tunnel tnl;
>> +};
>
> Not a native speaker but I realize there's probably an issue:
>
>         le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT
negotiated)
>         le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT
negotiated)
>         le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT
negotiated)
>         le16 outer_th_offset    (Only if
> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
> negotiated)
>         le16 inner_nh_offset;   (Only if
> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
> negotiated)
>         le16 outer_nh_offset;   /* Only if VIRTIO_NET_F_OUT_NET_HEADER
> negotiated */
>         /* Only if VIRTIO_NET_F_OUT_NET_HEADER or VIRTIO_NET_F_IPSEC
> negotiated */
>         union {
>                 u8 padding_reserved_2[6];
>                 struct ipsec_resource_hdr {
>                         le32 resource_id;
>                         le16 resource_type;
>                 } ipsec_resource_hdr;
>         };
>
> I thought e.g outer_th_offset should have a fixed offset then
> everything is simplified but it looks not the case here. If we decide
> to do things like this, we will end up with a very huge uAPI
> definition for different features combinations. This doesn't follow
> the existing headers for example num_buffers exist no matter if
> MRG_RXBUF is negotiated.>> At least, if we decide to go with the
dynamic offset, it seems less
> valuable to define those headers with different combinations if both
> device and driver process the vnet header piece wisely

I'm a little confused here. AFAICT the dynamic offset is
requested/mandated by the specifications: if the hash related fields are
not present, they are actually non existing and everything below moves
upward.  I think we spent together quite some time to agree on this.

If you want/intend the tunnel header to be at fixed offset inside the
virtio_hdr regardless of the negotiated features? That would yield to
slightly simpler but also slightly less efficient implementation.

Also I guess (fear mostly) some specification clarification would be needed.

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-06-12  4:05   ` Jason Wang
@ 2025-06-12 10:17     ` Paolo Abeni
  2025-06-16  3:20       ` Jason Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-12 10:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/12/25 6:05 AM, Jason Wang wrote:
> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> If the related virtio feature is set, enable transmission and reception
>> of gso over UDP tunnel packets.
>>
>> Most of the work is done by the previously introduced helper, just need
>> to determine the UDP tunnel features inside the virtio_net_hdr and
>> update accordingly the virtio net hdr size.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> v2 -> v3:
>>   - drop the VIRTIO_HAS_EXTENDED_FEATURES conditionals
>>
>> v1 -> v2:
>>   - test for UDP_TUNNEL_GSO* only on builds with extended features support
>>   - comment indentation cleanup
>>   - rebased on top of virtio helpers changes
>>   - dump more information in case of bad offloads
>> ---
>>  drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++---------
>>  1 file changed, 54 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 18ad50de4928..0b234f318e39 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -78,15 +78,19 @@ static const unsigned long guest_offloads[] = {
>>         VIRTIO_NET_F_GUEST_CSUM,
>>         VIRTIO_NET_F_GUEST_USO4,
>>         VIRTIO_NET_F_GUEST_USO6,
>> -       VIRTIO_NET_F_GUEST_HDRLEN
>> +       VIRTIO_NET_F_GUEST_HDRLEN,
>> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
>> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
>>  };
>>
>>  #define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
>> -                               (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
>> -                               (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
>> -                               (1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
>> -                               (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
>> -                               (1ULL << VIRTIO_NET_F_GUEST_USO6))
>> +                       (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
>> +                       (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
>> +                       (1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
>> +                       (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
>> +                       (1ULL << VIRTIO_NET_F_GUEST_USO6) | \
>> +                       (1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
>> +                       (1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
>>
>>  struct virtnet_stat_desc {
>>         char desc[ETH_GSTRING_LEN];
>> @@ -436,9 +440,14 @@ struct virtnet_info {
>>         /* Packet virtio header size */
>>         u8 hdr_len;
>>
>> +       /* UDP tunnel support */
>> +       u8 tnl_offset;
>> +
>>         /* Work struct for delayed refilling if we run low on memory. */
>>         struct delayed_work refill;
>>
>> +       bool rx_tnl_csum;
>> +
>>         /* Is delayed refill enabled? */
>>         bool refill_enabled;
>>
>> @@ -2531,14 +2540,22 @@ static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *
>>         if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
>>                 virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
>>
>> -       if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
>> -               skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +       /* restore the received value */
> 
> Nit: this comment seems to be redundant
> 
>> +       hdr->hdr.flags = flags;
>> +       if (virtio_net_chk_data_valid(skb, &hdr->hdr, vi->rx_tnl_csum)) {
> 
> Nit: this function did more than just check DATA_VALID, we probably
> need a better name.

What about virtio_net_handle_csum_offload()?

> 
>> +               net_warn_ratelimited("%s: bad csum: flags: %x, gso_type: %x rx_tnl_csum %d\n",
>> +                                    dev->name, hdr->hdr.flags,
>> +                                    hdr->hdr.gso_type, vi->rx_tnl_csum);
>> +               goto frame_err;
>> +       }
>>
>> -       if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
>> -                                 virtio_is_little_endian(vi->vdev))) {
>> -               net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
>> +       if (virtio_net_hdr_tnl_to_skb(skb, &hdr->hdr, vi->tnl_offset,
> 
> I wonder why virtio_net_chk_data_valid() is not part of the
> virtio_net_hdr_tnl_to_skb().

It can't be part of virtio_net_hdr_tnl_to_skb(), as hdr to skb
conversion is actually not symmetric with respect to the checksum - only
the driver handles DATA_VALID.

Tun must not call virtio_net_chk_data_valid()  (or whatever different
name will use).

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-12  4:55   ` Jason Wang
@ 2025-06-12 11:03     ` Paolo Abeni
  2025-06-14 10:21       ` Paolo Abeni
  2025-06-16  4:53       ` Jason Wang
  2025-06-14  8:04     ` Paolo Abeni
  1 sibling, 2 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-12 11:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/12/25 6:55 AM, Jason Wang wrote:
> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>
>>         if (tun->flags & IFF_VNET_HDR) {
>>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
>> +               int parsed_size;
>>
>> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
>> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
> 
> I still don't understand why we need to duplicate netdev features in
> flags, and it seems to introduce unnecessary complexities. Can we
> simply check dev->features instead?
> 
> I think I've asked before, for example, we don't duplicate gso and
> csum for non tunnel packets.

My fear was that if
- the guest negotiated VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO
- tun stores the negotiated offload info netdev->features
- the tun netdev UDP tunnel feature is disabled via ethtool

tun may end-up sending to the guest packets without filling the tnl hdr,
which should be safe, as the driver should not use such info as no GSO
over UDP packets will go through, but is technically against the
specification.

The current implementation always zero the whole virtio net hdr space,
so there is no such an issue.

Still the additional complexity is ~5 lines and makes all the needed
information available on a single int, which is quite nice performance
wise. Do you have strong feeling against it?

>> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
>>         if (metasize > 0)
>>                 skb_metadata_set(skb, metasize);
>>
>> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
>> +       /* Assume tun offloads are enabled if the provided hdr is large
>> +        * enough.
>> +        */
>> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
>> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
>> +               flags = tun->flags | TUN_VNET_TNL_MASK;
>> +       else
>> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;
> 
> I'm not sure I get the point that we need dynamics of
> TUN_VNET_TNL_MASK here. We know if tunnel gso and its csum or enabled
> or not,

How does tun know about VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or
VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM?

The user-space does not tell the tun device about any of the host
offload features. Plain/baremetal GSO information are always available
in the basic virtio net header, so there is no size check, but the
overall behavior is similar - tun assumes the features have been
negotiated if the relevant bits are present in the header.

Here before checking the relevant bit we ensures we have enough vitio
net hdr data - that makes the follow-up test simpler.

> and we know the vnet_hdr_sz here, we can simply drop the
> packet with less header.

That looks prone migration or connectivity issue, and different from the
current general behavior of always accepting any well formed packet even
if shorter than what is actually negotiated (i.e. tun accepts packets
with just virtio_net_hdr header even when V1 has been negotiated).

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 3/8] vhost-net: allow configuring extended features
  2025-06-08  6:16   ` Akihiko Odaki
@ 2025-06-14  7:27     ` Paolo Abeni
  2025-06-16  8:57       ` Akihiko Odaki
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-14  7:27 UTC (permalink / raw)
  To: Akihiko Odaki, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 6/8/25 8:16 AM, Akihiko Odaki wrote:
> On 2025/06/06 20:45, Paolo Abeni wrote:
>> +
>> +		/* Zero the trailing space provided by user-space, if any */
>> +		if (i < count && clear_user(argp, (count - i) * sizeof(u64)))
> 
> I think checking i < count is a premature optimization; it doesn't 
> matter even if we spend a bit longer because of the lack of the check.

FTR, the check is not an optimization. if `i` is greater than `count`,
`clear_user` is going to try to clear almost all the memory space (the
2nd argument is an unsigned one) and will likely return with error.

I think it's functionally needed.

Roger to the other comments.

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-12  4:55   ` Jason Wang
  2025-06-12 11:03     ` Paolo Abeni
@ 2025-06-14  8:04     ` Paolo Abeni
  2025-06-16  5:34       ` Jason Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-14  8:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/12/25 6:55 AM, Jason Wang wrote:
> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
>> index 287cdc81c939..79d53c7a1ebd 100644
>> --- a/include/uapi/linux/if_tun.h
>> +++ b/include/uapi/linux/if_tun.h
>> @@ -93,6 +93,15 @@
>>  #define TUN_F_USO4     0x20    /* I can handle USO for IPv4 packets */
>>  #define TUN_F_USO6     0x40    /* I can handle USO for IPv6 packets */
>>
>> +/* I can handle TSO/USO for UDP tunneled packets */
>> +#define TUN_F_UDP_TUNNEL_GSO           0x080
>> +
>> +/*
>> + * I can handle TSO/USO for UDP tunneled packets requiring csum offload for
>> + * the outer header
>> + */
>> +#define TUN_F_UDP_TUNNEL_GSO_CSUM      0x100
>> +
> 
> Any reason we don't choose to use 0x40 and 0x60?

I just noticed I forgot to answer this one, I'm sorry.

0x40 is already in use (for TUN_F_USO6, as you can see above), and 0x60
is a bitmask, not a single bit. I used the lowest available free bits.

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-12 11:03     ` Paolo Abeni
@ 2025-06-14 10:21       ` Paolo Abeni
  2025-06-16  4:53       ` Jason Wang
  1 sibling, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-14 10:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Thu, Jun 12, 2025 at 1:03 PM Paolo Abeni <pabeni@redhat.com> wrote:
> On 6/12/25 6:55 AM, Jason Wang wrote:
> > On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >>
> >>         if (tun->flags & IFF_VNET_HDR) {
> >>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
> >> +               int parsed_size;
> >>
> >> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
> >> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
> >
> > I still don't understand why we need to duplicate netdev features in
> > flags, and it seems to introduce unnecessary complexities. Can we
> > simply check dev->features instead?
> >
> > I think I've asked before, for example, we don't duplicate gso and
> > csum for non tunnel packets.

[...]

> Still the additional complexity is ~5 lines and makes all the needed
> information available on a single int, which is quite nice performance
> wise. Do you have strong feeling against it?

I forgot to mention a couple of relevant points: the tun_vnet_*
helpers are used also by tap devices, so we can't pass the tun struct
as an argument, and we will need to add a new argument to pass the
dev->features or dev pointer, which is IMHO not nice. Also we should
provide backward compatible variants for all the helpers to avoid
touching the tap driver. Overall using the 'dev->features' will
require a comparable code churn, likely even greater.

For plain GSO offload, currently the code validation is quite liberal
and doesn't check the actual offloaded features. We  can't change the
existing behaviour for backward compatibility, but we want to be more
conservative with the new code, when possible - so we want the
information available to the helpers.

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-06-12 10:10     ` Paolo Abeni
@ 2025-06-16  3:17       ` Jason Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Wang @ 2025-06-16  3:17 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Thu, Jun 12, 2025 at 6:10 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 6/12/25 5:53 AM, Jason Wang wrote:
> > On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> +static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
> >> +                                             struct virtio_net_hdr *hdr,
> >> +                                             unsigned int tnl_offset,
> >> +                                             bool little_endian,
> >> +                                             int vlan_hlen)
> >> +{
> >> +       struct virtio_net_hdr_tunnel *tnl;
> >> +       unsigned int inner_nh, outer_th;
> >> +       int tnl_gso_type;
> >> +       int ret;
> >> +
> >> +       tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
> >> +
> SKB_GSO_UDP_TUNNEL_CSUM);
> >> +       if (!tnl_gso_type)
> >> +               return virtio_net_hdr_from_skb(skb, hdr,
> little_endian, false,
> >> +                                              vlan_hlen);
> >
> > So tun_vnet_hdr_from_skb() has
> >
> >         int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
> >         int tnl_offset = tun_vnet_tnl_offset(flags);
> >
> >         if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
> >                                         tun_vnet_is_little_endian(flags),
> >                                         vlan_hlen)) {
> >
> >
> > It looks like the outer vlan_hlen is used for the inner here?
> vlan_hlen always refers to the outer vlan tag (if present), as it moves
> the (inner) transport csum offset accordingly.
>
> I can a comment to clarify the parsing.
>
> Note that in the above call there is a single set of headers (no
> encapsulation) so the vlan_hlen should be unambigous.

I see.

> >> +
> >> +       /* Tunnel support not negotiated but skb ask for it. */
> >> +       if (!tnl_offset)
> >> +               return -EINVAL;
> >> +
> >> +       /* Let the basic parsing deal with plain GSO features. */
> >> +       skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
> >> +       ret = virtio_net_hdr_from_skb(skb, hdr, true, false, vlan_hlen);
>
> Here I'll add:
>
>         Here vlan_hlen refers to the outer headers set, but still affect
>         the inner transport header offset.

Thanks, then I want to know if we need to care about the inner vlan or
it is something that is not supported by the kernel right now.

>
> >> @@ -181,6 +208,22 @@ struct virtio_net_hdr_v1_hash {
> >>         __le16 padding;
> >>  };
> >>
> >> +/* This header after hashing information */
> >> +struct virtio_net_hdr_tunnel {
> >> +       __le16 outer_th_offset;
> >> +       __le16 inner_nh_offset;
> >> +};
> >> +
> >> +struct virtio_net_hdr_v1_tunnel {
> >> +       struct virtio_net_hdr_v1 hdr;
> >> +       struct virtio_net_hdr_tunnel tnl;
> >> +};
> >> +
> >> +struct virtio_net_hdr_v1_hash_tunnel {
> >> +       struct virtio_net_hdr_v1_hash hdr;
> >> +       struct virtio_net_hdr_tunnel tnl;
> >> +};
> >
> > Not a native speaker but I realize there's probably an issue:
> >
> >         le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT
> negotiated)
> >         le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT
> negotiated)
> >         le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT
> negotiated)
> >         le16 outer_th_offset    (Only if
> > VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
> > negotiated)
> >         le16 inner_nh_offset;   (Only if
> > VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
> > negotiated)
> >         le16 outer_nh_offset;   /* Only if VIRTIO_NET_F_OUT_NET_HEADER
> > negotiated */
> >         /* Only if VIRTIO_NET_F_OUT_NET_HEADER or VIRTIO_NET_F_IPSEC
> > negotiated */
> >         union {
> >                 u8 padding_reserved_2[6];
> >                 struct ipsec_resource_hdr {
> >                         le32 resource_id;
> >                         le16 resource_type;
> >                 } ipsec_resource_hdr;
> >         };
> >
> > I thought e.g outer_th_offset should have a fixed offset then
> > everything is simplified but it looks not the case here. If we decide
> > to do things like this, we will end up with a very huge uAPI
> > definition for different features combinations. This doesn't follow
> > the existing headers for example num_buffers exist no matter if
> > MRG_RXBUF is negotiated.>> At least, if we decide to go with the
> dynamic offset, it seems less
> > valuable to define those headers with different combinations if both
> > device and driver process the vnet header piece wisely
>
> I'm a little confused here. AFAICT the dynamic offset is
> requested/mandated by the specifications: if the hash related fields are
> not present, they are actually non existing and everything below moves
> upward.  I think we spent together quite some time to agree on this.

I'm sorry if I lose some context there.

>
> If you want/intend the tunnel header to be at fixed offset inside the
> virtio_hdr regardless of the negotiated features? That would yield to
> slightly simpler but also slightly less efficient implementation.

Yes. I feel it's probably too late to fix the spec. But I meant if the
header offset of tunnel gso stuff is dynamic, it's probably not need
to define:

virtio_net_hdr_v1_tunnel and virtio_net_hdr_v1_hash_tunnel

in the uAPI.

>
> Also I guess (fear mostly) some specification clarification would be needed.
>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-06-12 10:17     ` Paolo Abeni
@ 2025-06-16  3:20       ` Jason Wang
  2025-06-16 14:44         ` Paolo Abeni
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2025-06-16  3:20 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Thu, Jun 12, 2025 at 6:18 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 6/12/25 6:05 AM, Jason Wang wrote:
> > On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>
> >> If the related virtio feature is set, enable transmission and reception
> >> of gso over UDP tunnel packets.
> >>
> >> Most of the work is done by the previously introduced helper, just need
> >> to determine the UDP tunnel features inside the virtio_net_hdr and
> >> update accordingly the virtio net hdr size.
> >>
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >> ---
> >> v2 -> v3:
> >>   - drop the VIRTIO_HAS_EXTENDED_FEATURES conditionals
> >>
> >> v1 -> v2:
> >>   - test for UDP_TUNNEL_GSO* only on builds with extended features support
> >>   - comment indentation cleanup
> >>   - rebased on top of virtio helpers changes
> >>   - dump more information in case of bad offloads
> >> ---
> >>  drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++---------
> >>  1 file changed, 54 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 18ad50de4928..0b234f318e39 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -78,15 +78,19 @@ static const unsigned long guest_offloads[] = {
> >>         VIRTIO_NET_F_GUEST_CSUM,
> >>         VIRTIO_NET_F_GUEST_USO4,
> >>         VIRTIO_NET_F_GUEST_USO6,
> >> -       VIRTIO_NET_F_GUEST_HDRLEN
> >> +       VIRTIO_NET_F_GUEST_HDRLEN,
> >> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
> >> +       VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
> >>  };
> >>
> >>  #define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> >> -                               (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> >> -                               (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
> >> -                               (1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
> >> -                               (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
> >> -                               (1ULL << VIRTIO_NET_F_GUEST_USO6))
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_USO6) | \
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
> >> +                       (1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
> >>
> >>  struct virtnet_stat_desc {
> >>         char desc[ETH_GSTRING_LEN];
> >> @@ -436,9 +440,14 @@ struct virtnet_info {
> >>         /* Packet virtio header size */
> >>         u8 hdr_len;
> >>
> >> +       /* UDP tunnel support */
> >> +       u8 tnl_offset;
> >> +
> >>         /* Work struct for delayed refilling if we run low on memory. */
> >>         struct delayed_work refill;
> >>
> >> +       bool rx_tnl_csum;
> >> +
> >>         /* Is delayed refill enabled? */
> >>         bool refill_enabled;
> >>
> >> @@ -2531,14 +2540,22 @@ static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *
> >>         if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
> >>                 virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
> >>
> >> -       if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
> >> -               skb->ip_summed = CHECKSUM_UNNECESSARY;
> >> +       /* restore the received value */
> >
> > Nit: this comment seems to be redundant
> >
> >> +       hdr->hdr.flags = flags;
> >> +       if (virtio_net_chk_data_valid(skb, &hdr->hdr, vi->rx_tnl_csum)) {
> >
> > Nit: this function did more than just check DATA_VALID, we probably
> > need a better name.
>
> What about virtio_net_handle_csum_offload()?

Works for me.

>
> >
> >> +               net_warn_ratelimited("%s: bad csum: flags: %x, gso_type: %x rx_tnl_csum %d\n",
> >> +                                    dev->name, hdr->hdr.flags,
> >> +                                    hdr->hdr.gso_type, vi->rx_tnl_csum);
> >> +               goto frame_err;
> >> +       }
> >>
> >> -       if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
> >> -                                 virtio_is_little_endian(vi->vdev))) {
> >> -               net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
> >> +       if (virtio_net_hdr_tnl_to_skb(skb, &hdr->hdr, vi->tnl_offset,
> >
> > I wonder why virtio_net_chk_data_valid() is not part of the
> > virtio_net_hdr_tnl_to_skb().
>
> It can't be part of virtio_net_hdr_tnl_to_skb(), as hdr to skb
> conversion is actually not symmetric with respect to the checksum - only
> the driver handles DATA_VALID.
>
> Tun must not call virtio_net_chk_data_valid()  (or whatever different
> name will use).

Note that we've already dealt with this via introducing a boolean
has_data_valid:

static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
                                          struct virtio_net_hdr *hdr,
                                          bool little_endian,
                                          bool has_data_valid,
                                          int vlan_hlen)

>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-12 11:03     ` Paolo Abeni
  2025-06-14 10:21       ` Paolo Abeni
@ 2025-06-16  4:53       ` Jason Wang
  2025-06-16 10:17         ` Paolo Abeni
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Wang @ 2025-06-16  4:53 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Thu, Jun 12, 2025 at 7:03 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 6/12/25 6:55 AM, Jason Wang wrote:
> > On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >>
> >>         if (tun->flags & IFF_VNET_HDR) {
> >>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
> >> +               int parsed_size;
> >>
> >> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
> >> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
> >
> > I still don't understand why we need to duplicate netdev features in
> > flags, and it seems to introduce unnecessary complexities. Can we
> > simply check dev->features instead?
> >
> > I think I've asked before, for example, we don't duplicate gso and
> > csum for non tunnel packets.
>
> My fear was that if
> - the guest negotiated VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO
> - tun stores the negotiated offload info netdev->features
> - the tun netdev UDP tunnel feature is disabled via ethtool
>
> tun may end-up sending to the guest packets without filling the tnl hdr,
> which should be safe, as the driver should not use such info as no GSO
> over UDP packets will go through, but is technically against the
> specification.

Probably not? For example this is the way tun works with non tunnel GSO as well.

(And it allows the flexibility of debugging etc).

>
> The current implementation always zero the whole virtio net hdr space,
> so there is no such an issue.
>
> Still the additional complexity is ~5 lines and makes all the needed
> information available on a single int, which is quite nice performance
> wise. Do you have strong feeling against it?

See above and at least we can disallow the changing of UDP tunnel GSO
(but I don't see too much value).

>
> >> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
> >>         if (metasize > 0)
> >>                 skb_metadata_set(skb, metasize);
> >>
> >> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
> >> +       /* Assume tun offloads are enabled if the provided hdr is large
> >> +        * enough.
> >> +        */
> >> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
> >> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
> >> +               flags = tun->flags | TUN_VNET_TNL_MASK;
> >> +       else
> >> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;
> >
> > I'm not sure I get the point that we need dynamics of
> > TUN_VNET_TNL_MASK here. We know if tunnel gso and its csum or enabled
> > or not,
>
> How does tun know about VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or
> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM?

I think it can be done in a way that works for non-tunnel gso.

The most complicated case is probably the case HOST_UDP_TUNNEL_X is
enabled but GUEST_UDP_TUNNEL_X is not. In this case tun can know this
by:

1) vnet_hdr_len is large enough
2) UDP tunnel GSO is not enabled in netdev->features

If HOST_UDP_TUNNEL_X is not enabled by GUEST_UDP_TUNNEL_X is enabled,
it can behave like existing non-tunnel GSO: still accept the UDP GSO
tunnel packet.

>
> The user-space does not tell the tun device about any of the host
> offload features. Plain/baremetal GSO information are always available
> in the basic virtio net header, so there is no size check, but the
> overall behavior is similar - tun assumes the features have been
> negotiated if the relevant bits are present in the header.

I'm not sure I understand here, there's no bit in the virtio net
header that tells us if the packet contains the tunnel gso field. And
the check of:

READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE

seems to be not buggy. As qemu already did:

static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
                                       int version_1, int hash_report)
{
    int i;
    NetClientState *nc;

    n->mergeable_rx_bufs = mergeable_rx_bufs;

    if (version_1) {
        n->guest_hdr_len = hash_report ?
            sizeof(struct virtio_net_hdr_v1_hash) :
            sizeof(struct virtio_net_hdr_mrg_rxbuf);
        n->rss_data.populate_hash = !!hash_report;

...

>
> Here before checking the relevant bit we ensures we have enough vitio
> net hdr data - that makes the follow-up test simpler.
>
> > and we know the vnet_hdr_sz here, we can simply drop the
> > packet with less header.
>
> That looks prone migration or connectivity issue, and different from the
> current general behavior of always accepting any well formed packet even
> if shorter than what is actually negotiated (i.e. tun accepts packets
> with just virtio_net_hdr header even when V1 has been negotiated).

Tun doesn't know V1, it can only see vnet_hdr_len, it is the userspace
(Qemu) that needs to configure all parties correctly. So I meant if we
get a XDP buff with less header, it should be more like a userspace
(Qemu) but. I'm not sure we need to workaround it by kernel.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-14  8:04     ` Paolo Abeni
@ 2025-06-16  5:34       ` Jason Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Wang @ 2025-06-16  5:34 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Sat, Jun 14, 2025 at 4:04 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 6/12/25 6:55 AM, Jason Wang wrote:
> > On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> >> index 287cdc81c939..79d53c7a1ebd 100644
> >> --- a/include/uapi/linux/if_tun.h
> >> +++ b/include/uapi/linux/if_tun.h
> >> @@ -93,6 +93,15 @@
> >>  #define TUN_F_USO4     0x20    /* I can handle USO for IPv4 packets */
> >>  #define TUN_F_USO6     0x40    /* I can handle USO for IPv6 packets */
> >>
> >> +/* I can handle TSO/USO for UDP tunneled packets */
> >> +#define TUN_F_UDP_TUNNEL_GSO           0x080
> >> +
> >> +/*
> >> + * I can handle TSO/USO for UDP tunneled packets requiring csum offload for
> >> + * the outer header
> >> + */
> >> +#define TUN_F_UDP_TUNNEL_GSO_CSUM      0x100
> >> +
> >
> > Any reason we don't choose to use 0x40 and 0x60?
>
> I just noticed I forgot to answer this one, I'm sorry.
>
> 0x40 is already in use (for TUN_F_USO6, as you can see above), and 0x60
> is a bitmask, not a single bit. I used the lowest available free bits.

You are right.

Thanks

>
> /P
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 3/8] vhost-net: allow configuring extended features
  2025-06-14  7:27     ` Paolo Abeni
@ 2025-06-16  8:57       ` Akihiko Odaki
  2025-06-16 10:27         ` Paolo Abeni
  0 siblings, 1 reply; 37+ messages in thread
From: Akihiko Odaki @ 2025-06-16  8:57 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/06/14 16:27, Paolo Abeni wrote:
> On 6/8/25 8:16 AM, Akihiko Odaki wrote:
>> On 2025/06/06 20:45, Paolo Abeni wrote:
>>> +
>>> +		/* Zero the trailing space provided by user-space, if any */
>>> +		if (i < count && clear_user(argp, (count - i) * sizeof(u64)))
>>
>> I think checking i < count is a premature optimization; it doesn't
>> matter even if we spend a bit longer because of the lack of the check.
> 
> FTR, the check is not an optimization. if `i` is greater than `count`,
> `clear_user` is going to try to clear almost all the memory space (the
> 2nd argument is an unsigned one) and will likely return with error.
> 
> I think it's functionally needed.

The for loop tells i cannot be greater than count, so i < count will be 
false only when i == count. clear_user() does nothing except incurring 
small overheads in that case. i < count can be safely removed.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-16  4:53       ` Jason Wang
@ 2025-06-16 10:17         ` Paolo Abeni
  2025-06-16 10:43           ` Paolo Abeni
  2025-06-17  3:24           ` Jason Wang
  0 siblings, 2 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-16 10:17 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Xuan Zhuo, Eugenio Pérez,
	Yuri Benditovich, Akihiko Odaki

On 6/16/25 6:53 AM, Jason Wang wrote:
> On Thu, Jun 12, 2025 at 7:03 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 6/12/25 6:55 AM, Jason Wang wrote:
>>> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>>>
>>>>         if (tun->flags & IFF_VNET_HDR) {
>>>>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
>>>> +               int parsed_size;
>>>>
>>>> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
>>>> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
>>>
>>> I still don't understand why we need to duplicate netdev features in
>>> flags, and it seems to introduce unnecessary complexities. Can we
>>> simply check dev->features instead?
>>>
>>> I think I've asked before, for example, we don't duplicate gso and
>>> csum for non tunnel packets.
>>
>> My fear was that if
>> - the guest negotiated VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO
>> - tun stores the negotiated offload info netdev->features
>> - the tun netdev UDP tunnel feature is disabled via ethtool
>>
>> tun may end-up sending to the guest packets without filling the tnl hdr,
>> which should be safe, as the driver should not use such info as no GSO
>> over UDP packets will go through, but is technically against the
>> specification.
> 
> Probably not? For example this is the way tun works with non tunnel GSO as well.
> 
> (And it allows the flexibility of debugging etc).
> 
>>
>> The current implementation always zero the whole virtio net hdr space,
>> so there is no such an issue.
>>
>> Still the additional complexity is ~5 lines and makes all the needed
>> information available on a single int, which is quite nice performance
>> wise. Do you have strong feeling against it?
> 
> See above and at least we can disallow the changing of UDP tunnel GSO
> (but I don't see too much value).

I'm sorry, but I don't understand what is the suggestion/request here.
Could you please phrase it?

Also please allow me to re-state my main point. The current
implementation adds a very limited amount of code in the control path,
and makes the data path simpler and faster - requiring no new argument
to the tun_hdr_* helper instead of (at least) one as the other alternative.

It looks like tun_hdr_* argument list could grow with every new feature,
but I think we should try to avoid that.

>>>> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
>>>>         if (metasize > 0)
>>>>                 skb_metadata_set(skb, metasize);
>>>>
>>>> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
>>>> +       /* Assume tun offloads are enabled if the provided hdr is large
>>>> +        * enough.
>>>> +        */
>>>> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
>>>> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
>>>> +               flags = tun->flags | TUN_VNET_TNL_MASK;
>>>> +       else
>>>> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;
>>>
>>> I'm not sure I get the point that we need dynamics of
>>> TUN_VNET_TNL_MASK here. We know if tunnel gso and its csum or enabled
>>> or not,
>>
>> How does tun know about VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or
>> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM?
> 
> I think it can be done in a way that works for non-tunnel gso.
> 
> The most complicated case is probably the case HOST_UDP_TUNNEL_X is
> enabled but GUEST_UDP_TUNNEL_X is not. In this case tun can know this
> by:
> 
> 1) vnet_hdr_len is large enough
> 2) UDP tunnel GSO is not enabled in netdev->features
> 
> If HOST_UDP_TUNNEL_X is not enabled by GUEST_UDP_TUNNEL_X is enabled,
> it can behave like existing non-tunnel GSO: still accept the UDP GSO
> tunnel packet.

AFAICS the text above matches/describes quite accurately the
implementation proposed in this patch and quoted above. Which in turn
confuses me, because I don't see what you would prefer to see
implemented differently.

>> The user-space does not tell the tun device about any of the host
>> offload features. Plain/baremetal GSO information are always available
>> in the basic virtio net header, so there is no size check, but the
>> overall behavior is similar - tun assumes the features have been
>> negotiated if the relevant bits are present in the header.
> 
> I'm not sure I understand here, there's no bit in the virtio net
> header that tells us if the packet contains the tunnel gso field. And
> the check of:
> 
> READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE
> 
> seems to be not buggy. As qemu already did:
> 
> static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
>                                        int version_1, int hash_report)
> {
>     int i;
>     NetClientState *nc;
> 
>     n->mergeable_rx_bufs = mergeable_rx_bufs;
> 
>     if (version_1) {
>         n->guest_hdr_len = hash_report ?
>             sizeof(struct virtio_net_hdr_v1_hash) :
>             sizeof(struct virtio_net_hdr_mrg_rxbuf);
>         n->rss_data.populate_hash = !!hash_report;
> 
> ...

Note that the qemu code quoted above does not include tunnel handling.

TUN_VNET_TNL_SIZE (== sizeof(struct virtio_net_hdr_v1_tunnel)) will be
too small when VIRTIO_NET_F_HASH_REPORT is enabled, too.

I did not handle that case here, due to the even greater overlapping with:

https://lore.kernel.org/netdev/20250530-rss-v12-0-95d8b348de91@daynix.com/

What I intended to do is:
- set another bit in `flags` according to the negotiated
VIRTIO_NET_F_HASH_REPORT value
- use such info in tun_vnet_parse_size() to computed the expected vnet
hdr len correctly.
- replace TUN_VNET_TNL_SIZE usage in tun.c with tun_vnet_parse_size() calls

I'm unsure if the above answer your question/doubt.

Anyhow I now see that keeping the UDP GSO related fields offset constant
regardless of VIRTIO_NET_F_HASH_REPORT would remove some ambiguity from
the relevant control path.

I think/hope we are still on time to update the specification clarifying
that, but I'm hesitant to take that path due to the additional
(hopefully small) overhead for the data path - and process overhead TBH.

On the flip (positive) side such action will decouple more this series
from the HASH_REPORT support.

Please advice, thanks!

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 3/8] vhost-net: allow configuring extended features
  2025-06-16  8:57       ` Akihiko Odaki
@ 2025-06-16 10:27         ` Paolo Abeni
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-16 10:27 UTC (permalink / raw)
  To: Akihiko Odaki, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 6/16/25 10:57 AM, Akihiko Odaki wrote:
> On 2025/06/14 16:27, Paolo Abeni wrote:
>> On 6/8/25 8:16 AM, Akihiko Odaki wrote:
>>> On 2025/06/06 20:45, Paolo Abeni wrote:
>>>> +
>>>> +		/* Zero the trailing space provided by user-space, if any */
>>>> +		if (i < count && clear_user(argp, (count - i) * sizeof(u64)))
>>>
>>> I think checking i < count is a premature optimization; it doesn't
>>> matter even if we spend a bit longer because of the lack of the check.
>>
>> FTR, the check is not an optimization. if `i` is greater than `count`,
>> `clear_user` is going to try to clear almost all the memory space (the
>> 2nd argument is an unsigned one) and will likely return with error.
>>
>> I think it's functionally needed.
> 
> The for loop tells i cannot be greater than count, so i < count will be 
> false only when i == count. 

Right you are, /me goes grabbing more coffee...

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-16 10:17         ` Paolo Abeni
@ 2025-06-16 10:43           ` Paolo Abeni
  2025-06-17  2:56             ` Jason Wang
  2025-06-17  3:24           ` Jason Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Paolo Abeni @ 2025-06-16 10:43 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Xuan Zhuo, Eugenio Pérez,
	Yuri Benditovich, Akihiko Odaki

On 6/16/25 12:17 PM, Paolo Abeni wrote:
> Anyhow I now see that keeping the UDP GSO related fields offset constant
> regardless of VIRTIO_NET_F_HASH_REPORT would remove some ambiguity from
> the relevant control path.
> 
> I think/hope we are still on time to update the specification clarifying
> that, but I'm hesitant to take that path due to the additional
> (hopefully small) overhead for the data path - and process overhead TBH.
> 
> On the flip (positive) side such action will decouple more this series
> from the HASH_REPORT support.

Whoops, I did not noticed:

https://lore.kernel.org/virtio-comment/20250401195655.486230-1-kshankar@marvell.com/
(virtio-spec commit 8d76f64d2198b34046fbedc3c835a6f75336a440 and
specifically:

"""
When calculating the size of \field{struct virtio_net_hdr}, the driver
MUST consider all the fields inclusive up to \field{padding_reserved_2}
// ...
i.e. 24 bytes if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is negotiated or up to
\field{padding_reserved}
"""

that is, UDP related fields are always in a fixed position, regardless
of VIRTIO_NET_F_HASH_REPORT (and other previous discussion not captured
in the spec clearly enough).

TL;DR: please scratch my previous comment above, I'll update the patches
accordingly (fixed UDP GSO field offset).

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-06-16  3:20       ` Jason Wang
@ 2025-06-16 14:44         ` Paolo Abeni
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Abeni @ 2025-06-16 14:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Willem de Bruijn, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On 6/16/25 5:20 AM, Jason Wang wrote:
> On Thu, Jun 12, 2025 at 6:18 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 6/12/25 6:05 AM, Jason Wang wrote:
>>> I wonder why virtio_net_chk_data_valid() is not part of the
>>> virtio_net_hdr_tnl_to_skb().
>>
>> It can't be part of virtio_net_hdr_tnl_to_skb(), as hdr to skb
>> conversion is actually not symmetric with respect to the checksum - only
>> the driver handles DATA_VALID.
>>
>> Tun must not call virtio_net_chk_data_valid()  (or whatever different
>> name will use).
> 
> Note that we've already dealt with this via introducing a boolean
> has_data_valid:
> 
> static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>                                           struct virtio_net_hdr *hdr,
>                                           bool little_endian,
>                                           bool has_data_valid,
>                                           int vlan_hlen)
> 

I'll keep the csum offload function separated, to avoid adding even more
argument to the common helper.

/P


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-16 10:43           ` Paolo Abeni
@ 2025-06-17  2:56             ` Jason Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Wang @ 2025-06-17  2:56 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Michael S. Tsirkin, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Mon, Jun 16, 2025 at 6:43 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 6/16/25 12:17 PM, Paolo Abeni wrote:
> > Anyhow I now see that keeping the UDP GSO related fields offset constant
> > regardless of VIRTIO_NET_F_HASH_REPORT would remove some ambiguity from
> > the relevant control path.
> >
> > I think/hope we are still on time to update the specification clarifying
> > that, but I'm hesitant to take that path due to the additional
> > (hopefully small) overhead for the data path - and process overhead TBH.
> >
> > On the flip (positive) side such action will decouple more this series
> > from the HASH_REPORT support.
>
> Whoops, I did not noticed:
>
> https://lore.kernel.org/virtio-comment/20250401195655.486230-1-kshankar@marvell.com/
> (virtio-spec commit 8d76f64d2198b34046fbedc3c835a6f75336a440 and
> specifically:
>
> """
> When calculating the size of \field{struct virtio_net_hdr}, the driver
> MUST consider all the fields inclusive up to \field{padding_reserved_2}
> // ...
> i.e. 24 bytes if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is negotiated or up to
> \field{padding_reserved}
> """
>
> that is, UDP related fields are always in a fixed position, regardless
> of VIRTIO_NET_F_HASH_REPORT (and other previous discussion not captured
> in the spec clearly enough).

This is why I feel strange when reading the patches. Everything would
be simplified if the offset is fixed.

>
> TL;DR: please scratch my previous comment above, I'll update the patches
> accordingly (fixed UDP GSO field offset).
>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.
  2025-06-16 10:17         ` Paolo Abeni
  2025-06-16 10:43           ` Paolo Abeni
@ 2025-06-17  3:24           ` Jason Wang
  1 sibling, 0 replies; 37+ messages in thread
From: Jason Wang @ 2025-06-17  3:24 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Michael S. Tsirkin, netdev, Willem de Bruijn, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

On Mon, Jun 16, 2025 at 6:17 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 6/16/25 6:53 AM, Jason Wang wrote:
> > On Thu, Jun 12, 2025 at 7:03 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 6/12/25 6:55 AM, Jason Wang wrote:
> >>> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >>>>
> >>>>         if (tun->flags & IFF_VNET_HDR) {
> >>>>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
> >>>> +               int parsed_size;
> >>>>
> >>>> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
> >>>> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
> >>>
> >>> I still don't understand why we need to duplicate netdev features in
> >>> flags, and it seems to introduce unnecessary complexities. Can we
> >>> simply check dev->features instead?
> >>>
> >>> I think I've asked before, for example, we don't duplicate gso and
> >>> csum for non tunnel packets.
> >>
> >> My fear was that if
> >> - the guest negotiated VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO
> >> - tun stores the negotiated offload info netdev->features
> >> - the tun netdev UDP tunnel feature is disabled via ethtool
> >>
> >> tun may end-up sending to the guest packets without filling the tnl hdr,
> >> which should be safe, as the driver should not use such info as no GSO
> >> over UDP packets will go through, but is technically against the
> >> specification.
> >
> > Probably not? For example this is the way tun works with non tunnel GSO as well.
> >
> > (And it allows the flexibility of debugging etc).
> >
> >>
> >> The current implementation always zero the whole virtio net hdr space,
> >> so there is no such an issue.
> >>
> >> Still the additional complexity is ~5 lines and makes all the needed
> >> information available on a single int, which is quite nice performance
> >> wise. Do you have strong feeling against it?
> >
> > See above and at least we can disallow the changing of UDP tunnel GSO
> > (but I don't see too much value).
>
> I'm sorry, but I don't understand what is the suggestion/request here.
> Could you please phrase it?

I meant I don't see strong reasons to duplicate tunnel gso/csum in tun->flags:

1) extra complexity
2) non tunnel gso doesn't do this (and for macvtap, it tries to
emulate netdev->features)
3) we can find way to disallow toggling tunnel gso/csum via ethtool
(but I don't see too much value)

>
> Also please allow me to re-state my main point. The current
> implementation adds a very limited amount of code in the control path,
> and makes the data path simpler and faster - requiring no new argument
> to the tun_hdr_* helper instead of (at least) one as the other alternative.
>
> It looks like tun_hdr_* argument list could grow with every new feature,
> but I think we should try to avoid that.

See above:

1) for HOST_UDP_XXX we can assume it is enabled
2) for GUEST_UDP_XXX we can check netdev->features

So passing netdev->features seems to be sufficient, it avoids
introducing new argument when a new offload is supported in the
future.

>
> >>>> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
> >>>>         if (metasize > 0)
> >>>>                 skb_metadata_set(skb, metasize);
> >>>>
> >>>> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
> >>>> +       /* Assume tun offloads are enabled if the provided hdr is large
> >>>> +        * enough.
> >>>> +        */
> >>>> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
> >>>> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
> >>>> +               flags = tun->flags | TUN_VNET_TNL_MASK;
> >>>> +       else
> >>>> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;
> >>>
> >>> I'm not sure I get the point that we need dynamics of
> >>> TUN_VNET_TNL_MASK here. We know if tunnel gso and its csum or enabled
> >>> or not,
> >>
> >> How does tun know about VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or
> >> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM?
> >
> > I think it can be done in a way that works for non-tunnel gso.
> >
> > The most complicated case is probably the case HOST_UDP_TUNNEL_X is
> > enabled but GUEST_UDP_TUNNEL_X is not. In this case tun can know this
> > by:
> >
> > 1) vnet_hdr_len is large enough
> > 2) UDP tunnel GSO is not enabled in netdev->features
> >
> > If HOST_UDP_TUNNEL_X is not enabled by GUEST_UDP_TUNNEL_X is enabled,
> > it can behave like existing non-tunnel GSO: still accept the UDP GSO
> > tunnel packet.
>
> AFAICS the text above matches/describes quite accurately the
> implementation proposed in this patch and quoted above. Which in turn
> confuses me, because I don't see what you would prefer to see
> implemented differently.

I meant those can be done without using tun->flags.

>
> >> The user-space does not tell the tun device about any of the host
> >> offload features. Plain/baremetal GSO information are always available
> >> in the basic virtio net header, so there is no size check, but the
> >> overall behavior is similar - tun assumes the features have been
> >> negotiated if the relevant bits are present in the header.
> >
> > I'm not sure I understand here, there's no bit in the virtio net
> > header that tells us if the packet contains the tunnel gso field. And
> > the check of:
> >
> > READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE
> >
> > seems to be not buggy. As qemu already did:
> >
> > static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
> >                                        int version_1, int hash_report)
> > {
> >     int i;
> >     NetClientState *nc;
> >
> >     n->mergeable_rx_bufs = mergeable_rx_bufs;
> >
> >     if (version_1) {
> >         n->guest_hdr_len = hash_report ?
> >             sizeof(struct virtio_net_hdr_v1_hash) :
> >             sizeof(struct virtio_net_hdr_mrg_rxbuf);
> >         n->rss_data.populate_hash = !!hash_report;
> >
> > ...
>
> Note that the qemu code quoted above does not include tunnel handling.
>
> TUN_VNET_TNL_SIZE (== sizeof(struct virtio_net_hdr_v1_tunnel)) will be
> too small when VIRTIO_NET_F_HASH_REPORT is enabled, too.
>
> I did not handle that case here, due to the even greater overlapping with:
>
> https://lore.kernel.org/netdev/20250530-rss-v12-0-95d8b348de91@daynix.com/
>
> What I intended to do is:
> - set another bit in `flags` according to the negotiated
> VIRTIO_NET_F_HASH_REPORT value
> - use such info in tun_vnet_parse_size() to computed the expected vnet
> hdr len correctly.
> - replace TUN_VNET_TNL_SIZE usage in tun.c with tun_vnet_parse_size() calls
>
> I'm unsure if the above answer your question/doubt.

For hash reporting since we don't have a netdev feature for that, a
new argument for tun_hdr_XXX() is probably needed for that

>
> Anyhow I now see that keeping the UDP GSO related fields offset constant
> regardless of VIRTIO_NET_F_HASH_REPORT would remove some ambiguity from
> the relevant control path.

Not a native speaker but I think the ambiguity mainly come from the
"Only if" that is something like

"Only if VIRTIO_NET_F_HASH_REPORT negotiated"

>
> I think/hope we are still on time to update the specification clarifying
> that, but I'm hesitant to take that path due to the additional
> (hopefully small) overhead for the data path - and process overhead TBH.
>
> On the flip (positive) side such action will decouple more this series
> from the HASH_REPORT support.
>
> Please advice, thanks!
>
> /P
>

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-06-17  3:24 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06 11:45 [PATCH RFC v3 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
2025-06-06 11:45 ` [PATCH RFC v3 1/8] virtio: introduce extended features Paolo Abeni
2025-06-08  5:49   ` Akihiko Odaki
2025-06-12  8:50     ` Paolo Abeni
2025-06-12  0:50   ` Jason Wang
2025-06-12  9:05     ` Paolo Abeni
2025-06-06 11:45 ` [PATCH RFC v3 2/8] virtio_pci_modern: allow configuring " Paolo Abeni
2025-06-12  0:57   ` Jason Wang
2025-06-12  9:34     ` Paolo Abeni
2025-06-06 11:45 ` [PATCH RFC v3 3/8] vhost-net: " Paolo Abeni
2025-06-08  6:16   ` Akihiko Odaki
2025-06-14  7:27     ` Paolo Abeni
2025-06-16  8:57       ` Akihiko Odaki
2025-06-16 10:27         ` Paolo Abeni
2025-06-12  1:31   ` Jason Wang
2025-06-06 11:45 ` [PATCH RFC v3 4/8] virtio_net: add supports for extended offloads Paolo Abeni
2025-06-06 11:45 ` [PATCH RFC v3 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
2025-06-12  3:53   ` Jason Wang
2025-06-12 10:10     ` Paolo Abeni
2025-06-16  3:17       ` Jason Wang
2025-06-06 11:45 ` [PATCH RFC v3 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
2025-06-12  4:05   ` Jason Wang
2025-06-12 10:17     ` Paolo Abeni
2025-06-16  3:20       ` Jason Wang
2025-06-16 14:44         ` Paolo Abeni
2025-06-06 11:45 ` [PATCH RFC v3 7/8] tun: " Paolo Abeni
2025-06-12  4:55   ` Jason Wang
2025-06-12 11:03     ` Paolo Abeni
2025-06-14 10:21       ` Paolo Abeni
2025-06-16  4:53       ` Jason Wang
2025-06-16 10:17         ` Paolo Abeni
2025-06-16 10:43           ` Paolo Abeni
2025-06-17  2:56             ` Jason Wang
2025-06-17  3:24           ` Jason Wang
2025-06-14  8:04     ` Paolo Abeni
2025-06-16  5:34       ` Jason Wang
2025-06-06 11:45 ` [PATCH RFC v3 8/8] vhost/net: " Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).