All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel
@ 2025-05-30 14:49 Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 1/8] virtio: introduce virtio_features_t Paolo Abeni
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.

The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.

Currently the kernel virtio support limits the feature space to 64,
while the virtio specification allows for a larger number of features.
Specifically the GSO-over-UDP-tunnel-related virtio features use bits
65-69.

The first four patches in this series rework the virtio and vhost
feature support to cope with up to 128 bits. The limit is arch-dependent:
only arches with native 128 integer support allow for the wider feature
space.

This implementation choice is aimed at keeping the code churn as
limited as possible. For the same reason, only the virtio_net driver is
reworked to leverage the extended feature space; all other
virtio/vhost drivers are unaffected, but could be upgraded to support
the extended features space in a later time.

The last four patches bring in the actual GSO over UDP tunnel support.
As per specification, some additional fields are introduced into the
virtio net header to support the new offload. The presence of such
fields depends on the negotiated features.

New helpers are introduced to convert the UDP-tunneled skb metadata to
an extended virtio net header and vice versa. Such helpers are used by
the tun and virtio_net driver to cope with the newly supported offloads.

Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.

--
v1 -> (rfc) v2:
  - fix build failures
  - many comment clarification
  - changed the vhost_net ioctl API
  - fixed some hdr <> skb helper bugs
v1: https://lore.kernel.org/netdev/cover.1747822866.git.pabeni@redhat.com/

Paolo Abeni (8):
  virtio: introduce virtio_features_t
  virtio_pci_modern: allow configuring extended features
  vhost-net: allow configuring extended features
  virtio_net: add supports for extended offloads
  net: implement virtio helpers to handle UDP GSO tunneling.
  virtio_net: enable gso over UDP tunnel support.
  tun: enable gso over UDP tunnel support.
  vhost/net: enable gso over UDP tunnel support.

 drivers/net/tun.c                      |  77 ++++++++--
 drivers/net/tun_vnet.h                 |  74 ++++++++--
 drivers/net/virtio_net.c               | 104 ++++++++++++--
 drivers/vhost/net.c                    |  71 ++++++++-
 drivers/vhost/vhost.h                  |   2 +-
 drivers/virtio/virtio.c                |  14 +-
 drivers/virtio/virtio_debug.c          |   2 +-
 drivers/virtio/virtio_pci_modern.c     |   6 +-
 drivers/virtio/virtio_pci_modern_dev.c |  44 +++---
 include/linux/virtio.h                 |   5 +-
 include/linux/virtio_config.h          |  32 +++--
 include/linux/virtio_features.h        |  27 ++++
 include/linux/virtio_net.h             | 191 +++++++++++++++++++++++--
 include/linux/virtio_pci_modern.h      |  10 +-
 include/uapi/linux/if_tun.h            |   9 ++
 include/uapi/linux/vhost.h             |   7 +
 include/uapi/linux/vhost_types.h       |   5 +
 include/uapi/linux/virtio_net.h        |  33 +++++
 18 files changed, 621 insertions(+), 92 deletions(-)
 create mode 100644 include/linux/virtio_features.h

-- 
2.49.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 1/8] virtio: introduce virtio_features_t
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-30 17:40   ` kernel test robot
  2025-05-31  5:37   ` Akihiko Odaki
  2025-05-30 14:49 ` [RFC PATCH v2 2/8] virtio_pci_modern: allow configuring extended features Paolo Abeni
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio_net driver.

Introduce an specific type to represent the virtio features bitmask.

On platform where 128 bits integer are available use such wide int
for the features bitmask, otherwise maintain the current u64.

Introduce an extended get_features128() configuration callback that
devices supporting the extended features range must implement in
place of the traditional one.

Note that legacy and transport features don't need any change, as
they are always in the low 64 bit range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - let u64 VIRTIO_BIT() cope with higher bit values
  - add .get_extended_features instead of changing .get_features signature
---
 drivers/virtio/virtio.c         | 14 +++++++-------
 drivers/virtio/virtio_debug.c   |  2 +-
 include/linux/virtio.h          |  5 +++--
 include/linux/virtio_config.h   | 32 ++++++++++++++++++++++----------
 include/linux/virtio_features.h | 27 +++++++++++++++++++++++++++
 5 files changed, 60 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/virtio_features.h

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 95d5d7993e5b..206ae8fa0654 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -272,22 +272,22 @@ static int virtio_dev_probe(struct device *_d)
 	int err, i;
 	struct virtio_device *dev = dev_to_virtio(_d);
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
-	u64 device_features;
-	u64 driver_features;
-	u64 driver_features_legacy;
+	virtio_features_t device_features;
+	virtio_features_t driver_features;
+	virtio_features_t driver_features_legacy;
 
 	/* We have a driver! */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
 	/* Figure out what features the device supports. */
-	device_features = dev->config->get_features(dev);
+	device_features = virtio_get_features(dev);
 
 	/* Figure out what features the driver supports. */
 	driver_features = 0;
 	for (i = 0; i < drv->feature_table_size; i++) {
 		unsigned int f = drv->feature_table[i];
-		BUG_ON(f >= 64);
-		driver_features |= (1ULL << f);
+		BUG_ON(f >= VIRTIO_FEATURES_MAX);
+		driver_features |= VIRTIO_BIT(f);
 	}
 
 	/* Some drivers have a separate feature table for virtio v1.0 */
@@ -320,7 +320,7 @@ static int virtio_dev_probe(struct device *_d)
 		goto err;
 
 	if (drv->validate) {
-		u64 features = dev->features;
+		virtio_features_t features = dev->features;
 
 		err = drv->validate(dev);
 		if (err)
diff --git a/drivers/virtio/virtio_debug.c b/drivers/virtio/virtio_debug.c
index 95c8fc7705bb..5ca95422d3ca 100644
--- a/drivers/virtio/virtio_debug.c
+++ b/drivers/virtio/virtio_debug.c
@@ -12,7 +12,7 @@ static int virtio_debug_device_features_show(struct seq_file *s, void *data)
 	u64 device_features;
 	unsigned int i;
 
-	device_features = dev->config->get_features(dev);
+	device_features = virtio_get_features(dev);
 	for (i = 0; i < BITS_PER_LONG_LONG; i++) {
 		if (device_features & (1ULL << i))
 			seq_printf(s, "%u\n", i);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 64cb4b04be7a..6e51400d0463 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -11,6 +11,7 @@
 #include <linux/gfp.h>
 #include <linux/dma-mapping.h>
 #include <linux/completion.h>
+#include <linux/virtio_features.h>
 
 /**
  * struct virtqueue - a queue to register buffers for sending or receiving.
@@ -159,11 +160,11 @@ struct virtio_device {
 	const struct virtio_config_ops *config;
 	const struct vringh_config_ops *vringh_config;
 	struct list_head vqs;
-	u64 features;
+	virtio_features_t features;
 	void *priv;
 #ifdef CONFIG_VIRTIO_DEBUG
 	struct dentry *debugfs_dir;
-	u64 debugfs_filter_features;
+	virtio_features_t debugfs_filter_features;
 #endif
 };
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 169c7d367fac..1cc43d9cf6e8 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -77,7 +77,10 @@ struct virtqueue_info {
  *      vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  *	vdev: the virtio_device
- *	Returns the first 64 feature bits (all we currently need).
+ *	Returns the first 64 feature bits.
+ * @get_extended_features:
+ *      vdev: the virtio_device
+ *      Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
  * @finalize_features: confirm what device features we'll be using.
  *	vdev: the virtio_device
  *	This sends the driver feature bits to the device: it can change
@@ -121,6 +124,7 @@ struct virtio_config_ops {
 	void (*del_vqs)(struct virtio_device *);
 	void (*synchronize_cbs)(struct virtio_device *);
 	u64 (*get_features)(struct virtio_device *vdev);
+	virtio_features_t (*get_extended_features)(struct virtio_device *vdev);
 	int (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
 	int (*set_vq_affinity)(struct virtqueue *vq,
@@ -149,11 +153,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	return vdev->features & BIT_ULL(fbit);
+	return vdev->features & VIRTIO_BIT(fbit);
 }
 
 /**
@@ -166,11 +170,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	vdev->features |= BIT_ULL(fbit);
+	vdev->features |= VIRTIO_BIT(fbit);
 }
 
 /**
@@ -183,11 +187,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 64);
+		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 	else
-		BUG_ON(fbit >= 64);
+		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
 
-	vdev->features &= ~BIT_ULL(fbit);
+	vdev->features &= ~VIRTIO_BIT(fbit);
 }
 
 /**
@@ -204,6 +208,14 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 	return __virtio_test_bit(vdev, fbit);
 }
 
+static inline virtio_features_t virtio_get_features(struct virtio_device *vdev)
+{
+	if (vdev->config->get_extended_features)
+		return vdev->config->get_extended_features(vdev);
+
+	return vdev->config->get_features(vdev);
+}
+
 /**
  * virtio_has_dma_quirk - determine whether this device has the DMA quirk
  * @vdev: the device
diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
new file mode 100644
index 000000000000..0a7e2265f8b4
--- /dev/null
+++ b/include/linux/virtio_features.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_FEATURES_H
+#define _LINUX_VIRTIO_FEATURES_H
+
+#include <linux/bits.h>
+
+#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
+#define VIRTIO_HAS_EXTENDED_FEATURES
+#define VIRTIO_FEATURES_MAX	128
+#define VIRTIO_FEATURES_WORDS	4
+#define VIRTIO_BIT(b)		_BIT128(b)
+
+typedef __uint128_t virtio_features_t;
+
+#else
+#define VIRTIO_FEATURES_MAX	64
+#define VIRTIO_FEATURES_WORDS	2
+
+static inline u64 VIRTIO_BIT(int bit)
+{
+	return bit >= 64 ? 0 : BIT_ULL(b);
+}
+
+typedef u64 virtio_features_t;
+#endif
+
+#endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 2/8] virtio_pci_modern: allow configuring extended features
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 1/8] virtio: introduce virtio_features_t Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 3/8] vhost-net: " Paolo Abeni
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio specifications allows for up to 128 bits for the
device features. Soon we are going to use some of the 'extended'
bits features (above 64) for the virtio_net driver.

Extend the virtio pci modern driver to support configuring the full
virtio features range, replacing the unrolled loops reading and
writing the features space with explicit one bounded to the actual
features space size in word and implementing the get_extended_features
callback.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - use get_extended_features
---
 drivers/virtio/virtio_pci_modern.c     |  6 ++--
 drivers/virtio/virtio_pci_modern_dev.c | 44 ++++++++++++++++----------
 include/linux/virtio_pci_modern.h      | 10 ++++--
 3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d50fe030d825..7a66844e84f8 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -22,7 +22,7 @@
 
 #define VIRTIO_AVQ_SGS_MAX	4
 
-static u64 vp_get_features(struct virtio_device *vdev)
+static virtio_features_t vp_get_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
@@ -1223,7 +1223,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
 	.synchronize_cbs = vp_synchronize_vectors,
-	.get_features	= vp_get_features,
+	.get_extended_features = vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
@@ -1243,7 +1243,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
 	.synchronize_cbs = vp_synchronize_vectors,
-	.get_features	= vp_get_features,
+	.get_extended_features = vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 0d3dbfaf4b23..e3025b6fa854 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -393,16 +393,19 @@ EXPORT_SYMBOL_GPL(vp_modern_remove);
  *
  * Returns the features read from the device
  */
-u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev)
+virtio_features_t vp_modern_get_features(struct virtio_pci_modern_device *mdev)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	virtio_features_t features = 0;
+	int i;
 
-	u64 features;
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		virtio_features_t cur;
 
-	vp_iowrite32(0, &cfg->device_feature_select);
-	features = vp_ioread32(&cfg->device_feature);
-	vp_iowrite32(1, &cfg->device_feature_select);
-	features |= ((u64)vp_ioread32(&cfg->device_feature) << 32);
+		vp_iowrite32(i, &cfg->device_feature_select);
+		cur = vp_ioread32(&cfg->device_feature);
+		features |= cur << (32 * i);
+	}
 
 	return features;
 }
@@ -414,16 +417,20 @@ EXPORT_SYMBOL_GPL(vp_modern_get_features);
  *
  * Returns the driver features read from the device
  */
-u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
+virtio_features_t
+vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	virtio_features_t features = 0;
+	int i;
 
-	u64 features;
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		virtio_features_t cur;
 
-	vp_iowrite32(0, &cfg->guest_feature_select);
-	features = vp_ioread32(&cfg->guest_feature);
-	vp_iowrite32(1, &cfg->guest_feature_select);
-	features |= ((u64)vp_ioread32(&cfg->guest_feature) << 32);
+		vp_iowrite32(i, &cfg->guest_feature_select);
+		cur = vp_ioread32(&cfg->guest_feature);
+		features |= cur << (32 * i);
+	}
 
 	return features;
 }
@@ -435,14 +442,17 @@ EXPORT_SYMBOL_GPL(vp_modern_get_driver_features);
  * @features: the features set to device
  */
 void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
-			    u64 features)
+			    virtio_features_t features)
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+	int i;
 
-	vp_iowrite32(0, &cfg->guest_feature_select);
-	vp_iowrite32((u32)features, &cfg->guest_feature);
-	vp_iowrite32(1, &cfg->guest_feature_select);
-	vp_iowrite32(features >> 32, &cfg->guest_feature);
+	for (i = 0; i < VIRTIO_FEATURES_WORDS; i++) {
+		u32 cur = features >> (32 * i);
+
+		vp_iowrite32(i, &cfg->guest_feature_select);
+		vp_iowrite32(cur, &cfg->guest_feature);
+	}
 }
 EXPORT_SYMBOL_GPL(vp_modern_set_features);
 
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index c0b1b1ca1163..77e4ca3a6e2f 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -3,6 +3,7 @@
 #define _LINUX_VIRTIO_PCI_MODERN_H
 
 #include <linux/pci.h>
+#include <linux/virtio_config.h>
 #include <linux/virtio_pci.h>
 
 /**
@@ -95,10 +96,13 @@ static inline void vp_iowrite64_twopart(u64 val,
 	vp_iowrite32(val >> 32, hi);
 }
 
-u64 vp_modern_get_features(struct virtio_pci_modern_device *mdev);
-u64 vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
+virtio_features_t
+vp_modern_get_features(struct virtio_pci_modern_device *mdev);
+
+virtio_features_t
+vp_modern_get_driver_features(struct virtio_pci_modern_device *mdev);
 void vp_modern_set_features(struct virtio_pci_modern_device *mdev,
-		     u64 features);
+		     virtio_features_t features);
 u32 vp_modern_generation(struct virtio_pci_modern_device *mdev);
 u8 vp_modern_get_status(struct virtio_pci_modern_device *mdev);
 void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 3/8] vhost-net: allow configuring extended features
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 1/8] virtio: introduce virtio_features_t Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 2/8] virtio_pci_modern: allow configuring extended features Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-31  6:15   ` Akihiko Odaki
  2025-05-30 14:49 ` [RFC PATCH v2 4/8] virtio_net: add supports for extended offloads Paolo Abeni
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Use the extended feature type for 'acked_features' and implement
two new ioctls operation allowing the user-space to set/query an
unbounded amount of features.

The actual number of processed features is limited by virtio_features_t
size, and attempts to set features above such limit fail with
EOPNOTSUPP.

Note that the legacy ioctls implicitly truncate the negotiated
features to the lower 64 bits range.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - change the ioctl to use an extensible API
---
 drivers/vhost/net.c              | 61 ++++++++++++++++++++++++++++++--
 drivers/vhost/vhost.h            |  2 +-
 include/uapi/linux/vhost.h       |  7 ++++
 include/uapi/linux/vhost_types.h |  5 +++
 4 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7cbfc7d718b3..f53294440695 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -77,6 +77,8 @@ enum {
 			 (1ULL << VIRTIO_F_RING_RESET)
 };
 
+#define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
+
 enum {
 	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
 };
@@ -1614,7 +1616,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
 	return err;
 }
 
-static int vhost_net_set_features(struct vhost_net *n, u64 features)
+static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
 {
 	size_t vhost_hlen, sock_hlen, hdr_len;
 	int i;
@@ -1685,8 +1687,9 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 	void __user *argp = (void __user *)arg;
 	u64 __user *featurep = argp;
 	struct vhost_vring_file backend;
-	u64 features;
-	int r;
+	virtio_features_t all_features;
+	u64 features, count;
+	int r, i;
 
 	switch (ioctl) {
 	case VHOST_NET_SET_BACKEND:
@@ -1704,6 +1707,58 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 		if (features & ~VHOST_NET_FEATURES)
 			return -EOPNOTSUPP;
 		return vhost_net_set_features(n, features);
+	case VHOST_GET_FEATURES_ARRAY:
+	{
+		if (copy_from_user(&count, argp, sizeof(u64)))
+			return -EFAULT;
+
+		/* Copy the net features, up to the user-provided buffer size */
+		all_features = VHOST_NET_ALL_FEATURES;
+		for (i = 0; i < min(VIRTIO_FEATURES_WORDS / 2, count); ++i) {
+			argp += sizeof(u64);
+			features = all_features >> (64 * i);
+			if (copy_to_user(argp, &features, sizeof(u64)))
+				return -EFAULT;
+		}
+
+		/* Zero the trailing space provided by user-space, if any */
+		features = 0;
+		for (; i < count; ++i) {
+			argp += sizeof(u64);
+			if (copy_to_user(argp, &features, sizeof(u64)))
+				return -EFAULT;
+		}
+		return 0;
+	}
+	case VHOST_SET_FEATURES_ARRAY:
+	{
+		if (copy_from_user(&count, argp, sizeof(u64)))
+			return -EFAULT;
+
+		all_features = 0;
+		for (i = 0; i < min(count, VIRTIO_FEATURES_WORDS / 2); ++i) {
+			argp += sizeof(u64);
+			if (copy_from_user(&features, argp, sizeof(u64)))
+				return -EFAULT;
+
+			all_features |= ((virtio_features_t)features) << (64 * i);
+		}
+
+		/* Any feature specified by user-space above VIRTIO_FEATURES_MAX is
+		 * not supported by definition.
+		 */
+		for (; i < count; ++i) {
+			if (copy_from_user(&features, argp, sizeof(u64)))
+				return -EFAULT;
+			if (features)
+				return -EOPNOTSUPP;
+		}
+
+		if (all_features & ~VHOST_NET_ALL_FEATURES)
+			return -EOPNOTSUPP;
+
+		return vhost_net_set_features(n, all_features);
+	}
 	case VHOST_GET_BACKEND_FEATURES:
 		features = VHOST_NET_BACKEND_FEATURES;
 		if (copy_to_user(featurep, &features, sizeof(features)))
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index bb75a292d50c..ef1c7fd6f4e1 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -133,7 +133,7 @@ struct vhost_virtqueue {
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	void *private_data;
-	u64 acked_features;
+	virtio_features_t acked_features;
 	u64 acked_backend_features;
 	/* Log write descriptors */
 	void __user *log_base;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index d4b3e2ae1314..d6ad01fbb8d2 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -235,4 +235,11 @@
  */
 #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
 					      struct vhost_vring_state)
+
+/* Extended features manipulation */
+#define VHOST_GET_FEATURES_ARRAY _IOR(VHOST_VIRTIO, 0x83, \
+				       struct vhost_features_array)
+#define VHOST_SET_FEATURES_ARRAY _IOW(VHOST_VIRTIO, 0x83, \
+				       struct vhost_features_array)
+
 #endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d7656908f730..3f227114c557 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -110,6 +110,11 @@ struct vhost_msg_v2 {
 	};
 };
 
+struct vhost_features_array {
+	__u64 count; /* number of entries present in features array */
+	__u64 features[];
+};
+
 struct vhost_memory_region {
 	__u64 guest_phys_addr;
 	__u64 memory_size; /* bytes */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 4/8] virtio_net: add supports for extended offloads
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-05-30 14:49 ` [RFC PATCH v2 3/8] vhost-net: " Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-31  6:18   ` Akihiko Odaki
  2025-05-30 14:49 ` [RFC PATCH v2 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio_net driver needs it to implement GSO over UDP tunnel
offload.

The only missing piece is mapping them to/from the extended
features.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - drop unused macro
 - restrict the offload remap range as per latest spec update
---
 drivers/net/virtio_net.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e53ba600605a..ec638b4aa1c1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -35,6 +35,24 @@ module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
 
+#define VIRTIO_OFFLOAD_MAP_MIN	46
+#define VIRTIO_OFFLOAD_MAP_MAX	47
+#define VIRTIO_FEATURES_MAP_MIN	65
+#define VIRTIO_O2F_DELTA	(VIRTIO_FEATURES_MAP_MIN - VIRTIO_OFFLOAD_MAP_MIN)
+
+static bool virtio_is_mapped_offload(unsigned int obit)
+{
+	return obit >= VIRTIO_OFFLOAD_MAP_MIN &&
+	       obit <= VIRTIO_OFFLOAD_MAP_MAX;
+}
+
+#define VIRTIO_OFFLOAD_TO_FEATURE(obit)	\
+	({								\
+		unsigned int __o = obit;				\
+		virtio_is_mapped_offload(__o) ? __o + VIRTIO_O2F_DELTA :\
+						__o;			\
+	})
+
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
@@ -7037,9 +7055,13 @@ static int virtnet_probe(struct virtio_device *vdev)
 		netif_carrier_on(dev);
 	}
 
-	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
-		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
+	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++) {
+		unsigned int fbit;
+
+		fbit = VIRTIO_OFFLOAD_TO_FEATURE(guest_offloads[i]);
+		if (virtio_has_feature(vi->vdev, fbit))
 			set_bit(guest_offloads[i], &vi->guest_offloads);
+	}
 	vi->guest_offloads_capable = vi->guest_offloads;
 
 	rtnl_unlock();
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 5/8] net: implement virtio helpers to handle UDP GSO tunneling.
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-05-30 14:49 ` [RFC PATCH v2 4/8] virtio_net: add supports for extended offloads Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

The virtio specification are introducing support for GSO over
UDP tunnel.

This patch brings in the needed defines and the additional
virtio hdr parsing/building helpers.

The UDP tunnel support uses additional fields in the virtio hdr,
and such fields location can change depending on other negotiated
features - specifically VIRTIO_NET_F_HASH_REPORT.

Try to be as conservative as possible with the new field validation.

Existing implementation for plain GSO offloads allow for invalid/
self-contradictory values of such fields. With GSO over UDP tunnel we can
be more strict, with no need to deal with legacy implementation.

Since the checksum-related field validation is asymmetric in the driver
and in the device, introduce a separate helper to implement the new checks
(to be used only on the driver side).

Note that while the feature space exceeds the 64-bit boundaries, the
guest offload space is fixed by the specification of the
VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command to a 64-bit size.

Prior to the UDP tunnel GSO support, each guest offload bit corresponded
to the feature bit with the same value and vice versa.

Due to the limited 'guest offload' space, relevant features in the high
64 bits are 'mapped' to free bits in the lower range. That is simpler
than defining a new command (and associated features) to exchange an
extended guest offloads set.

As a consequence, the uAPIs also specify the mapped guest offload value
corresponding to the UDP tunnel GSO features.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
v1 -> v2:
  - 'relay' -> 'rely' typo
  - less unclear comment WRT enforced inner GSO checks
  - inner header fields are allowed only with 'modern' virtio,
    thus are always le
  - clarified in the commit message the need for 'mapped features'
    defines
  - assume little_endian is true when UDP GSO is enabled.
  - fix inner proto type value
---
 include/linux/virtio_net.h      | 191 ++++++++++++++++++++++++++++++--
 include/uapi/linux/virtio_net.h |  33 ++++++
 2 files changed, 216 insertions(+), 8 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 02a9f4dc594d..bcf80534a739 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -47,9 +47,9 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
 	return 0;
 }
 
-static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
-					const struct virtio_net_hdr *hdr,
-					bool little_endian)
+static inline int __virtio_net_hdr_to_skb(struct sk_buff *skb,
+					  const struct virtio_net_hdr *hdr,
+					  bool little_endian, u8 hdr_gso_type)
 {
 	unsigned int nh_min_len = sizeof(struct iphdr);
 	unsigned int gso_type = 0;
@@ -57,8 +57,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 	unsigned int p_off = 0;
 	unsigned int ip_proto;
 
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+	if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		switch (hdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 			gso_type = SKB_GSO_TCPV4;
 			ip_proto = IPPROTO_TCP;
@@ -84,7 +84,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 			return -EINVAL;
 		}
 
-		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
+		if (hdr_gso_type & VIRTIO_NET_HDR_GSO_ECN)
 			gso_type |= SKB_GSO_TCP_ECN;
 
 		if (hdr->gso_size == 0)
@@ -122,7 +122,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 
 				if (!protocol)
 					virtio_net_hdr_set_proto(skb, hdr);
-				else if (!virtio_net_hdr_match_proto(protocol, hdr->gso_type))
+				else if (!virtio_net_hdr_match_proto(protocol, hdr_gso_type))
 					return -EINVAL;
 				else
 					skb->protocol = protocol;
@@ -153,7 +153,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 		}
 	}
 
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+	if (hdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
 		u16 gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
 		unsigned int nh_off = p_off;
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
@@ -199,6 +199,13 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 	return 0;
 }
 
+static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
+					const struct virtio_net_hdr *hdr,
+					bool little_endian)
+{
+	return __virtio_net_hdr_to_skb(skb, hdr, little_endian, hdr->gso_type);
+}
+
 static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 					  struct virtio_net_hdr *hdr,
 					  bool little_endian,
@@ -242,4 +249,172 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 	return 0;
 }
 
+static inline unsigned int virtio_l3min(bool is_ipv6)
+{
+	return is_ipv6 ? sizeof(struct ipv6hdr) : sizeof(struct iphdr);
+}
+
+static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb,
+					    const struct virtio_net_hdr *hdr,
+					    unsigned int tnl_hdr_offset,
+					    bool tnl_csum_negotiated,
+					    bool little_endian)
+{
+	u8 gso_tunnel_type = hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
+	unsigned int inner_nh, outer_th, inner_th;
+	unsigned int inner_l3min, outer_l3min;
+	struct virtio_net_hdr_tunnel *tnl;
+	bool outer_isv6, inner_isv6;
+	u8 gso_inner_type;
+	int ret;
+
+	if (!gso_tunnel_type)
+		return virtio_net_hdr_to_skb(skb, hdr, little_endian);
+
+	/* Tunnel not supported/negotiated, but the hdr asks for it. */
+	if (!tnl_hdr_offset)
+		return -EINVAL;
+
+	/* Either ipv4 or ipv6. */
+	if (gso_tunnel_type == VIRTIO_NET_HDR_GSO_UDP_TUNNEL)
+		return -EINVAL;
+
+	/* The UDP tunnel must carry a GSO packet, but no UFO. */
+	gso_inner_type = hdr->gso_type & ~(VIRTIO_NET_HDR_GSO_ECN |
+					   VIRTIO_NET_HDR_GSO_UDP_TUNNEL);
+	if (!gso_inner_type || gso_inner_type == VIRTIO_NET_HDR_GSO_UDP)
+		return -EINVAL;
+
+	/* Rely on csum being present. */
+	if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
+		return -EINVAL;
+
+	/* Validate offsets. */
+	outer_isv6 = gso_tunnel_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
+	inner_isv6 = gso_inner_type == VIRTIO_NET_HDR_GSO_TCPV6;
+	inner_l3min = virtio_l3min(inner_isv6);
+	outer_l3min = ETH_HLEN + virtio_l3min(outer_isv6);
+
+	tnl = ((void *)hdr) + tnl_hdr_offset;
+	inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start);
+	inner_nh = le16_to_cpu(tnl->inner_nh_offset);
+	outer_th = le16_to_cpu(tnl->outer_th_offset);
+	if (outer_th < outer_l3min ||
+	    inner_nh < outer_th + sizeof(struct udphdr) ||
+	    inner_th < inner_nh + inner_l3min)
+		return -EINVAL;
+
+	/* Let the basic parsing deal with plain GSO features. */
+	ret = __virtio_net_hdr_to_skb(skb, hdr, true,
+				      hdr->gso_type & ~gso_tunnel_type);
+	if (ret)
+		return ret;
+
+	/* In case of USO, the inner protocol is still unknown and
+	 * `inner_isv6` is just a guess, additional parsing is needed.
+	 * The previous validation ensures that accessing an ipv4 inner
+	 * network header is safe.
+	 */
+	if (gso_inner_type == VIRTIO_NET_HDR_GSO_UDP_L4) {
+		struct iphdr *iphdr = (struct iphdr *)(skb->data + inner_nh);
+
+		inner_isv6 = iphdr->version == 6;
+		inner_l3min = virtio_l3min(inner_isv6);
+		if (inner_th < inner_nh + inner_l3min)
+			return -EINVAL;
+	}
+
+	skb_set_inner_protocol(skb, inner_isv6 ? htons(ETH_P_IPV6) :
+						 htons(ETH_P_IP));
+	if (hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM) {
+		if (!tnl_csum_negotiated)
+			return -EINVAL;
+
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	} else {
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
+	}
+
+	skb->inner_transport_header = inner_th + skb_headroom(skb);
+	skb->inner_network_header = inner_nh + skb_headroom(skb);
+	skb->inner_mac_header = inner_nh + skb_headroom(skb);
+	skb->transport_header = outer_th + skb_headroom(skb);
+	skb->encapsulation = 1;
+	return 0;
+}
+
+/* Checksum-related fields validation for the driver */
+static inline int virtio_net_chk_data_valid(struct sk_buff *skb,
+					    struct virtio_net_hdr *hdr,
+					    bool tnl_csum_negotiated)
+{
+	if (!(hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)) {
+		if (!(hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID))
+			return 0;
+
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		if (!(hdr->flags & VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM))
+			return 0;
+
+		/* tunnel csum packets are invalid when the related
+		 * feature has not been negotiated
+		 */
+		if (!tnl_csum_negotiated)
+			return -EINVAL;
+		skb->csum_level = 1;
+		return 0;
+	}
+
+	/* DATA_VALID is mutually exclusive with NEEDS_CSUM, and GSO
+	 * over UDP tunnel requires the latter
+	 */
+	if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID)
+		return -EINVAL;
+	return 0;
+}
+
+static inline int virtio_net_hdr_tnl_from_skb(const struct sk_buff *skb,
+					      struct virtio_net_hdr *hdr,
+					      unsigned int tnl_offset,
+					      bool little_endian,
+					      int vlan_hlen)
+{
+	struct virtio_net_hdr_tunnel *tnl;
+	unsigned int inner_nh, outer_th;
+	int tnl_gso_type;
+	int ret;
+
+	tnl_gso_type = skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL |
+						    SKB_GSO_UDP_TUNNEL_CSUM);
+	if (!tnl_gso_type)
+		return virtio_net_hdr_from_skb(skb, hdr, little_endian, false,
+					       vlan_hlen);
+
+	/* Tunnel support not negotiated but skb ask for it. */
+	if (!tnl_offset)
+		return -EINVAL;
+
+	/* Let the basic parsing deal with plain GSO features. */
+	skb_shinfo(skb)->gso_type &= ~tnl_gso_type;
+	ret = virtio_net_hdr_from_skb(skb, hdr, true, false, vlan_hlen);
+	skb_shinfo(skb)->gso_type |= tnl_gso_type;
+	if (ret)
+		return ret;
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6;
+	else
+		hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
+		hdr->flags |= VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM;
+
+	tnl = ((void *)hdr) + tnl_offset;
+	inner_nh = skb->inner_network_header - skb_headroom(skb);
+	outer_th = skb->transport_header - skb_headroom(skb);
+	tnl->inner_nh_offset = cpu_to_le16(inner_nh);
+	tnl->outer_th_offset = cpu_to_le16(outer_th);
+	return 0;
+}
+
 #endif /* _LINUX_VIRTIO_NET_H */
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 963540deae66..3d0522166e67 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -70,6 +70,28 @@
 					 * with the same MAC.
 					 */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
+					      * GSO-over-UDP-tunnel packets
+					      */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
+						   * GSO-over-UDP-tunnel
+						   * packets with partial csum
+						   * for the outer header
+						   */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
+					     * GSO-over-UDP-tunnel packets
+					     */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
+						  * GSO-over-UDP-tunnel
+						  * packets with partial csum
+						  * for the outer header
+						  */
+
+/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
+ * features
+ */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
@@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
 #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
 #define VIRTIO_NET_HDR_F_RSC_INFO	4	/* rsc info in csum_ fields */
+#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8	/* UDP tunnel requires csum offload */
 	__u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
 #define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDP over IPv4 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDP over IPv6 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
+				       VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
 	__u8 gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
@@ -181,6 +208,12 @@ struct virtio_net_hdr_v1_hash {
 	__le16 padding;
 };
 
+/* This header after hashing information */
+struct virtio_net_hdr_tunnel {
+	__le16 outer_th_offset;
+	__le16 inner_nh_offset;
+};
+
 #ifndef VIRTIO_NET_NO_LEGACY
 /* This header comes first in the scatter-gather list.
  * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 6/8] virtio_net: enable gso over UDP tunnel support.
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (4 preceding siblings ...)
  2025-05-30 14:49 ` [RFC PATCH v2 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 7/8] tun: " Paolo Abeni
  2025-05-30 14:49 ` [RFC PATCH v2 8/8] vhost/net: " Paolo Abeni
  7 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

If the related virtio feature is set, enable transmission and reception
of gso over UDP tunnel packets.

Most of the work is done by the previously introduced helper, just need
to determine the UDP tunnel features inside the virtio_net_hdr and
update accordingly the virtio net hdr size.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - test for UDP_TUNNEL_GSO* only on builds with extended features support
  - comment indentation cleanup
  - rebased on top of virtio helpers changes
  - dump more information in case of bad offloads
---
 drivers/net/virtio_net.c | 78 +++++++++++++++++++++++++++++++++-------
 1 file changed, 66 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ec638b4aa1c1..cee23bd8dac2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -80,16 +80,30 @@ static const unsigned long guest_offloads[] = {
 	VIRTIO_NET_F_GUEST_CSUM,
 	VIRTIO_NET_F_GUEST_USO4,
 	VIRTIO_NET_F_GUEST_USO6,
-	VIRTIO_NET_F_GUEST_HDRLEN
+	VIRTIO_NET_F_GUEST_HDRLEN,
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED,
+#endif
 };
 
-#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+#define __GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
 				(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
 				(1ULL << VIRTIO_NET_F_GUEST_UFO)  | \
 				(1ULL << VIRTIO_NET_F_GUEST_USO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_USO6))
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+
+#define GUEST_OFFLOAD_GRO_HW_MASK (__GUEST_OFFLOAD_GRO_HW_MASK | \
+	(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED) | \
+	(1ULL << VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED))
+#else
+
+#define GUEST_OFFLOAD_GRO_HW_MASK __GUEST_OFFLOAD_GRO_HW_MASK
+#endif
+
 struct virtnet_stat_desc {
 	char desc[ETH_GSTRING_LEN];
 	size_t offset;
@@ -438,9 +452,14 @@ struct virtnet_info {
 	/* Packet virtio header size */
 	u8 hdr_len;
 
+	/* UDP tunnel support */
+	u8 tnl_offset;
+
 	/* Work struct for delayed refilling if we run low on memory. */
 	struct delayed_work refill;
 
+	bool rx_tnl_csum;
+
 	/* Is delayed refill enabled? */
 	bool refill_enabled;
 
@@ -2533,14 +2552,22 @@ static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *
 	if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
 		virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
 
-	if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
-		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	/* restore the received value */
+	hdr->hdr.flags = flags;
+	if (virtio_net_chk_data_valid(skb, &hdr->hdr, vi->rx_tnl_csum)) {
+		net_warn_ratelimited("%s: bad csum: flags: %x, gso_type: %x rx_tnl_csum %d\n",
+				     dev->name, hdr->hdr.flags,
+				     hdr->hdr.gso_type, vi->rx_tnl_csum);
+		goto frame_err;
+	}
 
-	if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
-				  virtio_is_little_endian(vi->vdev))) {
-		net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
+	if (virtio_net_hdr_tnl_to_skb(skb, &hdr->hdr, vi->tnl_offset,
+				      vi->rx_tnl_csum,
+				      virtio_is_little_endian(vi->vdev))) {
+		net_warn_ratelimited("%s: bad gso: type: %x, size: %u, flags %x tunnel %d tnl csum %d\n",
 				     dev->name, hdr->hdr.gso_type,
-				     hdr->hdr.gso_size);
+				     hdr->hdr.gso_size, hdr->hdr.flags,
+				     vi->tnl_offset,vi->rx_tnl_csum);
 		goto frame_err;
 	}
 
@@ -3271,9 +3298,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb, bool orphan)
 	else
 		hdr = &skb_vnet_common_hdr(skb)->mrg_hdr;
 
-	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
-				    virtio_is_little_endian(vi->vdev), false,
-				    0))
+	if (virtio_net_hdr_tnl_from_skb(skb, &hdr->hdr, vi->tnl_offset,
+					virtio_is_little_endian(vi->vdev), 0))
 		return -EPROTO;
 
 	if (vi->mergeable_rx_bufs)
@@ -6777,10 +6803,22 @@ static int virtnet_probe(struct virtio_device *vdev)
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_USO))
 			dev->hw_features |= NETIF_F_GSO_UDP_L4;
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)) {
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+			dev->hw_enc_features = dev->hw_features;
+		}
+		if (dev->hw_features & NETIF_F_GSO_UDP_TUNNEL &&
+		    virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM)) {
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+		}
+#endif
+
 		dev->features |= NETIF_F_GSO_ROBUST;
 
 		if (gso)
-			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
+			dev->features |= dev->hw_features;
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 
@@ -6881,6 +6919,16 @@ static int virtnet_probe(struct virtio_device *vdev)
 	else
 		vi->hdr_len = sizeof(struct virtio_net_hdr);
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) ||
+	    virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO))
+		vi->tnl_offset = vi->hdr_len;
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM))
+		vi->rx_tnl_csum = true;
+	if (vi->tnl_offset)
+		vi->hdr_len += sizeof(struct virtio_net_hdr_tunnel);
+#endif
+
 	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
 	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 		vi->any_header_sg = true;
@@ -7191,6 +7239,12 @@ static struct virtio_device_id id_table[] = {
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO,
+	VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
+	VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
+	VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM,
+#endif
 };
 
 static unsigned int features_legacy[] = {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 7/8] tun: enable gso over UDP tunnel support.
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (5 preceding siblings ...)
  2025-05-30 14:49 ` [RFC PATCH v2 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  2025-05-31  6:38   ` Akihiko Odaki
  2025-05-30 14:49 ` [RFC PATCH v2 8/8] vhost/net: " Paolo Abeni
  7 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Add new tun features to represent the newly introduced virtio
GSO over UDP tunnel offload. Allows detection and selection of
such features via the existing TUNSETOFFLOAD ioctl, store the
tunnel offload configuration in the highest bit of the tun flags
and compute the expected virtio header size and tunnel header
offset using such bits, so that we can plug almost seamless the
the newly introduced virtio helpers to serialize the extended
virtio header.

As the tun features and the virtio hdr size are configured
separately, the data path need to cope with (hopefully transient)
inconsistent values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note that this semantically conflicts with the hash report series, one,
the other or both should be adjusted to fit.
---
 drivers/net/tun.c           | 77 ++++++++++++++++++++++++++++++++-----
 drivers/net/tun_vnet.h      | 74 ++++++++++++++++++++++++++++-------
 include/uapi/linux/if_tun.h |  9 +++++
 3 files changed, 137 insertions(+), 23 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1207196cbbed..2977aff5bc46 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -186,7 +186,8 @@ struct tun_struct {
 	struct net_device	*dev;
 	netdev_features_t	set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
-			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
+			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \
+			  NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
 	int			align;
 	int			vnet_hdr_sz;
@@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev)
 	dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
 			   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
 			   NETIF_F_HW_VLAN_STAG_TX;
+	dev->hw_enc_features = dev->hw_features;
 	dev->features = dev->hw_features;
 	dev->vlan_features = dev->features &
 			     ~(NETIF_F_HW_VLAN_CTAG_TX |
@@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	struct sk_buff *skb;
 	size_t total_len = iov_iter_count(from);
 	size_t len = total_len, align = tun->align, linear;
-	struct virtio_net_hdr gso = { 0 };
+	char buf[TUN_VNET_TNL_SIZE];
+	struct virtio_net_hdr *gso;
 	int good_linear;
 	int copylen;
 	int hdr_len = 0;
@@ -1708,6 +1711,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	int skb_xdp = 1;
 	bool frags = tun_napi_frags_enabled(tfile);
 	enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
+	unsigned int flags = tun->flags & ~TUN_VNET_TNL_MASK;
+
+	/*
+	 * Keep it easy and always zero the whole buffer, even if the
+	 * tunnel-related field will be touched only when the feature
+	 * is enabled and the hdr size id compatible.
+	 */
+	memset(buf, 0, sizeof(buf));
+	gso = (void *)buf;
 
 	if (!(tun->flags & IFF_NO_PI)) {
 		if (len < sizeof(pi))
@@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	if (tun->flags & IFF_VNET_HDR) {
 		int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
+		int parsed_size;
 
-		hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
+		if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
+			parsed_size = vnet_hdr_sz;
+		} else {
+			parsed_size = TUN_VNET_TNL_SIZE;
+			flags |= TUN_VNET_TNL_MASK;
+		}
+		hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, parsed_size,
+					     flags, from, gso);
 		if (hdr_len < 0)
 			return hdr_len;
 
@@ -1755,7 +1775,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		 * (e.g gso or jumbo packet), we will do it at after
 		 * skb was created with generic XDP routine.
 		 */
-		skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp);
+		skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp);
 		err = PTR_ERR_OR_ZERO(skb);
 		if (err)
 			goto drop;
@@ -1799,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		}
 	}
 
-	if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) {
+	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		err = -EINVAL;
 		goto free_skb;
@@ -2050,13 +2070,26 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	}
 
 	if (vnet_hdr_sz) {
-		struct virtio_net_hdr gso;
+		char buf[TUN_VNET_TNL_SIZE];
+		struct virtio_net_hdr *gso;
+		int flags = tun->flags;
+		int parsed_size;
+
+		gso = (void *)buf;
+		parsed_size = tun_vnet_parse_size(tun->flags);
+		if (unlikely(vnet_hdr_sz < parsed_size)) {
+			/* Inconsistent hdr size and (tunnel) offloads:
+			 * strips the latter
+			 */
+			flags &= ~TUN_VNET_TNL_MASK;
+			parsed_size = sizeof(struct virtio_net_hdr);
+		};
 
-		ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
+		ret = tun_vnet_hdr_from_skb(flags, tun->dev, skb, gso);
 		if (ret)
 			return ret;
 
-		ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
+		ret = __tun_vnet_hdr_put(vnet_hdr_sz, parsed_size, iter, gso);
 		if (ret)
 			return ret;
 	}
@@ -2366,6 +2399,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	int metasize = 0;
 	int ret = 0;
 	bool skb_xdp = false;
+	unsigned int flags;
 	struct page *page;
 
 	if (unlikely(datasize < ETH_HLEN))
@@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
 	if (metasize > 0)
 		skb_metadata_set(skb, metasize);
 
-	if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
+	/* Assume tun offloads are enabled if the provided hdr is large
+	 * enough.
+	 */
+	if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
+	    xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
+		flags = tun->flags | TUN_VNET_TNL_MASK;
+	else
+		flags = tun->flags & ~TUN_VNET_TNL_MASK;
+
+	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		kfree_skb(skb);
 		ret = -EINVAL;
@@ -2812,6 +2855,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
 
 }
 
+#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6)
+
 /* This is like a cut-down ethtool ops, except done via tun fd so no
  * privs required. */
 static int set_offload(struct tun_struct *tun, unsigned long arg)
@@ -2841,6 +2886,17 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 			features |= NETIF_F_GSO_UDP_L4;
 			arg &= ~(TUN_F_USO4 | TUN_F_USO6);
 		}
+
+		/* Tunnel offload is allowed only if some plain offload is
+		 * available, too.
+		 */
+		if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) {
+			features |= NETIF_F_GSO_UDP_TUNNEL;
+			if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM)
+				features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			arg &= ~(TUN_F_UDP_TUNNEL_GSO |
+				 TUN_F_UDP_TUNNEL_GSO_CSUM);
+		}
 	}
 
 	/* This gives the user a way to test for new features in future by
@@ -2852,7 +2908,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 	tun->dev->wanted_features &= ~TUN_USER_FEATURES;
 	tun->dev->wanted_features |= features;
 	netdev_update_features(tun->dev);
-
+	tun_set_vnet_tnl(&tun->flags, !!(features & NETIF_F_GSO_UDP_TUNNEL),
+			 !!(features & NETIF_F_GSO_UDP_TUNNEL_CSUM));
 	return 0;
 }
 
diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
index 58b9ac7a5fc4..ab2d4396941c 100644
--- a/drivers/net/tun_vnet.h
+++ b/drivers/net/tun_vnet.h
@@ -5,6 +5,12 @@
 /* High bits in flags field are unused. */
 #define TUN_VNET_LE     0x80000000
 #define TUN_VNET_BE     0x40000000
+#define TUN_VNET_TNL		0x20000000
+#define TUN_VNET_TNL_CSUM	0x10000000
+#define TUN_VNET_TNL_MASK	(TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
+
+#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
+			   sizeof(struct virtio_net_hdr_tunnel))
 
 static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
 {
@@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
 	return 0;
 }
 
+static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
+{
+	*flags = (*flags & ~TUN_VNET_TNL_MASK) |
+		 tnl * TUN_VNET_TNL |
+		 tnl_csum * TUN_VNET_TNL_CSUM;
+}
+
 static inline bool tun_vnet_is_little_endian(unsigned int flags)
 {
 	return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags);
@@ -107,16 +120,33 @@ static inline long tun_vnet_ioctl(int *vnet_hdr_sz, unsigned int *flags,
 	}
 }
 
-static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
-				   struct iov_iter *from,
-				   struct virtio_net_hdr *hdr)
+static inline unsigned int tun_vnet_parse_size(unsigned int flags)
+{
+	if (!(flags & TUN_VNET_TNL))
+		return sizeof(struct virtio_net_hdr);
+
+	return TUN_VNET_TNL_SIZE;
+}
+
+static inline unsigned int tun_vnet_tnl_offset(unsigned int flags)
+{
+	if (!(flags & TUN_VNET_TNL))
+		return 0;
+
+	return sizeof(struct virtio_net_hdr_v1);
+}
+
+static inline int __tun_vnet_hdr_get(int sz, int parsed_size,
+				     unsigned int flags,
+				     struct iov_iter *from,
+				     struct virtio_net_hdr *hdr)
 {
 	u16 hdr_len;
 
 	if (iov_iter_count(from) < sz)
 		return -EINVAL;
 
-	if (!copy_from_iter_full(hdr, sizeof(*hdr), from))
+	if (!copy_from_iter_full(hdr, parsed_size, from))
 		return -EFAULT;
 
 	hdr_len = tun_vnet16_to_cpu(flags, hdr->hdr_len);
@@ -129,30 +159,47 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
 	if (hdr_len > iov_iter_count(from))
 		return -EINVAL;
 
-	iov_iter_advance(from, sz - sizeof(*hdr));
+	iov_iter_advance(from, sz - parsed_size);
 
 	return hdr_len;
 }
 
-static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
-				   const struct virtio_net_hdr *hdr)
+static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
+				   struct iov_iter *from,
+				   struct virtio_net_hdr *hdr)
+{
+	return __tun_vnet_hdr_get(sz, sizeof(*hdr), flags, from, hdr);
+}
+
+static inline int __tun_vnet_hdr_put(int sz, int parsed_size,
+				     struct iov_iter *iter,
+				     const struct virtio_net_hdr *hdr)
 {
 	if (unlikely(iov_iter_count(iter) < sz))
 		return -EINVAL;
 
-	if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
+	if (unlikely(copy_to_iter(hdr, parsed_size, iter) != parsed_size))
 		return -EFAULT;
 
-	if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr))
+	if (iov_iter_zero(sz - parsed_size, iter) != sz - parsed_size)
 		return -EFAULT;
 
 	return 0;
 }
 
+static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
+				   const struct virtio_net_hdr *hdr)
+{
+	return __tun_vnet_hdr_put(sz, sizeof(*hdr), iter, hdr);
+}
+
 static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
 				      const struct virtio_net_hdr *hdr)
 {
-	return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags));
+	return virtio_net_hdr_tnl_to_skb(skb, hdr,
+					 tun_vnet_tnl_offset(flags),
+					 !!(flags & TUN_VNET_TNL_CSUM),
+					 tun_vnet_is_little_endian(flags));
 }
 
 static inline int tun_vnet_hdr_from_skb(unsigned int flags,
@@ -161,10 +208,11 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
 					struct virtio_net_hdr *hdr)
 {
 	int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
+	int tnl_offset = tun_vnet_tnl_offset(flags);
 
-	if (virtio_net_hdr_from_skb(skb, hdr,
-				    tun_vnet_is_little_endian(flags), true,
-				    vlan_hlen)) {
+	if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
+					tun_vnet_is_little_endian(flags),
+					vlan_hlen)) {
 		struct skb_shared_info *sinfo = skb_shinfo(skb);
 
 		if (net_ratelimit()) {
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 287cdc81c939..a25a5e7a08ff 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -93,6 +93,15 @@
 #define TUN_F_USO4	0x20	/* I can handle USO for IPv4 packets */
 #define TUN_F_USO6	0x40	/* I can handle USO for IPv6 packets */
 
+#define TUN_F_UDP_TUNNEL_GSO		0x080 /* I can handle TSO/USO for UDP
+					       * tunneled packets
+					       */
+#define TUN_F_UDP_TUNNEL_GSO_CSUM	0x100 /* I can handle TSO/USO for UDP
+					       * tunneled packets requiring
+					       * csum offload for the outer
+					       * header
+					       */
+
 /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
 #define TUN_PKT_STRIP	0x0001
 struct tun_pi {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 8/8] vhost/net: enable gso over UDP tunnel support.
  2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
                   ` (6 preceding siblings ...)
  2025-05-30 14:49 ` [RFC PATCH v2 7/8] tun: " Paolo Abeni
@ 2025-05-30 14:49 ` Paolo Abeni
  7 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2025-05-30 14:49 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich, Akihiko Odaki

Vhost net need to know the exact virtio net hdr size to be able
to copy such header correctly. Teach it about the newly defined
UDP tunnel-related option and update the hdr size computation
accordingly.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/vhost/net.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f53294440695..fa88f021e9b1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -77,7 +77,13 @@ enum {
 			 (1ULL << VIRTIO_F_RING_RESET)
 };
 
+#ifdef VIRTIO_HAS_EXTENDED_FEATURES
+#define VHOST_NET_ALL_FEATURES (VHOST_NET_FEATURES | \
+			(VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO)) | \
+			(VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)))
+#else
 #define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
+#endif
 
 enum {
 	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
@@ -1619,12 +1625,16 @@ static long vhost_net_reset_owner(struct vhost_net *n)
 static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
 {
 	size_t vhost_hlen, sock_hlen, hdr_len;
+	bool has_tunnel;
 	int i;
 
 	hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
 			       (1ULL << VIRTIO_F_VERSION_1))) ?
 			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 			sizeof(struct virtio_net_hdr);
+	has_tunnel = !!(features & (VIRTIO_BIT(VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO) |
+				    VIRTIO_BIT(VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO)));
+	hdr_len += has_tunnel ? sizeof(struct virtio_net_hdr_tunnel) : 0;
 	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
 		/* vhost provides vnet_hdr */
 		vhost_hlen = hdr_len;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/8] virtio: introduce virtio_features_t
  2025-05-30 14:49 ` [RFC PATCH v2 1/8] virtio: introduce virtio_features_t Paolo Abeni
@ 2025-05-30 17:40   ` kernel test robot
  2025-05-31  5:37   ` Akihiko Odaki
  1 sibling, 0 replies; 18+ messages in thread
From: kernel test robot @ 2025-05-30 17:40 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: oe-kbuild-all

Hi Paolo,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on net/main net-next/main linus/master v6.15 next-20250530]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/virtio-introduce-virtio_features_t/20250530-225213
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/5be304a8d0e2fffa6bd13cfa9ff848b2e5842171.1748614223.git.pabeni%40redhat.com
patch subject: [RFC PATCH v2 1/8] virtio: introduce virtio_features_t
config: riscv-randconfig-002-20250530 (https://download.01.org/0day-ci/archive/20250531/202505310153.xoBJS3F5-lkp@intel.com/config)
compiler: riscv32-linux-gcc (GCC) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250531/202505310153.xoBJS3F5-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505310153.xoBJS3F5-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/bits.h:6,
                    from include/linux/bitops.h:6,
                    from include/linux/kernel.h:23,
                    from include/linux/uio.h:8,
                    from include/linux/socket.h:8,
                    from include/uapi/linux/in.h:25,
                    from include/linux/in.h:19,
                    from net/9p/trans_virtio.c:16:
   include/linux/virtio_features.h: In function 'VIRTIO_BIT':
>> include/linux/virtio_features.h:21:33: error: 'b' undeclared (first use in this function); did you mean 'mb'?
      21 |  return bit >= 64 ? 0 : BIT_ULL(b);
         |                                 ^
   include/vdso/bits.h:8:34: note: in definition of macro 'BIT_ULL'
       8 | #define BIT_ULL(nr)  (ULL(1) << (nr))
         |                                  ^~
   include/linux/virtio_features.h:21:33: note: each undeclared identifier is reported only once for each function it appears in
      21 |  return bit >= 64 ? 0 : BIT_ULL(b);
         |                                 ^
   include/vdso/bits.h:8:34: note: in definition of macro 'BIT_ULL'
       8 | #define BIT_ULL(nr)  (ULL(1) << (nr))
         |                                  ^~
   In file included from include/linux/virtio.h:14,
                    from net/9p/trans_virtio.c:34:
>> include/linux/virtio_features.h:22:1: warning: control reaches end of non-void function [-Wreturn-type]
      22 | }
         | ^


vim +21 include/linux/virtio_features.h

    18	
    19	static inline u64 VIRTIO_BIT(int bit)
    20	{
  > 21		return bit >= 64 ? 0 : BIT_ULL(b);
  > 22	}
    23	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/8] virtio: introduce virtio_features_t
  2025-05-30 14:49 ` [RFC PATCH v2 1/8] virtio: introduce virtio_features_t Paolo Abeni
  2025-05-30 17:40   ` kernel test robot
@ 2025-05-31  5:37   ` Akihiko Odaki
  1 sibling, 0 replies; 18+ messages in thread
From: Akihiko Odaki @ 2025-05-31  5:37 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/05/30 23:49, Paolo Abeni wrote:
> The virtio specifications allows for up to 128 bits for the
> device features. Soon we are going to use some of the 'extended'
> bits features (above 64) for the virtio_net driver.
> 
> Introduce an specific type to represent the virtio features bitmask.
> 
> On platform where 128 bits integer are available use such wide int
> for the features bitmask, otherwise maintain the current u64.
> 
> Introduce an extended get_features128() configuration callback that
> devices supporting the extended features range must implement in
> place of the traditional one.

The callback is called get_extended_features() in the code, which makes 
more sense as it is actually 64-bit if CONFIG_ARCH_SUPPORTS_INT128 is 
not defined.

That said, it doesn't seem that the code saved by the use of 128-bit 
integer type is significant enough.

I can think of three strategies to support more features:
1) Converting bitmasks to 128-bit, which is this patch does
2) Convert both "features" and "debugfs_filter_features" of the struct
    into bitmaps provided by include/linux/bitmap.h

2) requires more changes, but perhaps there may be a middle ground.

Looking at details, there are two fields that contain feature bitmasks 
in the common structure, struct virtio_device:
- features
- debugfs_filter_features

Fortunately, "debugfs_filter_features" is only referred by
drivers/virtio/virtio_debug.c. The problem is "features", which is 
referred everywhere. Perhaps converting all of them into bitmaps may 
make sense in a long term, but it may be something you want to avoid 
with this patch series.

So there is the following middle-ground option:

3) Adding e.g., "features_hi" for the upper 64-bit of features
    (which is similar to what I suggested for QEMU*),
    and convert only "debugfs_filter_features" into a bitmap.

This substantially reduces the code change required for "features" and 
make them contained in the following three functions:
- virtio_dev_probe() in drivers/virtio/virtio.c
- features_show() in drivers/virtio/virtio.c
- virtio_debug_device_filter_features() in drivers/virtio/virtio_debug.c

These functions can use bitmaps internally, and convert them from/into 
64-bit integers using bitmap_from_arr64() and bitmap_to_arr64().

Since you are adding the support for extended features to 
virtio_pci_modern and virtio_net, you'll later need to change their 
implementations too, but that won't add complexity much; some complexity 
is inevitable even when choosing 1) and "[RFC PATCH v2 2/8] 
virtio_pci_modern: allow configuring extended features" already include it.

And by choosing 3), you can remove the check of 
CONFIG_ARCH_SUPPORTS_INT128 and avoid the need to test platforms without 
the config. I suspect particularly eliminating the need of test 
outweighs the cost of the additional change required for 3).

Regards,
Akihiko Odaki

* 
https://lore.kernel.org/qemu-devel/473516b5-d52b-4671-aeca-d02ad1940364@daynix.com/

> 
> Note that legacy and transport features don't need any change, as
> they are always in the low 64 bit range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>    - let u64 VIRTIO_BIT() cope with higher bit values
>    - add .get_extended_features instead of changing .get_features signature
> ---
>   drivers/virtio/virtio.c         | 14 +++++++-------
>   drivers/virtio/virtio_debug.c   |  2 +-
>   include/linux/virtio.h          |  5 +++--
>   include/linux/virtio_config.h   | 32 ++++++++++++++++++++++----------
>   include/linux/virtio_features.h | 27 +++++++++++++++++++++++++++
>   5 files changed, 60 insertions(+), 20 deletions(-)
>   create mode 100644 include/linux/virtio_features.h
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 95d5d7993e5b..206ae8fa0654 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -272,22 +272,22 @@ static int virtio_dev_probe(struct device *_d)
>   	int err, i;
>   	struct virtio_device *dev = dev_to_virtio(_d);
>   	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> -	u64 device_features;
> -	u64 driver_features;
> -	u64 driver_features_legacy;
> +	virtio_features_t device_features;
> +	virtio_features_t driver_features;
> +	virtio_features_t driver_features_legacy;
>   
>   	/* We have a driver! */
>   	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>   
>   	/* Figure out what features the device supports. */
> -	device_features = dev->config->get_features(dev);
> +	device_features = virtio_get_features(dev);
>   
>   	/* Figure out what features the driver supports. */
>   	driver_features = 0;
>   	for (i = 0; i < drv->feature_table_size; i++) {
>   		unsigned int f = drv->feature_table[i];
> -		BUG_ON(f >= 64);
> -		driver_features |= (1ULL << f);
> +		BUG_ON(f >= VIRTIO_FEATURES_MAX);
> +		driver_features |= VIRTIO_BIT(f);
>   	}
>   
>   	/* Some drivers have a separate feature table for virtio v1.0 */
> @@ -320,7 +320,7 @@ static int virtio_dev_probe(struct device *_d)
>   		goto err;
>   
>   	if (drv->validate) {
> -		u64 features = dev->features;
> +		virtio_features_t features = dev->features;
>   
>   		err = drv->validate(dev);
>   		if (err)
> diff --git a/drivers/virtio/virtio_debug.c b/drivers/virtio/virtio_debug.c
> index 95c8fc7705bb..5ca95422d3ca 100644
> --- a/drivers/virtio/virtio_debug.c
> +++ b/drivers/virtio/virtio_debug.c
> @@ -12,7 +12,7 @@ static int virtio_debug_device_features_show(struct seq_file *s, void *data)
>   	u64 device_features;
>   	unsigned int i;
>   
> -	device_features = dev->config->get_features(dev);
> +	device_features = virtio_get_features(dev);
>   	for (i = 0; i < BITS_PER_LONG_LONG; i++) {
>   		if (device_features & (1ULL << i))
>   			seq_printf(s, "%u\n", i);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 64cb4b04be7a..6e51400d0463 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -11,6 +11,7 @@
>   #include <linux/gfp.h>
>   #include <linux/dma-mapping.h>
>   #include <linux/completion.h>
> +#include <linux/virtio_features.h>
>   
>   /**
>    * struct virtqueue - a queue to register buffers for sending or receiving.
> @@ -159,11 +160,11 @@ struct virtio_device {
>   	const struct virtio_config_ops *config;
>   	const struct vringh_config_ops *vringh_config;
>   	struct list_head vqs;
> -	u64 features;
> +	virtio_features_t features;
>   	void *priv;
>   #ifdef CONFIG_VIRTIO_DEBUG
>   	struct dentry *debugfs_dir;
> -	u64 debugfs_filter_features;
> +	virtio_features_t debugfs_filter_features;
>   #endif
>   };
>   
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 169c7d367fac..1cc43d9cf6e8 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -77,7 +77,10 @@ struct virtqueue_info {
>    *      vdev: the virtio_device
>    * @get_features: get the array of feature bits for this device.
>    *	vdev: the virtio_device
> - *	Returns the first 64 feature bits (all we currently need).
> + *	Returns the first 64 feature bits.
> + * @get_extended_features:
> + *      vdev: the virtio_device
> + *      Returns the first VIRTIO_FEATURES_MAX feature bits (all we currently need).
>    * @finalize_features: confirm what device features we'll be using.
>    *	vdev: the virtio_device
>    *	This sends the driver feature bits to the device: it can change
> @@ -121,6 +124,7 @@ struct virtio_config_ops {
>   	void (*del_vqs)(struct virtio_device *);
>   	void (*synchronize_cbs)(struct virtio_device *);
>   	u64 (*get_features)(struct virtio_device *vdev);
> +	virtio_features_t (*get_extended_features)(struct virtio_device *vdev);
>   	int (*finalize_features)(struct virtio_device *vdev);
>   	const char *(*bus_name)(struct virtio_device *vdev);
>   	int (*set_vq_affinity)(struct virtqueue *vq,
> @@ -149,11 +153,11 @@ static inline bool __virtio_test_bit(const struct virtio_device *vdev,
>   {
>   	/* Did you forget to fix assumptions on max features? */
>   	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   
> -	return vdev->features & BIT_ULL(fbit);
> +	return vdev->features & VIRTIO_BIT(fbit);
>   }
>   
>   /**
> @@ -166,11 +170,11 @@ static inline void __virtio_set_bit(struct virtio_device *vdev,
>   {
>   	/* Did you forget to fix assumptions on max features? */
>   	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   
> -	vdev->features |= BIT_ULL(fbit);
> +	vdev->features |= VIRTIO_BIT(fbit);
>   }
>   
>   /**
> @@ -183,11 +187,11 @@ static inline void __virtio_clear_bit(struct virtio_device *vdev,
>   {
>   	/* Did you forget to fix assumptions on max features? */
>   	if (__builtin_constant_p(fbit))
> -		BUILD_BUG_ON(fbit >= 64);
> +		BUILD_BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   	else
> -		BUG_ON(fbit >= 64);
> +		BUG_ON(fbit >= VIRTIO_FEATURES_MAX);
>   
> -	vdev->features &= ~BIT_ULL(fbit);
> +	vdev->features &= ~VIRTIO_BIT(fbit);
>   }
>   
>   /**
> @@ -204,6 +208,14 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
>   	return __virtio_test_bit(vdev, fbit);
>   }
>   
> +static inline virtio_features_t virtio_get_features(struct virtio_device *vdev)
> +{
> +	if (vdev->config->get_extended_features)
> +		return vdev->config->get_extended_features(vdev);
> +
> +	return vdev->config->get_features(vdev);
> +}
> +
>   /**
>    * virtio_has_dma_quirk - determine whether this device has the DMA quirk
>    * @vdev: the device
> diff --git a/include/linux/virtio_features.h b/include/linux/virtio_features.h
> new file mode 100644
> index 000000000000..0a7e2265f8b4
> --- /dev/null
> +++ b/include/linux/virtio_features.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_FEATURES_H
> +#define _LINUX_VIRTIO_FEATURES_H
> +
> +#include <linux/bits.h>
> +
> +#if IS_ENABLED(CONFIG_ARCH_SUPPORTS_INT128)
> +#define VIRTIO_HAS_EXTENDED_FEATURES
> +#define VIRTIO_FEATURES_MAX	128
> +#define VIRTIO_FEATURES_WORDS	4
> +#define VIRTIO_BIT(b)		_BIT128(b)
> +
> +typedef __uint128_t virtio_features_t;
> +
> +#else
> +#define VIRTIO_FEATURES_MAX	64
> +#define VIRTIO_FEATURES_WORDS	2
> +
> +static inline u64 VIRTIO_BIT(int bit)
> +{
> +	return bit >= 64 ? 0 : BIT_ULL(b);
> +}
> +
> +typedef u64 virtio_features_t;
> +#endif
> +
> +#endif


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/8] vhost-net: allow configuring extended features
  2025-05-30 14:49 ` [RFC PATCH v2 3/8] vhost-net: " Paolo Abeni
@ 2025-05-31  6:15   ` Akihiko Odaki
  2025-06-03 13:32     ` Paolo Abeni
  0 siblings, 1 reply; 18+ messages in thread
From: Akihiko Odaki @ 2025-05-31  6:15 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/05/30 23:49, Paolo Abeni wrote:
> Use the extended feature type for 'acked_features' and implement
> two new ioctls operation allowing the user-space to set/query an
> unbounded amount of features.
> 
> The actual number of processed features is limited by virtio_features_t
> size, and attempts to set features above such limit fail with
> EOPNOTSUPP.
> 
> Note that the legacy ioctls implicitly truncate the negotiated
> features to the lower 64 bits range.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>    - change the ioctl to use an extensible API
> ---
>   drivers/vhost/net.c              | 61 ++++++++++++++++++++++++++++++--
>   drivers/vhost/vhost.h            |  2 +-
>   include/uapi/linux/vhost.h       |  7 ++++
>   include/uapi/linux/vhost_types.h |  5 +++
>   4 files changed, 71 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3..f53294440695 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -77,6 +77,8 @@ enum {
>   			 (1ULL << VIRTIO_F_RING_RESET)
>   };
>   
> +#define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
> +
>   enum {
>   	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>   };
> @@ -1614,7 +1616,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>   	return err;
>   }
>   
> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>   {
>   	size_t vhost_hlen, sock_hlen, hdr_len;
>   	int i;
> @@ -1685,8 +1687,9 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>   	void __user *argp = (void __user *)arg;
>   	u64 __user *featurep = argp;
>   	struct vhost_vring_file backend;
> -	u64 features;
> -	int r;
> +	virtio_features_t all_features;
> +	u64 features, count;
> +	int r, i;
>   
>   	switch (ioctl) {
>   	case VHOST_NET_SET_BACKEND:
> @@ -1704,6 +1707,58 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>   		if (features & ~VHOST_NET_FEATURES)
>   			return -EOPNOTSUPP;
>   		return vhost_net_set_features(n, features);
> +	case VHOST_GET_FEATURES_ARRAY:
> +	{
> +		if (copy_from_user(&count, argp, sizeof(u64)))
> +			return -EFAULT;
> +
> +		/* Copy the net features, up to the user-provided buffer size */
> +		all_features = VHOST_NET_ALL_FEATURES;
> +		for (i = 0; i < min(VIRTIO_FEATURES_WORDS / 2, count); ++i) {

I think you need to use: array_index_nospec()

> +			argp += sizeof(u64);
> +			features = all_features >> (64 * i);
> +			if (copy_to_user(argp, &features, sizeof(u64)))
> +				return -EFAULT;
> +		}
> +
> +		/* Zero the trailing space provided by user-space, if any */
> +		features = 0;
> +		for (; i < count; ++i) {
> +			argp += sizeof(u64);
> +			if (copy_to_user(argp, &features, sizeof(u64)))
> +				return -EFAULT;

There is clear_user().

> +		}
> +		return 0;
> +	}
> +	case VHOST_SET_FEATURES_ARRAY:
> +	{
> +		if (copy_from_user(&count, argp, sizeof(u64)))
> +			return -EFAULT;
> +
> +		all_features = 0;
> +		for (i = 0; i < min(count, VIRTIO_FEATURES_WORDS / 2); ++i) {
> +			argp += sizeof(u64);
> +			if (copy_from_user(&features, argp, sizeof(u64)))
> +				return -EFAULT;
> +
> +			all_features |= ((virtio_features_t)features) << (64 * i);
> +		}
> +
> +		/* Any feature specified by user-space above VIRTIO_FEATURES_MAX is
> +		 * not supported by definition.
> +		 */
> +		for (; i < count; ++i) {
> +			if (copy_from_user(&features, argp, sizeof(u64)))
> +				return -EFAULT;
> +			if (features)
> +				return -EOPNOTSUPP;
> +		}
> +
> +		if (all_features & ~VHOST_NET_ALL_FEATURES)
> +			return -EOPNOTSUPP;
> +
> +		return vhost_net_set_features(n, all_features);
> +	}
>   	case VHOST_GET_BACKEND_FEATURES:
>   		features = VHOST_NET_BACKEND_FEATURES;
>   		if (copy_to_user(featurep, &features, sizeof(features)))
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index bb75a292d50c..ef1c7fd6f4e1 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -133,7 +133,7 @@ struct vhost_virtqueue {
>   	struct vhost_iotlb *umem;
>   	struct vhost_iotlb *iotlb;
>   	void *private_data;
> -	u64 acked_features;
> +	virtio_features_t acked_features;
>   	u64 acked_backend_features;
>   	/* Log write descriptors */
>   	void __user *log_base;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index d4b3e2ae1314..d6ad01fbb8d2 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -235,4 +235,11 @@
>    */
>   #define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
>   					      struct vhost_vring_state)
> +
> +/* Extended features manipulation */
> +#define VHOST_GET_FEATURES_ARRAY _IOR(VHOST_VIRTIO, 0x83, \
> +				       struct vhost_features_array)
> +#define VHOST_SET_FEATURES_ARRAY _IOW(VHOST_VIRTIO, 0x83, \
> +				       struct vhost_features_array)
> +
>   #endif
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..3f227114c557 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -110,6 +110,11 @@ struct vhost_msg_v2 {
>   	};
>   };
>   
> +struct vhost_features_array {
> +	__u64 count; /* number of entries present in features array */
> +	__u64 features[];


An alternative idea:

#define VHOST_GET_FEATURES_ARRAY(len) _IOC(_IOC_READ, VHOST_VIRTIO,
                                            0x00, (len))

By doing so, the kernel can have share the code for 
VHOST_GET_FEATURES_ARRAY() with VHOST_GET_FEATURES() since 
VHOST_GET_FEATURES() will be just a specialized definition.

It also makes the life of the userspace a bit easier by not making it 
construct struct vhost_features_array.

Looking at include/uapi, it seems there are examples of both your 
pattern and my alternative, so please pick what you prefer.

If you are going to keep struct vhost_features_array, you may want to 
use __counted_by().

> +};
> +
>   struct vhost_memory_region {
>   	__u64 guest_phys_addr;
>   	__u64 memory_size; /* bytes */


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 4/8] virtio_net: add supports for extended offloads
  2025-05-30 14:49 ` [RFC PATCH v2 4/8] virtio_net: add supports for extended offloads Paolo Abeni
@ 2025-05-31  6:18   ` Akihiko Odaki
  0 siblings, 0 replies; 18+ messages in thread
From: Akihiko Odaki @ 2025-05-31  6:18 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/05/30 23:49, Paolo Abeni wrote:
> The virtio_net driver needs it to implement GSO over UDP tunnel
> offload.
> 
> The only missing piece is mapping them to/from the extended
> features.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>   - drop unused macro
>   - restrict the offload remap range as per latest spec update
> ---
>   drivers/net/virtio_net.c | 26 ++++++++++++++++++++++++--
>   1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e53ba600605a..ec638b4aa1c1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -35,6 +35,24 @@ module_param(csum, bool, 0444);
>   module_param(gso, bool, 0444);
>   module_param(napi_tx, bool, 0644);
>   
> +#define VIRTIO_OFFLOAD_MAP_MIN	46
> +#define VIRTIO_OFFLOAD_MAP_MAX	47
> +#define VIRTIO_FEATURES_MAP_MIN	65
> +#define VIRTIO_O2F_DELTA	(VIRTIO_FEATURES_MAP_MIN - VIRTIO_OFFLOAD_MAP_MIN)
> +
> +static bool virtio_is_mapped_offload(unsigned int obit)
> +{
> +	return obit >= VIRTIO_OFFLOAD_MAP_MIN &&
> +	       obit <= VIRTIO_OFFLOAD_MAP_MAX;
> +}
> +
> +#define VIRTIO_OFFLOAD_TO_FEATURE(obit)	\
> +	({								\
> +		unsigned int __o = obit;				\
> +		virtio_is_mapped_offload(__o) ? __o + VIRTIO_O2F_DELTA :\
> +						__o;			\
> +	})

I wonder why this is a macro while virtio_is_mapped_offload() is a function.

> +
 >   /* FIXME: MTU in config. */>   #define GOOD_PACKET_LEN (ETH_HLEN + 
VLAN_HLEN + ETH_DATA_LEN)
>   #define GOOD_COPY_LEN	128
> @@ -7037,9 +7055,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>   		netif_carrier_on(dev);
>   	}
>   
> -	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
> -		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
> +	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++) {
> +		unsigned int fbit;
> +
> +		fbit = VIRTIO_OFFLOAD_TO_FEATURE(guest_offloads[i]);
> +		if (virtio_has_feature(vi->vdev, fbit))
>   			set_bit(guest_offloads[i], &vi->guest_offloads);
> +	}
>   	vi->guest_offloads_capable = vi->guest_offloads;
>   
>   	rtnl_unlock();


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 7/8] tun: enable gso over UDP tunnel support.
  2025-05-30 14:49 ` [RFC PATCH v2 7/8] tun: " Paolo Abeni
@ 2025-05-31  6:38   ` Akihiko Odaki
  0 siblings, 0 replies; 18+ messages in thread
From: Akihiko Odaki @ 2025-05-31  6:38 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/05/30 23:49, Paolo Abeni wrote:
> Add new tun features to represent the newly introduced virtio
> GSO over UDP tunnel offload. Allows detection and selection of
> such features via the existing TUNSETOFFLOAD ioctl, store the
> tunnel offload configuration in the highest bit of the tun flags
> and compute the expected virtio header size and tunnel header
> offset using such bits, so that we can plug almost seamless the
> the newly introduced virtio helpers to serialize the extended
> virtio header.
> 
> As the tun features and the virtio hdr size are configured
> separately, the data path need to cope with (hopefully transient)
> inconsistent values.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note that this semantically conflicts with the hash report series, one,
> the other or both should be adjusted to fit.
> ---
>   drivers/net/tun.c           | 77 ++++++++++++++++++++++++++++++++-----
>   drivers/net/tun_vnet.h      | 74 ++++++++++++++++++++++++++++-------
>   include/uapi/linux/if_tun.h |  9 +++++
>   3 files changed, 137 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 1207196cbbed..2977aff5bc46 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -186,7 +186,8 @@ struct tun_struct {
>   	struct net_device	*dev;
>   	netdev_features_t	set_features;
>   #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> -			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
> +			  NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \
> +			  NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM)
>   
>   	int			align;
>   	int			vnet_hdr_sz;
> @@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev)
>   	dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>   			   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
>   			   NETIF_F_HW_VLAN_STAG_TX;
> +	dev->hw_enc_features = dev->hw_features;
>   	dev->features = dev->hw_features;
>   	dev->vlan_features = dev->features &
>   			     ~(NETIF_F_HW_VLAN_CTAG_TX |
> @@ -1698,7 +1700,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   	struct sk_buff *skb;
>   	size_t total_len = iov_iter_count(from);
>   	size_t len = total_len, align = tun->align, linear;
> -	struct virtio_net_hdr gso = { 0 };
> +	char buf[TUN_VNET_TNL_SIZE];
> +	struct virtio_net_hdr *gso;
>   	int good_linear;
>   	int copylen;
>   	int hdr_len = 0;
> @@ -1708,6 +1711,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   	int skb_xdp = 1;
>   	bool frags = tun_napi_frags_enabled(tfile);
>   	enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
> +	unsigned int flags = tun->flags & ~TUN_VNET_TNL_MASK;
> +
> +	/*
> +	 * Keep it easy and always zero the whole buffer, even if the
> +	 * tunnel-related field will be touched only when the feature
> +	 * is enabled and the hdr size id compatible.
> +	 */
> +	memset(buf, 0, sizeof(buf));
> +	gso = (void *)buf;

buf may be unaligned.

>   
>   	if (!(tun->flags & IFF_NO_PI)) {
>   		if (len < sizeof(pi))
> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   
>   	if (tun->flags & IFF_VNET_HDR) {
>   		int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
> +		int parsed_size;
>   
> -		hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
> +		if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
> +			parsed_size = vnet_hdr_sz;
> +		} else {
> +			parsed_size = TUN_VNET_TNL_SIZE;
> +			flags |= TUN_VNET_TNL_MASK;
> +		}
> +		hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, parsed_size,
> +					     flags, from, gso);
>   		if (hdr_len < 0)
>   			return hdr_len;
>   
> @@ -1755,7 +1775,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   		 * (e.g gso or jumbo packet), we will do it at after
>   		 * skb was created with generic XDP routine.
>   		 */
> -		skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp);
> +		skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp);
>   		err = PTR_ERR_OR_ZERO(skb);
>   		if (err)
>   			goto drop;
> @@ -1799,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   		}
>   	}
>   
> -	if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) {
> +	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
>   		atomic_long_inc(&tun->rx_frame_errors);
>   		err = -EINVAL;
>   		goto free_skb;
> @@ -2050,13 +2070,26 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>   	}
>   
>   	if (vnet_hdr_sz) {
> -		struct virtio_net_hdr gso;
> +		char buf[TUN_VNET_TNL_SIZE];
> +		struct virtio_net_hdr *gso;
> +		int flags = tun->flags;
> +		int parsed_size;
> +
> +		gso = (void *)buf;
> +		parsed_size = tun_vnet_parse_size(tun->flags);
> +		if (unlikely(vnet_hdr_sz < parsed_size)) {
> +			/* Inconsistent hdr size and (tunnel) offloads:
> +			 * strips the latter
> +			 */
> +			flags &= ~TUN_VNET_TNL_MASK;
> +			parsed_size = sizeof(struct virtio_net_hdr);
> +		};
>   
> -		ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
> +		ret = tun_vnet_hdr_from_skb(flags, tun->dev, skb, gso);
>   		if (ret)
>   			return ret;
>   
> -		ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
> +		ret = __tun_vnet_hdr_put(vnet_hdr_sz, parsed_size, iter, gso);
>   		if (ret)
>   			return ret;
>   	}
> @@ -2366,6 +2399,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>   	int metasize = 0;
>   	int ret = 0;
>   	bool skb_xdp = false;
> +	unsigned int flags;
>   	struct page *page;
>   
>   	if (unlikely(datasize < ETH_HLEN))
> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
>   	if (metasize > 0)
>   		skb_metadata_set(skb, metasize);
>   
> -	if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
> +	/* Assume tun offloads are enabled if the provided hdr is large
> +	 * enough.
> +	 */
> +	if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
> +	    xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
> +		flags = tun->flags | TUN_VNET_TNL_MASK;
> +	else
> +		flags = tun->flags & ~TUN_VNET_TNL_MASK;
> +
> +	if (tun_vnet_hdr_to_skb(flags, skb, gso)) {
>   		atomic_long_inc(&tun->rx_frame_errors);
>   		kfree_skb(skb);
>   		ret = -EINVAL;
> @@ -2812,6 +2855,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
>   
>   }
>   
> +#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6)
> +
>   /* This is like a cut-down ethtool ops, except done via tun fd so no
>    * privs required. */
>   static int set_offload(struct tun_struct *tun, unsigned long arg)
> @@ -2841,6 +2886,17 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>   			features |= NETIF_F_GSO_UDP_L4;
>   			arg &= ~(TUN_F_USO4 | TUN_F_USO6);
>   		}
> +
> +		/* Tunnel offload is allowed only if some plain offload is
> +		 * available, too.
> +		 */
> +		if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) {
> +			features |= NETIF_F_GSO_UDP_TUNNEL;
> +			if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM)
> +				features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
> +			arg &= ~(TUN_F_UDP_TUNNEL_GSO |
> +				 TUN_F_UDP_TUNNEL_GSO_CSUM);
> +		}
>   	}
>   
>   	/* This gives the user a way to test for new features in future by
> @@ -2852,7 +2908,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>   	tun->dev->wanted_features &= ~TUN_USER_FEATURES;
>   	tun->dev->wanted_features |= features;
>   	netdev_update_features(tun->dev);
> -
> +	tun_set_vnet_tnl(&tun->flags, !!(features & NETIF_F_GSO_UDP_TUNNEL),
> +			 !!(features & NETIF_F_GSO_UDP_TUNNEL_CSUM));
>   	return 0;
>   }
>   
> diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
> index 58b9ac7a5fc4..ab2d4396941c 100644
> --- a/drivers/net/tun_vnet.h
> +++ b/drivers/net/tun_vnet.h
> @@ -5,6 +5,12 @@
>   /* High bits in flags field are unused. */
>   #define TUN_VNET_LE     0x80000000
>   #define TUN_VNET_BE     0x40000000
> +#define TUN_VNET_TNL		0x20000000
> +#define TUN_VNET_TNL_CSUM	0x10000000
> +#define TUN_VNET_TNL_MASK	(TUN_VNET_TNL | TUN_VNET_TNL_CSUM)
> +
> +#define TUN_VNET_TNL_SIZE (sizeof(struct virtio_net_hdr_v1) + \
> +			   sizeof(struct virtio_net_hdr_tunnel))
>   
>   static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags)
>   {
> @@ -45,6 +51,13 @@ static inline long tun_set_vnet_be(unsigned int *flags, int __user *argp)
>   	return 0;
>   }
>   
> +static inline void tun_set_vnet_tnl(unsigned int *flags, bool tnl, bool tnl_csum)
> +{
> +	*flags = (*flags & ~TUN_VNET_TNL_MASK) |
> +		 tnl * TUN_VNET_TNL |
> +		 tnl_csum * TUN_VNET_TNL_CSUM;
> +}
> +
>   static inline bool tun_vnet_is_little_endian(unsigned int flags)
>   {
>   	return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags);
> @@ -107,16 +120,33 @@ static inline long tun_vnet_ioctl(int *vnet_hdr_sz, unsigned int *flags,
>   	}
>   }
>   
> -static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
> -				   struct iov_iter *from,
> -				   struct virtio_net_hdr *hdr)
> +static inline unsigned int tun_vnet_parse_size(unsigned int flags)
> +{
> +	if (!(flags & TUN_VNET_TNL))
> +		return sizeof(struct virtio_net_hdr);
> +
> +	return TUN_VNET_TNL_SIZE;
> +}
> +
> +static inline unsigned int tun_vnet_tnl_offset(unsigned int flags)
> +{
> +	if (!(flags & TUN_VNET_TNL))
> +		return 0;
> +
> +	return sizeof(struct virtio_net_hdr_v1);
> +}
> +
> +static inline int __tun_vnet_hdr_get(int sz, int parsed_size,
> +				     unsigned int flags,
> +				     struct iov_iter *from,
> +				     struct virtio_net_hdr *hdr)
>   {
>   	u16 hdr_len;
>   
>   	if (iov_iter_count(from) < sz)
>   		return -EINVAL;
>   
> -	if (!copy_from_iter_full(hdr, sizeof(*hdr), from))
> +	if (!copy_from_iter_full(hdr, parsed_size, from))
>   		return -EFAULT;
>   
>   	hdr_len = tun_vnet16_to_cpu(flags, hdr->hdr_len);
> @@ -129,30 +159,47 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
>   	if (hdr_len > iov_iter_count(from))
>   		return -EINVAL;
>   
> -	iov_iter_advance(from, sz - sizeof(*hdr));
> +	iov_iter_advance(from, sz - parsed_size);
>   
>   	return hdr_len;
>   }
>   
> -static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
> -				   const struct virtio_net_hdr *hdr)
> +static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
> +				   struct iov_iter *from,
> +				   struct virtio_net_hdr *hdr)
> +{
> +	return __tun_vnet_hdr_get(sz, sizeof(*hdr), flags, from, hdr);
> +}
> +
> +static inline int __tun_vnet_hdr_put(int sz, int parsed_size,
> +				     struct iov_iter *iter,
> +				     const struct virtio_net_hdr *hdr)
>   {
>   	if (unlikely(iov_iter_count(iter) < sz))
>   		return -EINVAL;
>   
> -	if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
> +	if (unlikely(copy_to_iter(hdr, parsed_size, iter) != parsed_size))
>   		return -EFAULT;
>   
> -	if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr))
> +	if (iov_iter_zero(sz - parsed_size, iter) != sz - parsed_size)
>   		return -EFAULT;
>   
>   	return 0;
>   }
>   
> +static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
> +				   const struct virtio_net_hdr *hdr)
> +{
> +	return __tun_vnet_hdr_put(sz, sizeof(*hdr), iter, hdr);
> +}
> +
>   static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
>   				      const struct virtio_net_hdr *hdr)
>   {
> -	return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags));
> +	return virtio_net_hdr_tnl_to_skb(skb, hdr,
> +					 tun_vnet_tnl_offset(flags),
> +					 !!(flags & TUN_VNET_TNL_CSUM),
> +					 tun_vnet_is_little_endian(flags));
>   }
>   
>   static inline int tun_vnet_hdr_from_skb(unsigned int flags,
> @@ -161,10 +208,11 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
>   					struct virtio_net_hdr *hdr)
>   {
>   	int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
> +	int tnl_offset = tun_vnet_tnl_offset(flags);
>   
> -	if (virtio_net_hdr_from_skb(skb, hdr,
> -				    tun_vnet_is_little_endian(flags), true,
> -				    vlan_hlen)) {
> +	if (virtio_net_hdr_tnl_from_skb(skb, hdr, tnl_offset,
> +					tun_vnet_is_little_endian(flags),
> +					vlan_hlen)) {
>   		struct skb_shared_info *sinfo = skb_shinfo(skb);
>   
>   		if (net_ratelimit()) {
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index 287cdc81c939..a25a5e7a08ff 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -93,6 +93,15 @@
>   #define TUN_F_USO4	0x20	/* I can handle USO for IPv4 packets */
>   #define TUN_F_USO6	0x40	/* I can handle USO for IPv6 packets */
>   
> +#define TUN_F_UDP_TUNNEL_GSO		0x080 /* I can handle TSO/USO for UDP
> +					       * tunneled packets
> +					       */
> +#define TUN_F_UDP_TUNNEL_GSO_CSUM	0x100 /* I can handle TSO/USO for UDP
> +					       * tunneled packets requiring
> +					       * csum offload for the outer
> +					       * header
> +					       */


Documentation/process/coding-style.rst says multiline comments starts a 
line only with "/*". I also feel a bit difficult to read the comments 
with multiple short lines. So I sugguest:

/*
  * I can handle TSO/USO for UDP tunneled packets requiring csum offload
  * for the outer header
  */
#define TUN_F_UDP_TUNNEL_GSO_CSUM	0x080


> +
>   /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
>   #define TUN_PKT_STRIP	0x0001
>   struct tun_pi {


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/8] vhost-net: allow configuring extended features
  2025-05-31  6:15   ` Akihiko Odaki
@ 2025-06-03 13:32     ` Paolo Abeni
  2025-06-06  9:57       ` Akihiko Odaki
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2025-06-03 13:32 UTC (permalink / raw)
  To: Akihiko Odaki, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 5/31/25 8:15 AM, Akihiko Odaki wrote:
> On 2025/05/30 23:49, Paolo Abeni wrote:
>> Use the extended feature type for 'acked_features' and implement
>> two new ioctls operation allowing the user-space to set/query an
>> unbounded amount of features.
>>
>> The actual number of processed features is limited by virtio_features_t
>> size, and attempts to set features above such limit fail with
>> EOPNOTSUPP.
>>
>> Note that the legacy ioctls implicitly truncate the negotiated
>> features to the lower 64 bits range.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> v1 -> v2:
>>    - change the ioctl to use an extensible API
>> ---
>>   drivers/vhost/net.c              | 61 ++++++++++++++++++++++++++++++--
>>   drivers/vhost/vhost.h            |  2 +-
>>   include/uapi/linux/vhost.h       |  7 ++++
>>   include/uapi/linux/vhost_types.h |  5 +++
>>   4 files changed, 71 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 7cbfc7d718b3..f53294440695 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -77,6 +77,8 @@ enum {
>>   			 (1ULL << VIRTIO_F_RING_RESET)
>>   };
>>   
>> +#define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
>> +
>>   enum {
>>   	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>>   };
>> @@ -1614,7 +1616,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>>   	return err;
>>   }
>>   
>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>>   {
>>   	size_t vhost_hlen, sock_hlen, hdr_len;
>>   	int i;
>> @@ -1685,8 +1687,9 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>   	void __user *argp = (void __user *)arg;
>>   	u64 __user *featurep = argp;
>>   	struct vhost_vring_file backend;
>> -	u64 features;
>> -	int r;
>> +	virtio_features_t all_features;
>> +	u64 features, count;
>> +	int r, i;
>>   
>>   	switch (ioctl) {
>>   	case VHOST_NET_SET_BACKEND:
>> @@ -1704,6 +1707,58 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>   		if (features & ~VHOST_NET_FEATURES)
>>   			return -EOPNOTSUPP;
>>   		return vhost_net_set_features(n, features);
>> +	case VHOST_GET_FEATURES_ARRAY:
>> +	{
>> +		if (copy_from_user(&count, argp, sizeof(u64)))
>> +			return -EFAULT;
>> +
>> +		/* Copy the net features, up to the user-provided buffer size */
>> +		all_features = VHOST_NET_ALL_FEATURES;
>> +		for (i = 0; i < min(VIRTIO_FEATURES_WORDS / 2, count); ++i) {
> 
> I think you need to use: array_index_nospec()

Do you mean like:
			i = array_index_nospec(i, min(VIRTIO_FEATURES_WORDS / 2, count));

?

Note that even if the cpu would speculative execute the loop for too
high 'i' values, it will could only read `all_features`, which
user-space can access freely.

>> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
>> index d7656908f730..3f227114c557 100644
>> --- a/include/uapi/linux/vhost_types.h
>> +++ b/include/uapi/linux/vhost_types.h
>> @@ -110,6 +110,11 @@ struct vhost_msg_v2 {
>>   	};
>>   };
>>   
>> +struct vhost_features_array {
>> +	__u64 count; /* number of entries present in features array */
>> +	__u64 features[];
> 
> 
> An alternative idea:
> 
> #define VHOST_GET_FEATURES_ARRAY(len) _IOC(_IOC_READ, VHOST_VIRTIO,
>                                             0x00, (len))
> 
> By doing so, the kernel can have share the code for 
> VHOST_GET_FEATURES_ARRAY() with VHOST_GET_FEATURES() since 
> VHOST_GET_FEATURES() will be just a specialized definition.
> 
> It also makes the life of the userspace a bit easier by not making it 
> construct struct vhost_features_array.
> 
> Looking at include/uapi, it seems there are examples of both your 
> pattern and my alternative, so please pick what you prefer.

I'm ok either way, but I don't see big win code-wise. The user-space
side saving will be literally a one liner. In the kernel the get/set
sockopt could be consolidated, but there will be a slightly increase in
complexity, to extract the ioctl len from the ioctl op value itself.

/P


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/8] vhost-net: allow configuring extended features
  2025-06-03 13:32     ` Paolo Abeni
@ 2025-06-06  9:57       ` Akihiko Odaki
  2025-06-06 11:52         ` Paolo Abeni
  0 siblings, 1 reply; 18+ messages in thread
From: Akihiko Odaki @ 2025-06-06  9:57 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/06/03 22:32, Paolo Abeni wrote:
> On 5/31/25 8:15 AM, Akihiko Odaki wrote:
>> On 2025/05/30 23:49, Paolo Abeni wrote:
>>> Use the extended feature type for 'acked_features' and implement
>>> two new ioctls operation allowing the user-space to set/query an
>>> unbounded amount of features.
>>>
>>> The actual number of processed features is limited by virtio_features_t
>>> size, and attempts to set features above such limit fail with
>>> EOPNOTSUPP.
>>>
>>> Note that the legacy ioctls implicitly truncate the negotiated
>>> features to the lower 64 bits range.
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> v1 -> v2:
>>>     - change the ioctl to use an extensible API
>>> ---
>>>    drivers/vhost/net.c              | 61 ++++++++++++++++++++++++++++++--
>>>    drivers/vhost/vhost.h            |  2 +-
>>>    include/uapi/linux/vhost.h       |  7 ++++
>>>    include/uapi/linux/vhost_types.h |  5 +++
>>>    4 files changed, 71 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index 7cbfc7d718b3..f53294440695 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -77,6 +77,8 @@ enum {
>>>    			 (1ULL << VIRTIO_F_RING_RESET)
>>>    };
>>>    
>>> +#define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
>>> +
>>>    enum {
>>>    	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>>>    };
>>> @@ -1614,7 +1616,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>>>    	return err;
>>>    }
>>>    
>>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
>>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>>>    {
>>>    	size_t vhost_hlen, sock_hlen, hdr_len;
>>>    	int i;
>>> @@ -1685,8 +1687,9 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>    	void __user *argp = (void __user *)arg;
>>>    	u64 __user *featurep = argp;
>>>    	struct vhost_vring_file backend;
>>> -	u64 features;
>>> -	int r;
>>> +	virtio_features_t all_features;
>>> +	u64 features, count;
>>> +	int r, i;
>>>    
>>>    	switch (ioctl) {
>>>    	case VHOST_NET_SET_BACKEND:
>>> @@ -1704,6 +1707,58 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>    		if (features & ~VHOST_NET_FEATURES)
>>>    			return -EOPNOTSUPP;
>>>    		return vhost_net_set_features(n, features);
>>> +	case VHOST_GET_FEATURES_ARRAY:
>>> +	{
>>> +		if (copy_from_user(&count, argp, sizeof(u64)))
>>> +			return -EFAULT;
>>> +
>>> +		/* Copy the net features, up to the user-provided buffer size */
>>> +		all_features = VHOST_NET_ALL_FEATURES;
>>> +		for (i = 0; i < min(VIRTIO_FEATURES_WORDS / 2, count); ++i) {
>>
>> I think you need to use: array_index_nospec()
> 
> Do you mean like:
> 			i = array_index_nospec(i, min(VIRTIO_FEATURES_WORDS / 2, count));
> 
> ?
> 
> Note that even if the cpu would speculative execute the loop for too
> high 'i' values, it will could only read `all_features`, which
> user-space can access freely.

I was wrong; I forgot you used a 128-bit integer instead of an array.

> 
>>> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
>>> index d7656908f730..3f227114c557 100644
>>> --- a/include/uapi/linux/vhost_types.h
>>> +++ b/include/uapi/linux/vhost_types.h
>>> @@ -110,6 +110,11 @@ struct vhost_msg_v2 {
>>>    	};
>>>    };
>>>    
>>> +struct vhost_features_array {
>>> +	__u64 count; /* number of entries present in features array */
>>> +	__u64 features[];
>>
>>
>> An alternative idea:
>>
>> #define VHOST_GET_FEATURES_ARRAY(len) _IOC(_IOC_READ, VHOST_VIRTIO,
>>                                              0x00, (len))
>>
>> By doing so, the kernel can have share the code for
>> VHOST_GET_FEATURES_ARRAY() with VHOST_GET_FEATURES() since
>> VHOST_GET_FEATURES() will be just a specialized definition.
>>
>> It also makes the life of the userspace a bit easier by not making it
>> construct struct vhost_features_array.
>>
>> Looking at include/uapi, it seems there are examples of both your
>> pattern and my alternative, so please pick what you prefer.
> 
> I'm ok either way, but I don't see big win code-wise. The user-space
> side saving will be literally a one liner. In the kernel the get/set
> sockopt could be consolidated, but there will be a slightly increase in
> complexity, to extract the ioctl len from the ioctl op value itself.

The current patch also requires copy_from_user() to get the count, so I 
don't think they are different in that sense.

The difference will be marginal anyway, and it may turn out encoding the 
length in the ioctl number requires a bit more code.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/8] vhost-net: allow configuring extended features
  2025-06-06  9:57       ` Akihiko Odaki
@ 2025-06-06 11:52         ` Paolo Abeni
  2025-06-08  5:28           ` Akihiko Odaki
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2025-06-06 11:52 UTC (permalink / raw)
  To: Akihiko Odaki, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 6/6/25 11:57 AM, Akihiko Odaki wrote:
> On 2025/06/03 22:32, Paolo Abeni wrote:
>> On 5/31/25 8:15 AM, Akihiko Odaki wrote:
>>> On 2025/05/30 23:49, Paolo Abeni wrote:
>>>> Use the extended feature type for 'acked_features' and implement
>>>> two new ioctls operation allowing the user-space to set/query an
>>>> unbounded amount of features.
>>>>
>>>> The actual number of processed features is limited by virtio_features_t
>>>> size, and attempts to set features above such limit fail with
>>>> EOPNOTSUPP.
>>>>
>>>> Note that the legacy ioctls implicitly truncate the negotiated
>>>> features to the lower 64 bits range.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>> v1 -> v2:
>>>>     - change the ioctl to use an extensible API
>>>> ---
>>>>    drivers/vhost/net.c              | 61 ++++++++++++++++++++++++++++++--
>>>>    drivers/vhost/vhost.h            |  2 +-
>>>>    include/uapi/linux/vhost.h       |  7 ++++
>>>>    include/uapi/linux/vhost_types.h |  5 +++
>>>>    4 files changed, 71 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> index 7cbfc7d718b3..f53294440695 100644
>>>> --- a/drivers/vhost/net.c
>>>> +++ b/drivers/vhost/net.c
>>>> @@ -77,6 +77,8 @@ enum {
>>>>    			 (1ULL << VIRTIO_F_RING_RESET)
>>>>    };
>>>>    
>>>> +#define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
>>>> +
>>>>    enum {
>>>>    	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>>>>    };
>>>> @@ -1614,7 +1616,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>>>>    	return err;
>>>>    }
>>>>    
>>>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
>>>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>>>>    {
>>>>    	size_t vhost_hlen, sock_hlen, hdr_len;
>>>>    	int i;
>>>> @@ -1685,8 +1687,9 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>>    	void __user *argp = (void __user *)arg;
>>>>    	u64 __user *featurep = argp;
>>>>    	struct vhost_vring_file backend;
>>>> -	u64 features;
>>>> -	int r;
>>>> +	virtio_features_t all_features;
>>>> +	u64 features, count;
>>>> +	int r, i;
>>>>    
>>>>    	switch (ioctl) {
>>>>    	case VHOST_NET_SET_BACKEND:
>>>> @@ -1704,6 +1707,58 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>>    		if (features & ~VHOST_NET_FEATURES)
>>>>    			return -EOPNOTSUPP;
>>>>    		return vhost_net_set_features(n, features);
>>>> +	case VHOST_GET_FEATURES_ARRAY:
>>>> +	{
>>>> +		if (copy_from_user(&count, argp, sizeof(u64)))
>>>> +			return -EFAULT;
>>>> +
>>>> +		/* Copy the net features, up to the user-provided buffer size */
>>>> +		all_features = VHOST_NET_ALL_FEATURES;
>>>> +		for (i = 0; i < min(VIRTIO_FEATURES_WORDS / 2, count); ++i) {
>>>
>>> I think you need to use: array_index_nospec()
>>
>> Do you mean like:
>> 			i = array_index_nospec(i, min(VIRTIO_FEATURES_WORDS / 2, count));
>>
>> ?
>>
>> Note that even if the cpu would speculative execute the loop for too
>> high 'i' values, it will could only read `all_features`, which
>> user-space can access freely.
> 
> I was wrong; I forgot you used a 128-bit integer instead of an array.
> 
>>
>>>> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
>>>> index d7656908f730..3f227114c557 100644
>>>> --- a/include/uapi/linux/vhost_types.h
>>>> +++ b/include/uapi/linux/vhost_types.h
>>>> @@ -110,6 +110,11 @@ struct vhost_msg_v2 {
>>>>    	};
>>>>    };
>>>>    
>>>> +struct vhost_features_array {
>>>> +	__u64 count; /* number of entries present in features array */
>>>> +	__u64 features[];
>>>
>>>
>>> An alternative idea:
>>>
>>> #define VHOST_GET_FEATURES_ARRAY(len) _IOC(_IOC_READ, VHOST_VIRTIO,
>>>                                              0x00, (len))
>>>
>>> By doing so, the kernel can have share the code for
>>> VHOST_GET_FEATURES_ARRAY() with VHOST_GET_FEATURES() since
>>> VHOST_GET_FEATURES() will be just a specialized definition.
>>>
>>> It also makes the life of the userspace a bit easier by not making it
>>> construct struct vhost_features_array.
>>>
>>> Looking at include/uapi, it seems there are examples of both your
>>> pattern and my alternative, so please pick what you prefer.
>>
>> I'm ok either way, but I don't see big win code-wise. The user-space
>> side saving will be literally a one liner. In the kernel the get/set
>> sockopt could be consolidated, but there will be a slightly increase in
>> complexity, to extract the ioctl len from the ioctl op value itself.
> 
> The current patch also requires copy_from_user() to get the count, so I 
> don't think they are different in that sense.
> 
> The difference will be marginal anyway, and it may turn out encoding the 
> length in the ioctl number requires a bit more code.

I'm sorry, almost mid-air collision. I just send out the rfc v3, and I
read your reply here only afterwards.

I stuck to separate ioctls operations; as an additional reason for that,
I understand there is interest in extending the features space even
more, and let user-space/kernel with different features space limits
easily interact.

I think that with a single ioctl either the kernel or the user-space
should be update to handle explicitly every additional features space
expansion, while the API proposed here no additional changes should be
required.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/8] vhost-net: allow configuring extended features
  2025-06-06 11:52         ` Paolo Abeni
@ 2025-06-08  5:28           ` Akihiko Odaki
  0 siblings, 0 replies; 18+ messages in thread
From: Akihiko Odaki @ 2025-06-08  5:28 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Michael S. Tsirkin, Xuan Zhuo,
	Eugenio Pérez, Yuri Benditovich

On 2025/06/06 20:52, Paolo Abeni wrote:
> On 6/6/25 11:57 AM, Akihiko Odaki wrote:
>> On 2025/06/03 22:32, Paolo Abeni wrote:
>>> On 5/31/25 8:15 AM, Akihiko Odaki wrote:
>>>> On 2025/05/30 23:49, Paolo Abeni wrote:
>>>>> Use the extended feature type for 'acked_features' and implement
>>>>> two new ioctls operation allowing the user-space to set/query an
>>>>> unbounded amount of features.
>>>>>
>>>>> The actual number of processed features is limited by virtio_features_t
>>>>> size, and attempts to set features above such limit fail with
>>>>> EOPNOTSUPP.
>>>>>
>>>>> Note that the legacy ioctls implicitly truncate the negotiated
>>>>> features to the lower 64 bits range.
>>>>>
>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>> ---
>>>>> v1 -> v2:
>>>>>      - change the ioctl to use an extensible API
>>>>> ---
>>>>>     drivers/vhost/net.c              | 61 ++++++++++++++++++++++++++++++--
>>>>>     drivers/vhost/vhost.h            |  2 +-
>>>>>     include/uapi/linux/vhost.h       |  7 ++++
>>>>>     include/uapi/linux/vhost_types.h |  5 +++
>>>>>     4 files changed, 71 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>>> index 7cbfc7d718b3..f53294440695 100644
>>>>> --- a/drivers/vhost/net.c
>>>>> +++ b/drivers/vhost/net.c
>>>>> @@ -77,6 +77,8 @@ enum {
>>>>>     			 (1ULL << VIRTIO_F_RING_RESET)
>>>>>     };
>>>>>     
>>>>> +#define VHOST_NET_ALL_FEATURES VHOST_NET_FEATURES
>>>>> +
>>>>>     enum {
>>>>>     	VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
>>>>>     };
>>>>> @@ -1614,7 +1616,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
>>>>>     	return err;
>>>>>     }
>>>>>     
>>>>> -static int vhost_net_set_features(struct vhost_net *n, u64 features)
>>>>> +static int vhost_net_set_features(struct vhost_net *n, virtio_features_t features)
>>>>>     {
>>>>>     	size_t vhost_hlen, sock_hlen, hdr_len;
>>>>>     	int i;
>>>>> @@ -1685,8 +1687,9 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>>>     	void __user *argp = (void __user *)arg;
>>>>>     	u64 __user *featurep = argp;
>>>>>     	struct vhost_vring_file backend;
>>>>> -	u64 features;
>>>>> -	int r;
>>>>> +	virtio_features_t all_features;
>>>>> +	u64 features, count;
>>>>> +	int r, i;
>>>>>     
>>>>>     	switch (ioctl) {
>>>>>     	case VHOST_NET_SET_BACKEND:
>>>>> @@ -1704,6 +1707,58 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>>>>>     		if (features & ~VHOST_NET_FEATURES)
>>>>>     			return -EOPNOTSUPP;
>>>>>     		return vhost_net_set_features(n, features);
>>>>> +	case VHOST_GET_FEATURES_ARRAY:
>>>>> +	{
>>>>> +		if (copy_from_user(&count, argp, sizeof(u64)))
>>>>> +			return -EFAULT;
>>>>> +
>>>>> +		/* Copy the net features, up to the user-provided buffer size */
>>>>> +		all_features = VHOST_NET_ALL_FEATURES;
>>>>> +		for (i = 0; i < min(VIRTIO_FEATURES_WORDS / 2, count); ++i) {
>>>>
>>>> I think you need to use: array_index_nospec()
>>>
>>> Do you mean like:
>>> 			i = array_index_nospec(i, min(VIRTIO_FEATURES_WORDS / 2, count));
>>>
>>> ?
>>>
>>> Note that even if the cpu would speculative execute the loop for too
>>> high 'i' values, it will could only read `all_features`, which
>>> user-space can access freely.
>>
>> I was wrong; I forgot you used a 128-bit integer instead of an array.
>>
>>>
>>>>> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
>>>>> index d7656908f730..3f227114c557 100644
>>>>> --- a/include/uapi/linux/vhost_types.h
>>>>> +++ b/include/uapi/linux/vhost_types.h
>>>>> @@ -110,6 +110,11 @@ struct vhost_msg_v2 {
>>>>>     	};
>>>>>     };
>>>>>     
>>>>> +struct vhost_features_array {
>>>>> +	__u64 count; /* number of entries present in features array */
>>>>> +	__u64 features[];
>>>>
>>>>
>>>> An alternative idea:
>>>>
>>>> #define VHOST_GET_FEATURES_ARRAY(len) _IOC(_IOC_READ, VHOST_VIRTIO,
>>>>                                               0x00, (len))
>>>>
>>>> By doing so, the kernel can have share the code for
>>>> VHOST_GET_FEATURES_ARRAY() with VHOST_GET_FEATURES() since
>>>> VHOST_GET_FEATURES() will be just a specialized definition.
>>>>
>>>> It also makes the life of the userspace a bit easier by not making it
>>>> construct struct vhost_features_array.
>>>>
>>>> Looking at include/uapi, it seems there are examples of both your
>>>> pattern and my alternative, so please pick what you prefer.
>>>
>>> I'm ok either way, but I don't see big win code-wise. The user-space
>>> side saving will be literally a one liner. In the kernel the get/set
>>> sockopt could be consolidated, but there will be a slightly increase in
>>> complexity, to extract the ioctl len from the ioctl op value itself.
>>
>> The current patch also requires copy_from_user() to get the count, so I
>> don't think they are different in that sense.
>>
>> The difference will be marginal anyway, and it may turn out encoding the
>> length in the ioctl number requires a bit more code.
> 
> I'm sorry, almost mid-air collision. I just send out the rfc v3, and I
> read your reply here only afterwards.
> 
> I stuck to separate ioctls operations; as an additional reason for that,
> I understand there is interest in extending the features space even
> more, and let user-space/kernel with different features space limits
> easily interact.
> 
> I think that with a single ioctl either the kernel or the user-space
> should be update to handle explicitly every additional features space
> expansion, while the API proposed here no additional changes should be
> required.

It is not a problem with the VHOST_GET_FEATURES_ARRAY() macro I 
suggested. It takes the size of array as a parameter, enabling it to 
grow without updating the ioctl definition.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-08  5:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 14:49 [RFC PATCH v2 0/8] virtio: introduce GSO over UDP tunnel Paolo Abeni
2025-05-30 14:49 ` [RFC PATCH v2 1/8] virtio: introduce virtio_features_t Paolo Abeni
2025-05-30 17:40   ` kernel test robot
2025-05-31  5:37   ` Akihiko Odaki
2025-05-30 14:49 ` [RFC PATCH v2 2/8] virtio_pci_modern: allow configuring extended features Paolo Abeni
2025-05-30 14:49 ` [RFC PATCH v2 3/8] vhost-net: " Paolo Abeni
2025-05-31  6:15   ` Akihiko Odaki
2025-06-03 13:32     ` Paolo Abeni
2025-06-06  9:57       ` Akihiko Odaki
2025-06-06 11:52         ` Paolo Abeni
2025-06-08  5:28           ` Akihiko Odaki
2025-05-30 14:49 ` [RFC PATCH v2 4/8] virtio_net: add supports for extended offloads Paolo Abeni
2025-05-31  6:18   ` Akihiko Odaki
2025-05-30 14:49 ` [RFC PATCH v2 5/8] net: implement virtio helpers to handle UDP GSO tunneling Paolo Abeni
2025-05-30 14:49 ` [RFC PATCH v2 6/8] virtio_net: enable gso over UDP tunnel support Paolo Abeni
2025-05-30 14:49 ` [RFC PATCH v2 7/8] tun: " Paolo Abeni
2025-05-31  6:38   ` Akihiko Odaki
2025-05-30 14:49 ` [RFC PATCH v2 8/8] vhost/net: " Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.