Netdev List
 help / color / mirror / Atom feed
* [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE
@ 2026-07-04 11:20 Simon Schippers
  2026-07-04 11:58 ` Brett Sheffield
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Simon Schippers @ 2026-07-04 11:20 UTC (permalink / raw)
  To: Willem de Bruijn, Jason Wang, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Michael S . Tsirkin, netdev
  Cc: Simon Horman, Jonathan Corbet, Shuah Khan, Andrew Lunn,
	Tim Gebauer, Brett Sheffield, linux-doc, linux-kernel,
	Simon Schippers

Commit 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop
when a qdisc is present") did not show a relevant performance regression
in my testing but on Brett Sheffield's librecast testbed it shows a
significant performance drop. The regression can be pinpointed when
multiple iperf3 TCP threads are sending. For 8 threads the performance
dropped from 13.5 Gbit/s to 9.13 Gbit/s. This is the reason why this
patch makes the qdisc backpressure behavior opt-in.

One option to accomplish the opt-in would be to set the default qdisc to
noqueue at init. However this may also break userspace as users might
have chosen a custom qdisc even though most of the qdiscs did nothing
for tun/tap in the past due to missing backpressure...

This is the reason why in this patch, the flag IFF_BACKPRESSURE is
introduced instead which is required to enable the backpressure logic.
This means the stopping logic in tun_net_xmit() and the waking logic in
__tun_wake_queue() are skipped if the flag is disabled.

In tun_set_iff(), netif_tx_wake_all_queues() is replaced with looping
over all tfiles in which the netdev queues are woken and cons_cnt is
reset while the consumer_lock and producer_lock are held. This is to
ensure that tun_net_xmit() can not stop the queue concurrently, avoiding
a possible stall.

The documentation in tuntap.rst is updated accordingly.

Fixes: 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present")
Reported-by: Brett Sheffield <brett@librecast.net>
Closes: https://lore.kernel.org/netdev/akVnoOYQOrt8k-Gu@karahi.librecast.net/T/#u
Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
---
 Documentation/networking/tuntap.rst | 17 +++++++++++++++++
 drivers/net/tun.c                   | 29 +++++++++++++++++++++++------
 include/uapi/linux/if_tun.h         |  1 +
 tools/include/uapi/linux/if_tun.h   |  1 +
 4 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/tuntap.rst b/Documentation/networking/tuntap.rst
index 4d7087f727be..599264825dd2 100644
--- a/Documentation/networking/tuntap.rst
+++ b/Documentation/networking/tuntap.rst
@@ -206,6 +206,23 @@ enable is true we enable it, otherwise we disable it::
       return ioctl(fd, TUNSETQUEUE, (void *)&ifr);
   }
 
+3.4 qdisc backpressure
+----------------------
+
+Starting with Linux 7.2, IFF_BACKPRESSURE can be set to enable qdisc
+backpressure. Without it, TX drops occur when the internal ring buffer is
+full. With it, the kernel stops the TX queue instead, letting the qdisc
+hold packets. Drops only occur as a rare race. This can benefit protocols
+like TCP that react to drops. Backpressure requires a qdisc to be
+attached and has no effect with noqueue.
+
+The TUN/TAP ring buffer size can be reduced alongside this flag to
+further shift buffering into the qdisc and reduce bufferbloat, but comes
+at possible performance cost.
+
+When running multiple network streams in parallel, the flag may reduce
+performance due to the extra overhead of the backpressure mechanism.
+
 Universal TUN/TAP device driver Frequently Asked Question
 =========================================================
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ffbe6f13fb1f..3bf8a73a0816 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -98,7 +98,8 @@ static void tun_default_link_ksettings(struct net_device *dev,
 #define TUN_FASYNC	IFF_ATTACH_QUEUE
 
 #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
-		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
+		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS | \
+		      IFF_BACKPRESSURE)
 
 #define GOODCOPY_LEN 128
 
@@ -1077,7 +1078,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	spin_lock(&tfile->tx_ring.producer_lock);
 	ret = __ptr_ring_produce(&tfile->tx_ring, skb);
-	if (!qdisc_txq_has_no_queue(queue) &&
+	if ((tun->flags & IFF_BACKPRESSURE) &&
+	    !qdisc_txq_has_no_queue(queue) &&
 	    __ptr_ring_check_produce(&tfile->tx_ring) == -ENOSPC) {
 		netif_tx_stop_queue(queue);
 		/* Paired with smp_mb() in __tun_wake_queue() */
@@ -2151,8 +2153,12 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 static void __tun_wake_queue(struct tun_struct *tun,
 			     struct tun_file *tfile, int consumed)
 {
-	struct netdev_queue *txq = netdev_get_tx_queue(tun->dev,
-						tfile->queue_index);
+	struct netdev_queue *txq;
+
+	if (!(tun->flags & IFF_BACKPRESSURE))
+		return;
+
+	txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
 
 	/* Paired with smp_mb__after_atomic() in tun_net_xmit() */
 	smp_mb();
@@ -2893,8 +2899,19 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	/* Make sure persistent devices do not get stuck in
 	 * xoff state.
 	 */
-	if (netif_running(tun->dev))
-		netif_tx_wake_all_queues(tun->dev);
+	if (netif_running(tun->dev)) {
+		for (int i = 0; i < tun->numqueues; i++) {
+			struct tun_file *i_tfile;
+
+			i_tfile = rtnl_dereference(tun->tfiles[i]);
+			spin_lock_bh(&i_tfile->tx_ring.consumer_lock);
+			spin_lock(&i_tfile->tx_ring.producer_lock);
+			netif_wake_subqueue(tun->dev, i_tfile->queue_index);
+			i_tfile->cons_cnt = 0;
+			spin_unlock(&i_tfile->tx_ring.producer_lock);
+			spin_unlock_bh(&i_tfile->tx_ring.consumer_lock);
+		}
+	}
 
 	strscpy(ifr->ifr_name, tun->dev->name);
 	return 0;
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 79d53c7a1ebd..73a77141315c 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -69,6 +69,7 @@
 #define IFF_NAPI_FRAGS	0x0020
 /* Used in TUNSETIFF to bring up tun/tap without carrier */
 #define IFF_NO_CARRIER	0x0040
+#define IFF_BACKPRESSURE	0x0080
 #define IFF_NO_PI	0x1000
 /* This flag has no real effect */
 #define IFF_ONE_QUEUE	0x2000
diff --git a/tools/include/uapi/linux/if_tun.h b/tools/include/uapi/linux/if_tun.h
index 2ec07de1d73b..97b670f5bc0a 100644
--- a/tools/include/uapi/linux/if_tun.h
+++ b/tools/include/uapi/linux/if_tun.h
@@ -67,6 +67,7 @@
 #define IFF_TAP		0x0002
 #define IFF_NAPI	0x0010
 #define IFF_NAPI_FRAGS	0x0020
+#define IFF_BACKPRESSURE	0x0080
 #define IFF_NO_PI	0x1000
 /* This flag has no real effect */
 #define IFF_ONE_QUEUE	0x2000
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE
  2026-07-04 11:20 [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE Simon Schippers
@ 2026-07-04 11:58 ` Brett Sheffield
  2026-07-04 12:28 ` Michael S. Tsirkin
  2026-07-04 12:52 ` Michael S. Tsirkin
  2 siblings, 0 replies; 4+ messages in thread
From: Brett Sheffield @ 2026-07-04 11:58 UTC (permalink / raw)
  To: Simon Schippers
  Cc: Willem de Bruijn, Jason Wang, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Michael S . Tsirkin, netdev,
	Simon Horman, Jonathan Corbet, Shuah Khan, Andrew Lunn,
	Tim Gebauer, linux-doc, linux-kernel

On 2026-07-04 13:20, Simon Schippers wrote:
> Commit 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop
> when a qdisc is present") did not show a relevant performance regression
> in my testing but on Brett Sheffield's librecast testbed it shows a
> significant performance drop. The regression can be pinpointed when
> multiple iperf3 TCP threads are sending. For 8 threads the performance
> dropped from 13.5 Gbit/s to 9.13 Gbit/s. This is the reason why this
> patch makes the qdisc backpressure behavior opt-in.

Thanks Simon.

I've tested and confirmed this fixes the original failing IPv6 multicast (UDP)
test which flagged the problem.

Tested-by: Brett A C Sheffield <bacs@librecast.net>

Cheers,


Brett
-- 
Brett Sheffield (he/him)
Librecast - Decentralising the Internet with Multicast
https://librecast.net/
https://blog.brettsheffield.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE
  2026-07-04 11:20 [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE Simon Schippers
  2026-07-04 11:58 ` Brett Sheffield
@ 2026-07-04 12:28 ` Michael S. Tsirkin
  2026-07-04 12:52 ` Michael S. Tsirkin
  2 siblings, 0 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2026-07-04 12:28 UTC (permalink / raw)
  To: Simon Schippers
  Cc: Willem de Bruijn, Jason Wang, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, Simon Horman,
	Jonathan Corbet, Shuah Khan, Andrew Lunn, Tim Gebauer,
	Brett Sheffield, linux-doc, linux-kernel

On Sat, Jul 04, 2026 at 01:20:58PM +0200, Simon Schippers wrote:
> Commit 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop
> when a qdisc is present") did not show a relevant performance regression
> in my testing but on Brett Sheffield's librecast testbed it shows a
> significant performance drop. The regression can be pinpointed when
> multiple iperf3 TCP threads are sending. For 8 threads the performance
> dropped from 13.5 Gbit/s to 9.13 Gbit/s. This is the reason why this
> patch makes the qdisc backpressure behavior opt-in.
> 
> One option to accomplish the opt-in would be to set the default qdisc to
> noqueue at init. However this may also break userspace as users might
> have chosen a custom qdisc even though most of the qdiscs did nothing
> for tun/tap in the past due to missing backpressure...
> 
> This is the reason why in this patch, the flag IFF_BACKPRESSURE is
> introduced instead which is required to enable the backpressure logic.
> This means the stopping logic in tun_net_xmit() and the waking logic in
> __tun_wake_queue() are skipped if the flag is disabled.
> 
> In tun_set_iff(), netif_tx_wake_all_queues() is replaced with looping
> over all tfiles in which the netdev queues are woken and cons_cnt is
> reset while the consumer_lock and producer_lock are held. This is to
> ensure that tun_net_xmit() can not stop the queue concurrently, avoiding
> a possible stall.
> 
> The documentation in tuntap.rst is updated accordingly.
> 
> Fixes: 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present")
> Reported-by: Brett Sheffield <brett@librecast.net>
> Closes: https://lore.kernel.org/netdev/akVnoOYQOrt8k-Gu@karahi.librecast.net/T/#u
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>

I don't object to this approach. At the same time - a new UAPI outside
the merge window? Is this acceptable to net maintainers?


> ---
>  Documentation/networking/tuntap.rst | 17 +++++++++++++++++
>  drivers/net/tun.c                   | 29 +++++++++++++++++++++++------
>  include/uapi/linux/if_tun.h         |  1 +
>  tools/include/uapi/linux/if_tun.h   |  1 +
>  4 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/networking/tuntap.rst b/Documentation/networking/tuntap.rst
> index 4d7087f727be..599264825dd2 100644
> --- a/Documentation/networking/tuntap.rst
> +++ b/Documentation/networking/tuntap.rst
> @@ -206,6 +206,23 @@ enable is true we enable it, otherwise we disable it::
>        return ioctl(fd, TUNSETQUEUE, (void *)&ifr);
>    }
>  
> +3.4 qdisc backpressure
> +----------------------
> +
> +Starting with Linux 7.2, IFF_BACKPRESSURE can be set to enable qdisc
> +backpressure. Without it, TX drops occur when the internal ring buffer is
> +full. With it, the kernel stops the TX queue instead, letting the qdisc
> +hold packets. Drops only occur as a rare race. This can benefit protocols
> +like TCP that react to drops. Backpressure requires a qdisc to be
> +attached and has no effect with noqueue.
> +
> +The TUN/TAP ring buffer size can be reduced alongside this flag to
> +further shift buffering into the qdisc and reduce bufferbloat, but comes
> +at possible performance cost.
> +
> +When running multiple network streams in parallel, the flag may reduce
> +performance due to the extra overhead of the backpressure mechanism.
> +
>  Universal TUN/TAP device driver Frequently Asked Question
>  =========================================================
>  
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ffbe6f13fb1f..3bf8a73a0816 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -98,7 +98,8 @@ static void tun_default_link_ksettings(struct net_device *dev,
>  #define TUN_FASYNC	IFF_ATTACH_QUEUE
>  
>  #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
> -		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
> +		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS | \
> +		      IFF_BACKPRESSURE)
>  
>  #define GOODCOPY_LEN 128
>  
> @@ -1077,7 +1078,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	spin_lock(&tfile->tx_ring.producer_lock);
>  	ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> -	if (!qdisc_txq_has_no_queue(queue) &&
> +	if ((tun->flags & IFF_BACKPRESSURE) &&
> +	    !qdisc_txq_has_no_queue(queue) &&
>  	    __ptr_ring_check_produce(&tfile->tx_ring) == -ENOSPC) {
>  		netif_tx_stop_queue(queue);
>  		/* Paired with smp_mb() in __tun_wake_queue() */
> @@ -2151,8 +2153,12 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>  static void __tun_wake_queue(struct tun_struct *tun,
>  			     struct tun_file *tfile, int consumed)
>  {
> -	struct netdev_queue *txq = netdev_get_tx_queue(tun->dev,
> -						tfile->queue_index);
> +	struct netdev_queue *txq;
> +
> +	if (!(tun->flags & IFF_BACKPRESSURE))
> +		return;
> +
> +	txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
>  
>  	/* Paired with smp_mb__after_atomic() in tun_net_xmit() */
>  	smp_mb();
> @@ -2893,8 +2899,19 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  	/* Make sure persistent devices do not get stuck in
>  	 * xoff state.
>  	 */
> -	if (netif_running(tun->dev))
> -		netif_tx_wake_all_queues(tun->dev);
> +	if (netif_running(tun->dev)) {
> +		for (int i = 0; i < tun->numqueues; i++) {
> +			struct tun_file *i_tfile;
> +
> +			i_tfile = rtnl_dereference(tun->tfiles[i]);
> +			spin_lock_bh(&i_tfile->tx_ring.consumer_lock);
> +			spin_lock(&i_tfile->tx_ring.producer_lock);
> +			netif_wake_subqueue(tun->dev, i_tfile->queue_index);
> +			i_tfile->cons_cnt = 0;
> +			spin_unlock(&i_tfile->tx_ring.producer_lock);
> +			spin_unlock_bh(&i_tfile->tx_ring.consumer_lock);
> +		}
> +	}
>  
>  	strscpy(ifr->ifr_name, tun->dev->name);
>  	return 0;
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index 79d53c7a1ebd..73a77141315c 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -69,6 +69,7 @@
>  #define IFF_NAPI_FRAGS	0x0020
>  /* Used in TUNSETIFF to bring up tun/tap without carrier */
>  #define IFF_NO_CARRIER	0x0040
> +#define IFF_BACKPRESSURE	0x0080
>  #define IFF_NO_PI	0x1000
>  /* This flag has no real effect */
>  #define IFF_ONE_QUEUE	0x2000
> diff --git a/tools/include/uapi/linux/if_tun.h b/tools/include/uapi/linux/if_tun.h
> index 2ec07de1d73b..97b670f5bc0a 100644
> --- a/tools/include/uapi/linux/if_tun.h
> +++ b/tools/include/uapi/linux/if_tun.h
> @@ -67,6 +67,7 @@
>  #define IFF_TAP		0x0002
>  #define IFF_NAPI	0x0010
>  #define IFF_NAPI_FRAGS	0x0020
> +#define IFF_BACKPRESSURE	0x0080
>  #define IFF_NO_PI	0x1000
>  /* This flag has no real effect */
>  #define IFF_ONE_QUEUE	0x2000
> -- 
> 2.43.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE
  2026-07-04 11:20 [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE Simon Schippers
  2026-07-04 11:58 ` Brett Sheffield
  2026-07-04 12:28 ` Michael S. Tsirkin
@ 2026-07-04 12:52 ` Michael S. Tsirkin
  2 siblings, 0 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2026-07-04 12:52 UTC (permalink / raw)
  To: Simon Schippers
  Cc: Willem de Bruijn, Jason Wang, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, Simon Horman,
	Jonathan Corbet, Shuah Khan, Andrew Lunn, Tim Gebauer,
	Brett Sheffield, linux-doc, linux-kernel

On Sat, Jul 04, 2026 at 01:20:58PM +0200, Simon Schippers wrote:
> Commit 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop
> when a qdisc is present") did not show a relevant performance regression
> in my testing but on Brett Sheffield's librecast testbed it shows a
> significant performance drop. The regression can be pinpointed when
> multiple iperf3 TCP threads are sending. For 8 threads the performance
> dropped from 13.5 Gbit/s to 9.13 Gbit/s. This is the reason why this
> patch makes the qdisc backpressure behavior opt-in.
> 
> One option to accomplish the opt-in would be to set the default qdisc to
> noqueue at init. However this may also break userspace as users might
> have chosen a custom qdisc even though most of the qdiscs did nothing
> for tun/tap in the past due to missing backpressure...
> 
> This is the reason why in this patch, the flag IFF_BACKPRESSURE is
> introduced instead which is required to enable the backpressure logic.
> This means the stopping logic in tun_net_xmit() and the waking logic in
> __tun_wake_queue() are skipped if the flag is disabled.
> 
> In tun_set_iff(), netif_tx_wake_all_queues() is replaced with looping
> over all tfiles in which the netdev queues are woken and cons_cnt is
> reset while the consumer_lock and producer_lock are held. This is to
> ensure that tun_net_xmit() can not stop the queue concurrently, avoiding
> a possible stall.
> 
> The documentation in tuntap.rst is updated accordingly.
> 
> Fixes: 1d6e569b7d0c ("tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present")
> Reported-by: Brett Sheffield <brett@librecast.net>
> Closes: https://lore.kernel.org/netdev/akVnoOYQOrt8k-Gu@karahi.librecast.net/T/#u
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>

the patch itself makes sense

Acked-by: Michael S. Tsirkin <mst@redhat.com>

The issue is it would ideally be in next, but we need it now
to fix the regression introduced by 1d6e569b7d0c.

> ---
>  Documentation/networking/tuntap.rst | 17 +++++++++++++++++
>  drivers/net/tun.c                   | 29 +++++++++++++++++++++++------
>  include/uapi/linux/if_tun.h         |  1 +
>  tools/include/uapi/linux/if_tun.h   |  1 +
>  4 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/networking/tuntap.rst b/Documentation/networking/tuntap.rst
> index 4d7087f727be..599264825dd2 100644
> --- a/Documentation/networking/tuntap.rst
> +++ b/Documentation/networking/tuntap.rst
> @@ -206,6 +206,23 @@ enable is true we enable it, otherwise we disable it::
>        return ioctl(fd, TUNSETQUEUE, (void *)&ifr);
>    }
>  
> +3.4 qdisc backpressure
> +----------------------
> +
> +Starting with Linux 7.2, IFF_BACKPRESSURE can be set to enable qdisc
> +backpressure. Without it, TX drops occur when the internal ring buffer is
> +full. With it, the kernel stops the TX queue instead, letting the qdisc
> +hold packets. Drops only occur as a rare race. This can benefit protocols
> +like TCP that react to drops. Backpressure requires a qdisc to be
> +attached and has no effect with noqueue.
> +
> +The TUN/TAP ring buffer size can be reduced alongside this flag to
> +further shift buffering into the qdisc and reduce bufferbloat, but comes
> +at possible performance cost.
> +
> +When running multiple network streams in parallel, the flag may reduce
> +performance due to the extra overhead of the backpressure mechanism.
> +
>  Universal TUN/TAP device driver Frequently Asked Question
>  =========================================================
>  
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ffbe6f13fb1f..3bf8a73a0816 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -98,7 +98,8 @@ static void tun_default_link_ksettings(struct net_device *dev,
>  #define TUN_FASYNC	IFF_ATTACH_QUEUE
>  
>  #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
> -		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
> +		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS | \
> +		      IFF_BACKPRESSURE)
>  
>  #define GOODCOPY_LEN 128
>  
> @@ -1077,7 +1078,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	spin_lock(&tfile->tx_ring.producer_lock);
>  	ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> -	if (!qdisc_txq_has_no_queue(queue) &&
> +	if ((tun->flags & IFF_BACKPRESSURE) &&
> +	    !qdisc_txq_has_no_queue(queue) &&
>  	    __ptr_ring_check_produce(&tfile->tx_ring) == -ENOSPC) {
>  		netif_tx_stop_queue(queue);
>  		/* Paired with smp_mb() in __tun_wake_queue() */
> @@ -2151,8 +2153,12 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>  static void __tun_wake_queue(struct tun_struct *tun,
>  			     struct tun_file *tfile, int consumed)
>  {
> -	struct netdev_queue *txq = netdev_get_tx_queue(tun->dev,
> -						tfile->queue_index);
> +	struct netdev_queue *txq;
> +
> +	if (!(tun->flags & IFF_BACKPRESSURE))
> +		return;
> +
> +	txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
>  
>  	/* Paired with smp_mb__after_atomic() in tun_net_xmit() */
>  	smp_mb();
> @@ -2893,8 +2899,19 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  	/* Make sure persistent devices do not get stuck in
>  	 * xoff state.
>  	 */
> -	if (netif_running(tun->dev))
> -		netif_tx_wake_all_queues(tun->dev);
> +	if (netif_running(tun->dev)) {
> +		for (int i = 0; i < tun->numqueues; i++) {
> +			struct tun_file *i_tfile;
> +
> +			i_tfile = rtnl_dereference(tun->tfiles[i]);
> +			spin_lock_bh(&i_tfile->tx_ring.consumer_lock);
> +			spin_lock(&i_tfile->tx_ring.producer_lock);
> +			netif_wake_subqueue(tun->dev, i_tfile->queue_index);
> +			i_tfile->cons_cnt = 0;
> +			spin_unlock(&i_tfile->tx_ring.producer_lock);
> +			spin_unlock_bh(&i_tfile->tx_ring.consumer_lock);
> +		}
> +	}
>  
>  	strscpy(ifr->ifr_name, tun->dev->name);
>  	return 0;
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index 79d53c7a1ebd..73a77141315c 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -69,6 +69,7 @@
>  #define IFF_NAPI_FRAGS	0x0020
>  /* Used in TUNSETIFF to bring up tun/tap without carrier */
>  #define IFF_NO_CARRIER	0x0040
> +#define IFF_BACKPRESSURE	0x0080
>  #define IFF_NO_PI	0x1000
>  /* This flag has no real effect */
>  #define IFF_ONE_QUEUE	0x2000
> diff --git a/tools/include/uapi/linux/if_tun.h b/tools/include/uapi/linux/if_tun.h
> index 2ec07de1d73b..97b670f5bc0a 100644
> --- a/tools/include/uapi/linux/if_tun.h
> +++ b/tools/include/uapi/linux/if_tun.h
> @@ -67,6 +67,7 @@
>  #define IFF_TAP		0x0002
>  #define IFF_NAPI	0x0010
>  #define IFF_NAPI_FRAGS	0x0020
> +#define IFF_BACKPRESSURE	0x0080
>  #define IFF_NO_PI	0x1000
>  /* This flag has no real effect */
>  #define IFF_ONE_QUEUE	0x2000
> -- 
> 2.43.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-07-04 12:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-04 11:20 [PATCH net] tun/tap & vhost-net: make qdisc backpressure opt-in via IFF_BACKPRESSURE Simon Schippers
2026-07-04 11:58 ` Brett Sheffield
2026-07-04 12:28 ` Michael S. Tsirkin
2026-07-04 12:52 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox