* [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
@ 2025-08-11 22:03 Simon Schippers
2025-08-12 3:10 ` Jason Wang
2025-08-13 15:01 ` Stephen Hemminger
0 siblings, 2 replies; 10+ messages in thread
From: Simon Schippers @ 2025-08-11 22:03 UTC (permalink / raw)
To: willemdebruijn.kernel, jasowang, netdev, linux-kernel
Cc: Simon Schippers, Tim Gebauer
This patch is the result of our paper with the title "The NODROP Patch:
Hardening Secure Networking for Real-time Teleoperation by Preventing
Packet Drops in the Linux TUN Driver" [1].
It deals with the tun_net_xmit function which drops SKB's with the reason
SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
resulting in reduced TCP performance and packet loss for bursty video
streams when used over VPN's.
The abstract reads as follows:
"Throughput-critical teleoperation requires robust and low-latency
communication to ensure safety and performance. Often, these kinds of
applications are implemented in Linux-based operating systems and transmit
over virtual private networks, which ensure encryption and ease of use by
providing a dedicated tunneling interface (TUN) to user space
applications. In this work, we identified a specific behavior in the Linux
TUN driver, which results in significant performance degradation due to
the sender stack silently dropping packets. This design issue drastically
impacts real-time video streaming, inducing up to 29 % packet loss with
noticeable video artifacts when the internal queue of the TUN driver is
reduced to 25 packets to minimize latency. Furthermore, a small queue
length also drastically reduces the throughput of TCP traffic due to many
retransmissions. Instead, with our open-source NODROP Patch, we propose
generating backpressure in case of burst traffic or network congestion.
The patch effectively addresses the packet-dropping behavior, hardening
real-time video streaming and improving TCP throughput by 36 % in high
latency scenarios."
In addition to the mentioned performance and latency improvements for VPN
applications, this patch also allows the proper usage of qdisc's. For
example a fq_codel can not control the queuing delay when packets are
already dropped in the TUN driver. This issue is also described in [2].
The performance evaluation of the paper (see Fig. 4) showed a 4%
performance hit for a single queue TUN with the default TUN queue size of
500 packets. However it is important to notice that with the proposed
patch no packet drop ever occurred even with a TUN queue size of 1 packet.
The utilized validation pipeline is available under [3].
As the reduction of the TUN queue to a size of down to 5 packets showed no
further performance hit in the paper, a reduction of the default TUN queue
size might be desirable accompanying this patch. A reduction would
obviously reduce buffer bloat and memory requirements.
Implementation details:
- The netdev queue start/stop flow control is utilized.
- Compatible with multi-queue by only stopping/waking the specific
netdevice subqueue.
- No additional locking is used.
In the tun_net_xmit function:
- Stopping the subqueue is done when the tx_ring gets full after inserting
the SKB into the tx_ring.
- In the unlikely case when the insertion with ptr_ring_produce fails, the
old dropping behavior is used for this SKB.
In the tun_ring_recv function:
- Waking the subqueue is done after consuming a SKB from the tx_ring when
the tx_ring is empty. Waking the subqueue when the tx_ring has any
available space, so when it is not full, showed crashes in our testing. We
are open to suggestions.
- When the tx_ring is configured to be small (for example to hold 1 SKB),
queuing might be stopped in the tun_net_xmit function while at the same
time, ptr_ring_consume is not able to grab a SKB. This prevents
tun_net_xmit from being called again and causes tun_ring_recv to wait
indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
queue is woken in the wait queue if it has stopped.
- Because the tun_struct is required to get the tx_queue into the new txq
pointer, the tun_struct is passed in tun_do_read aswell. This is likely
faster then trying to get it via the tun_file tfile because it utilizes a
rcu lock.
We are open to suggestions regarding the implementation :)
Thank you for your work!
[1] Link:
https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
[2] Link:
https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
[3] Link: https://github.com/tudo-cni/nodrop
Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
---
V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed
unnecessary netif_tx_wake_queue in tun_ring_recv.
drivers/net/tun.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cc6c50180663..81abdd3f9aca 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1060,13 +1060,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
nf_reset_ct(skb);
- if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+ queue = netdev_get_tx_queue(dev, txq);
+ if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) {
+ netif_tx_stop_queue(queue);
drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
}
+ if (ptr_ring_full(&tfile->tx_ring))
+ netif_tx_stop_queue(queue);
/* dev->lltx requires to do our own update of trans_start */
- queue = netdev_get_tx_queue(dev, txq);
txq_trans_cond_update(queue);
/* Notify and wake up reader process */
@@ -2110,9 +2113,10 @@ static ssize_t tun_put_user(struct tun_struct *tun,
return total;
}
-static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
+static void *tun_ring_recv(struct tun_struct *tun, struct tun_file *tfile, int noblock, int *err)
{
DECLARE_WAITQUEUE(wait, current);
+ struct netdev_queue *txq;
void *ptr = NULL;
int error = 0;
@@ -2124,6 +2128,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
goto out;
}
+ txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
add_wait_queue(&tfile->socket.wq.wait, &wait);
while (1) {
@@ -2131,6 +2136,10 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
ptr = ptr_ring_consume(&tfile->tx_ring);
if (ptr)
break;
+
+ if (unlikely(netif_tx_queue_stopped(txq)))
+ netif_tx_wake_queue(txq);
+
if (signal_pending(current)) {
error = -ERESTARTSYS;
break;
@@ -2147,6 +2156,10 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
remove_wait_queue(&tfile->socket.wq.wait, &wait);
out:
+ if (ptr_ring_empty(&tfile->tx_ring)) {
+ txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
+ netif_tx_wake_queue(txq);
+ }
*err = error;
return ptr;
}
@@ -2165,7 +2178,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
if (!ptr) {
/* Read frames from ring */
- ptr = tun_ring_recv(tfile, noblock, &err);
+ ptr = tun_ring_recv(tun, tfile, noblock, &err);
if (!ptr)
return err;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-11 22:03 [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops Simon Schippers
@ 2025-08-12 3:10 ` Jason Wang
2025-08-13 18:27 ` Simon Schippers
2025-08-13 15:01 ` Stephen Hemminger
1 sibling, 1 reply; 10+ messages in thread
From: Jason Wang @ 2025-08-12 3:10 UTC (permalink / raw)
To: Simon Schippers; +Cc: willemdebruijn.kernel, netdev, linux-kernel, Tim Gebauer
On Tue, Aug 12, 2025 at 6:04 AM Simon Schippers
<simon.schippers@tu-dortmund.de> wrote:
>
> This patch is the result of our paper with the title "The NODROP Patch:
> Hardening Secure Networking for Real-time Teleoperation by Preventing
> Packet Drops in the Linux TUN Driver" [1].
> It deals with the tun_net_xmit function which drops SKB's with the reason
> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
> resulting in reduced TCP performance and packet loss for bursty video
> streams when used over VPN's.
>
> The abstract reads as follows:
> "Throughput-critical teleoperation requires robust and low-latency
> communication to ensure safety and performance. Often, these kinds of
> applications are implemented in Linux-based operating systems and transmit
> over virtual private networks, which ensure encryption and ease of use by
> providing a dedicated tunneling interface (TUN) to user space
> applications. In this work, we identified a specific behavior in the Linux
> TUN driver, which results in significant performance degradation due to
> the sender stack silently dropping packets. This design issue drastically
> impacts real-time video streaming, inducing up to 29 % packet loss with
> noticeable video artifacts when the internal queue of the TUN driver is
> reduced to 25 packets to minimize latency. Furthermore, a small queue
> length also drastically reduces the throughput of TCP traffic due to many
> retransmissions. Instead, with our open-source NODROP Patch, we propose
> generating backpressure in case of burst traffic or network congestion.
> The patch effectively addresses the packet-dropping behavior, hardening
> real-time video streaming and improving TCP throughput by 36 % in high
> latency scenarios."
>
> In addition to the mentioned performance and latency improvements for VPN
> applications, this patch also allows the proper usage of qdisc's. For
> example a fq_codel can not control the queuing delay when packets are
> already dropped in the TUN driver. This issue is also described in [2].
>
> The performance evaluation of the paper (see Fig. 4) showed a 4%
> performance hit for a single queue TUN with the default TUN queue size of
> 500 packets. However it is important to notice that with the proposed
> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
> The utilized validation pipeline is available under [3].
>
> As the reduction of the TUN queue to a size of down to 5 packets showed no
> further performance hit in the paper, a reduction of the default TUN queue
> size might be desirable accompanying this patch. A reduction would
> obviously reduce buffer bloat and memory requirements.
>
> Implementation details:
> - The netdev queue start/stop flow control is utilized.
> - Compatible with multi-queue by only stopping/waking the specific
> netdevice subqueue.
> - No additional locking is used.
>
> In the tun_net_xmit function:
> - Stopping the subqueue is done when the tx_ring gets full after inserting
> the SKB into the tx_ring.
> - In the unlikely case when the insertion with ptr_ring_produce fails, the
> old dropping behavior is used for this SKB.
>
> In the tun_ring_recv function:
> - Waking the subqueue is done after consuming a SKB from the tx_ring when
> the tx_ring is empty. Waking the subqueue when the tx_ring has any
> available space, so when it is not full, showed crashes in our testing. We
> are open to suggestions.
> - When the tx_ring is configured to be small (for example to hold 1 SKB),
> queuing might be stopped in the tun_net_xmit function while at the same
> time, ptr_ring_consume is not able to grab a SKB. This prevents
> tun_net_xmit from being called again and causes tun_ring_recv to wait
> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
> queue is woken in the wait queue if it has stopped.
> - Because the tun_struct is required to get the tx_queue into the new txq
> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
> faster then trying to get it via the tun_file tfile because it utilizes a
> rcu lock.
>
> We are open to suggestions regarding the implementation :)
> Thank you for your work!
>
I would like to see some benchmark results. Not only VPN but also a
classical VM setup that is using vhost-net + TAP.
> [1] Link:
> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> [2] Link:
> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> [3] Link: https://github.com/tudo-cni/nodrop
>
> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> ---
> V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed
> unnecessary netif_tx_wake_queue in tun_ring_recv.
>
> drivers/net/tun.c | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index cc6c50180663..81abdd3f9aca 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1060,13 +1060,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>
> nf_reset_ct(skb);
>
> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> + queue = netdev_get_tx_queue(dev, txq);
> + if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) {
> + netif_tx_stop_queue(queue);
> drop_reason = SKB_DROP_REASON_FULL_RING;
This would still drop the packet. Should we detect if the ring is
about to be full and stop then like a virtio-net?
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-11 22:03 [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops Simon Schippers
2025-08-12 3:10 ` Jason Wang
@ 2025-08-13 15:01 ` Stephen Hemminger
2025-08-13 18:33 ` Simon Schippers
1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2025-08-13 15:01 UTC (permalink / raw)
To: Simon Schippers
Cc: willemdebruijn.kernel, jasowang, netdev, linux-kernel,
Tim Gebauer
On Tue, 12 Aug 2025 00:03:48 +0200
Simon Schippers <simon.schippers@tu-dortmund.de> wrote:
> This patch is the result of our paper with the title "The NODROP Patch:
> Hardening Secure Networking for Real-time Teleoperation by Preventing
> Packet Drops in the Linux TUN Driver" [1].
> It deals with the tun_net_xmit function which drops SKB's with the reason
> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
> resulting in reduced TCP performance and packet loss for bursty video
> streams when used over VPN's.
>
> The abstract reads as follows:
> "Throughput-critical teleoperation requires robust and low-latency
> communication to ensure safety and performance. Often, these kinds of
> applications are implemented in Linux-based operating systems and transmit
> over virtual private networks, which ensure encryption and ease of use by
> providing a dedicated tunneling interface (TUN) to user space
> applications. In this work, we identified a specific behavior in the Linux
> TUN driver, which results in significant performance degradation due to
> the sender stack silently dropping packets. This design issue drastically
> impacts real-time video streaming, inducing up to 29 % packet loss with
> noticeable video artifacts when the internal queue of the TUN driver is
> reduced to 25 packets to minimize latency. Furthermore, a small queue
> length also drastically reduces the throughput of TCP traffic due to many
> retransmissions. Instead, with our open-source NODROP Patch, we propose
> generating backpressure in case of burst traffic or network congestion.
> The patch effectively addresses the packet-dropping behavior, hardening
> real-time video streaming and improving TCP throughput by 36 % in high
> latency scenarios."
>
> In addition to the mentioned performance and latency improvements for VPN
> applications, this patch also allows the proper usage of qdisc's. For
> example a fq_codel can not control the queuing delay when packets are
> already dropped in the TUN driver. This issue is also described in [2].
>
> The performance evaluation of the paper (see Fig. 4) showed a 4%
> performance hit for a single queue TUN with the default TUN queue size of
> 500 packets. However it is important to notice that with the proposed
> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
> The utilized validation pipeline is available under [3].
>
> As the reduction of the TUN queue to a size of down to 5 packets showed no
> further performance hit in the paper, a reduction of the default TUN queue
> size might be desirable accompanying this patch. A reduction would
> obviously reduce buffer bloat and memory requirements.
>
> Implementation details:
> - The netdev queue start/stop flow control is utilized.
> - Compatible with multi-queue by only stopping/waking the specific
> netdevice subqueue.
> - No additional locking is used.
>
> In the tun_net_xmit function:
> - Stopping the subqueue is done when the tx_ring gets full after inserting
> the SKB into the tx_ring.
> - In the unlikely case when the insertion with ptr_ring_produce fails, the
> old dropping behavior is used for this SKB.
>
> In the tun_ring_recv function:
> - Waking the subqueue is done after consuming a SKB from the tx_ring when
> the tx_ring is empty. Waking the subqueue when the tx_ring has any
> available space, so when it is not full, showed crashes in our testing. We
> are open to suggestions.
> - When the tx_ring is configured to be small (for example to hold 1 SKB),
> queuing might be stopped in the tun_net_xmit function while at the same
> time, ptr_ring_consume is not able to grab a SKB. This prevents
> tun_net_xmit from being called again and causes tun_ring_recv to wait
> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
> queue is woken in the wait queue if it has stopped.
> - Because the tun_struct is required to get the tx_queue into the new txq
> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
> faster then trying to get it via the tun_file tfile because it utilizes a
> rcu lock.
>
> We are open to suggestions regarding the implementation :)
> Thank you for your work!
>
> [1] Link:
> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> [2] Link:
> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> [3] Link: https://github.com/tudo-cni/nodrop
>
> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
I wonder if it would be possible to implement BQL in TUN/TAP?
https://lwn.net/Articles/454390/
BQL provides a feedback mechanism to application when queue fills.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-12 3:10 ` Jason Wang
@ 2025-08-13 18:27 ` Simon Schippers
0 siblings, 0 replies; 10+ messages in thread
From: Simon Schippers @ 2025-08-13 18:27 UTC (permalink / raw)
To: Jason Wang; +Cc: willemdebruijn.kernel, netdev, linux-kernel, Tim Gebauer
Jason Wang wrote:
> On Tue, Aug 12, 2025 at 6:04 AM Simon Schippers
> <simon.schippers@tu-dortmund.de> wrote:
>>
>> This patch is the result of our paper with the title "The NODROP Patch:
>> Hardening Secure Networking for Real-time Teleoperation by Preventing
>> Packet Drops in the Linux TUN Driver" [1].
>> It deals with the tun_net_xmit function which drops SKB's with the reason
>> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
>> resulting in reduced TCP performance and packet loss for bursty video
>> streams when used over VPN's.
>>
>> The abstract reads as follows:
>> "Throughput-critical teleoperation requires robust and low-latency
>> communication to ensure safety and performance. Often, these kinds of
>> applications are implemented in Linux-based operating systems and transmit
>> over virtual private networks, which ensure encryption and ease of use by
>> providing a dedicated tunneling interface (TUN) to user space
>> applications. In this work, we identified a specific behavior in the Linux
>> TUN driver, which results in significant performance degradation due to
>> the sender stack silently dropping packets. This design issue drastically
>> impacts real-time video streaming, inducing up to 29 % packet loss with
>> noticeable video artifacts when the internal queue of the TUN driver is
>> reduced to 25 packets to minimize latency. Furthermore, a small queue
>> length also drastically reduces the throughput of TCP traffic due to many
>> retransmissions. Instead, with our open-source NODROP Patch, we propose
>> generating backpressure in case of burst traffic or network congestion.
>> The patch effectively addresses the packet-dropping behavior, hardening
>> real-time video streaming and improving TCP throughput by 36 % in high
>> latency scenarios."
>>
>> In addition to the mentioned performance and latency improvements for VPN
>> applications, this patch also allows the proper usage of qdisc's. For
>> example a fq_codel can not control the queuing delay when packets are
>> already dropped in the TUN driver. This issue is also described in [2].
>>
>> The performance evaluation of the paper (see Fig. 4) showed a 4%
>> performance hit for a single queue TUN with the default TUN queue size of
>> 500 packets. However it is important to notice that with the proposed
>> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
>> The utilized validation pipeline is available under [3].
>>
>> As the reduction of the TUN queue to a size of down to 5 packets showed no
>> further performance hit in the paper, a reduction of the default TUN queue
>> size might be desirable accompanying this patch. A reduction would
>> obviously reduce buffer bloat and memory requirements.
>>
>> Implementation details:
>> - The netdev queue start/stop flow control is utilized.
>> - Compatible with multi-queue by only stopping/waking the specific
>> netdevice subqueue.
>> - No additional locking is used.
>>
>> In the tun_net_xmit function:
>> - Stopping the subqueue is done when the tx_ring gets full after inserting
>> the SKB into the tx_ring.
>> - In the unlikely case when the insertion with ptr_ring_produce fails, the
>> old dropping behavior is used for this SKB.
>>
>> In the tun_ring_recv function:
>> - Waking the subqueue is done after consuming a SKB from the tx_ring when
>> the tx_ring is empty. Waking the subqueue when the tx_ring has any
>> available space, so when it is not full, showed crashes in our testing. We
>> are open to suggestions.
>> - When the tx_ring is configured to be small (for example to hold 1 SKB),
>> queuing might be stopped in the tun_net_xmit function while at the same
>> time, ptr_ring_consume is not able to grab a SKB. This prevents
>> tun_net_xmit from being called again and causes tun_ring_recv to wait
>> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
>> queue is woken in the wait queue if it has stopped.
>> - Because the tun_struct is required to get the tx_queue into the new txq
>> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
>> faster then trying to get it via the tun_file tfile because it utilizes a
>> rcu lock.
>>
>> We are open to suggestions regarding the implementation :)
>> Thank you for your work!
>>
>
> I would like to see some benchmark results. Not only VPN but also a
> classical VM setup that is using vhost-net + TAP.
>
I completely overlooked that in tap.c there is also a tap_do_read function.
I would like to apologize for that and also implement the same behavior
from the tun_ring_recv function there.
The implementation is already done and it proved to be working but I will
test it a bit more before submitting a v3.
Regarding your proposed vhost-net + TAP setup: I need more time to
implement such a setup. However I am wondering what kind of tests you
would like to see exactly? TCP connections to a remote host like in our
paper?
>> [1] Link:
>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>> [2] Link:
>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
>> [3] Link: https://github.com/tudo-cni/nodrop
>>
>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>> ---
>> V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed
>> unnecessary netif_tx_wake_queue in tun_ring_recv.
>>
>> drivers/net/tun.c | 21 +++++++++++++++++----
>> 1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index cc6c50180663..81abdd3f9aca 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1060,13 +1060,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>
>> nf_reset_ct(skb);
>>
>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
>> + queue = netdev_get_tx_queue(dev, txq);
>> + if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) {
>> + netif_tx_stop_queue(queue);
>> drop_reason = SKB_DROP_REASON_FULL_RING;
>
> This would still drop the packet. Should we detect if the ring is
> about to be full and stop then like a virtio-net?
>
> Thanks
>
I am a bit confused. You omitted the important part of the code which
comes right after that. There I stop the netdev queue when the tx_ring
gets full. Therefore tun_net_xmit is not (very very unlikely) called when
there is no space for another SKB and no SKB's are dropped. It is only
called again after the netdev queue is activated again in the
tun_ring_recv function.
In virtio-net whole SKB's they basically do the same in the tx_may_stop
function. The function is called after inserting a SKB and see if there is
enough space for another max size SKB, which is the same statement as if
their send_queue is full.
Correct me if I am wrong!
Thank you!
Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.
Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-13 15:01 ` Stephen Hemminger
@ 2025-08-13 18:33 ` Simon Schippers
2025-08-14 3:45 ` Jason Wang
2025-08-14 15:08 ` Willem de Bruijn
0 siblings, 2 replies; 10+ messages in thread
From: Simon Schippers @ 2025-08-13 18:33 UTC (permalink / raw)
To: Stephen Hemminger
Cc: willemdebruijn.kernel, jasowang, netdev, linux-kernel,
Tim Gebauer
Stephen Hemminger wrote:
> On Tue, 12 Aug 2025 00:03:48 +0200
> Simon Schippers <simon.schippers@tu-dortmund.de> wrote:
>
>> This patch is the result of our paper with the title "The NODROP Patch:
>> Hardening Secure Networking for Real-time Teleoperation by Preventing
>> Packet Drops in the Linux TUN Driver" [1].
>> It deals with the tun_net_xmit function which drops SKB's with the reason
>> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
>> resulting in reduced TCP performance and packet loss for bursty video
>> streams when used over VPN's.
>>
>> The abstract reads as follows:
>> "Throughput-critical teleoperation requires robust and low-latency
>> communication to ensure safety and performance. Often, these kinds of
>> applications are implemented in Linux-based operating systems and transmit
>> over virtual private networks, which ensure encryption and ease of use by
>> providing a dedicated tunneling interface (TUN) to user space
>> applications. In this work, we identified a specific behavior in the Linux
>> TUN driver, which results in significant performance degradation due to
>> the sender stack silently dropping packets. This design issue drastically
>> impacts real-time video streaming, inducing up to 29 % packet loss with
>> noticeable video artifacts when the internal queue of the TUN driver is
>> reduced to 25 packets to minimize latency. Furthermore, a small queue
>> length also drastically reduces the throughput of TCP traffic due to many
>> retransmissions. Instead, with our open-source NODROP Patch, we propose
>> generating backpressure in case of burst traffic or network congestion.
>> The patch effectively addresses the packet-dropping behavior, hardening
>> real-time video streaming and improving TCP throughput by 36 % in high
>> latency scenarios."
>>
>> In addition to the mentioned performance and latency improvements for VPN
>> applications, this patch also allows the proper usage of qdisc's. For
>> example a fq_codel can not control the queuing delay when packets are
>> already dropped in the TUN driver. This issue is also described in [2].
>>
>> The performance evaluation of the paper (see Fig. 4) showed a 4%
>> performance hit for a single queue TUN with the default TUN queue size of
>> 500 packets. However it is important to notice that with the proposed
>> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
>> The utilized validation pipeline is available under [3].
>>
>> As the reduction of the TUN queue to a size of down to 5 packets showed no
>> further performance hit in the paper, a reduction of the default TUN queue
>> size might be desirable accompanying this patch. A reduction would
>> obviously reduce buffer bloat and memory requirements.
>>
>> Implementation details:
>> - The netdev queue start/stop flow control is utilized.
>> - Compatible with multi-queue by only stopping/waking the specific
>> netdevice subqueue.
>> - No additional locking is used.
>>
>> In the tun_net_xmit function:
>> - Stopping the subqueue is done when the tx_ring gets full after inserting
>> the SKB into the tx_ring.
>> - In the unlikely case when the insertion with ptr_ring_produce fails, the
>> old dropping behavior is used for this SKB.
>>
>> In the tun_ring_recv function:
>> - Waking the subqueue is done after consuming a SKB from the tx_ring when
>> the tx_ring is empty. Waking the subqueue when the tx_ring has any
>> available space, so when it is not full, showed crashes in our testing. We
>> are open to suggestions.
>> - When the tx_ring is configured to be small (for example to hold 1 SKB),
>> queuing might be stopped in the tun_net_xmit function while at the same
>> time, ptr_ring_consume is not able to grab a SKB. This prevents
>> tun_net_xmit from being called again and causes tun_ring_recv to wait
>> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
>> queue is woken in the wait queue if it has stopped.
>> - Because the tun_struct is required to get the tx_queue into the new txq
>> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
>> faster then trying to get it via the tun_file tfile because it utilizes a
>> rcu lock.
>>
>> We are open to suggestions regarding the implementation :)
>> Thank you for your work!
>>
>> [1] Link:
>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>> [2] Link:
>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
>> [3] Link: https://github.com/tudo-cni/nodrop
>>
>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>
> I wonder if it would be possible to implement BQL in TUN/TAP?
>
> https://lwn.net/Articles/454390/
>
> BQL provides a feedback mechanism to application when queue fills.
Thank you very much for your reply,
I also thought about BQL before and like the idea!
However I see the following challenges in the implementation:
- netdev_tx_sent_queue is no problem, it would just be called in
tun_net_xmit function.
- netdev_tx_completed_queue is challenging, because there is no completion
routine like in a "normal" network driver. tun_ring_recv reads one SKB at
a time and therefore I am not sure when and with what parameters to call
the function.
- What to do with the existing TUN queue packet limit (500 packets
default)? Use it as an upper limit?
Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.
Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-13 18:33 ` Simon Schippers
@ 2025-08-14 3:45 ` Jason Wang
2025-08-14 15:08 ` Willem de Bruijn
1 sibling, 0 replies; 10+ messages in thread
From: Jason Wang @ 2025-08-14 3:45 UTC (permalink / raw)
To: Simon Schippers
Cc: Stephen Hemminger, willemdebruijn.kernel, netdev, linux-kernel,
Tim Gebauer
On Thu, Aug 14, 2025 at 2:34 AM Simon Schippers
<simon.schippers@tu-dortmund.de> wrote:
>
> Stephen Hemminger wrote:
> > On Tue, 12 Aug 2025 00:03:48 +0200
> > Simon Schippers <simon.schippers@tu-dortmund.de> wrote:
> >
> >> This patch is the result of our paper with the title "The NODROP Patch:
> >> Hardening Secure Networking for Real-time Teleoperation by Preventing
> >> Packet Drops in the Linux TUN Driver" [1].
> >> It deals with the tun_net_xmit function which drops SKB's with the reason
> >> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
> >> resulting in reduced TCP performance and packet loss for bursty video
> >> streams when used over VPN's.
> >>
> >> The abstract reads as follows:
> >> "Throughput-critical teleoperation requires robust and low-latency
> >> communication to ensure safety and performance. Often, these kinds of
> >> applications are implemented in Linux-based operating systems and transmit
> >> over virtual private networks, which ensure encryption and ease of use by
> >> providing a dedicated tunneling interface (TUN) to user space
> >> applications. In this work, we identified a specific behavior in the Linux
> >> TUN driver, which results in significant performance degradation due to
> >> the sender stack silently dropping packets. This design issue drastically
> >> impacts real-time video streaming, inducing up to 29 % packet loss with
> >> noticeable video artifacts when the internal queue of the TUN driver is
> >> reduced to 25 packets to minimize latency. Furthermore, a small queue
> >> length also drastically reduces the throughput of TCP traffic due to many
> >> retransmissions. Instead, with our open-source NODROP Patch, we propose
> >> generating backpressure in case of burst traffic or network congestion.
> >> The patch effectively addresses the packet-dropping behavior, hardening
> >> real-time video streaming and improving TCP throughput by 36 % in high
> >> latency scenarios."
> >>
> >> In addition to the mentioned performance and latency improvements for VPN
> >> applications, this patch also allows the proper usage of qdisc's. For
> >> example a fq_codel can not control the queuing delay when packets are
> >> already dropped in the TUN driver. This issue is also described in [2].
> >>
> >> The performance evaluation of the paper (see Fig. 4) showed a 4%
> >> performance hit for a single queue TUN with the default TUN queue size of
> >> 500 packets. However it is important to notice that with the proposed
> >> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
> >> The utilized validation pipeline is available under [3].
> >>
> >> As the reduction of the TUN queue to a size of down to 5 packets showed no
> >> further performance hit in the paper, a reduction of the default TUN queue
> >> size might be desirable accompanying this patch. A reduction would
> >> obviously reduce buffer bloat and memory requirements.
> >>
> >> Implementation details:
> >> - The netdev queue start/stop flow control is utilized.
> >> - Compatible with multi-queue by only stopping/waking the specific
> >> netdevice subqueue.
> >> - No additional locking is used.
> >>
> >> In the tun_net_xmit function:
> >> - Stopping the subqueue is done when the tx_ring gets full after inserting
> >> the SKB into the tx_ring.
> >> - In the unlikely case when the insertion with ptr_ring_produce fails, the
> >> old dropping behavior is used for this SKB.
> >>
> >> In the tun_ring_recv function:
> >> - Waking the subqueue is done after consuming a SKB from the tx_ring when
> >> the tx_ring is empty. Waking the subqueue when the tx_ring has any
> >> available space, so when it is not full, showed crashes in our testing. We
> >> are open to suggestions.
> >> - When the tx_ring is configured to be small (for example to hold 1 SKB),
> >> queuing might be stopped in the tun_net_xmit function while at the same
> >> time, ptr_ring_consume is not able to grab a SKB. This prevents
> >> tun_net_xmit from being called again and causes tun_ring_recv to wait
> >> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
> >> queue is woken in the wait queue if it has stopped.
> >> - Because the tun_struct is required to get the tx_queue into the new txq
> >> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
> >> faster then trying to get it via the tun_file tfile because it utilizes a
> >> rcu lock.
> >>
> >> We are open to suggestions regarding the implementation :)
> >> Thank you for your work!
> >>
> >> [1] Link:
> >> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> >> [2] Link:
> >> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> >> [3] Link: https://github.com/tudo-cni/nodrop
> >>
> >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >
> > I wonder if it would be possible to implement BQL in TUN/TAP?
> >
> > https://lwn.net/Articles/454390/
> >
> > BQL provides a feedback mechanism to application when queue fills.
>
> Thank you very much for your reply,
> I also thought about BQL before and like the idea!
>
> However I see the following challenges in the implementation:
> - netdev_tx_sent_queue is no problem, it would just be called in
> tun_net_xmit function.
> - netdev_tx_completed_queue is challenging, because there is no completion
> routine like in a "normal" network driver. tun_ring_recv reads one SKB at
> a time and therefore I am not sure when and with what parameters to call
> the function.
Right, this is similar to virtio_net without TX NAPI. It would be
tricky to implement BQL on top (and TUN also did skb_orphan during
xmit).
Thanks
> - What to do with the existing TUN queue packet limit (500 packets
> default)? Use it as an upper limit?
>
> Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
> Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.
>
> Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-13 18:33 ` Simon Schippers
2025-08-14 3:45 ` Jason Wang
@ 2025-08-14 15:08 ` Willem de Bruijn
2025-08-14 16:23 ` Simon Schippers
1 sibling, 1 reply; 10+ messages in thread
From: Willem de Bruijn @ 2025-08-14 15:08 UTC (permalink / raw)
To: Simon Schippers, Stephen Hemminger
Cc: willemdebruijn.kernel, jasowang, netdev, linux-kernel,
Tim Gebauer
Simon Schippers wrote:
> Stephen Hemminger wrote:
> > On Tue, 12 Aug 2025 00:03:48 +0200
> > Simon Schippers <simon.schippers@tu-dortmund.de> wrote:
> >
> >> This patch is the result of our paper with the title "The NODROP Patch:
> >> Hardening Secure Networking for Real-time Teleoperation by Preventing
> >> Packet Drops in the Linux TUN Driver" [1].
> >> It deals with the tun_net_xmit function which drops SKB's with the reason
> >> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
> >> resulting in reduced TCP performance and packet loss for bursty video
> >> streams when used over VPN's.
> >>
> >> The abstract reads as follows:
> >> "Throughput-critical teleoperation requires robust and low-latency
> >> communication to ensure safety and performance. Often, these kinds of
> >> applications are implemented in Linux-based operating systems and transmit
> >> over virtual private networks, which ensure encryption and ease of use by
> >> providing a dedicated tunneling interface (TUN) to user space
> >> applications. In this work, we identified a specific behavior in the Linux
> >> TUN driver, which results in significant performance degradation due to
> >> the sender stack silently dropping packets. This design issue drastically
> >> impacts real-time video streaming, inducing up to 29 % packet loss with
> >> noticeable video artifacts when the internal queue of the TUN driver is
> >> reduced to 25 packets to minimize latency. Furthermore, a small queue
> >> length also drastically reduces the throughput of TCP traffic due to many
> >> retransmissions. Instead, with our open-source NODROP Patch, we propose
> >> generating backpressure in case of burst traffic or network congestion.
> >> The patch effectively addresses the packet-dropping behavior, hardening
> >> real-time video streaming and improving TCP throughput by 36 % in high
> >> latency scenarios."
> >>
> >> In addition to the mentioned performance and latency improvements for VPN
> >> applications, this patch also allows the proper usage of qdisc's. For
> >> example a fq_codel can not control the queuing delay when packets are
> >> already dropped in the TUN driver. This issue is also described in [2].
> >>
> >> The performance evaluation of the paper (see Fig. 4) showed a 4%
> >> performance hit for a single queue TUN with the default TUN queue size of
> >> 500 packets. However it is important to notice that with the proposed
> >> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
> >> The utilized validation pipeline is available under [3].
> >>
> >> As the reduction of the TUN queue to a size of down to 5 packets showed no
> >> further performance hit in the paper, a reduction of the default TUN queue
> >> size might be desirable accompanying this patch. A reduction would
> >> obviously reduce buffer bloat and memory requirements.
> >>
> >> Implementation details:
> >> - The netdev queue start/stop flow control is utilized.
> >> - Compatible with multi-queue by only stopping/waking the specific
> >> netdevice subqueue.
> >> - No additional locking is used.
> >>
> >> In the tun_net_xmit function:
> >> - Stopping the subqueue is done when the tx_ring gets full after inserting
> >> the SKB into the tx_ring.
> >> - In the unlikely case when the insertion with ptr_ring_produce fails, the
> >> old dropping behavior is used for this SKB.
> >>
> >> In the tun_ring_recv function:
> >> - Waking the subqueue is done after consuming a SKB from the tx_ring when
> >> the tx_ring is empty. Waking the subqueue when the tx_ring has any
> >> available space, so when it is not full, showed crashes in our testing. We
> >> are open to suggestions.
> >> - When the tx_ring is configured to be small (for example to hold 1 SKB),
> >> queuing might be stopped in the tun_net_xmit function while at the same
> >> time, ptr_ring_consume is not able to grab a SKB. This prevents
> >> tun_net_xmit from being called again and causes tun_ring_recv to wait
> >> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
> >> queue is woken in the wait queue if it has stopped.
> >> - Because the tun_struct is required to get the tx_queue into the new txq
> >> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
> >> faster then trying to get it via the tun_file tfile because it utilizes a
> >> rcu lock.
> >>
> >> We are open to suggestions regarding the implementation :)
> >> Thank you for your work!
> >>
> >> [1] Link:
> >> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> >> [2] Link:
> >> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> >> [3] Link: https://github.com/tudo-cni/nodrop
> >>
> >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >
> > I wonder if it would be possible to implement BQL in TUN/TAP?
> >
> > https://lwn.net/Articles/454390/
> >
> > BQL provides a feedback mechanism to application when queue fills.
>
> Thank you very much for your reply,
> I also thought about BQL before and like the idea!
I would start with this patch series to convert TUN to a driver that
pauses the stack rather than drops.
Please reword the commit to describe the functional change concisely.
In general the effect of drops on TCP are well understood. You can
link to your paper for specific details.
I still suggest stopping the ring before a packet has to be dropped.
Note also that there is a mechanism to requeue an skb rather than
drop, see dev_requeue_skb and NETDEV_TX_BUSY. But simply pausing
before empty likely suffices.
Relevant to BQL: did your workload include particularly large packets,
e.g., TSO? Only then does byte limits vs packet limits matter.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-14 15:08 ` Willem de Bruijn
@ 2025-08-14 16:23 ` Simon Schippers
2025-08-15 15:35 ` Jakub Kicinski
0 siblings, 1 reply; 10+ messages in thread
From: Simon Schippers @ 2025-08-14 16:23 UTC (permalink / raw)
To: Willem de Bruijn, Stephen Hemminger
Cc: jasowang, netdev, linux-kernel, Tim Gebauer
Willem de Bruijn wrote:
> Simon Schippers wrote:
>> Stephen Hemminger wrote:
>>> On Tue, 12 Aug 2025 00:03:48 +0200
>>> Simon Schippers <simon.schippers@tu-dortmund.de> wrote:
>>>
>>>> This patch is the result of our paper with the title "The NODROP Patch:
>>>> Hardening Secure Networking for Real-time Teleoperation by Preventing
>>>> Packet Drops in the Linux TUN Driver" [1].
>>>> It deals with the tun_net_xmit function which drops SKB's with the reason
>>>> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
>>>> resulting in reduced TCP performance and packet loss for bursty video
>>>> streams when used over VPN's.
>>>>
>>>> The abstract reads as follows:
>>>> "Throughput-critical teleoperation requires robust and low-latency
>>>> communication to ensure safety and performance. Often, these kinds of
>>>> applications are implemented in Linux-based operating systems and transmit
>>>> over virtual private networks, which ensure encryption and ease of use by
>>>> providing a dedicated tunneling interface (TUN) to user space
>>>> applications. In this work, we identified a specific behavior in the Linux
>>>> TUN driver, which results in significant performance degradation due to
>>>> the sender stack silently dropping packets. This design issue drastically
>>>> impacts real-time video streaming, inducing up to 29 % packet loss with
>>>> noticeable video artifacts when the internal queue of the TUN driver is
>>>> reduced to 25 packets to minimize latency. Furthermore, a small queue
>>>> length also drastically reduces the throughput of TCP traffic due to many
>>>> retransmissions. Instead, with our open-source NODROP Patch, we propose
>>>> generating backpressure in case of burst traffic or network congestion.
>>>> The patch effectively addresses the packet-dropping behavior, hardening
>>>> real-time video streaming and improving TCP throughput by 36 % in high
>>>> latency scenarios."
>>>>
>>>> In addition to the mentioned performance and latency improvements for VPN
>>>> applications, this patch also allows the proper usage of qdisc's. For
>>>> example a fq_codel can not control the queuing delay when packets are
>>>> already dropped in the TUN driver. This issue is also described in [2].
>>>>
>>>> The performance evaluation of the paper (see Fig. 4) showed a 4%
>>>> performance hit for a single queue TUN with the default TUN queue size of
>>>> 500 packets. However it is important to notice that with the proposed
>>>> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
>>>> The utilized validation pipeline is available under [3].
>>>>
>>>> As the reduction of the TUN queue to a size of down to 5 packets showed no
>>>> further performance hit in the paper, a reduction of the default TUN queue
>>>> size might be desirable accompanying this patch. A reduction would
>>>> obviously reduce buffer bloat and memory requirements.
>>>>
>>>> Implementation details:
>>>> - The netdev queue start/stop flow control is utilized.
>>>> - Compatible with multi-queue by only stopping/waking the specific
>>>> netdevice subqueue.
>>>> - No additional locking is used.
>>>>
>>>> In the tun_net_xmit function:
>>>> - Stopping the subqueue is done when the tx_ring gets full after inserting
>>>> the SKB into the tx_ring.
>>>> - In the unlikely case when the insertion with ptr_ring_produce fails, the
>>>> old dropping behavior is used for this SKB.
>>>>
>>>> In the tun_ring_recv function:
>>>> - Waking the subqueue is done after consuming a SKB from the tx_ring when
>>>> the tx_ring is empty. Waking the subqueue when the tx_ring has any
>>>> available space, so when it is not full, showed crashes in our testing. We
>>>> are open to suggestions.
>>>> - When the tx_ring is configured to be small (for example to hold 1 SKB),
>>>> queuing might be stopped in the tun_net_xmit function while at the same
>>>> time, ptr_ring_consume is not able to grab a SKB. This prevents
>>>> tun_net_xmit from being called again and causes tun_ring_recv to wait
>>>> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
>>>> queue is woken in the wait queue if it has stopped.
>>>> - Because the tun_struct is required to get the tx_queue into the new txq
>>>> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
>>>> faster then trying to get it via the tun_file tfile because it utilizes a
>>>> rcu lock.
>>>>
>>>> We are open to suggestions regarding the implementation :)
>>>> Thank you for your work!
>>>>
>>>> [1] Link:
>>>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>>>> [2] Link:
>>>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
>>>> [3] Link: https://github.com/tudo-cni/nodrop
>>>>
>>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>>>
>>> I wonder if it would be possible to implement BQL in TUN/TAP?
>>>
>>> https://lwn.net/Articles/454390/
>>>
>>> BQL provides a feedback mechanism to application when queue fills.
>>
>> Thank you very much for your reply,
>> I also thought about BQL before and like the idea!
>
> I would start with this patch series to convert TUN to a driver that
> pauses the stack rather than drops.
>
> Please reword the commit to describe the functional change concisely.
> In general the effect of drops on TCP are well understood. You can
> link to your paper for specific details.
>
I will remove the paper abstract for the v3 to have a more concise
description.
Also I will clarify why no packets are dropped anymore.
> I still suggest stopping the ring before a packet has to be dropped.
> Note also that there is a mechanism to requeue an skb rather than
> drop, see dev_requeue_skb and NETDEV_TX_BUSY. But simply pausing
> before empty likely suffices.
>
As explained before in my reply to Jason, this patch does stop the netdev
queue before a packet has to be dropped. It uses a very similar approach
to the suggested virtio_net.
> Relevant to BQL: did your workload include particularly large packets,
> e.g., TSO? Only then does byte limits vs packet limits matter.
>
No, in my workload I did not use TSO/GSO. However I think the most
important aspect is that the BQL algorithm utilizes a dynamic queue limit.
This will in most cases reduce the TUN queue size and reduce buffer bloat.
I now have a idea how to include BQL, but first I will add TAP support in
a v3. BQL could then be added in a v4.
Thank you :)
Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.
Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-14 16:23 ` Simon Schippers
@ 2025-08-15 15:35 ` Jakub Kicinski
2025-08-15 17:40 ` Simon Schippers
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-08-15 15:35 UTC (permalink / raw)
To: Simon Schippers
Cc: Willem de Bruijn, Stephen Hemminger, jasowang, netdev,
linux-kernel, Tim Gebauer
On Thu, 14 Aug 2025 18:23:56 +0200 Simon Schippers wrote:
> Important note: The information included in this e-mail is confidential.
You really need to try to get rid of this footer if you want to talk
to an open source community.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
2025-08-15 15:35 ` Jakub Kicinski
@ 2025-08-15 17:40 ` Simon Schippers
0 siblings, 0 replies; 10+ messages in thread
From: Simon Schippers @ 2025-08-15 17:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Willem de Bruijn, Stephen Hemminger, jasowang, netdev,
linux-kernel, Tim Gebauer
Jakub Kicinski wrote:
> On Thu, 14 Aug 2025 18:23:56 +0200 Simon Schippers wrote:
>> Important note: The information included in this e-mail is confidential.
>
> You really need to try to get rid of this footer if you want to talk
> to an open source community.
Hi,
I am sorry for that. I fixed it now by avoiding outlook servers...
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-15 17:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 22:03 [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops Simon Schippers
2025-08-12 3:10 ` Jason Wang
2025-08-13 18:27 ` Simon Schippers
2025-08-13 15:01 ` Stephen Hemminger
2025-08-13 18:33 ` Simon Schippers
2025-08-14 3:45 ` Jason Wang
2025-08-14 15:08 ` Willem de Bruijn
2025-08-14 16:23 ` Simon Schippers
2025-08-15 15:35 ` Jakub Kicinski
2025-08-15 17:40 ` Simon Schippers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).