All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Simon Schippers <simon.schippers@tu-dortmund.de>
Cc: willemdebruijn.kernel@gmail.com, jasowang@redhat.com,
	eperezma@redhat.com, stephen@networkplumber.org,
	leiyang@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	kvm@vger.kernel.org
Subject: Re: [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop
Date: Wed, 24 Sep 2025 02:12:04 -0400	[thread overview]
Message-ID: <20250924021145-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <96058e18-bb1e-46d1-99aa-9fdffb965e44@tu-dortmund.de>

On Wed, Sep 24, 2025 at 07:59:46AM +0200, Simon Schippers wrote:
> On 23.09.25 16:55, Michael S. Tsirkin wrote:
> > On Tue, Sep 23, 2025 at 12:15:45AM +0200, Simon Schippers wrote:
> >> This patch series deals with TUN, TAP and vhost_net which drop incoming 
> >> SKBs whenever their internal ptr_ring buffer is full. Instead, with this 
> >> patch series, the associated netdev queue is stopped before this happens. 
> >> This allows the connected qdisc to function correctly as reported by [1] 
> >> and improves application-layer performance, see our paper [2]. Meanwhile 
> >> the theoretical performance differs only slightly:
> >>
> >> +------------------------+----------+----------+
> >> | pktgen benchmarks      | Stock    | Patched  |
> >> | i5 6300HQ, 20M packets |          |          |
> >> +------------------------+----------+----------+
> >> | TAP                    | 2.10Mpps | 1.99Mpps |
> >> +------------------------+----------+----------+
> >> | TAP+vhost_net          | 6.05Mpps | 6.14Mpps |
> >> +------------------------+----------+----------+
> >> | Note: Patched had no TX drops at all,        |
> >> | while stock suffered numerous drops.         |
> >> +----------------------------------------------+
> >>
> >> This patch series includes TUN, TAP, and vhost_net because they share 
> >> logic. Adjusting only one of them would break the others. Therefore, the 
> >> patch series is structured as follows:
> >> 1+2: New ptr_ring helpers for 3 & 4
> >> 3: TUN & TAP: Stop netdev queue upon reaching a full ptr_ring
> > 
> > 
> > so what happens if you only apply patches 1-3?
> > 
> 
> The netdev queue of vhost_net would be stopped by tun_net_xmit but will
> never be woken again.

So this breaks bisect. Don't split patches like this please.


> >> 4: TUN & TAP: Wake netdev queue after consuming an entry
> >> 5+6+7: TUN & TAP: ptr_ring wrappers and other helpers to be called by 
> >> vhost_net
> >> 8: vhost_net: Call the wrappers & helpers
> >>
> >> Possible future work:
> >> - Introduction of Byte Queue Limits as suggested by Stephen Hemminger
> >> - Adaption of the netdev queue flow control for ipvtap & macvtap
> >>
> >> [1] Link: 
> >> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> >> [2] Link: 
> >> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> >>
> >> Links to previous versions:
> >> V4: 
> >> https://lore.kernel.org/netdev/20250902080957.47265-1-simon.schippers@tu-dortmund.de/T/#u
> >> V3: 
> >> https://lore.kernel.org/netdev/20250825211832.84901-1-simon.schippers@tu-dortmund.de/T/#u
> >> V2: 
> >> https://lore.kernel.org/netdev/20250811220430.14063-1-simon.schippers@tu-dortmund.de/T/#u
> >> V1: 
> >> https://lore.kernel.org/netdev/20250808153721.261334-1-simon.schippers@tu-dortmund.de/T/#u
> >>
> >> Changelog:
> >> V4 -> V5:
> >> - Stop the netdev queue prior to producing the final fitting ptr_ring entry
> >> -> Ensures the consumer has the latest netdev queue state, making it safe 
> >> to wake the queue
> >> -> Resolves an issue in vhost_net where the netdev queue could remain 
> >> stopped despite being empty
> >> -> For TUN/TAP, the netdev queue no longer needs to be woken in the 
> >> blocking loop
> >> -> Introduces new helpers __ptr_ring_full_next and 
> >> __ptr_ring_will_invalidate for this purpose
> >>
> >> - vhost_net now uses wrappers of TUN/TAP for ptr_ring consumption rather 
> >> than maintaining its own rx_ring pointer
> >>
> >> V3 -> V4:
> >> - Target net-next instead of net
> >> - Changed to patch series instead of single patch
> >> - Changed to new title from old title
> >> "TUN/TAP: Improving throughput and latency by avoiding SKB drops"
> >> - Wake netdev queue with new helpers wake_netdev_queue when there is any 
> >> spare capacity in the ptr_ring instead of waiting for it to be empty
> >> - Use tun_file instead of tun_struct in tun_ring_recv as a more consistent 
> >> logic
> >> - Use smp_wmb() and smp_rmb() barrier pair, which avoids any packet drops 
> >> that happened rarely before
> >> - Use safer logic for vhost_net using RCU read locks to access TUN/TAP data
> >>
> >> V2 -> V3: Added support for TAP and TAP+vhost_net.
> >>
> >> V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed 
> >> unnecessary netif_tx_wake_queue in tun_ring_recv.
> >>
> >> Thanks,
> >> Simon :)
> >>
> >> Simon Schippers (8):
> >>   __ptr_ring_full_next: Returns if ring will be full after next
> >>     insertion
> >>   Move the decision of invalidation out of __ptr_ring_discard_one
> >>   TUN, TAP & vhost_net: Stop netdev queue before reaching a full
> >>     ptr_ring
> >>   TUN & TAP: Wake netdev queue after consuming an entry
> >>   TUN & TAP: Provide ptr_ring_consume_batched wrappers for vhost_net
> >>   TUN & TAP: Provide ptr_ring_unconsume wrappers for vhost_net
> >>   TUN & TAP: Methods to determine whether file is TUN/TAP for vhost_net
> >>   vhost_net: Replace rx_ring with calls of TUN/TAP wrappers
> >>
> >>  drivers/net/tap.c        | 115 +++++++++++++++++++++++++++++++--
> >>  drivers/net/tun.c        | 136 +++++++++++++++++++++++++++++++++++----
> >>  drivers/vhost/net.c      |  90 +++++++++++++++++---------
> >>  include/linux/if_tap.h   |  15 +++++
> >>  include/linux/if_tun.h   |  18 ++++++
> >>  include/linux/ptr_ring.h |  54 +++++++++++++---
> >>  6 files changed, 367 insertions(+), 61 deletions(-)
> >>
> >> -- 
> >> 2.43.0
> > 


  reply	other threads:[~2025-09-24  6:12 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-22 22:15 [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 1/8] __ptr_ring_full_next: Returns if ring will be full after next insertion Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 2/8] Move the decision of invalidation out of __ptr_ring_discard_one Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring Simon Schippers
2025-09-23 14:47   ` Michael S. Tsirkin
2025-09-24  5:41     ` Simon Schippers
2025-09-24  5:50       ` Michael S. Tsirkin
2025-09-22 22:15 ` [PATCH net-next v5 4/8] TUN & TAP: Wake netdev queue after consuming an entry Simon Schippers
2025-09-23 14:54   ` Michael S. Tsirkin
2025-09-23 16:36   ` Michael S. Tsirkin
2025-09-24  5:56     ` Simon Schippers
2025-09-24  6:55       ` Michael S. Tsirkin
2025-09-24  7:42         ` Simon Schippers
2025-09-24  7:49           ` Michael S. Tsirkin
2025-09-24  8:40             ` Simon Schippers
2025-09-24  9:00               ` Michael S. Tsirkin
2025-09-28 21:27     ` Simon Schippers
2025-09-28 22:33       ` Michael S. Tsirkin
2025-09-29  9:43         ` Simon Schippers
2025-10-11  9:15           ` Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 5/8] TUN & TAP: Provide ptr_ring_consume_batched wrappers for vhost_net Simon Schippers
2025-09-23 16:23   ` Michael S. Tsirkin
2025-09-22 22:15 ` [PATCH net-next v5 6/8] TUN & TAP: Provide ptr_ring_unconsume " Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 7/8] TUN & TAP: Methods to determine whether file is TUN/TAP " Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 8/8] vhost_net: Replace rx_ring with calls of TUN/TAP wrappers Simon Schippers
2025-09-23 14:14   ` kernel test robot
2025-09-26 13:47   ` kernel test robot
2025-09-23 14:55 ` [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop Michael S. Tsirkin
2025-09-24  5:59   ` Simon Schippers
2025-09-24  6:12     ` Michael S. Tsirkin [this message]
2025-09-24  7:18 ` Michael S. Tsirkin
2025-09-24  7:33   ` Jason Wang
2025-09-24  7:41     ` Michael S. Tsirkin
2025-09-24  8:08       ` Jason Wang
2025-09-24  8:09         ` Michael S. Tsirkin
2025-09-24  8:30           ` Jason Wang
2025-09-24  8:54             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250924021145-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=leiyang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=simon.schippers@tu-dortmund.de \
    --cc=stephen@networkplumber.org \
    --cc=virtualization@lists.linux.dev \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.