All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Simon Schippers <simon.schippers@tu-dortmund.de>
Cc: willemdebruijn.kernel@gmail.com, jasowang@redhat.com,
	eperezma@redhat.com, stephen@networkplumber.org,
	leiyang@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	kvm@vger.kernel.org, Tim Gebauer <tim.gebauer@tu-dortmund.de>
Subject: Re: [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring
Date: Wed, 24 Sep 2025 01:50:34 -0400	[thread overview]
Message-ID: <20250924014703-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <71afbe18-3a5a-44ca-bb3b-b018f73ae8c6@tu-dortmund.de>

On Wed, Sep 24, 2025 at 07:41:28AM +0200, Simon Schippers wrote:
> Hi,
> first of all thank you very much for your detailed replies! :)
> 
> On 23.09.25 16:47, Michael S. Tsirkin wrote:
> > On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote:
> >> Stop the netdev queue ahead of __ptr_ring_produce when
> >> __ptr_ring_full_next signals the ring is about to fill. Due to the
> >> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
> >> notice the stopped netdev queue after seeing the new ptr_ring entry. As
> >> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
> >> the lock is held during the execution of both methods.
> >>
> >> dev->lltx is disabled to ensure that tun_net_xmit is not called even
> >> though the netdev queue is stopped (which happened in my testing,
> >> resulting in rare packet drops). Consequently, the update of trans_start
> >> in tun_net_xmit is also removed.
> >>
> >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >> ---
> >>  drivers/net/tun.c | 16 ++++++++++------
> >>  1 file changed, 10 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index 86a9e927d0ff..c6b22af9bae8 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
> >>  	dev->vlan_features = dev->features &
> >>  			     ~(NETIF_F_HW_VLAN_CTAG_TX |
> >>  			       NETIF_F_HW_VLAN_STAG_TX);
> >> -	dev->lltx = true;
> >> +	dev->lltx = false;
> >>  
> >>  	tun->flags = (tun->flags & ~TUN_FEATURES) |
> >>  		      (ifr->ifr_flags & TUN_FEATURES);
> >> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>  
> >>  	nf_reset_ct(skb);
> >>  
> >> -	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> >> +	queue = netdev_get_tx_queue(dev, txq);
> >> +
> >> +	spin_lock(&tfile->tx_ring.producer_lock);
> >> +	if (__ptr_ring_full_next(&tfile->tx_ring))
> >> +		netif_tx_stop_queue(queue);
> >> +
> >> +	if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
> >> +		spin_unlock(&tfile->tx_ring.producer_lock);
> >>  		drop_reason = SKB_DROP_REASON_FULL_RING;
> >>  		goto drop;
> >>  	}
> > 
> > The comment makes it sound like you always keep one slot free
> > in the queue but that is not the case - you just
> > check before calling __ptr_ring_produce.
> > 
> 
> I agree.
> 
> > 
> > But it is racy isn't it? So first of all I suspect you
> > are missing an mb before netif_tx_stop_queue.
> > 
> 
> I don’t really get this point right now.

ring full next is a read. stop queue is a write. if you are
relying on ordering them in some way you need a full mb generally.




> > Second it's racy because more entries can get freed
> > afterwards. Which maybe is ok in this instance?
> > But it really should be explained in more detail, if so.
> > 
> 
> Will be covered in the next mail.
> 
> > 
> > 
> > Now - why not just check ring full *after* __ptr_ring_produce?
> > Why do we need all these new APIs, and we can
> > use existing ones which at least are not so hard to understand.
> > 
> > 
> 
> You convinced me about changing my implementation anyway but here my (old) 
> idea:
> I did this in V1-V4. The problem is that vhost_net is only called on 
> EPOLLIN triggered by tun_net_xmit. Then, after consuming a batch from the 
> ptr_ring, it must be able to see if the netdev queue stopped or not. If 
> this is not the case the ptr_ring might get empty and vhost_net is not 
> able to wake the queue again (because it is not stopped from its POV), 
> which happened in my testing in my V4.
> 
> This is the reason why, now in the V5, in tun_net_xmit I stop the netdev 
> queue before producing. With that I exploit the smp_wmb() in 
> __ptr_ring_produce which is paired with the READ_ONCE in __ptr_ring_peek 
> to ensure that the consumer in vhost_net sees that the netdev queue 
> stopped after consuming a batch.

yea you said it somewhere in code, too, and I am not sure I understand it all, but
wmb isn't paired with READ_ONCE generally. barrier pairing
is described in memory-barriers.txt, READ_ONCE is not a barrier
at all.

> > 
> > 
> >> -
> >> -	/* dev->lltx requires to do our own update of trans_start */
> >> -	queue = netdev_get_tx_queue(dev, txq);
> >> -	txq_trans_cond_update(queue);
> >> +	spin_unlock(&tfile->tx_ring.producer_lock);
> >>  
> >>  	/* Notify and wake up reader process */
> >>  	if (tfile->flags & TUN_FASYNC)
> >> -- 
> >> 2.43.0
> > 


  reply	other threads:[~2025-09-24  5:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-22 22:15 [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 1/8] __ptr_ring_full_next: Returns if ring will be full after next insertion Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 2/8] Move the decision of invalidation out of __ptr_ring_discard_one Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring Simon Schippers
2025-09-23 14:47   ` Michael S. Tsirkin
2025-09-24  5:41     ` Simon Schippers
2025-09-24  5:50       ` Michael S. Tsirkin [this message]
2025-09-22 22:15 ` [PATCH net-next v5 4/8] TUN & TAP: Wake netdev queue after consuming an entry Simon Schippers
2025-09-23 14:54   ` Michael S. Tsirkin
2025-09-23 16:36   ` Michael S. Tsirkin
2025-09-24  5:56     ` Simon Schippers
2025-09-24  6:55       ` Michael S. Tsirkin
2025-09-24  7:42         ` Simon Schippers
2025-09-24  7:49           ` Michael S. Tsirkin
2025-09-24  8:40             ` Simon Schippers
2025-09-24  9:00               ` Michael S. Tsirkin
2025-09-28 21:27     ` Simon Schippers
2025-09-28 22:33       ` Michael S. Tsirkin
2025-09-29  9:43         ` Simon Schippers
2025-10-11  9:15           ` Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 5/8] TUN & TAP: Provide ptr_ring_consume_batched wrappers for vhost_net Simon Schippers
2025-09-23 16:23   ` Michael S. Tsirkin
2025-09-22 22:15 ` [PATCH net-next v5 6/8] TUN & TAP: Provide ptr_ring_unconsume " Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 7/8] TUN & TAP: Methods to determine whether file is TUN/TAP " Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 8/8] vhost_net: Replace rx_ring with calls of TUN/TAP wrappers Simon Schippers
2025-09-23 14:14   ` kernel test robot
2025-09-26 13:47   ` kernel test robot
2025-09-23 14:55 ` [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop Michael S. Tsirkin
2025-09-24  5:59   ` Simon Schippers
2025-09-24  6:12     ` Michael S. Tsirkin
2025-09-24  7:18 ` Michael S. Tsirkin
2025-09-24  7:33   ` Jason Wang
2025-09-24  7:41     ` Michael S. Tsirkin
2025-09-24  8:08       ` Jason Wang
2025-09-24  8:09         ` Michael S. Tsirkin
2025-09-24  8:30           ` Jason Wang
2025-09-24  8:54             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250924014703-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=leiyang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=simon.schippers@tu-dortmund.de \
    --cc=stephen@networkplumber.org \
    --cc=tim.gebauer@tu-dortmund.de \
    --cc=virtualization@lists.linux.dev \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.