netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, tom@herbertland.com,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Toke Høiland-Jørgensen" <toke@toke.dk>,
	dsahern@kernel.org, makita.toshiaki@lab.ntt.co.jp,
	kernel-team@cloudflare.com, phil@nwl.cc
Subject: Re: [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops
Date: Fri, 25 Apr 2025 15:55:52 +0200	[thread overview]
Message-ID: <d36cb5a0-902c-4de5-bdd2-cbf9e1b1c7b1@kernel.org> (raw)
In-Reply-To: <20250424085358.75d817ae@kernel.org>



On 24/04/2025 17.53, Jakub Kicinski wrote:
> On Thu, 24 Apr 2025 17:24:51 +0200 Jesper Dangaard Brouer wrote:
>>> Looks like I wrote a reply to v5 but didn't hit send. But I may have
>>> set v5 to Changes Requested because of it :S Here is my comment:
>>>
>>>    I think this is missing a memory barrier. When drivers do this dance
>>>    there's usually a barrier between stop and recheck, to make sure the
>>>    stop is visible before we check. And vice versa veth_xdp_rcv() needs
>>>    to make sure other side sees the "empty" indication before it checks
>>>    if the queue is stopped.
>>
>> The call netif_tx_stop_queue(txq); already contains a memory barrier
>> smp_mb__before_atomic() plus an atomic set_bit operation.  That should
>> be sufficient.
> 
> That barrier is _before_ stopping the queue. I'm saying we need a
> barrier between stop and emptiness re-check. Note that:
>   - smp_mb__after_atomic() is enough, and it 'compiles' to nothing
>     on x86

I see, I will add a smp_mb__after_atomic() after netif_tx_stop_queue()
and send a V7.  I considered an atomic operation a full memory-barrier,
which I guess is correct for x86 (as you say this compiled to nothing),
but I guess other archs need this, so lets add it.

>   - all of this is the unlikely path :) You restart the qdisc
>     when the ptr ring is completely full so the stopping in absolute
>     worst case will happen once or twice per full ptr_ring ?
> 

Yes, basically. It should only happen once per full ptr_ring event.
As soon as TXQ is stopped, the driver code is no-longer called.
Do remember that remote CPU running veth_poll call, will (re)start the
TXQ again via qdisc layer, which call veth driver code again, e.g. race
to fill ptr_ring again and that will stop TXQ again. (Sysadm help: These
full/TXQ-stop events will be recorded in "requeues" counter by qdisc
stats).  The remote CPU running NAPI is in a fairly tight loop, so it
will do it's best to empty the queue, and have a total budget of 300.
The race is still very unlikely, but it is a race, that would stop the
TXQ forever for the veth device (we don't recover).


>> And the other side veth_poll(), have a smp_store_mb() before reading
>> ptr_ring.
>>
>> --Jesper
>>
>> p.s.
>> I actually had an alternative implementation of this, that only calls
>> stop when it is needed.  See below, it kind of looks prettier, but it
>> adds an extra memory barrier in the likely path. (And I'm not sure if
>> read memory barrier is strong enough).
> 
> Not sure that works either :S


      reply	other threads:[~2025-04-25 13:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 12:56 [PATCH net-next V6 0/2] veth: qdisc backpressure and qdisc check refactor Jesper Dangaard Brouer
2025-04-24 12:56 ` [PATCH net-next V6 1/2] net: sched: generalize check for no-queue qdisc on TX queue Jesper Dangaard Brouer
2025-04-24 12:56 ` [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer
2025-04-24 14:23   ` Jakub Kicinski
2025-04-24 15:24     ` Jesper Dangaard Brouer
2025-04-24 15:53       ` Jakub Kicinski
2025-04-25 13:55         ` Jesper Dangaard Brouer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d36cb5a0-902c-4de5-bdd2-cbf9e1b1c7b1@kernel.org \
    --to=hawk@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=phil@nwl.cc \
    --cc=toke@toke.dk \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).