From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, tom@herbertland.com,
"Eric Dumazet" <eric.dumazet@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
"Paolo Abeni" <pabeni@redhat.com>,
"Toke Høiland-Jørgensen" <toke@toke.dk>,
dsahern@kernel.org, makita.toshiaki@lab.ntt.co.jp,
kernel-team@cloudflare.com, phil@nwl.cc
Subject: Re: [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops
Date: Fri, 25 Apr 2025 15:55:52 +0200 [thread overview]
Message-ID: <d36cb5a0-902c-4de5-bdd2-cbf9e1b1c7b1@kernel.org> (raw)
In-Reply-To: <20250424085358.75d817ae@kernel.org>
On 24/04/2025 17.53, Jakub Kicinski wrote:
> On Thu, 24 Apr 2025 17:24:51 +0200 Jesper Dangaard Brouer wrote:
>>> Looks like I wrote a reply to v5 but didn't hit send. But I may have
>>> set v5 to Changes Requested because of it :S Here is my comment:
>>>
>>> I think this is missing a memory barrier. When drivers do this dance
>>> there's usually a barrier between stop and recheck, to make sure the
>>> stop is visible before we check. And vice versa veth_xdp_rcv() needs
>>> to make sure other side sees the "empty" indication before it checks
>>> if the queue is stopped.
>>
>> The call netif_tx_stop_queue(txq); already contains a memory barrier
>> smp_mb__before_atomic() plus an atomic set_bit operation. That should
>> be sufficient.
>
> That barrier is _before_ stopping the queue. I'm saying we need a
> barrier between stop and emptiness re-check. Note that:
> - smp_mb__after_atomic() is enough, and it 'compiles' to nothing
> on x86
I see, I will add a smp_mb__after_atomic() after netif_tx_stop_queue()
and send a V7. I considered an atomic operation a full memory-barrier,
which I guess is correct for x86 (as you say this compiled to nothing),
but I guess other archs need this, so lets add it.
> - all of this is the unlikely path :) You restart the qdisc
> when the ptr ring is completely full so the stopping in absolute
> worst case will happen once or twice per full ptr_ring ?
>
Yes, basically. It should only happen once per full ptr_ring event.
As soon as TXQ is stopped, the driver code is no-longer called.
Do remember that remote CPU running veth_poll call, will (re)start the
TXQ again via qdisc layer, which call veth driver code again, e.g. race
to fill ptr_ring again and that will stop TXQ again. (Sysadm help: These
full/TXQ-stop events will be recorded in "requeues" counter by qdisc
stats). The remote CPU running NAPI is in a fairly tight loop, so it
will do it's best to empty the queue, and have a total budget of 300.
The race is still very unlikely, but it is a race, that would stop the
TXQ forever for the veth device (we don't recover).
>> And the other side veth_poll(), have a smp_store_mb() before reading
>> ptr_ring.
>>
>> --Jesper
>>
>> p.s.
>> I actually had an alternative implementation of this, that only calls
>> stop when it is needed. See below, it kind of looks prettier, but it
>> adds an extra memory barrier in the likely path. (And I'm not sure if
>> read memory barrier is strong enough).
>
> Not sure that works either :S
prev parent reply other threads:[~2025-04-25 13:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 12:56 [PATCH net-next V6 0/2] veth: qdisc backpressure and qdisc check refactor Jesper Dangaard Brouer
2025-04-24 12:56 ` [PATCH net-next V6 1/2] net: sched: generalize check for no-queue qdisc on TX queue Jesper Dangaard Brouer
2025-04-24 12:56 ` [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer
2025-04-24 14:23 ` Jakub Kicinski
2025-04-24 15:24 ` Jesper Dangaard Brouer
2025-04-24 15:53 ` Jakub Kicinski
2025-04-25 13:55 ` Jesper Dangaard Brouer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d36cb5a0-902c-4de5-bdd2-cbf9e1b1c7b1@kernel.org \
--to=hawk@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eric.dumazet@gmail.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=makita.toshiaki@lab.ntt.co.jp \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=phil@nwl.cc \
--cc=toke@toke.dk \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).