linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Toshiaki Makita <toshiaki.makita1@gmail.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	ihor.solodrai@linux.dev, "Michael S. Tsirkin" <mst@redhat.com>,
	makita.toshiaki@lab.ntt.co.jp, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kernel-team@cloudflare.com,
	netdev@vger.kernel.org, "Toke Høiland-Jørgensen" <toke@toke.dk>
Subject: Re: [PATCH net V2 2/2] veth: more robust handing of race to avoid txq getting stuck
Date: Mon, 3 Nov 2025 17:41:37 +0900	[thread overview]
Message-ID: <8b70ba1d-323b-4e76-be7f-9df45b8f53d5@gmail.com> (raw)
In-Reply-To: <e3abd249-f348-4504-b1d9-4b5cd3df5822@kernel.org>

On 2025/10/31 4:06, Jesper Dangaard Brouer wrote:
> On 29/10/2025 16.00, Toshiaki Makita wrote:
>> On 2025/10/29 19:33, Jesper Dangaard Brouer wrote:
>>> On 28/10/2025 15.56, Toshiaki Makita wrote:
>>>> On 2025/10/28 5:05, Jesper Dangaard Brouer wrote:
>>>>> (3) Finally, the NAPI completion check in veth_poll() is updated. If NAPI is
>>>>> about to complete (napi_complete_done), it now also checks if the peer TXQ
>>>>> is stopped. If the ring is empty but the peer TXQ is stopped, NAPI will
>>>>> reschedule itself. This prevents a new race where the producer stops the
>>>>> queue just as the consumer is finishing its poll, ensuring the wakeup is not 
>>>>> missed.
>>>> ...
>>>>
>>>>> @@ -986,7 +979,8 @@ static int veth_poll(struct napi_struct *napi, int budget)
>>>>>       if (done < budget && napi_complete_done(napi, done)) {
>>>>>           /* Write rx_notify_masked before reading ptr_ring */
>>>>>           smp_store_mb(rq->rx_notify_masked, false);
>>>>> -        if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
>>>>> +        if (unlikely(!__ptr_ring_empty(&rq->xdp_ring) ||
>>>>> +                 (peer_txq && netif_tx_queue_stopped(peer_txq)))) {
>>>>
>>>> Not sure if this is necessary.
>>>
>>> How sure are you that this isn't necessary?
>>>
>>>>  From commitlog, your intention seems to be making sure to wake up the queue,
>>>> but you wake up the queue immediately after this hunk in the same function,
>>>> so isn't it guaranteed without scheduling another napi?
>>>>
>>>
>>> The above code catches the case, where the ptr_ring is empty and the
>>> tx_queue is stopped.  It feels wrong not to reach in this case, but you
>>> *might* be right that it isn't strictly necessary, because below code
>>> will also call netif_tx_wake_queue() which *should* have a SKB stored
>>> that will *indirectly* trigger a restart of the NAPI.
>>
>> I'm a bit confused.
>> Wrt (3), what you want is waking up the queue, right?
>> Or, what you want is actually NAPI reschedule itself?
> 
> I want NAPI to reschedule itself, the queue it woken up later close to
> the exit of the function.  Maybe it is unnecessary to for NAPI to
> reschedule itself here... and that is what you are objecting to?
> 
>> My understanding was the former (wake up the queue).
>> If it's correct, (3) seems not necessary because you have already woken up the 
>> queue in the same function.
>>
>> First NAPI
>>   veth_poll()
>>     // ptr_ring_empty() and queue_stopped()
>>    __napi_schedule() ... schedule second NAPI
>>    netif_tx_wake_queue() ... wake up the queue if queue_stopped()
>>
>> Second NAPI
>>   veth_poll()
>>    netif_tx_wake_queue() ... this is what you want,
>>                              but the queue has been woken up in the first NAPI
>>                              What's the point?
>>
> 
> So, yes I agree that there is a potential for restarting NAPI one time
> too many.  But only *potential* because if NAPI is already/still running
> then the producer will not actually start NAPI.
> 
> I guess this is a kind of optimization, to avoid the time it takes to
> restart NAPI. When we see that TXQ is stopped and ptr_ring is empty,
> then we know that a packet will be sitting in the qdisc requeue queue,
> and netif_tx_wake_queue() will very soon fill "produce" a packet into
> ptr_ring (via calling ndo_start_xmit/veth_xmit).

In some cases it may be an optimization but not in every case because it can 
prematurely start NAPI before tx side fills packets?

> As this is a fixes patch I can drop this optimization. It seems both
> Paolo and you thinks this isn't necessary.

I think it's better to drop (3) as a fix.

Toshiaki Makita



  reply	other threads:[~2025-11-03  8:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 20:05 [PATCH net V2 0/2] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-10-27 20:05 ` [PATCH net V2 1/2] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-10-28  9:10   ` Toke Høiland-Jørgensen
2025-10-27 20:05 ` [PATCH net V2 2/2] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-10-28  9:10   ` Toke Høiland-Jørgensen
2025-10-28 14:56   ` Toshiaki Makita
2025-10-29 10:33     ` Jesper Dangaard Brouer
2025-10-29 15:00       ` Toshiaki Makita
2025-10-30 19:06         ` Jesper Dangaard Brouer
2025-11-03  8:41           ` Toshiaki Makita [this message]
2025-10-30 12:28   ` Paolo Abeni
2025-11-05 15:54     ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b70ba1d-323b-4e76-be7f-9df45b8f53d5@gmail.com \
    --to=toshiaki.makita1@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@toke.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).