Re: [PATCH net V1 3/3] veth: more robust handing of race to avoid txq getting stuck

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>,
	netdev@vger.kernel.org, makita.toshiaki@lab.ntt.co.jp
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	ihor.solodrai@linux.dev, toshiaki.makita1@gmail.com,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@cloudflare.com
Subject: Re: [PATCH net V1 3/3] veth: more robust handing of race to avoid txq getting stuck
Date: Mon, 27 Oct 2025 15:12:48 +0100	[thread overview]
Message-ID: <87sef4e8m7.fsf@toke.dk> (raw)
In-Reply-To: <b021f5c3-5105-445d-b919-8282363a19fc@kernel.org>

Jesper Dangaard Brouer <hawk@kernel.org> writes:

> On 24/10/2025 16.33, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <hawk@kernel.org> writes:
>> 
>>> Commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
>>> reduce TX drops") introduced a race condition that can lead to a permanently
>>> stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra
>>> Max).
>>>
>>> The race occurs in veth_xmit(). The producer observes a full ptr_ring and
>>> stops the queue (netif_tx_stop_queue()). The subsequent conditional logic,
>>> intended to re-wake the queue if the consumer had just emptied it (if
>>> (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a
>>> "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and
>>> traffic halts.
>>>
>>> This failure is caused by an incorrect use of the __ptr_ring_empty() API
>>> from the producer side. As noted in kernel comments, this check is not
>>> guaranteed to be correct if a consumer is operating on another CPU. The
>>> empty test is based on ptr_ring->consumer_head, making it reliable only for
>>> the consumer. Using this check from the producer side is fundamentally racy.
>>>
>>> This patch fixes the race by adopting the more robust logic from an earlier
>>> version V4 of the patchset, which always flushed the peer:
>>>
>>> (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier
>>> are removed. Instead, after stopping the queue, we unconditionally call
>>> __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled,
>>> making it solely responsible for re-waking the TXQ.
>> 
>> This makes sense.
>> 
>>> (2) On the consumer side, the logic for waking the peer TXQ is centralized.
>>> It is moved out of veth_xdp_rcv() (which processes a batch) and placed at
>>> the end of the veth_poll() function. This ensures netif_tx_wake_queue() is
>>> called once per complete NAPI poll cycle.
>> 
>> So is this second point strictly necessary to fix the race, or is it
>> more of an optimisation?
>> 
>
> IMHO it is strictly necessary to fix the race.  The wakeup check
> netif_tx_queue_stopped() in veth_poll() needs to be after the code that
> (potentially) writes rx_notify_masked.
>
> This handles the race where veth_xmit() haven't called
> netif_tx_stop_queue() yet, but veth_poll() manage to consume all packets
> and stopped NAPI.  Then we know that __veth_xdp_flush(rq) in veth_xmit()
> will see rx_notify_masked==false and start NAPI/veth_poll() again, and
> even-though there is no packets left to process we still hit the check
> netif_tx_queue_stopped() which start txq and will allow veth_xmit() to
> run again.
>
> I'll see if I can improve the description for (2).

Right, okay. Yes, adding this reasoning to the commit message would be
good :)

-Toke

next prev parent reply	other threads:[~2025-10-27 14:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-23 14:59 [PATCH net V1 0/3] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-10-23 14:59 ` [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-10-24 13:39   ` Toke Høiland-Jørgensen
2025-10-27 11:41     ` Jesper Dangaard Brouer
2025-10-27 14:09       ` Toke Høiland-Jørgensen
2025-10-27 16:18         ` Jesper Dangaard Brouer
2025-10-23 14:59 ` [PATCH net V1 2/3] veth: stop and start all TX queue in netdev down/up Jesper Dangaard Brouer
2025-10-25  0:54   ` Jakub Kicinski
2025-10-27 10:33     ` Jesper Dangaard Brouer
2025-10-23 14:59 ` [PATCH net V1 3/3] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-10-24 14:33   ` Toke Høiland-Jørgensen
2025-10-27 12:19     ` Jesper Dangaard Brouer
2025-10-27 14:12       ` Toke Høiland-Jørgensen [this message]
2025-10-27 19:22   ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sef4e8m7.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toshiaki.makita1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).