From: Toshiaki Makita <toshiaki.makita1@gmail.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
ihor.solodrai@linux.dev, "Michael S. Tsirkin" <mst@redhat.com>,
makita.toshiaki@lab.ntt.co.jp, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kernel-team@cloudflare.com,
"Toke Høiland-Jørgensen" <toke@toke.dk>,
netdev@vger.kernel.org
Subject: Re: [PATCH net V3 2/2] veth: more robust handing of race to avoid txq getting stuck
Date: Thu, 6 Nov 2025 23:14:11 +0900 [thread overview]
Message-ID: <4abd5327-ccb7-4dbc-9b09-e98069312e2f@gmail.com> (raw)
In-Reply-To: <176236369968.30034.1538535221816777531.stgit@firesoul>
On 2025/11/06 2:28, Jesper Dangaard Brouer wrote:
> Commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
> reduce TX drops") introduced a race condition that can lead to a permanently
> stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra
> Max).
>
> The race occurs in veth_xmit(). The producer observes a full ptr_ring and
> stops the queue (netif_tx_stop_queue()). The subsequent conditional logic,
> intended to re-wake the queue if the consumer had just emptied it (if
> (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a
> "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and
> traffic halts.
>
> This failure is caused by an incorrect use of the __ptr_ring_empty() API
> from the producer side. As noted in kernel comments, this check is not
> guaranteed to be correct if a consumer is operating on another CPU. The
> empty test is based on ptr_ring->consumer_head, making it reliable only for
> the consumer. Using this check from the producer side is fundamentally racy.
>
> This patch fixes the race by adopting the more robust logic from an earlier
> version V4 of the patchset, which always flushed the peer:
>
> (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier
> are removed. Instead, after stopping the queue, we unconditionally call
> __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled,
> making it solely responsible for re-waking the TXQ.
> This handles the race where veth_poll() consumes all packets and completes
> NAPI *before* veth_xmit() on the producer side has called netif_tx_stop_queue.
> The __veth_xdp_flush(rq) will observe rx_notify_masked is false and schedule
> NAPI.
>
> (2) On the consumer side, the logic for waking the peer TXQ is moved out of
> veth_xdp_rcv() and placed at the end of the veth_poll() function. This
> placement is part of fixing the race, as the netif_tx_queue_stopped() check
> must occur after rx_notify_masked is potentially set to false during NAPI
> completion.
> This handles the race where veth_poll() consumes all packets, but haven't
> finished (rx_notify_masked is still true). The producer veth_xmit() stops the
> TXQ and __veth_xdp_flush(rq) will observe rx_notify_masked is true, meaning
> not starting NAPI. Then veth_poll() change rx_notify_masked to false and
> stops NAPI. Before exiting veth_poll() will observe TXQ is stopped and wake
> it up.
>
> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
prev parent reply other threads:[~2025-11-06 14:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 17:28 [PATCH net V3 0/2] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-11-05 17:28 ` [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-11-07 1:29 ` Jakub Kicinski
2025-11-07 13:42 ` Jesper Dangaard Brouer
2025-11-08 1:54 ` Jakub Kicinski
2025-11-05 17:28 ` [PATCH net V3 2/2] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-11-06 14:14 ` Toshiaki Makita [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4abd5327-ccb7-4dbc-9b09-e98069312e2f@gmail.com \
--to=toshiaki.makita1@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=hawk@kernel.org \
--cc=ihor.solodrai@linux.dev \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=makita.toshiaki@lab.ntt.co.jp \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=toke@toke.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).