Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs

BPF List
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <hawk@kernel.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	netdev@vger.kernel.org, makita.toshiaki@lab.ntt.co.jp
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	ihor.solodrai@linux.dev, toshiaki.makita1@gmail.com,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@cloudflare.com
Subject: Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs
Date: Mon, 27 Oct 2025 17:18:43 +0100	[thread overview]
Message-ID: <c8ea44e0-e8f5-4356-af80-efa7f56921df@kernel.org> (raw)
In-Reply-To: <87v7k0e8qz.fsf@toke.dk>



On 27/10/2025 15.09, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <hawk@kernel.org> writes:
> 
>> On 24/10/2025 15.39, Toke Høiland-Jørgensen wrote:
>>> Jesper Dangaard Brouer <hawk@kernel.org> writes:
>>>
>>>> The changes introduced in commit dc82a33297fc ("veth: apply qdisc
>>>> backpressure on full ptr_ring to reduce TX drops") have been found to cause
>>>> a race condition in production environments.
>>>>
>>>> Under specific circumstances, observed exclusively on ARM64 (aarch64)
>>>> systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
>>>> permanently stalled. This happens when the race condition leads to the TXQ
>>>> entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
>>>> preventing the attached qdisc from dequeueing packets and causing the
>>>> network link to halt.
>>>>
>>>> As a first step towards resolving this issue, this patch introduces a
>>>> failsafe mechanism. It enables the net device watchdog by setting a timeout
>>>> value and implements the .ndo_tx_timeout callback.
>>>>
>>>> If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
>>>> which logs a warning and calls netif_tx_wake_queue() to unstall the queue
>>>> and allow traffic to resume.
>>>>
>>>> The log message will look like this:
>>>>
>>>>    veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
>>>>    veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
>>>>
>>>> This provides a necessary recovery mechanism while the underlying race
>>>> condition is investigated further. Subsequent patches will address the root
>>>> cause and add more robust state handling in ndo_open/ndo_stop.
>>>>
>>>> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
>>>> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>>> ---
>>>>    drivers/net/veth.c |   16 +++++++++++++++-
>>>>    1 file changed, 15 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>>> index a3046142cb8e..7b1a9805b270 100644
>>>> --- a/drivers/net/veth.c
>>>> +++ b/drivers/net/veth.c
>>>> @@ -959,8 +959,10 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
>>>>    	rq->stats.vs.xdp_packets += done;
>>>>    	u64_stats_update_end(&rq->stats.syncp);
>>>>    
>>>> -	if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq)))
>>>> +	if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq))) {
>>>> +		txq_trans_cond_update(peer_txq);
>>>>    		netif_tx_wake_queue(peer_txq);
>>>> +	}
>>>
>>> Hmm, seems a bit weird that this call to txq_trans_cond_update() is only
>>> in veth_xdp_recv(). Shouldn't there (also?) be one in veth_xmit()?
>>>
>>
>> The veth_xmit() call (indirectly) *do* update the txq_trans start
>> timestamp, but only for return code NET_RX_SUCCESS / NETDEV_TX_OK.
>> As .ndo_start_xmit = veth_xmit and netdev_start_xmit[1] will call
>> txq_trans_update on NETDEV_TX_OK.
> 
> Ah, right; didn't think of checking the caller, thanks for the pointer :)
> 
>> This call to txq_trans_cond_update() isn't strictly necessary, as
>> veth_xmit() call will update it later, and the netif_tx_stop_queue()
>> call also updates trans_start.
>>
>> I primarily added it because other drivers that use BQL have their
>> helper functions update txq_trans.  As I see the veth implementation as
>> a simplified BQL, that we hopefully can extend to become more dynamic
>> like BQL.
>>
>> Do you prefer that I remove this?  (call to txq_trans_cond_update)
> 
> Hmm, don't we need it for the XDP path? I.e., if there's no traffic
> other than XDP_REDIRECT traffic, ndo_start_xmit() will not get called,
> so we need some way other to keep the watchdog from firing, I think?
> 

Yes, perhaps you are right.  Even-though the stop call
netif_tx_stop_queue() also updates the txq_trans start, then with XDP
redirect something else can keep the ptr_ring full.  The
netif_tx_wake_queue() call doesn't update txq_trans itself (it depend on
a successful netstack packet).  So, without this txq_trans update it can
get very old (out-of-date) if we starve normal network stack packets.
  I'm not 100% sure this will trigger a watchdog even, as the queue
stopped bit should have been cleared.  It is might worth keeping to
avoid it gets too much out-of-date due to XDP traffic.

--Jesper

next prev parent reply	other threads:[~2025-10-27 16:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-23 14:59 [PATCH net V1 0/3] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-10-23 14:59 ` [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-10-24 13:39   ` Toke Høiland-Jørgensen
2025-10-27 11:41     ` Jesper Dangaard Brouer
2025-10-27 14:09       ` Toke Høiland-Jørgensen
2025-10-27 16:18         ` Jesper Dangaard Brouer [this message]
2025-10-23 14:59 ` [PATCH net V1 2/3] veth: stop and start all TX queue in netdev down/up Jesper Dangaard Brouer
2025-10-25  0:54   ` Jakub Kicinski
2025-10-27 10:33     ` Jesper Dangaard Brouer
2025-10-23 14:59 ` [PATCH net V1 3/3] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-10-24 14:33   ` Toke Høiland-Jørgensen
2025-10-27 12:19     ` Jesper Dangaard Brouer
2025-10-27 14:12       ` Toke Høiland-Jørgensen
2025-10-27 19:22   ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8ea44e0-e8f5-4356-af80-efa7f56921df@kernel.org \
    --to=hawk@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=toshiaki.makita1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox