bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: netdev@vger.kernel.org, "Toke Høiland-Jørgensen" <toke@toke.dk>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Paolo Abeni" <pabeni@redhat.com>,
	ihor.solodrai@linux.dev, "Michael S. Tsirkin" <mst@redhat.com>,
	makita.toshiaki@lab.ntt.co.jp, toshiaki.makita1@gmail.com,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kernel-team@cloudflare.com
Subject: Re: [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs
Date: Fri, 7 Nov 2025 17:54:45 -0800	[thread overview]
Message-ID: <20251107175445.58eba452@kernel.org> (raw)
In-Reply-To: <b9f01e64-f7cc-4f5a-9716-5767b37e2245@kernel.org>

On Fri, 7 Nov 2025 14:42:58 +0100 Jesper Dangaard Brouer wrote:
> > I think this belongs in net-next.. Fail safe is not really a bug fix.
> > I'm slightly worried we're missing a corner case and will cause
> > timeouts to get printed for someone's config.
> 
> This is a recovery fix.  If the race condition fix isn't 100% then this
> patch will allow veth to recover.  Thus, to me it makes sense to group
> these two patches together.
> 
> I'm more worried that we we're missing a corner case that we cannot
> recover from. Than triggering timeouts to get printed, for a config
> where NAPI consumer veth_poll() takes more that 5 seconds to run (budget
> max 64 packets this needs to consume packets at a rate less than 12.8
> pps). It might be good to get some warnings if the system is operating
> this slow.
> 
> Also remember this is not the default config that most people use.
> The code is only activated if attaching a qdisc to veth, which isn't
> default. Plus, NAPI mode need to be activated, where in normal NAPI mode
> the producer and consumer usually runs on the same CPU, which makes it
> impossible to overflow the ptr_ring.  The veth backpressure is primarily
> needed when running with threaded-NAPI, where it is natural that
> producer and consumer runs on different CPUs. In our production setup
> the consumer is always slower than the producer (as the product inside
> the namespace have installed too many nftables rules).

I understand all of this, but IMO the fix is in patch 2.
This is a resiliency improvement, not a fix.

> >> +static void veth_tx_timeout(struct net_device *dev, unsigned int txqueue)
> >> +{
> >> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
> >> +
> >> +	netdev_err(dev, "veth backpressure stalled(n:%ld) TXQ(%u) re-enable\n",
> >> +		   atomic_long_read(&txq->trans_timeout), txqueue);  
> > 
> > If you think the trans_timeout is useful, let's add it to the message
> > core prints? And then we can make this msg just veth specific, I don't
> > think we should be repeating what core already printed.  
> 
> The trans_timeout is a counter for how many times this TXQ have seen a
> timeout.  It is practical as it directly tell us if this a frequent
> event (without having to search log files for similar events).
> 
> It does make sense to add this to the core message ("NETDEV WATCHDOG")
> with the same argument.  For physical NICs these logs are present in
> production. Looking at logs through Kibana (right now) and it would make
> my life easier to see the number of times the individual queues have
> experienced timeouts.  The logs naturally gets spaced in time by the
> timeout, making it harder to tell the even frequency. Such a patch would
> naturally go though net-next.

Right, I see how it'd be useful. I think it's worth adding in the core.

> Do you still want me to remove the frequency counter from this message?
> By the same argument it is practical for me to have as a single log line
> when troubleshooting this in practice. 

IOW it makes it easier to query logs for veth timeouts vs non-veth
timeouts? I'm tempted to suggest adding driver name to the logs in
the core :) but it's fine, I'm already asking you to add the timeout
count in the core.

I'm just trying to make sure everyone can benefit from the good ideas,
rather than hiding them in one driver.

> BTW, I've already backported this watchdog patch to prod kernel
> (without race fix) and I'll try to reproduce the race in staging/lab
> on some ARM64 servers.  If I reproduce it will be practical to have
> this counter.

  reply	other threads:[~2025-11-08  1:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-05 17:28 [PATCH net V3 0/2] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-11-05 17:28 ` [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-11-07  1:29   ` Jakub Kicinski
2025-11-07 13:42     ` Jesper Dangaard Brouer
2025-11-08  1:54       ` Jakub Kicinski [this message]
2025-11-05 17:28 ` [PATCH net V3 2/2] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-11-06 14:14   ` Toshiaki Makita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251107175445.58eba452@kernel.org \
    --to=kuba@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@toke.dk \
    --cc=toshiaki.makita1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).