From: Breno Leitao <leitao@debian.org>
To: Tariq Toukan <ttoukan.linux@gmail.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Amir Vadai <amirv@mellanox.com>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, dcostantino@meta.com,
rneu@meta.com, kernel-team@meta.com
Subject: Re: [PATCH net] net/mlx5e: Skip NAPI polling when PCI channel is offline
Date: Wed, 11 Feb 2026 05:44:48 -0800 [thread overview]
Message-ID: <aYyHNGBPu0dEIEzS@gmail.com> (raw)
In-Reply-To: <09a77964-37bf-4b3c-bfa9-8939eb7761ab@gmail.com>
Hello Tariq,
On Wed, Feb 11, 2026 at 01:26:35PM +0200, Tariq Toukan wrote:
> On 09/02/2026 20:01, Breno Leitao wrote:
> > When a PCI error (e.g. AER error or DPC containment) marks the PCI
> > channel as frozen or permanently failed, the IOMMU mappings for the
> > device may already be torn down. If mlx5e_napi_poll() continues
> > processing CQEs in this state, every call to dma_unmap_page() triggers
> > a WARN_ON in iommu_dma_unmap_phys().
> >
> > In a real-world crash scenario on an NVIDIA Grace (ARM64) platform,
> > a DPC event froze the PCI channel and the mlx5 NAPI poll continued
> > processing error CQEs, calling dma_unmap for each pending WQE. Here is
> > an example:
> >
> > The DPC event on port 0007:00:00.0 fires and eth1 (on 0017:01:00.0) starts
> > seeing error CQEs almost immediately:
> >
> > pcieport 0007:00:00.0: DPC: containment event, status:0x2009
> > mlx5_core 0017:01:00.0 eth1: Error cqe on cqn 0x54e, ci 0xb06, ...
> >
> > The WARN_ON storm begins ~0.4s later and repeats for every pending WQE:
> >
> > WARNING: CPU: 32 PID: 0 at drivers/iommu/dma-iommu.c:1237 iommu_dma_unmap_phys
> > Call trace:
> > iommu_dma_unmap_phys+0xd4/0xe0
> > mlx5e_tx_wi_dma_unmap+0xb4/0xf0
> > mlx5e_poll_tx_cq+0x14c/0x438
> > mlx5e_napi_poll+0x6c/0x5e0
> > net_rx_action+0x160/0x5c0
> > handle_softirqs+0xe8/0x320
> > run_ksoftirqd+0x30/0x58
> >
> > After 23 seconds of WARN_ON() storm, the watchdog fires:
> >
> > watchdog: BUG: soft lockup - CPU#32 stuck for 23s! [ksoftirqd/32:179]
> > Kernel panic - not syncing: softlockup: hung tasks
> >
> > Each unmap hit the WARN_ON in the IOMMU layer, printing a full stack
> > trace. With dozens of pending WQEs, this created a storm of WARN_ON
> > dumps in softirq context that monopolized the CPU for over 23 seconds,
> > triggering a soft lockup panic.
...
> You're introducing an interesting problem, but I am not convinced by this
> solution approach.
>
> Why would the driver perform this check if it doesn't guarantee prevention
> of invalid access? It only "allows one napi cycle", which happen to be good
> enough to prevent the soft lockup in your case.
>
> What if a napi cycle is configured with larger budget?
Very good point. In this case, we will still see some WARN_ON() in DMA, and the
patch might eventually not help much if the AER hits mid-NAPI and there is
still a long budget remaining.
> If the problem is that the WARN_ON is being called at a high rate, then it
> should be rate-limited.
That would be a solution as well, and I am happy to pursue it, if that one is
more appropriate
Thanks for reviewing it,
--breno
next prev parent reply other threads:[~2026-02-11 13:45 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-09 18:01 [PATCH net] net/mlx5e: Skip NAPI polling when PCI channel is offline Breno Leitao
2026-02-10 2:19 ` Jijie Shao
2026-02-10 15:18 ` Breno Leitao
2026-02-11 1:42 ` Jijie Shao
2026-02-11 11:26 ` Tariq Toukan
2026-02-11 13:44 ` Breno Leitao [this message]
2026-02-11 15:17 ` Breno Leitao
2026-02-11 15:27 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYyHNGBPu0dEIEzS@gmail.com \
--to=leitao@debian.org \
--cc=amirv@mellanox.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dcostantino@meta.com \
--cc=edumazet@google.com \
--cc=kernel-team@meta.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rneu@meta.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=ttoukan.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.