All of lore.kernel.org
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: Jijie Shao <shaojijie@huawei.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>,  Mark Bloch <mbloch@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	 Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>, Amir Vadai <amirv@mellanox.com>,
	netdev@vger.kernel.org,  linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, dcostantino@meta.com,
	 rneu@meta.com, kernel-team@meta.com
Subject: Re: [PATCH net] net/mlx5e: Skip NAPI polling when PCI channel is offline
Date: Tue, 10 Feb 2026 07:18:43 -0800	[thread overview]
Message-ID: <aYtIrl01U0uHo7RP@gmail.com> (raw)
In-Reply-To: <49fe0af5-7dcf-42e0-bd73-0bd42c067d26@huawei.com>

On Tue, Feb 10, 2026 at 10:19:46AM +0800, Jijie Shao wrote:
> 
> on 2026/2/10 2:01, Breno Leitao wrote:
> > When a PCI error (e.g. AER error or DPC containment) marks the PCI
> > channel as frozen or permanently failed, the IOMMU mappings for the
> > device may already be torn down. If mlx5e_napi_poll() continues
> > processing CQEs in this state, every call to dma_unmap_page() triggers
> > a WARN_ON in iommu_dma_unmap_phys().
> 
> Hi:
>   My comment has nothing to do with the changes made in this patch itself.
> 
> 
> I am more interested in this error itself.
> 1. If there is an issue with dma_unmp, does dma_map in tx have a similar problem?

I suspect that dma_map will succeed in such a case (when the DMA maps are
gone). 

dma_map_single/dma_map_page creates new page table entries — it doesn't look up
existing ones. Even if existing mappings are gone, new mappings succeed !?

I haven't seen this instance on the TX path as well.

> 2. Can this error be detected by mlx5_pci_err_detected()? If not, does this mean that all PCIe NIC drivers might have similar issues?

mlx5_pci_err_detected() is called for the device under DPC — that's not the
issue.

From my naive view, the issue seems to be timing: there's a potential race
between DPC setting the PCI channel to frozen and the error handler completing
(which eventually calls napi_disable_locked).

During that window, NAPI poll can still fire and process CQEs, triggering the
dma_unmap WARN_ON storm, and crash.

>    Do other drivers need to do similar checks?

I really don't know, honestly. Are other drivers solving the problem
differently?! 

Thanks for the question,
--breno

  reply	other threads:[~2026-02-10 15:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09 18:01 [PATCH net] net/mlx5e: Skip NAPI polling when PCI channel is offline Breno Leitao
2026-02-10  2:19 ` Jijie Shao
2026-02-10 15:18   ` Breno Leitao [this message]
2026-02-11  1:42     ` Jijie Shao
2026-02-11 11:26 ` Tariq Toukan
2026-02-11 13:44   ` Breno Leitao
2026-02-11 15:17     ` Breno Leitao
2026-02-11 15:27       ` Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYtIrl01U0uHo7RP@gmail.com \
    --to=leitao@debian.org \
    --cc=amirv@mellanox.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dcostantino@meta.com \
    --cc=edumazet@google.com \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rneu@meta.com \
    --cc=saeedm@nvidia.com \
    --cc=shaojijie@huawei.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.