From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DBBD2F49F0; Wed, 11 Feb 2026 13:45:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770817510; cv=none; b=RAaulUui8OGu5dSsrg64NfHumOIlTfs4FFITjKV3rduZeqA5/gN2yqcIJoUkBr/uKsmxfh9u+/0zOIcd/zKkkWP41p+TTih7454t715SHPdX9UYu0vHdVjv+UE3cQwwpqmDaMVtw1DvgSO5NoXws4VslDX3dNJFFAAcv7SWCakk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770817510; c=relaxed/simple; bh=/Q5p4Lscy9lUSPNGsN8vo3iiOEj7o8Bpkr6wiO5IUvs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=VpaYiGdIDz7vtnKZIRVTD6Rp/bih030BN5uvPd5zMfDTiSGb2Ehlp4XPUrAConInZlcidquAJ3jjEL0d9+3GijMiNPH1t2A6r22rFhcY8391Iqw29TRihdoVMy5LUBAgjTPV2Z0pI+HWyWQoP3jrfdVPLeLkxYO0cNIUxIsOgEc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=EBKs6+gY; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="EBKs6+gY" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JB6k4/RrNjPlMllxDC6Oj8qJK0lUnmvgjQsV5ZUjKFo=; b=EBKs6+gY4blppUVhmp6BItThEj 8DCdjedjl+MUJ2PXqR5G+6YfLlTD231eAyZ7e6SwT/JAiRxNzpbwNsrSo2V7lcK6zQ9VAmxOkLtrG V5FFQwzO4pOrG7P/r+Rn8lE0OWGVHKQ4DMt7sc1tNlJcrC6pFqoYourFpg2xBSkyBOrseYV9Llep3 JEMRPRvSDsau0Jl2UqJyO2K5G+7yWmfmTBicoV09Kkp/ETAK5btDJvElb3iLuUi58fSQaIg0/BBDH BL9oDqBQ0k+u+7uWJxrIzBbHLfmqkrvua7Jx+XGwjpOJfsFACphRBMvZCQ20MWp+bKOxLUPyqQo8F 8L78mVpg==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.94.2) (envelope-from ) id 1vqAWU-00B4bR-Sx; Wed, 11 Feb 2026 13:44:55 +0000 Date: Wed, 11 Feb 2026 05:44:48 -0800 From: Breno Leitao To: Tariq Toukan Cc: Saeed Mahameed , Tariq Toukan , Mark Bloch , Leon Romanovsky , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Amir Vadai , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, dcostantino@meta.com, rneu@meta.com, kernel-team@meta.com Subject: Re: [PATCH net] net/mlx5e: Skip NAPI polling when PCI channel is offline Message-ID: References: <20260209-mlx5_iommu-v1-1-b17ae501aeb2@debian.org> <09a77964-37bf-4b3c-bfa9-8939eb7761ab@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <09a77964-37bf-4b3c-bfa9-8939eb7761ab@gmail.com> X-Debian-User: leitao Hello Tariq, On Wed, Feb 11, 2026 at 01:26:35PM +0200, Tariq Toukan wrote: > On 09/02/2026 20:01, Breno Leitao wrote: > > When a PCI error (e.g. AER error or DPC containment) marks the PCI > > channel as frozen or permanently failed, the IOMMU mappings for the > > device may already be torn down. If mlx5e_napi_poll() continues > > processing CQEs in this state, every call to dma_unmap_page() triggers > > a WARN_ON in iommu_dma_unmap_phys(). > > > > In a real-world crash scenario on an NVIDIA Grace (ARM64) platform, > > a DPC event froze the PCI channel and the mlx5 NAPI poll continued > > processing error CQEs, calling dma_unmap for each pending WQE. Here is > > an example: > > > > The DPC event on port 0007:00:00.0 fires and eth1 (on 0017:01:00.0) starts > > seeing error CQEs almost immediately: > > > > pcieport 0007:00:00.0: DPC: containment event, status:0x2009 > > mlx5_core 0017:01:00.0 eth1: Error cqe on cqn 0x54e, ci 0xb06, ... > > > > The WARN_ON storm begins ~0.4s later and repeats for every pending WQE: > > > > WARNING: CPU: 32 PID: 0 at drivers/iommu/dma-iommu.c:1237 iommu_dma_unmap_phys > > Call trace: > > iommu_dma_unmap_phys+0xd4/0xe0 > > mlx5e_tx_wi_dma_unmap+0xb4/0xf0 > > mlx5e_poll_tx_cq+0x14c/0x438 > > mlx5e_napi_poll+0x6c/0x5e0 > > net_rx_action+0x160/0x5c0 > > handle_softirqs+0xe8/0x320 > > run_ksoftirqd+0x30/0x58 > > > > After 23 seconds of WARN_ON() storm, the watchdog fires: > > > > watchdog: BUG: soft lockup - CPU#32 stuck for 23s! [ksoftirqd/32:179] > > Kernel panic - not syncing: softlockup: hung tasks > > > > Each unmap hit the WARN_ON in the IOMMU layer, printing a full stack > > trace. With dozens of pending WQEs, this created a storm of WARN_ON > > dumps in softirq context that monopolized the CPU for over 23 seconds, > > triggering a soft lockup panic. ... > You're introducing an interesting problem, but I am not convinced by this > solution approach. > > Why would the driver perform this check if it doesn't guarantee prevention > of invalid access? It only "allows one napi cycle", which happen to be good > enough to prevent the soft lockup in your case. > > What if a napi cycle is configured with larger budget? Very good point. In this case, we will still see some WARN_ON() in DMA, and the patch might eventually not help much if the AER hits mid-NAPI and there is still a long budget remaining. > If the problem is that the WARN_ON is being called at a high rate, then it > should be rate-limited. That would be a solution as well, and I am happy to pursue it, if that one is more appropriate Thanks for reviewing it, --breno