From: sashiko-bot@kernel.org
To: "Nicolin Chen" <nicolinc@nvidia.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 11/18] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out
Date: Fri, 03 Jul 2026 04:39:56 +0000 [thread overview]
Message-ID: <20260703043956.A290E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <2ce44672e51a136c220d9c5f325ea20d850a249e.1783044582.git.nicolinc@nvidia.com>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Sleeping in atomic context during Service Failure Mode (SFM) error handling.
- [High] Race condition in teardown leading to UAF / NULL pointer dereference in the threaded ISR.
--
commit 3f49807deb6de6a8ccbe2ea880ec3f8f51819f03
Author: Nicolin Chen <nicolinc@nvidia.com>
iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out
This patch adds a synchronous mechanism to drain pending CMDQ_ERRs when a
CMD_SYNC times out. It introduces cmdq_err_handler callbacks for both the
standard ARM SMMU-v3 command queue and the Tegra241 VCMDQ to prevent missing
ATC_INV timeouts if the generic ISR is delayed.
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c6e3d1be23403..4b4e8108d5944 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
[ ... ]
> @@ -2298,6 +2307,26 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>
> static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
>
> +/* Drain a pending CMDQ_ERR, used by arm_smmu_cmdq_issue_cmdlist() */
> +static void arm_smmu_cmdq_err_handler(struct arm_smmu_device *smmu,
> + struct arm_smmu_cmdq *cmdq)
> +{
> + u32 gerror, gerrorn;
> +
> + guard(raw_spinlock_irqsave)(&cmdq->cmdq_err_lock);
> +
> + gerror = readl_relaxed(smmu->base + ARM_SMMU_GERROR);
> + gerrorn = readl_relaxed(smmu->base + ARM_SMMU_GERRORN);
> +
> + if (!((gerror ^ gerrorn) & GERROR_CMDQ_ERR))
> + return;
> +
> + __arm_smmu_cmdq_skip_err(smmu, cmdq);
> +
> + /* Toggle only the CMDQ_ERR bit; other bits are left for the ISR. */
> + writel(gerrorn ^ GERROR_CMDQ_ERR, smmu->base + ARM_SMMU_GERRORN);
> +}
> +
> static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
> {
[Severity: High]
This isn't a bug introduced by this patch, but could this hardirq handler
cause a scheduling while atomic bug when handling SFM errors?
Looking at arm_smmu_gerror_handler(), when the SMMU hardware reports a Service
Failure Mode (GERROR_SFM_ERR), it calls arm_smmu_device_disable(smmu).
This in turn calls arm_smmu_write_reg_sync(), which performs a polling loop:
return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val,
1, ARM_SMMU_POLL_TIMEOUT_US);
Because readl_relaxed_poll_timeout() uses a non-zero delay (1us), it
internally uses usleep_range() which can sleep.
Since arm_smmu_gerror_handler() executes in hardirq context, will this sleep
lead to a kernel panic?
> u32 gerror, gerrorn, active;
[ ... ]
> diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> index 666dd23b0c7ca..628a3a7cc0335 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> @@ -337,6 +337,28 @@ static void tegra241_vintf0_handle_error(struct tegra241_vintf *vintf)
> }
> }
>
> +/* Drain a pending CMDQ_ERR, used by arm_smmu_cmdq_issue_cmdlist() */
> +static void tegra241_vcmdq_handle_cmdq_err(struct arm_smmu_device *smmu,
> + struct arm_smmu_cmdq *cmdq)
> +{
> + struct tegra241_vcmdq *vcmdq =
> + container_of(cmdq, struct tegra241_vcmdq, cmdq);
> + u32 gerror, gerrorn;
> +
> + guard(raw_spinlock_irqsave)(&cmdq->cmdq_err_lock);
> +
> + gerror = readl_relaxed(REG_VCMDQ_PAGE0(vcmdq, GERROR));
> + gerrorn = readl_relaxed(REG_VCMDQ_PAGE0(vcmdq, GERRORN));
> +
> + if (!((gerror ^ gerrorn) & GERROR_CMDQ_ERR))
> + return;
> +
> + __arm_smmu_cmdq_skip_err(smmu, cmdq);
> +
> + /* Toggle only the CMDQ_ERR bit on this VCMDQ's GERRORN */
> + writel(gerrorn ^ GERROR_CMDQ_ERR, REG_VCMDQ_PAGE0(vcmdq, GERRORN));
> +}
> +
> static irqreturn_t tegra241_cmdqv_isr(int irq, void *devid)
> {
[Severity: High]
This isn't a bug introduced by this patch, but is there a race condition
in teardown that could lead to a NULL pointer dereference or use-after-free
in this ISR?
In tegra241_cmdqv_remove(), the driver appears to free vintf objects and
nullify pointers before freeing the interrupt:
tegra241_cmdqv_remove_vintf(cmdqv, idx);
...
free_irq(cmdqv->irq, cmdqv);
Since tegra241_cmdqv_isr() is a threaded IRQ handler, if a concurrent or
pending interrupt wakes the IRQ thread during this teardown window, could it
pass a NULL or freed cmdqv->vintfs[0] to tegra241_vintf0_handle_error()?
The tegra241_vintf0_handle_error() function immediately dereferences the
pointer:
u64 map = readq_relaxed(REG_VINTF(vintf, LVCMDQ_ERR_MAP_64(i)));
Could this cause a kernel panic during device unbind or removal?
> struct tegra241_cmdqv *cmdqv = (struct tegra241_cmdqv *)devid;
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1783044582.git.nicolinc@nvidia.com?part=11
next prev parent reply other threads:[~2026-07-03 4:39 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 4:06 [PATCH v5 00/18] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-07-03 4:06 ` [PATCH v5 01/18] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-07-03 4:27 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 02/18] PCI/CXL: Probe the underlying bus reset in cxl_reset_bus_function() Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 03/18] PCI: Propagate FLR return values to callers Nicolin Chen
2026-07-03 4:25 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 04/18] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-07-03 4:24 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 05/18] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-07-03 4:27 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 06/18] iommu/arm-smmu-v3: Don't rb_erase() a never-inserted stream node Nicolin Chen
2026-07-03 4:25 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 07/18] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-07-03 4:26 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 08/18] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 09/18] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-07-03 4:32 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 10/18] iommu/arm-smmu-v3: Recheck CMDQ_ERR in tegra241_vintf0_handle_error() Nicolin Chen
2026-07-03 4:40 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 11/18] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-07-03 4:39 ` sashiko-bot [this message]
2026-07-03 4:06 ` [PATCH v5 12/18] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-07-03 4:22 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 13/18] iommu/arm-smmu-v3: Add streams_lock for atomic-context SID->master lookup Nicolin Chen
2026-07-03 4:26 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 14/18] iommu/arm-smmu-v3: Add has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 15/18] iommu/arm-smmu-v3: Add INV_TYPE_ATS_BROKEN to skip quarantined ATS masters Nicolin Chen
2026-07-03 4:34 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 16/18] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 17/18] iommu/arm-smmu-v3: Thread arm_smmu_master_domain on a per-master list Nicolin Chen
2026-07-03 4:32 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 18/18] iommu/arm-smmu-v3: Block ATS for a master upon an ATC invalidation timeout Nicolin Chen
2026-07-03 4:36 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703043956.A290E1F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox