From: sashiko-bot@kernel.org
To: "Nicolin Chen" <nicolinc@nvidia.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 10/18] iommu/arm-smmu-v3: Recheck CMDQ_ERR in tegra241_vintf0_handle_error()
Date: Fri, 03 Jul 2026 04:40:02 +0000 [thread overview]
Message-ID: <20260703044002.9FB9E1F00A3E@smtp.kernel.org> (raw)
In-Reply-To: <2f83aa2157ea08642f7f55c5dd3fb1c66e883c91.1783044582.git.nicolinc@nvidia.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues:
- [High] Race condition between teardown paths and the threaded ISR leading to a NULL pointer dereference.
- [High] Missing index offset for the second 64-bit LVCMDQ error map causes incorrect array access and an unhandled IRQ storm.
- [High] Premature IRQ registration allows the ISR to fire before core data structures are allocated, leading to an uninitialized pointer dereference.
--
commit 24d0e50ef454f78b71fd6e45f50c5e36ef3cc631
Author: Nicolin Chen <nicolinc@nvidia.com>
iommu/arm-smmu-v3: Recheck CMDQ_ERR in tegra241_vintf0_handle_error()
This patch introduces locking when reading and acknowledging the
GERROR/GERRORN pair in both tegra241_vintf0_handle_error() and
tegra241_vcmdq_hw_deinit(). It ensures that concurrent acknowledgments
do not skip errors on already-handled errors. It also properly
initializes the cmdq_err_lock at allocation.
> diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> index 9012ab584d1dd..666dd23b0c7ca 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> @@ -319,10 +319,19 @@ static void tegra241_vintf0_handle_error(struct tegra241_vintf *vintf)
> while (map) {
> unsigned long lidx = __ffs64(map);
> struct tegra241_vcmdq *vcmdq = vintf->lvcmdqs[lidx];
[Severity: High]
This isn't a bug introduced by this patch, but does this code compute the
correct array index for logical VCMDQs when the outer loop index i is
greater than 0?
Since this loop processes LVCMDQ_ERR_MAP_64(i), if an error occurs on a logical
VCMDQ with an index >= 64, the hardware sets the corresponding bit in the
second 64-bit error map. By computing lidx purely as __ffs64(map) and omitting
an (i * 64) offset, wouldn't it erroneously access an index in the 0..63 range
instead of the correct queue? Could this fail to acknowledge the actual queue's
hardware error and lead to an unhandled IRQ storm?
[Severity: High]
This is a pre-existing issue, but is it possible for a race condition to occur
between the teardown paths and this threaded ISR?
If the driver is removed (e.g., via rmmod) and tegra241_cmdqv_remove() initiates
teardown and nullifies the vintf->lvcmdqs[lidx] pointers before free_irq() is
called, could this concurrently active ISR thread read the hardware map and
dereference the newly nullified vintf->lvcmdqs[lidx] without holding any locks?
> - u32 gerror = readl_relaxed(REG_VCMDQ_PAGE0(vcmdq, GERROR));
> + struct arm_smmu_cmdq *cmdq = &vcmdq->cmdq;
> + unsigned long flags;
> + u32 gerror, gerrorn;
[ ... ]
I also noticed another pre-existing issue outside the scope of this diff:
drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c:__tegra241_cmdqv_probe() {
...
ret = request_threaded_irq(irq, NULL, tegra241_cmdqv_isr,
IRQF_ONESHOT, "tegra241-cmdqv",
cmdqv);
...
}
[Severity: High]
This is a pre-existing issue, but is there a race condition here where the IRQ
handler is registered before core data structures are fully initialized?
If request_threaded_irq() is called while cmdqv->vintfs is still an
uninitialized pointer (e.g. following devm_krealloc without __GFP_ZERO), could a
latched interrupt cause the ISR to fire instantly and dereference
cmdqv->vintfs[0] in tegra241_cmdqv_isr() before it is safely initialized?
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1783044582.git.nicolinc@nvidia.com?part=10
next prev parent reply other threads:[~2026-07-03 4:40 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 4:06 [PATCH v5 00/18] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-07-03 4:06 ` [PATCH v5 01/18] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-07-03 4:27 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 02/18] PCI/CXL: Probe the underlying bus reset in cxl_reset_bus_function() Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 03/18] PCI: Propagate FLR return values to callers Nicolin Chen
2026-07-03 4:25 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 04/18] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-07-03 4:24 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 05/18] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-07-03 4:27 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 06/18] iommu/arm-smmu-v3: Don't rb_erase() a never-inserted stream node Nicolin Chen
2026-07-03 4:25 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 07/18] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-07-03 4:26 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 08/18] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 09/18] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-07-03 4:32 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 10/18] iommu/arm-smmu-v3: Recheck CMDQ_ERR in tegra241_vintf0_handle_error() Nicolin Chen
2026-07-03 4:40 ` sashiko-bot [this message]
2026-07-03 4:06 ` [PATCH v5 11/18] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-07-03 4:39 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 12/18] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-07-03 4:22 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 13/18] iommu/arm-smmu-v3: Add streams_lock for atomic-context SID->master lookup Nicolin Chen
2026-07-03 4:26 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 14/18] iommu/arm-smmu-v3: Add has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 15/18] iommu/arm-smmu-v3: Add INV_TYPE_ATS_BROKEN to skip quarantined ATS masters Nicolin Chen
2026-07-03 4:34 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 16/18] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Nicolin Chen
2026-07-03 4:29 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 17/18] iommu/arm-smmu-v3: Thread arm_smmu_master_domain on a per-master list Nicolin Chen
2026-07-03 4:32 ` sashiko-bot
2026-07-03 4:06 ` [PATCH v5 18/18] iommu/arm-smmu-v3: Block ATS for a master upon an ATC invalidation timeout Nicolin Chen
2026-07-03 4:36 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703044002.9FB9E1F00A3E@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox