Re: [PATCH v3 1/2] drivers/perf: riscv-iommu: add risc-v iommu pmu driver

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Zong Li" <zong.li@sifive.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v3 1/2] drivers/perf: riscv-iommu: add risc-v iommu pmu driver
Date: Tue, 30 Jun 2026 08:56:18 +0000	[thread overview]
Message-ID: <20260630085618.71F961F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260630083833.1275837-2-zong.li@sifive.com>

Thank you for your contribution! Sashiko AI review found 12 potential issue(s) to consider:
- [High] Missing `.remove` callback in the auxiliary driver leads to Use-After-Free of PMU and CPU hotplug instances.
- [High] PMU IRQ is requested on the parent device instead of the auxiliary device, causing Use-After-Free on unbind.
- [High] Missing spinlocks and IRQ CPU affinity results in data races on shared MMIO and software state.
- [High] Passing a 32-bit `u32` pointer to `for_each_set_bit` causes KASAN stack-out-of-bounds reads and failure on big-endian architectures.
- [High] Failure to clear the event counter overflow (OF) bit causes a permanent interrupt storm.
- [High] Flawed cycle event check in `riscv_iommu_pmu_set_event` leads to an out-of-bounds 64-bit MMIO write.
- [High] Undefined behavior from shifting `icvec` by a bitmask causes the wrong IRQ vector to be mapped.
- [Medium] Unmanaged `free_irq` paired with `devm_request_irq` causes a devres leak and WARN on teardown.
- [Medium] Requesting `IRQF_ONESHOT` without a threaded handler triggers an unconditional `WARN_ON_ONCE` in the IRQ core.
- [Medium] Broken perf group validation logic enables the creation of oversized event groups.
- [Medium] Incorrect counter topology discovery using `hweight32` misses non-contiguous hardware counters and causes UB.
- [Medium] Race condition clearing the Write-1-to-Clear (W1C) pending bit loses new PMU interrupts.
--

commit d9f0a1344a5b91b6021c1f8b59b255eb9fdf5355
Author: Zong Li <zong.li@sifive.com>

drivers/perf: riscv-iommu: add risc-v iommu pmu driver

Add a new driver to support the RISC-V IOMMU PMU. This is an auxiliary
device driver created by the parent RISC-V IOMMU driver.

Link: https://lore.kernel.org/linux-riscv/20260618143634.7f3dd6c5@pumpkin/

> diff --git a/drivers/perf/riscv_iommu_pmu.c b/drivers/perf/riscv_iommu_pmu.c
> new file mode 100644
> index 0000000000000..2e856289ddc95
> --- /dev/null
> +++ b/drivers/perf/riscv_iommu_pmu.c
[ ... ]
> +static bool is_cycle_event(u64 event)
> +{
> +	return event == RISCV_IOMMU_HPMEVENT_CYCLE;
> +}
> +
> +static void riscv_iommu_pmu_set_event(struct riscv_iommu_pmu *pmu, u32 idx,
> +				      u64 value)
> +{
> +	/* There is no associtated IOHPMEVT0 for IOHPMCYCLES */
> +	if (is_cycle_event(value))
> +		return;
> +
> +	/* Event counter start from idx 1 */
> +	writeq(FIELD_GET(RISCV_IOMMU_IOHPMEVT_EVENT, value),
> +	       pmu->reg + RISCV_IOMMU_REG_IOHPMEVT(idx - 1));
> +}

[Severity: High]
What happens if a user configures a cycle event but sets filter bits
(e.g., pid_pscid)?

Because is_cycle_event() compares the raw 64-bit config to 0 instead of masking
the event ID, it would return false. The code would then execute a write to
IOHPMEVT(idx - 1).

Since idx is 0 for the cycle counter, idx - 1 underflows, which might cause an
out-of-bounds 64-bit MMIO write.

> +static void riscv_iommu_pmu_enable_counter(struct riscv_iommu_pmu *pmu, u32 idx)
> +{
> +	void __iomem *addr = pmu->reg + RISCV_IOMMU_REG_IOCOUNTINH;
> +	u32 value = readl(addr);
> +
> +	writel(value & ~BIT(idx), addr);
> +}

[Severity: High]
Could this lockless read-modify-write race with the IRQ handler?

Since the driver pins all perf events to a specific CPU but requests the shared
IRQ without setting CPU affinity, the hardirq handler can run concurrently on
any CPU.

Without spinlocks, the handler might race with perf core operations on MMIO
registers here, as well as shared state like the pmu->events array.

[ ... ]
> +static int riscv_iommu_pmu_event_init(struct perf_event *event)
> +{
[ ... ]
> +	if (is_cycle_event(get_event(event->group_leader)))
> +		if (++counters > total_event_counters)
> +			return -EINVAL;
> +
> +	for_each_sibling_event(sibling, event->group_leader) {
> +		if (is_cycle_event(get_event(sibling)))
> +			continue;
> +
> +		if (sibling->pmu != event->pmu && !is_software_event(sibling))
> +			return -EINVAL;
> +
> +		if (++counters > total_event_counters)
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}

[Severity: Medium]
Does this group validation logic tally the counters correctly?

It appears to skip counting the new event itself, increments the event count
if the leader is a cycle event (which does not consume an event counter), and
skips counting non-cycle leaders entirely.

Could this allow the creation of oversized event groups that exceed hardware
constraints, causing them to fail late during scheduling?

[ ... ]
> +static irqreturn_t riscv_iommu_pmu_handle_irq(struct riscv_iommu_pmu *pmu)
> +{
> +	u32 ovf = readl(pmu->reg + RISCV_IOMMU_REG_IOCOUNTOVF);
> +	int idx;
> +
> +	if (!ovf)
> +		return IRQ_NONE;
> +
> +	riscv_iommu_pmu_stop_all(pmu);
> +
> +	for_each_set_bit(idx, (unsigned long *)&ovf, pmu->num_counters) {

[Severity: High]
Can casting this 32-bit u32 variable to an unsigned long pointer cause KASAN
stack-out-of-bounds reads on 64-bit architectures?

Since unsigned long is 8 bytes on 64-bit platforms, passing it to
for_each_set_bit() might evaluate uninitialized stack memory past the ovf
variable, and also misbehave on big-endian architectures.

> +		struct perf_event *event = pmu->events[idx];
> +
> +		if (WARN_ON_ONCE(!event))
> +			continue;
> +
> +		riscv_iommu_pmu_update(event);
> +		riscv_iommu_pmu_set_period(event);
> +	}

[Severity: High]
Does the handler need to clear the overflow (OF) bit in the IOHPMEVT register?

The handler recalculates the period and resets the counter, but without writing
back to IOHPMEVT to clear the OF bit, the interrupt pending condition might
never be resolved.

Could this result in an infinite PMU interrupt storm?

> +
> +	riscv_iommu_pmu_start_all(pmu);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t riscv_iommu_pmu_irq_handler(int irq, void *dev_id)
> +{
> +	struct riscv_iommu_pmu *pmu = (struct riscv_iommu_pmu *)dev_id;
> +	irqreturn_t ret;
> +
> +	/* Check whether this interrupt is for PMU */
> +	if (!(readl_relaxed(pmu->reg + RISCV_IOMMU_REG_IPSR) & RISCV_IOMMU_IPSR_PMIP))
> +		return IRQ_NONE;
> +
> +	/* Process PMU IRQ */
> +	ret = riscv_iommu_pmu_handle_irq(pmu);
> +
> +	/* Clear performance monitoring interrupt pending bit */
> +	writel_relaxed(RISCV_IOMMU_IPSR_PMIP, pmu->reg + RISCV_IOMMU_REG_IPSR);

[Severity: Medium]
Is there a race condition when clearing the Write-1-to-Clear pending bit here?

The bit is cleared after the hardware counters are uninhibited by
riscv_iommu_pmu_start_all(). If a counter overflows very rapidly right after
being started, it will assert the interrupt again.

Clearing the pending bit at this point would wipe out that new pending state
before it can be handled, potentially losing PMU interrupts.

> +
> +	return ret;
> +}
> +
> +static unsigned int riscv_iommu_pmu_get_irq_num(struct riscv_iommu_device *iommu)
> +{
> +	/* Reuse ICVEC.CIV mask for all interrupt vectors mapping */
> +	int vec = (iommu->icvec >> (RISCV_IOMMU_IPSR_PMIP * 4)) & RISCV_IOMMU_ICVEC_CIV;

[Severity: High]
Does shifting icvec by a bitmask value cause undefined behavior?

RISCV_IOMMU_IPSR_PMIP is BIT(3) (which is 8). Shifting by (8 * 4 = 32) bits
on a 32-bit unsigned int is undefined in C.

Should this shift use the raw bit index (RISCV_IOMMU_INTR_PM) instead?

> +
> +	return iommu->irqs[vec];
> +}
> +
> +static int riscv_iommu_pmu_request_irq(struct riscv_iommu_device *iommu,
> +				       struct riscv_iommu_pmu *pmu)
> +{
> +	unsigned int irq = riscv_iommu_pmu_get_irq_num(iommu);
> +
> +	/*
> +	 * Set the IRQF_ONESHOT flag because this IRQ can be shared with
> +	 * other threaded IRQs by other queues.
> +	 */
> +	return devm_request_irq(iommu->dev, irq, riscv_iommu_pmu_irq_handler,
> +				IRQF_ONESHOT | IRQF_SHARED, dev_name(iommu->dev), pmu);
> +}

[Severity: High]
Since this uses devm_request_irq() against the parent iommu->dev instead of
the auxiliary device, could this lead to a use-after-free on unbind?

If the auxiliary device is unbound, devres would free the pmu dev_id, but the
IRQ would remain active on the parent. A subsequent interrupt would then
dereference the freed pmu pointer.

[Severity: Medium]
Also, does passing IRQF_ONESHOT without a threaded handler trigger a
WARN_ON_ONCE() in the IRQ core? __setup_irq() typically enforces that
IRQF_ONESHOT is only used when a thread_fn is provided.

> +
> +static void riscv_iommu_pmu_free_irq(struct riscv_iommu_device *iommu,
> +				     struct riscv_iommu_pmu *pmu)
> +{
> +	unsigned int irq = riscv_iommu_pmu_get_irq_num(iommu);
> +
> +	free_irq(irq, pmu);
> +}

[Severity: Medium]
Does using the unmanaged free_irq() to release an IRQ requested via
devm_request_irq() cause a devres memory leak?

This removes the IRQ action but leaves the devres record intact. When the
parent device is eventually destroyed, devres might attempt to free the IRQ
again, triggering a warning.

> +
> +static int riscv_iommu_pmu_probe(struct auxiliary_device *auxdev,
> +				 const struct auxiliary_device_id *id)
> +{
[ ... ]
> +	/* Counter number and width are hardware-implemented. Detect them by write 1s */
> +	addr = iommu_pmu->reg + RISCV_IOMMU_REG_IOCOUNTINH;
> +	writel(RISCV_IOMMU_IOCOUNTINH_HPM, addr);
> +	iommu_pmu->num_counters = hweight32(readl(addr));

[Severity: Medium]
If the hardware has non-contiguous counters, will using hweight32() miss the
valid upper bounds?

For example, if the cycle counter is absent, the population count would be
lower than the highest valid index, which might clip valid upper-index
counters when iterating.

Additionally, if hweight32() evaluates to 0, could GENMASK_ULL(-1, 0)
subsequently generate an undefined negative bitwise shift?

[ ... ]
> +static struct auxiliary_driver iommu_pmu_driver = {
> +	.probe		= riscv_iommu_pmu_probe,
> +	.id_table	= riscv_iommu_pmu_id_table,
> +};

[Severity: High]
Since the auxiliary driver lacks a .remove callback, are the PMU and CPU
hotplug instances left dangling if the driver is unbound?

If the driver is unbound via sysfs or parent IOMMU removal, devres will free
the iommu_pmu structure, which might leave dangling pointers in the perf core
and cpuhp subsystems.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260630083833.1275837-1-zong.li@sifive.com?part=1

next prev parent reply	other threads:[~2026-06-30  8:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-30  8:38 [PATCH v3 0/2] RISC-V IOMMU HPM support Zong Li
2026-06-30  8:38 ` [PATCH v3 1/2] drivers/perf: riscv-iommu: add risc-v iommu pmu driver Zong Li
2026-06-30  8:56   ` sashiko-bot [this message]
2026-06-30 13:17   ` Guo Ren
2026-06-30  8:38 ` [PATCH v3 2/2] iommu/riscv: create a auxiliary device for HPM Zong Li
2026-06-30  8:55   ` sashiko-bot
2026-07-01  1:43   ` Guo Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260630085618.71F961F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox