Re: [PATCH v9 3/5] drivers: firmware: add riscv SSE support

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Zhanpeng Zhang" <zhangzhanpeng.jasper@bytedance.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v9 3/5] drivers: firmware: add riscv SSE support
Date: Sat, 09 May 2026 14:38:15 +0000	[thread overview]
Message-ID: <20260509143818.B170EC2BCB2@smtp.kernel.org> (raw)
In-Reply-To: <7d86cbd47688700fbdfc27e530e824a6bec0548c.1778331862.3.git.zhangzhanpeng.jasper@bytedance.com>

Thank you for your contribution! Sashiko AI review found 12 potential issue(s) to consider:
- [Critical] Missing braces around `scoped_guard(cpus_read_lock)` in `sse_event_unregister` releases lock prematurely, leading to a race condition and use-after-free.
- [High] Event registration rollback uses the wrong revert function, leaking firmware registrations and causing use-after-free.
- [High] Missing validation of `riscv_hartid_to_cpuid` return value leads to out-of-bounds array access.
- [High] `sse_cpu_teardown` causes out-of-bounds array access on single-CPU teardown and fails to unregister global events across reboots.
- [High] 64-bit physical addresses are silently truncated to 32 bits on RV32.
- [High] CPU hotplug unconditionally enables local events, ignoring driver-defined disabled states.
- [High] Early release of `cpus_read_lock` in `sse_event_register` causes missed event registration on hotplugged CPUs.
- [High] `sse_event_alloc_global` calls `smp_processor_id()` in a preemptible context.
- [High] Missing synchronization in `sse_event_unregister()` leads to use-after-free with concurrently executing hardware event handlers.
- [Medium] `sse_event_set_target_cpu_nolock` silently ignores attribute write failures.
- [Medium] Mixing `goto` error handling with `guard()` and `scoped_guard()` declarations violates subsystem cleanup guidelines.
- [Medium] The commit message claims clients can 'complete SSE events', but no such API exists.
--

commit 1f43f4083e7681521e63a564e1ff80519c1569f5
Author: Clément Léger <cleger@rivosinc.com>
Subject: drivers: firmware: add riscv SSE support

This commit adds a driver-level interface for RISC-V SSE support, allowing
Linux clients to register handlers and manage SSE events. It supports both
local and global events and handles CPU hotplug and PM transitions.

>     Add a driver-level interface for RISC-V SSE support.  The interface lets
>     Linux clients register handlers and enable, disable and complete SSE
>     events.  It will be used by PMU and GHES drivers.

The commit message states the interface lets clients complete SSE events.
However, there does not appear to be any API provided to complete an event.

Could the commit message be updated to reflect the actual implementation?

> diff --git a/drivers/firmware/riscv/riscv_sbi_sse.c b/drivers/firmware/riscv/riscv_sbi_sse.c
> new file mode 100644
> index 0000000000000..830011ddd33f7
> --- /dev/null
> +++ b/drivers/firmware/riscv/riscv_sbi_sse.c
> @@ -0,0 +1,694 @@
[ ... ]
> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
> +				      unsigned long attr_id, unsigned long *val)
> +{
> +	struct sbiret sret;
> +	u32 evt = reg_evt->event->evt_id;
> +	unsigned long phys;
> +
> +	phys = sse_event_get_attr_phys(reg_evt);

Can this physical address be truncated on RV32 systems?

When CONFIG_ARCH_PHYS_ADDR_T_64BIT is enabled, physical addresses are 64 bits,
but unsigned long is 32 bits. Will the firmware receive a corrupted 32-bit
physical address since the ABI expects 64-bit values to be passed in two
registers on RV32?

[ ... ]
> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
> +				     unsigned long attr_id, unsigned long val)
> +{
> +	struct sbiret sret;
> +	u32 evt = reg_evt->event->evt_id;
> +	unsigned long phys;
> +
> +	reg_evt->attr = val;
> +	phys = sse_event_get_attr_phys(reg_evt);

Is this subject to the same RV32 physical address truncation mentioned above?

[ ... ]
> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
> +					   unsigned int cpu)
> +{
> +	unsigned long hart_id = cpuid_to_hartid_map(cpu);
> +	struct sse_registered_event *reg_evt = event->global;
> +	u32 evt = event->evt_id;
> +	bool was_enabled;
> +	int ret;
> +
> +	if (!sse_event_is_global(evt))
> +		return -EINVAL;
> +
> +	was_enabled = reg_evt->is_enabled;
> +	if (was_enabled)
> +		sse_event_disable_local(event);
> +
> +	ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PREFERRED_HART,
> +					hart_id);
> +	if (ret == 0)
> +		sse_global_event_update_cpu(event, cpu);
> +
> +	if (was_enabled)
> +		sse_event_enable_local(event);
> +
> +	return 0;
> +}

Should this function return the error code instead of unconditionally returning 0?

If the SBI call to set the preferred hart fails, the error is masked from the
caller, leaving the kernel thinking the migration succeeded while the firmware
state is out of sync.

[ ... ]
> +static int sse_event_alloc_global(struct sse_event *event)
> +{
> +	int err;
> +	struct sse_registered_event *reg_evt;
> +
> +	reg_evt = kzalloc_obj(*reg_evt, GFP_KERNEL);
> +	if (!reg_evt)
> +		return -ENOMEM;
> +
> +	event->global = reg_evt;
> +	err = sse_event_init_registered(smp_processor_id(), reg_evt, event);

Is it safe to call smp_processor_id() here?

sse_event_alloc_global() is invoked from sse_event_register(), which only
holds a mutex and runs with preemption enabled. Could this trigger a warning
if CONFIG_DEBUG_PREEMPT is enabled, or allow the thread to migrate
immediately after, yielding an invalid CPU ID?

[ ... ]
> +struct sse_event *sse_event_register(u32 evt, u32 priority,
> +				     sse_event_handler_fn *handler, void *arg)
> +{
> +	struct sse_event *event;
> +	int cpu;
> +	int ret = 0;
> +
> +	if (!sse_available)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	guard(mutex)(&sse_mutex);

This isn't a bug, but is it intended to mix goto-based error handling with
scope-based guard declarations in the same function?

The cleanup subsystem documentation recommends against this pattern as it can
create confusing ownership semantics.

> +	if (sse_event_get(evt))
> +		return ERR_PTR(-EEXIST);
> +
> +	event = sse_event_alloc(evt, priority, handler, arg);
> +	if (IS_ERR(event))
> +		return event;
> +
> +	scoped_guard(cpus_read_lock) {
> +		if (sse_event_is_global(evt)) {
> +			unsigned long preferred_hart;
> +
> +			ret = sse_event_attr_get_no_lock(event->global,
> +							 SBI_SSE_ATTR_PREFERRED_HART,
> +							 &preferred_hart);
> +			if (ret)
> +				goto err_event_free;
> +
> +			cpu = riscv_hartid_to_cpuid(preferred_hart);

Can riscv_hartid_to_cpuid() return -ENOENT here?

If the firmware returns an unknown or unmapped hart ID, it appears the
negative value is passed to sse_global_event_update_cpu(), which uses it to
index the logical hart array via cpuid_to_hartid_map(), potentially causing
an out-of-bounds memory read.

> +			sse_global_event_update_cpu(event, cpu);
> +
> +			ret = sse_sbi_register_event(event, event->global);
> +			if (ret)
> +				goto err_event_free;
> +
> +		} else {
> +			ret = sse_on_each_cpu(event, SBI_SSE_EVENT_REGISTER,
> +					      SBI_SSE_EVENT_DISABLE);

Should the revert function be SBI_SSE_EVENT_UNREGISTER instead of
SBI_SSE_EVENT_DISABLE?

Disabling an event merely masks it without unregistering it from the firmware.
If registration fails on any CPU, does this leave the firmware holding an
active registration pointing to freed memory after the error path frees the
event structures?

> +			if (ret)
> +				goto err_event_free;
> +		}
> +	}
> +
> +	scoped_guard(spinlock, &events_list_lock)
> +		list_add(&event->list, &events);

Because the cpus_read_lock is released before acquiring events_list_lock to
add the event to the global list, could a CPU hotplug operation occur in
this window?

If so, sse_cpu_online() would iterate over the list and miss the new event,
resulting in the event not being registered on the new CPU.

> +
> +	return event;
> +
> +err_event_free:
> +	sse_event_free(event);
> +
> +	return ERR_PTR(ret);
> +}
[ ... ]
> +void sse_event_unregister(struct sse_event *event)
> +{
> +	scoped_guard(mutex, &sse_mutex) {
> +		scoped_guard(cpus_read_lock)
> +			sse_event_unregister_nolock(event);

Is there a missing compound block around the cpus_read_lock guard?

Without braces, the lock is dropped before acquiring events_list_lock. Could a
concurrent CPU hotplug operation acquire the list lock, find the partially
unregistered event still in the list, and erroneously register and enable
it in the firmware for the new CPU?

> +
> +		scoped_guard(spinlock, &events_list_lock)
> +			list_del(&event->list);
> +
> +		sse_event_free(event);

Since SBI events are delivered as NMI-like notifications, could an event have
just fired and its handler be actively executing on another CPU?

Without synchronization like waiting for active handlers or RCU grace periods,
it seems sse_handle_event() might dereference evt->handler and
evt->handler_arg after the memory has been freed.

> +	}
> +}
[ ... ]
> +static int sse_cpu_online(unsigned int cpu)
> +{
> +	struct sse_event *event;
> +
> +	scoped_guard(spinlock, &events_list_lock) {
> +		list_for_each_entry(event, &events, list) {
> +			if (sse_event_is_global(event->evt_id))
> +				continue;
> +
> +			sse_event_register_local(event);
> +			if (sse_get_reg_evt(event))
> +				sse_event_enable_local(event);

Should this check reg_evt->is_enabled before unconditionally enabling the event?

If a driver registers an event but intentionally leaves it disabled, it looks
like it will unexpectedly become enabled on the newly hotplugged CPU.

> +		}
> +	}
> +
> +	/* Ready to handle events. Unmask SSE. */
> +	return sse_events_unmask();
> +}
> +
> +static int sse_cpu_teardown(unsigned int cpu)
> +{
> +	int ret = 0;
> +	unsigned int next_cpu;
> +	struct sse_event *event;
> +	struct sse_registered_event *reg_evt;
> +
> +	/* Mask the sse events */
> +	ret = sse_events_mask();
> +	if (ret)
> +		return ret;
> +
> +	scoped_guard(spinlock, &events_list_lock) {
> +		list_for_each_entry(event, &events, list) {
> +			/* Disable local event on current cpu */
> +			if (!sse_event_is_global(event->evt_id)) {
> +				reg_evt = sse_get_reg_evt(event);
> +				if (reg_evt->is_enabled)
> +					sse_event_disable_local(event);
> +
> +				sse_sbi_unregister_event(event);
> +				continue;
> +			}
> +
> +			if (event->cpu != smp_processor_id())
> +				continue;
> +
> +			/* Update destination hart for global event */
> +			next_cpu = cpumask_any_but(cpu_online_mask, cpu);
> +			ret = sse_event_set_target_cpu_nolock(event, next_cpu);

What happens if there are no other CPUs online during teardown?

If cpumask_any_but() returns a value greater than or equal to nr_cpu_ids,
passing it to sse_event_set_target_cpu_nolock() might result in an
out-of-bounds array read in cpuid_to_hartid_map().

Furthermore, by migrating rather than unregistering global events on reboot,
does this break kexec by leaving the event actively registered for the next
kernel?

> +		}
> +	}
> +
> +	return ret;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1778331862.git.zhangzhanpeng.jasper@bytedance.com?part=3

next prev parent reply	other threads:[~2026-05-09 14:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
2026-05-09 13:09 ` [PATCH v9 1/5] riscv: add SBI SSE extension definitions Zhanpeng Zhang
2026-05-09 13:09 ` [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension Zhanpeng Zhang
2026-05-09 13:53   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
2026-05-09 14:38   ` sashiko-bot [this message]
2026-05-09 13:10 ` [PATCH v9 4/5] perf: RISC-V: add support for SSE event Zhanpeng Zhang
2026-05-09 15:16   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 5/5] selftests/riscv: add SSE test module Zhanpeng Zhang
2026-05-09 15:57   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260509143818.B170EC2BCB2@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=zhangzhanpeng.jasper@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox