From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2690F36606E for ; Sat, 9 May 2026 14:38:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778337499; cv=none; b=uiXKRSz73iBJ1Ls9orelPHkC+0kYraf+Or/yaOTTIEHEfF/Cw9DVaTQ6TsHQNt8WnsUAO2Bj4t8ItiPa8430gaVWi7ioCdWhZHOBmUC8jHocZUW1oIy3QVTixZhShGrObO5q6/108/e3Rvg1WUF+xAm7CfJCd8iD74M7RevwTeQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778337499; c=relaxed/simple; bh=GhdWvbsAhQYajww50i4obo3O4xeZ/HFwV/XFOGXza6w=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=JzrHSrbONb1hmWKZmh8Ohfdzx9pmjcvlliGP+IvQw+Rk7pJ6QPpiCMtkPNQuffb9tJymp6U2A6tgGS/a1AAd5rLq009bhS7ssgaScG3ClY/s/S3k8JRmISV2L91wQJLmQkxVp/G3n55ZTKm3AiHe7fWB2ZD1Ym13vywy4cd7KmE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JTKY4q+7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JTKY4q+7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B170EC2BCB2; Sat, 9 May 2026 14:38:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778337498; bh=GhdWvbsAhQYajww50i4obo3O4xeZ/HFwV/XFOGXza6w=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=JTKY4q+7UBWu87T52pByzsyXkuHJJ+Ojy1Z8godSsBcRRhDxxfcBC/1LSjzJu0FMp uv+uDjYGWBrtdRhmtBB7IyAbAc4HaTGLTpiEQAQInGingolbEdCUB6198GFdpcNqMi bkeCirTLGkTw5IPiL8tTUARixsaP0HkecE1CoZx7f4SFJK95g3C/p5uPPHUhTtMF3u KgKXYAzOOHW0UWpNpHz4Qt//N/kOzCnc7527O/vV0zHnVcDv4ismWf/PUM2VSXIPsN 2RvDt3cCCp2ITzMxgpgLou4/62ZltMARmJWaESFeNodMKxgqjELlkL6Q1hmGdaAtrA Vi5zYogC+hK6Q== From: sashiko-bot@kernel.org Subject: Re: [PATCH v9 3/5] drivers: firmware: add riscv SSE support Reply-To: sashiko@lists.linux.dev To: "Zhanpeng Zhang" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <7d86cbd47688700fbdfc27e530e824a6bec0548c.1778331862.3.git.zhangzhanpeng.jasper@bytedance.com> References: <7d86cbd47688700fbdfc27e530e824a6bec0548c.1778331862.3.git.zhangzhanpeng.jasper@bytedance.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 09 May 2026 14:38:15 +0000 Message-Id: <20260509143818.B170EC2BCB2@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 12 potential issue= (s) to consider: - [Critical] Missing braces around `scoped_guard(cpus_read_lock)` in `sse_e= vent_unregister` releases lock prematurely, leading to a race condition and= use-after-free. - [High] Event registration rollback uses the wrong revert function, leakin= g firmware registrations and causing use-after-free. - [High] Missing validation of `riscv_hartid_to_cpuid` return value leads t= o out-of-bounds array access. - [High] `sse_cpu_teardown` causes out-of-bounds array access on single-CPU= teardown and fails to unregister global events across reboots. - [High] 64-bit physical addresses are silently truncated to 32 bits on RV3= 2. - [High] CPU hotplug unconditionally enables local events, ignoring driver-= defined disabled states. - [High] Early release of `cpus_read_lock` in `sse_event_register` causes m= issed event registration on hotplugged CPUs. - [High] `sse_event_alloc_global` calls `smp_processor_id()` in a preemptib= le context. - [High] Missing synchronization in `sse_event_unregister()` leads to use-a= fter-free with concurrently executing hardware event handlers. - [Medium] `sse_event_set_target_cpu_nolock` silently ignores attribute wri= te failures. - [Medium] Mixing `goto` error handling with `guard()` and `scoped_guard()`= declarations violates subsystem cleanup guidelines. - [Medium] The commit message claims clients can 'complete SSE events', but= no such API exists. -- commit 1f43f4083e7681521e63a564e1ff80519c1569f5 Author: Cl=C3=A9ment L=C3=A9ger Subject: drivers: firmware: add riscv SSE support This commit adds a driver-level interface for RISC-V SSE support, allowing Linux clients to register handlers and manage SSE events. It supports both local and global events and handles CPU hotplug and PM transitions. > Add a driver-level interface for RISC-V SSE support. The interface l= ets > Linux clients register handlers and enable, disable and complete SSE > events. It will be used by PMU and GHES drivers. The commit message states the interface lets clients complete SSE events. However, there does not appear to be any API provided to complete an event. Could the commit message be updated to reflect the actual implementation? > diff --git a/drivers/firmware/riscv/riscv_sbi_sse.c b/drivers/firmware/ri= scv/riscv_sbi_sse.c > new file mode 100644 > index 0000000000000..830011ddd33f7 > --- /dev/null > +++ b/drivers/firmware/riscv/riscv_sbi_sse.c > @@ -0,0 +1,694 @@ [ ... ] > +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_e= vt, > + unsigned long attr_id, unsigned long *val) > +{ > + struct sbiret sret; > + u32 evt =3D reg_evt->event->evt_id; > + unsigned long phys; > + > + phys =3D sse_event_get_attr_phys(reg_evt); Can this physical address be truncated on RV32 systems? When CONFIG_ARCH_PHYS_ADDR_T_64BIT is enabled, physical addresses are 64 bi= ts, but unsigned long is 32 bits. Will the firmware receive a corrupted 32-bit physical address since the ABI expects 64-bit values to be passed in two registers on RV32? [ ... ] > +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_ev= t, > + unsigned long attr_id, unsigned long val) > +{ > + struct sbiret sret; > + u32 evt =3D reg_evt->event->evt_id; > + unsigned long phys; > + > + reg_evt->attr =3D val; > + phys =3D sse_event_get_attr_phys(reg_evt); Is this subject to the same RV32 physical address truncation mentioned abov= e? [ ... ] > +static int sse_event_set_target_cpu_nolock(struct sse_event *event, > + unsigned int cpu) > +{ > + unsigned long hart_id =3D cpuid_to_hartid_map(cpu); > + struct sse_registered_event *reg_evt =3D event->global; > + u32 evt =3D event->evt_id; > + bool was_enabled; > + int ret; > + > + if (!sse_event_is_global(evt)) > + return -EINVAL; > + > + was_enabled =3D reg_evt->is_enabled; > + if (was_enabled) > + sse_event_disable_local(event); > + > + ret =3D sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PREFERRED_HART, > + hart_id); > + if (ret =3D=3D 0) > + sse_global_event_update_cpu(event, cpu); > + > + if (was_enabled) > + sse_event_enable_local(event); > + > + return 0; > +} Should this function return the error code instead of unconditionally retur= ning 0? If the SBI call to set the preferred hart fails, the error is masked from t= he caller, leaving the kernel thinking the migration succeeded while the firmw= are state is out of sync. [ ... ] > +static int sse_event_alloc_global(struct sse_event *event) > +{ > + int err; > + struct sse_registered_event *reg_evt; > + > + reg_evt =3D kzalloc_obj(*reg_evt, GFP_KERNEL); > + if (!reg_evt) > + return -ENOMEM; > + > + event->global =3D reg_evt; > + err =3D sse_event_init_registered(smp_processor_id(), reg_evt, event); Is it safe to call smp_processor_id() here? sse_event_alloc_global() is invoked from sse_event_register(), which only holds a mutex and runs with preemption enabled. Could this trigger a warning if CONFIG_DEBUG_PREEMPT is enabled, or allow the thread to migrate immediately after, yielding an invalid CPU ID? [ ... ] > +struct sse_event *sse_event_register(u32 evt, u32 priority, > + sse_event_handler_fn *handler, void *arg) > +{ > + struct sse_event *event; > + int cpu; > + int ret =3D 0; > + > + if (!sse_available) > + return ERR_PTR(-EOPNOTSUPP); > + > + guard(mutex)(&sse_mutex); This isn't a bug, but is it intended to mix goto-based error handling with scope-based guard declarations in the same function? The cleanup subsystem documentation recommends against this pattern as it c= an create confusing ownership semantics. > + if (sse_event_get(evt)) > + return ERR_PTR(-EEXIST); > + > + event =3D sse_event_alloc(evt, priority, handler, arg); > + if (IS_ERR(event)) > + return event; > + > + scoped_guard(cpus_read_lock) { > + if (sse_event_is_global(evt)) { > + unsigned long preferred_hart; > + > + ret =3D sse_event_attr_get_no_lock(event->global, > + SBI_SSE_ATTR_PREFERRED_HART, > + &preferred_hart); > + if (ret) > + goto err_event_free; > + > + cpu =3D riscv_hartid_to_cpuid(preferred_hart); Can riscv_hartid_to_cpuid() return -ENOENT here? If the firmware returns an unknown or unmapped hart ID, it appears the negative value is passed to sse_global_event_update_cpu(), which uses it to index the logical hart array via cpuid_to_hartid_map(), potentially causing an out-of-bounds memory read. > + sse_global_event_update_cpu(event, cpu); > + > + ret =3D sse_sbi_register_event(event, event->global); > + if (ret) > + goto err_event_free; > + > + } else { > + ret =3D sse_on_each_cpu(event, SBI_SSE_EVENT_REGISTER, > + SBI_SSE_EVENT_DISABLE); Should the revert function be SBI_SSE_EVENT_UNREGISTER instead of SBI_SSE_EVENT_DISABLE? Disabling an event merely masks it without unregistering it from the firmwa= re. If registration fails on any CPU, does this leave the firmware holding an active registration pointing to freed memory after the error path frees the event structures? > + if (ret) > + goto err_event_free; > + } > + } > + > + scoped_guard(spinlock, &events_list_lock) > + list_add(&event->list, &events); Because the cpus_read_lock is released before acquiring events_list_lock to add the event to the global list, could a CPU hotplug operation occur in this window? If so, sse_cpu_online() would iterate over the list and miss the new event, resulting in the event not being registered on the new CPU. > + > + return event; > + > +err_event_free: > + sse_event_free(event); > + > + return ERR_PTR(ret); > +} [ ... ] > +void sse_event_unregister(struct sse_event *event) > +{ > + scoped_guard(mutex, &sse_mutex) { > + scoped_guard(cpus_read_lock) > + sse_event_unregister_nolock(event); Is there a missing compound block around the cpus_read_lock guard? Without braces, the lock is dropped before acquiring events_list_lock. Coul= d a concurrent CPU hotplug operation acquire the list lock, find the partially unregistered event still in the list, and erroneously register and enable it in the firmware for the new CPU? > + > + scoped_guard(spinlock, &events_list_lock) > + list_del(&event->list); > + > + sse_event_free(event); Since SBI events are delivered as NMI-like notifications, could an event ha= ve just fired and its handler be actively executing on another CPU? Without synchronization like waiting for active handlers or RCU grace perio= ds, it seems sse_handle_event() might dereference evt->handler and evt->handler_arg after the memory has been freed. > + } > +} [ ... ] > +static int sse_cpu_online(unsigned int cpu) > +{ > + struct sse_event *event; > + > + scoped_guard(spinlock, &events_list_lock) { > + list_for_each_entry(event, &events, list) { > + if (sse_event_is_global(event->evt_id)) > + continue; > + > + sse_event_register_local(event); > + if (sse_get_reg_evt(event)) > + sse_event_enable_local(event); Should this check reg_evt->is_enabled before unconditionally enabling the e= vent? If a driver registers an event but intentionally leaves it disabled, it loo= ks like it will unexpectedly become enabled on the newly hotplugged CPU. > + } > + } > + > + /* Ready to handle events. Unmask SSE. */ > + return sse_events_unmask(); > +} > + > +static int sse_cpu_teardown(unsigned int cpu) > +{ > + int ret =3D 0; > + unsigned int next_cpu; > + struct sse_event *event; > + struct sse_registered_event *reg_evt; > + > + /* Mask the sse events */ > + ret =3D sse_events_mask(); > + if (ret) > + return ret; > + > + scoped_guard(spinlock, &events_list_lock) { > + list_for_each_entry(event, &events, list) { > + /* Disable local event on current cpu */ > + if (!sse_event_is_global(event->evt_id)) { > + reg_evt =3D sse_get_reg_evt(event); > + if (reg_evt->is_enabled) > + sse_event_disable_local(event); > + > + sse_sbi_unregister_event(event); > + continue; > + } > + > + if (event->cpu !=3D smp_processor_id()) > + continue; > + > + /* Update destination hart for global event */ > + next_cpu =3D cpumask_any_but(cpu_online_mask, cpu); > + ret =3D sse_event_set_target_cpu_nolock(event, next_cpu); What happens if there are no other CPUs online during teardown? If cpumask_any_but() returns a value greater than or equal to nr_cpu_ids, passing it to sse_event_set_target_cpu_nolock() might result in an out-of-bounds array read in cpuid_to_hartid_map(). Furthermore, by migrating rather than unregistering global events on reboot, does this break kexec by leaving the event actively registered for the next kernel? > + } > + } > + > + return ret; > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/cover.1778331862.gi= t.zhangzhanpeng.jasper@bytedance.com?part=3D3