From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC66D2D7D3A for ; Sat, 9 May 2026 15:57:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778342244; cv=none; b=Z74pHbkk11a59y8fZEECQftQqq9/v1denc/jMAo5gwxKPW2PsuAeaWdzDqRKS3AY6HwgrGgqFpJOpjomkxh8aDfbDvI0psuEs/hSnEP+Mmn5sI73z8YLK7BmtxUvCZK6RXyVovSwXhVLW6zGPMnYIGMaAaQE5/De1MyXsas3RUE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778342244; c=relaxed/simple; bh=iNhvzvGcPYhtelN6rTkIcG/VyO+sNmO/c0l0UltBl6Y=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=D1xv9ASGUlPYJmx89yBaMCkXZcHu3L1Z5bA+SAXUo7oG11+HLdl+nZojulbvPropAOagy092jkXJ78UpBHGEoQ+NGeMRktqjYyD81m2NAGUtpMlHM1zY6LztVT9+dwe72muTwz45uOFBaij05aGiS1Vys6hsJ9A/2u56Gol+Eow= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J26lsRts; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J26lsRts" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51B87C2BCB2; Sat, 9 May 2026 15:57:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778342244; bh=iNhvzvGcPYhtelN6rTkIcG/VyO+sNmO/c0l0UltBl6Y=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=J26lsRtsN/EfcE76WhFZY4TLsfv+CEMZ9/n6V9QmOT9On4Sh0BlTLgfvY7H3DLsBc yAsnMBamOVJTJWyO7ej0U+0EZ/5o5jHELpqj6nowBk+M3kT1laXL/AaZh8kuZnWr4y vWuK3CnRmoudNER0dGoJ0/b+kKaNAqP7h/JqLjir1kul8ig7OYUE2pYAkiqkY1fg+L niCGUr2A07YTalijdA7G/oR0CrEhtPcnCiRYgzQUZnFRjgH50j3y5ibjaycqd+F6ln kPdNs0Z3LTUZBkh1va/zK+KMWCUw1bG9xiXpstne07PJqw3BCOEQ+n5alfUi71TVDH 6hwbtVwm6gY7A== From: sashiko-bot@kernel.org Subject: Re: [PATCH v9 5/5] selftests/riscv: add SSE test module Reply-To: sashiko@lists.linux.dev To: "Zhanpeng Zhang" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <1f66bfb775e8b6658c6f2f2f48aa4f668f911db5.1778331862.5.git.zhangzhanpeng.jasper@bytedance.com> References: <1f66bfb775e8b6658c6f2f2f48aa4f668f911db5.1778331862.5.git.zhangzhanpeng.jasper@bytedance.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 09 May 2026 15:57:23 +0000 Message-Id: <20260509155724.51B87C2BCB2@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 7 potential issue(= s) to consider: - [High] Event resource leak in error path leads to Use-After-Free of stack= memory. - [High] Use-After-Free of `task_struct` due to unsynchronized `kthread_sto= p()`. - [High] Physical address truncation on RV32 leads to hypervisor memory cor= ruption. - [Medium] The executable test script `run_sse_test.sh` is incorrectly assi= gned to `TEST_FILES` instead of `TEST_PROGS`. - [Medium] Time-Of-Check to Time-Of-Use (TOCTOU) race in timeout evaluation= causes false test failures. - [Medium] Using `smp_processor_id()` in a preemptible context triggers ker= nel warnings and migration races. - [Medium] Infinite busy-wait loop without a timeout risks test soft-lockup= s. -- commit 044a5142115aaf579bd893af3ea7d69bca6b7dfa Author: Cl=C3=A9ment L=C3=A9ger selftests/riscv: add SSE test module This module, once loaded, will execute a series of tests using the SSE framework. The provided script will check for any error reported by the test module. > diff --git a/tools/testing/selftests/riscv/sse/Makefile b/tools/testing/s= elftests/riscv/sse/Makefile > new file mode 100644 > --- /dev/null > +++ b/tools/testing/selftests/riscv/sse/Makefile > @@ -0,0 +1,5 @@ > +TEST_GEN_MODS_DIR :=3D module > + > +TEST_FILES :=3D run_sse_test.sh Should run_sse_test.sh be in TEST_PROGS instead of TEST_FILES? According to the selftests documentation, executable test scripts intended to be run directly should be placed in TEST_PROGS, whereas TEST_FILES is for auxiliary supporting files. Placing it in TEST_FILES might cause it to be skipped during automated execution. > diff --git a/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c b/= tools/testing/selftests/riscv/sse/module/riscv_sse_test.c > new file mode 100644 > --- /dev/null > +++ b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c [ ... ] > +static int sse_event_attr_get(u32 evt, unsigned long attr_id, > + unsigned long *val) > +{ > + struct sbiret sret; > + unsigned long *attr_buf, phys; > + > + attr_buf =3D kmalloc_obj(*attr_buf, GFP_KERNEL); > + if (!attr_buf) > + return -ENOMEM; > + > + phys =3D virt_to_phys(attr_buf); Will this truncate the physical address on RV32 systems with high memory? virt_to_phys() returns a phys_addr_t, which can be 64-bit on RV32 (Sv32). Since phys is an unsigned long, the upper 32 bits might be silently truncated. > + > + sret =3D sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt, attr_id, = 1, > + phys, 0, 0); If the buffer resides above the 4GB boundary, could this corrupted physical address cause the SBI firmware to write the attribute value to the wrong memory location? [ ... ] > +static int sse_test_inject_event(struct sse_event *event, u32 evt, unsig= ned int cpu) > +{ > + int res; > + unsigned long status; > + > + if (sse_event_is_global(evt)) { > + /* > + * Due to the fact the completion might happen faster than > + * the call to SBI_SSE_COMPLETE in the handler, if the event was > + * running on another CPU, we need to wait for the event status > + * to be !RUNNING. > + */ > + do { > + res =3D sse_event_attr_get(evt, SBI_SSE_ATTR_STATUS, &status); > + if (res) { > + sse_err("Failed to get status for evt %x, error %d\n", evt, res); > + return res; > + } > + status =3D status & SBI_SSE_ATTR_STATUS_STATE_MASK; > + } while (status =3D=3D SBI_SSE_STATE_RUNNING); Could this loop spin indefinitely if the SBI firmware encounters a bug or takes an excessive amount of time? Adding a timeout mechanism might be safer, especially since sse_event_attr_get() allocates memory with GFP_KERNEL on each iteration. [ ... ] > +static void sse_run_fast_test(struct fast_test_arg *test_arg, struct sse= _event *event, u32 evt) > +{ > + unsigned long timeout; > + int ret, cpu; > + > + for_each_online_cpu(cpu) { > + WRITE_ONCE(test_arg->completion, false); > + WRITE_ONCE(test_arg->cpu, cpu); > + /* Test arg is used on another CPU */ > + smp_wmb(); > + > + ret =3D sse_test_inject_event(event, evt, cpu); > + if (ret) { > + sse_err("event %s injection failed, err %d\n", sse_evt_name(evt), ret= ); > + return; > + } > + > + timeout =3D jiffies + HZ / 100; > + /* We can not use since they are not NMI safe */ > + while (!READ_ONCE(test_arg->completion) && > + time_before(jiffies, timeout)) { > + cpu_relax(); > + } > + if (!time_before(jiffies, timeout)) { Can this lead to a false test failure? If the SSE event successfully completes at the exact same jiffy that the timeout expires, the loop will exit safely because test_arg->completion is true. However, the time_before() check right after will evaluate to false, causing the code to incorrectly report a timeout. Checking if (!READ_ONCE(test_arg->completion)) directly might be more robust. > + sse_err("Failed to wait for event %s completion on CPU %d\n", > + sse_evt_name(evt), cpu); > + return; > + } > + } > +} [ ... ] > +static void sse_test_injection_priority_arg(struct priority_test_arg *ar= gs, unsigned int args_size, > + sse_event_handler_fn handler, const char *test_name) > +{ > + unsigned int i; > + int ret; > + struct sse_event *event; > + struct priority_test_arg *arg, *first_arg =3D NULL, *prev_arg =3D NULL; > + > + pr_info("Starting SSE priority test (%s)\n", test_name); > + for (i =3D 0; i < args_size; i++) { > + arg =3D &args[i]; > + > + if (!sse_test_can_inject_event(arg->evt)) > + continue; > + > + WRITE_ONCE(arg->called, false); > + WRITE_ONCE(arg->next_evt_arg, NULL); > + WRITE_ONCE(arg->event, NULL); > + > + event =3D sse_event_register(arg->evt, arg->prio, handler, (void *)arg= ); > + if (IS_ERR(event)) { > + if (PTR_ERR(event) =3D=3D -EEXIST) { > + pr_info("Event %s already registered, skipping\n", > + sse_evt_name(arg->evt)); > + continue; > + } > + sse_err("Failed to register event %s, err %ld\n", sse_evt_name(arg->e= vt), > + PTR_ERR(event)); > + goto release_events; > + } > + arg->event =3D event; > + > + if (sse_event_is_global(arg->evt)) { > + /* Target event at current CPU */ > + ret =3D sse_event_set_target_cpu(event, smp_processor_id()); Is it safe to call smp_processor_id() here? This function executes during module initialization, which is a preemptible context. Calling smp_processor_id() without disabling preemption will trigger a warning. Additionally, if the task migrates between CPUs during this loop, global events might be bound to different CPUs, which might violate the assumption that all events in the chain run on the exact same CPU. > + if (ret) { > + sse_err("Failed to set event %s target CPU, err %d\n", > + sse_evt_name(arg->evt), ret); > + goto release_events; > + } > + } > + > + ret =3D sse_event_enable(event); > + if (ret) { > + sse_err("Failed to enable event %s, err %d\n", sse_evt_name(arg->evt)= , ret); > + goto release_events; > + } > + > + if (prev_arg) > + WRITE_ONCE(prev_arg->next_evt_arg, arg); > + > + prev_arg =3D arg; > + > + if (!first_arg) > + first_arg =3D arg; > + } > + > + if (!first_arg) { > + pr_info("No injectable event available for %s priority test\n", > + test_name); > + return; > + } > + > + /* Inject first event, handler should trigger the others in chain. */ > + ret =3D sse_test_inject_event(first_arg->event, first_arg->evt, smp_pro= cessor_id()); > + if (ret) { > + sse_err("SSE event %s injection failed\n", sse_evt_name(first_arg->evt= )); > + goto release_events; > + } > + > + /* > + * Event are injected directly on the current CPU after calling sse_tes= t_inject_event() > + * so that execution is preempted right away, no need to wait for timeo= ut. > + */ > + arg =3D first_arg; > + while (arg) { > + if (!READ_ONCE(arg->called)) { > + sse_err("Event %s handler was not called\n", > + sse_evt_name(arg->evt)); > + ret =3D -EINVAL; > + } > + > + event =3D arg->event; > + arg =3D READ_ONCE(arg->next_evt_arg); > + } > + > +release_events: > + > + arg =3D first_arg; > + while (arg) { > + event =3D arg->event; > + if (!event) > + break; > + > + sse_event_disable(event); > + sse_event_unregister(event); > + arg =3D READ_ONCE(arg->next_evt_arg); > + } Could this cleanup loop leak an event in the error path? If sse_event_set_target_cpu() or sse_event_enable() fails, the code jumps to release_events. However, the current event hasn't been linked into the next_evt_arg chain yet (since prev_arg is updated after). This causes the failing event to be skipped during cleanup. If the firmware triggers this leaked event later, could it result in a use-after-free? The handler context (arg) points to a local array on the stack (e.g., default_hi_prio_args), so executing the handler after the function returns would access destroyed stack memory. > + > + pr_info("Finished SSE priority test (%s)\n", test_name); > +} [ ... ] > +static int sse_stress_test_layer(int layer) > +{ > + struct sse_event *event; > + int ret; > + > + if (layer < 0 || layer >=3D ARRAY_SIZE(stress_handlers)) > + return -EINVAL; > + > + pr_info("Starting SSE stress layer %d (%s)\n", > + layer, stress_layer_names[layer]); > + > + memset(&stress_ctx, 0, sizeof(stress_ctx)); > + stress_ctx.evt_id =3D SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED; > + stress_ctx.layer =3D layer; > + WRITE_ONCE(stress_ctx.running, true); > + atomic_set(&stress_ctx.inject_count, 0); > + atomic_set(&stress_ctx.handler_count, 0); > + init_waitqueue_head(&stress_ctx.wait_q); > + > + event =3D sse_event_register(stress_ctx.evt_id, 0, > + stress_handlers[layer], &stress_ctx); > + if (IS_ERR(event)) { > + sse_err("Failed to register stress event, err %ld\n", > + PTR_ERR(event)); > + return PTR_ERR(event); > + } > + > + stress_ctx.event =3D event; > + > + ret =3D sse_event_enable(event); > + if (ret) { > + sse_err("Failed to enable stress event, err %d\n", ret); > + goto err_unregister; > + } > + > + stress_ctx.monitor_task =3D kthread_run(stress_monitor_thread, > + &stress_ctx, "sse_stress_mon"); > + if (IS_ERR(stress_ctx.monitor_task)) { > + ret =3D PTR_ERR(stress_ctx.monitor_task); > + sse_err("Failed to create stress monitor thread, err %d\n", ret); > + goto err_disable; > + } > + > + hrtimer_setup(&stress_ctx.timer, stress_timer_callback, > + CLOCK_MONOTONIC, HRTIMER_MODE_PINNED); > + hrtimer_start(&stress_ctx.timer, ns_to_ktime(STRESS_INJECT_NS), > + HRTIMER_MODE_REL_PINNED); > + > + hrtimer_setup(&stress_ctx.stop_timer, stress_stop_timer_callback, > + CLOCK_MONOTONIC, HRTIMER_MODE_REL); > + hrtimer_start(&stress_ctx.stop_timer, ms_to_ktime(STRESS_DURATION_MS), > + HRTIMER_MODE_REL); > + > + wait_event(stress_ctx.wait_q, READ_ONCE(stress_ctx.test_done)); > + > + WRITE_ONCE(stress_ctx.running, false); > + hrtimer_cancel(&stress_ctx.timer); > + hrtimer_cancel(&stress_ctx.stop_timer); > + kthread_stop(stress_ctx.monitor_task); Can setting stress_ctx.running to false before calling kthread_stop() lead to a use-after-free? If stress_monitor_thread() wakes up and observes running =3D=3D false, it w= ill exit its loop and terminate naturally, dropping its reference to the task_struct. Since the caller didn't take an extra reference via get_task_struct(), the task_struct might be freed before kthread_stop() is called. > + > + pr_info("Finished SSE stress layer %d (%s): inject=3D%d, handler=3D%d\n= ", > + layer, stress_layer_names[layer], > + atomic_read(&stress_ctx.inject_count), > + atomic_read(&stress_ctx.handler_count)); > + > +err_disable: > + sse_event_disable(event); > +err_unregister: > + sse_event_unregister(event); > + > + return ret; > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/cover.1778331862.gi= t.zhangzhanpeng.jasper@bytedance.com?part=3D5