Linux Perf Users
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Zhanpeng Zhang" <zhangzhanpeng.jasper@bytedance.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v9 5/5] selftests/riscv: add SSE test module
Date: Sat, 09 May 2026 15:57:23 +0000	[thread overview]
Message-ID: <20260509155724.51B87C2BCB2@smtp.kernel.org> (raw)
In-Reply-To: <1f66bfb775e8b6658c6f2f2f48aa4f668f911db5.1778331862.5.git.zhangzhanpeng.jasper@bytedance.com>

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] Event resource leak in error path leads to Use-After-Free of stack memory.
- [High] Use-After-Free of `task_struct` due to unsynchronized `kthread_stop()`.
- [High] Physical address truncation on RV32 leads to hypervisor memory corruption.
- [Medium] The executable test script `run_sse_test.sh` is incorrectly assigned to `TEST_FILES` instead of `TEST_PROGS`.
- [Medium] Time-Of-Check to Time-Of-Use (TOCTOU) race in timeout evaluation causes false test failures.
- [Medium] Using `smp_processor_id()` in a preemptible context triggers kernel warnings and migration races.
- [Medium] Infinite busy-wait loop without a timeout risks test soft-lockups.
--

commit 044a5142115aaf579bd893af3ea7d69bca6b7dfa
Author: Clément Léger <cleger@rivosinc.com>

selftests/riscv: add SSE test module

This module, once loaded, will execute a series of tests using the SSE
framework. The provided script will check for any error reported by the
test module.

> diff --git a/tools/testing/selftests/riscv/sse/Makefile b/tools/testing/selftests/riscv/sse/Makefile
> new file mode 100644
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/sse/Makefile
> @@ -0,0 +1,5 @@
> +TEST_GEN_MODS_DIR := module
> +
> +TEST_FILES := run_sse_test.sh

Should run_sse_test.sh be in TEST_PROGS instead of TEST_FILES?

According to the selftests documentation, executable test scripts intended
to be run directly should be placed in TEST_PROGS, whereas TEST_FILES is
for auxiliary supporting files. Placing it in TEST_FILES might cause it to
be skipped during automated execution.

> diff --git a/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c
> new file mode 100644
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c

[ ... ]

> +static int sse_event_attr_get(u32 evt, unsigned long attr_id,
> +			      unsigned long *val)
> +{
> +	struct sbiret sret;
> +	unsigned long *attr_buf, phys;
> +
> +	attr_buf = kmalloc_obj(*attr_buf, GFP_KERNEL);
> +	if (!attr_buf)
> +		return -ENOMEM;
> +
> +	phys = virt_to_phys(attr_buf);

Will this truncate the physical address on RV32 systems with high memory?

virt_to_phys() returns a phys_addr_t, which can be 64-bit on RV32 (Sv32).
Since phys is an unsigned long, the upper 32 bits might be silently
truncated.

> +
> +	sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt, attr_id, 1,
> +			 phys, 0, 0);

If the buffer resides above the 4GB boundary, could this corrupted physical
address cause the SBI firmware to write the attribute value to the wrong
memory location?

[ ... ]

> +static int sse_test_inject_event(struct sse_event *event, u32 evt, unsigned int cpu)
> +{
> +	int res;
> +	unsigned long status;
> +
> +	if (sse_event_is_global(evt)) {
> +		/*
> +		 * Due to the fact the completion might happen faster than
> +		 * the call to SBI_SSE_COMPLETE in the handler, if the event was
> +		 * running on another CPU, we need to wait for the event status
> +		 * to be !RUNNING.
> +		 */
> +		do {
> +			res = sse_event_attr_get(evt, SBI_SSE_ATTR_STATUS, &status);
> +			if (res) {
> +				sse_err("Failed to get status for evt %x, error %d\n", evt, res);
> +				return res;
> +			}
> +			status = status & SBI_SSE_ATTR_STATUS_STATE_MASK;
> +		} while (status == SBI_SSE_STATE_RUNNING);

Could this loop spin indefinitely if the SBI firmware encounters a bug or
takes an excessive amount of time?

Adding a timeout mechanism might be safer, especially since
sse_event_attr_get() allocates memory with GFP_KERNEL on each iteration.

[ ... ]

> +static void sse_run_fast_test(struct fast_test_arg *test_arg, struct sse_event *event, u32 evt)
> +{
> +	unsigned long timeout;
> +	int ret, cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		WRITE_ONCE(test_arg->completion, false);
> +		WRITE_ONCE(test_arg->cpu, cpu);
> +		/* Test arg is used on another CPU */
> +		smp_wmb();
> +
> +		ret = sse_test_inject_event(event, evt, cpu);
> +		if (ret) {
> +			sse_err("event %s injection failed, err %d\n", sse_evt_name(evt), ret);
> +			return;
> +		}
> +
> +		timeout = jiffies + HZ / 100;
> +		/* We can not use <linux/completion.h> since they are not NMI safe */
> +		while (!READ_ONCE(test_arg->completion) &&
> +		       time_before(jiffies, timeout)) {
> +			cpu_relax();
> +		}
> +		if (!time_before(jiffies, timeout)) {

Can this lead to a false test failure?

If the SSE event successfully completes at the exact same jiffy that the
timeout expires, the loop will exit safely because test_arg->completion is
true. However, the time_before() check right after will evaluate to false,
causing the code to incorrectly report a timeout. Checking
if (!READ_ONCE(test_arg->completion)) directly might be more robust.

> +			sse_err("Failed to wait for event %s completion on CPU %d\n",
> +				sse_evt_name(evt), cpu);
> +			return;
> +		}
> +	}
> +}

[ ... ]

> +static void sse_test_injection_priority_arg(struct priority_test_arg *args, unsigned int args_size,
> +					    sse_event_handler_fn handler, const char *test_name)
> +{
> +	unsigned int i;
> +	int ret;
> +	struct sse_event *event;
> +	struct priority_test_arg *arg, *first_arg = NULL, *prev_arg = NULL;
> +
> +	pr_info("Starting SSE priority test (%s)\n", test_name);
> +	for (i = 0; i < args_size; i++) {
> +		arg = &args[i];
> +
> +		if (!sse_test_can_inject_event(arg->evt))
> +			continue;
> +
> +		WRITE_ONCE(arg->called, false);
> +		WRITE_ONCE(arg->next_evt_arg, NULL);
> +		WRITE_ONCE(arg->event, NULL);
> +
> +		event = sse_event_register(arg->evt, arg->prio, handler, (void *)arg);
> +		if (IS_ERR(event)) {
> +			if (PTR_ERR(event) == -EEXIST) {
> +				pr_info("Event %s already registered, skipping\n",
> +					sse_evt_name(arg->evt));
> +				continue;
> +			}
> +			sse_err("Failed to register event %s, err %ld\n", sse_evt_name(arg->evt),
> +				PTR_ERR(event));
> +			goto release_events;
> +		}
> +		arg->event = event;
> +
> +		if (sse_event_is_global(arg->evt)) {
> +			/* Target event at current CPU */
> +			ret = sse_event_set_target_cpu(event, smp_processor_id());

Is it safe to call smp_processor_id() here?

This function executes during module initialization, which is a preemptible
context. Calling smp_processor_id() without disabling preemption will
trigger a warning. Additionally, if the task migrates between CPUs during
this loop, global events might be bound to different CPUs, which might
violate the assumption that all events in the chain run on the exact same
CPU.

> +			if (ret) {
> +				sse_err("Failed to set event %s target CPU, err %d\n",
> +					sse_evt_name(arg->evt), ret);
> +				goto release_events;
> +			}
> +		}
> +
> +		ret = sse_event_enable(event);
> +		if (ret) {
> +			sse_err("Failed to enable event %s, err %d\n", sse_evt_name(arg->evt), ret);
> +			goto release_events;
> +		}
> +
> +		if (prev_arg)
> +			WRITE_ONCE(prev_arg->next_evt_arg, arg);
> +
> +		prev_arg = arg;
> +
> +		if (!first_arg)
> +			first_arg = arg;
> +	}
> +
> +	if (!first_arg) {
> +		pr_info("No injectable event available for %s priority test\n",
> +			test_name);
> +		return;
> +	}
> +
> +	/* Inject first event, handler should trigger the others in chain. */
> +	ret = sse_test_inject_event(first_arg->event, first_arg->evt, smp_processor_id());
> +	if (ret) {
> +		sse_err("SSE event %s injection failed\n", sse_evt_name(first_arg->evt));
> +		goto release_events;
> +	}
> +
> +	/*
> +	 * Event are injected directly on the current CPU after calling sse_test_inject_event()
> +	 * so that execution is preempted right away, no need to wait for timeout.
> +	 */
> +	arg = first_arg;
> +	while (arg) {
> +		if (!READ_ONCE(arg->called)) {
> +			sse_err("Event %s handler was not called\n",
> +				sse_evt_name(arg->evt));
> +			ret = -EINVAL;
> +		}
> +
> +		event = arg->event;
> +		arg = READ_ONCE(arg->next_evt_arg);
> +	}
> +
> +release_events:
> +
> +	arg = first_arg;
> +	while (arg) {
> +		event = arg->event;
> +		if (!event)
> +			break;
> +
> +		sse_event_disable(event);
> +		sse_event_unregister(event);
> +		arg = READ_ONCE(arg->next_evt_arg);
> +	}

Could this cleanup loop leak an event in the error path?

If sse_event_set_target_cpu() or sse_event_enable() fails, the code jumps
to release_events. However, the current event hasn't been linked into the
next_evt_arg chain yet (since prev_arg is updated after). This causes the
failing event to be skipped during cleanup.

If the firmware triggers this leaked event later, could it result in a
use-after-free? The handler context (arg) points to a local array on the
stack (e.g., default_hi_prio_args), so executing the handler after the
function returns would access destroyed stack memory.

> +
> +	pr_info("Finished SSE priority test (%s)\n", test_name);
> +}

[ ... ]

> +static int sse_stress_test_layer(int layer)
> +{
> +	struct sse_event *event;
> +	int ret;
> +
> +	if (layer < 0 || layer >= ARRAY_SIZE(stress_handlers))
> +		return -EINVAL;
> +
> +	pr_info("Starting SSE stress layer %d (%s)\n",
> +		layer, stress_layer_names[layer]);
> +
> +	memset(&stress_ctx, 0, sizeof(stress_ctx));
> +	stress_ctx.evt_id = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED;
> +	stress_ctx.layer = layer;
> +	WRITE_ONCE(stress_ctx.running, true);
> +	atomic_set(&stress_ctx.inject_count, 0);
> +	atomic_set(&stress_ctx.handler_count, 0);
> +	init_waitqueue_head(&stress_ctx.wait_q);
> +
> +	event = sse_event_register(stress_ctx.evt_id, 0,
> +				   stress_handlers[layer], &stress_ctx);
> +	if (IS_ERR(event)) {
> +		sse_err("Failed to register stress event, err %ld\n",
> +			PTR_ERR(event));
> +		return PTR_ERR(event);
> +	}
> +
> +	stress_ctx.event = event;
> +
> +	ret = sse_event_enable(event);
> +	if (ret) {
> +		sse_err("Failed to enable stress event, err %d\n", ret);
> +		goto err_unregister;
> +	}
> +
> +	stress_ctx.monitor_task = kthread_run(stress_monitor_thread,
> +					      &stress_ctx, "sse_stress_mon");
> +	if (IS_ERR(stress_ctx.monitor_task)) {
> +		ret = PTR_ERR(stress_ctx.monitor_task);
> +		sse_err("Failed to create stress monitor thread, err %d\n", ret);
> +		goto err_disable;
> +	}
> +
> +	hrtimer_setup(&stress_ctx.timer, stress_timer_callback,
> +		      CLOCK_MONOTONIC, HRTIMER_MODE_PINNED);
> +	hrtimer_start(&stress_ctx.timer, ns_to_ktime(STRESS_INJECT_NS),
> +		      HRTIMER_MODE_REL_PINNED);
> +
> +	hrtimer_setup(&stress_ctx.stop_timer, stress_stop_timer_callback,
> +		      CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	hrtimer_start(&stress_ctx.stop_timer, ms_to_ktime(STRESS_DURATION_MS),
> +		      HRTIMER_MODE_REL);
> +
> +	wait_event(stress_ctx.wait_q, READ_ONCE(stress_ctx.test_done));
> +
> +	WRITE_ONCE(stress_ctx.running, false);
> +	hrtimer_cancel(&stress_ctx.timer);
> +	hrtimer_cancel(&stress_ctx.stop_timer);
> +	kthread_stop(stress_ctx.monitor_task);

Can setting stress_ctx.running to false before calling kthread_stop() lead
to a use-after-free?

If stress_monitor_thread() wakes up and observes running == false, it will
exit its loop and terminate naturally, dropping its reference to the
task_struct. Since the caller didn't take an extra reference via
get_task_struct(), the task_struct might be freed before kthread_stop() is
called.

> +
> +	pr_info("Finished SSE stress layer %d (%s): inject=%d, handler=%d\n",
> +		layer, stress_layer_names[layer],
> +		atomic_read(&stress_ctx.inject_count),
> +		atomic_read(&stress_ctx.handler_count));
> +
> +err_disable:
> +	sse_event_disable(event);
> +err_unregister:
> +	sse_event_unregister(event);
> +
> +	return ret;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1778331862.git.zhangzhanpeng.jasper@bytedance.com?part=5

      reply	other threads:[~2026-05-09 15:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
2026-05-09 13:09 ` [PATCH v9 1/5] riscv: add SBI SSE extension definitions Zhanpeng Zhang
2026-05-09 13:09 ` [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension Zhanpeng Zhang
2026-05-09 13:53   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
2026-05-09 14:38   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 4/5] perf: RISC-V: add support for SSE event Zhanpeng Zhang
2026-05-09 15:16   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 5/5] selftests/riscv: add SSE test module Zhanpeng Zhang
2026-05-09 15:57   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260509155724.51B87C2BCB2@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=zhangzhanpeng.jasper@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox