* [PATCH v2 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record()
2026-04-09 16:22 [PATCH v2 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
@ 2026-04-09 16:22 ` Swapnil Sapkal
2026-04-09 16:51 ` sashiko-bot
2026-04-09 16:22 ` [PATCH v2 2/3] perf sched stats: Fix SIGCHLD race in schedstat_live() Swapnil Sapkal
2026-04-09 16:22 ` [PATCH v2 3/3] perf lock contention: Fix SIGCHLD race in __cmd_contention() Swapnil Sapkal
2 siblings, 1 reply; 7+ messages in thread
From: Swapnil Sapkal @ 2026-04-09 16:22 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers, james.clark
Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal
When a very short-lived workload is used with 'perf sched stats record',
the child process can exit and deliver SIGCHLD between
evlist__start_workload() and pause(). Since pause() only returns when a
signal is received while suspended, and the SIGCHLD has already been
delivered and handled by then, pause() blocks indefinitely.
Fix this by blocking SIGCHLD, SIGINT and SIGTERM after
evlist__prepare_workload() (so the forked child does not inherit a
modified signal mask) and replacing pause() with sigsuspend().
sigsuspend() atomically restores the original mask and suspends the
process, ensuring no signal is lost regardless of how quickly the child
exits or the user sends Ctrl+C.
Reviewed-by: James Clark <james.clark@linaro.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
tools/perf/builtin-sched.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3f509cfdd58c..3f448acd5046 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -3807,6 +3807,7 @@ const char *output_name;
static int perf_sched__schedstat_record(struct perf_sched *sched,
int argc, const char **argv)
{
+ sigset_t sig_mask, oldmask;
struct perf_session *session;
struct target target = {};
struct evlist *evlist;
@@ -3899,11 +3900,25 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
if (err < 0)
goto out;
+ /*
+ * Block all handled signals so that a short-lived workload exiting
+ * (SIGCHLD) or an early Ctrl+C (SIGINT/SIGTERM) during the remaining
+ * setup cannot be delivered before we are ready to wait. sigsuspend()
+ * below will atomically unblock them. This is done after
+ * evlist__prepare_workload() so the forked child does not inherit a
+ * modified signal mask.
+ */
+ sigemptyset(&sig_mask);
+ sigaddset(&sig_mask, SIGCHLD);
+ sigaddset(&sig_mask, SIGINT);
+ sigaddset(&sig_mask, SIGTERM);
+ sigprocmask(SIG_BLOCK, &sig_mask, &oldmask);
+
if (argc)
evlist__start_workload(evlist);
- /* wait for signal */
- pause();
+ sigsuspend(&oldmask);
+ sigprocmask(SIG_SETMASK, &oldmask, NULL);
if (reset) {
err = disable_sched_schedstat();
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record()
2026-04-09 16:22 ` [PATCH v2 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record() Swapnil Sapkal
@ 2026-04-09 16:51 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-04-09 16:51 UTC (permalink / raw)
To: Swapnil Sapkal; +Cc: linux-perf-users
Sashiko AI review found 1 potential issue(s):
- [Medium] The patch leaves a race window where early signals (like SIGINT) are lost, causing sigsuspend() to hang indefinitely.
--
commit 82e42f3b8c8062e3b83019c07f66b6e7199c4104
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>
perf sched stats: Fix SIGCHLD race in schedstat_record()
This commit attempts to fix a SIGCHLD race in perf sched stats record by
replacing pause() with sigsuspend(). It blocks SIGCHLD, SIGINT, and SIGTERM
before starting the workload to ensure no signal is lost if the child exits
quickly.
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
[ ... ]
> @@ -3899,11 +3900,25 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
> if (err < 0)
> goto out;
>
> + /*
> + * Block all handled signals so that a short-lived workload exiting
> + * (SIGCHLD) or an early Ctrl+C (SIGINT/SIGTERM) during the remaining
> + * setup cannot be delivered before we are ready to wait. sigsuspend()
> + * below will atomically unblock them. This is done after
> + * evlist__prepare_workload() so the forked child does not inherit a
> + * modified signal mask.
> + */
> + sigemptyset(&sig_mask);
> + sigaddset(&sig_mask, SIGCHLD);
> + sigaddset(&sig_mask, SIGINT);
> + sigaddset(&sig_mask, SIGTERM);
> + sigprocmask(SIG_BLOCK, &sig_mask, &oldmask);
Does this leave a race window where early signals like SIGINT can be lost,
causing sigsuspend() to hang indefinitely?
If a signal like SIGINT or SIGTERM is delivered before this sigprocmask()
call, the existing sighandler will consume it without setting any state flag
that gets checked here.
> if (argc)
> evlist__start_workload(evlist);
>
> - /* wait for signal */
> - pause();
> + sigsuspend(&oldmask);
When execution subsequently reaches sigsuspend(), wouldn't it block
indefinitely waiting for an event that was already processed?
To robustly resolve the TOCTOU race without leaking a modified signal mask
to the child, should the signal handler set a volatile sig_atomic_t flag,
and sigsuspend() be called conditionally based on that flag?
> + sigprocmask(SIG_SETMASK, &oldmask, NULL);
>
> if (reset) {
> err = disable_sched_schedstat();
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260409162249.25581-1-swapnil.sapkal@amd.com?part=1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/3] perf sched stats: Fix SIGCHLD race in schedstat_live()
2026-04-09 16:22 [PATCH v2 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
2026-04-09 16:22 ` [PATCH v2 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record() Swapnil Sapkal
@ 2026-04-09 16:22 ` Swapnil Sapkal
2026-04-09 17:18 ` sashiko-bot
2026-04-09 16:22 ` [PATCH v2 3/3] perf lock contention: Fix SIGCHLD race in __cmd_contention() Swapnil Sapkal
2 siblings, 1 reply; 7+ messages in thread
From: Swapnil Sapkal @ 2026-04-09 16:22 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers, james.clark
Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal
The same signal race that exists in perf_sched__schedstat_record() also
affects perf_sched__schedstat_live(). A very short-lived workload can
exit and deliver SIGCHLD before pause() is entered, causing an
indefinite hang.
Apply the same fix: block SIGCHLD, SIGINT and SIGTERM after the
workload is forked (evlist__prepare_workload) but before it is started
(evlist__start_workload), then replace pause() with sigsuspend() to
atomically unblock and wait.
Reviewed-by: James Clark <james.clark@linaro.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
tools/perf/builtin-sched.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3f448acd5046..c716dd33a9c9 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -4653,6 +4653,7 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
int argc, const char **argv)
{
struct cpu_domain_map **cd_map = NULL;
+ sigset_t sig_mask, oldmask;
struct target target = {};
u32 __maybe_unused md;
struct evlist *evlist;
@@ -4702,11 +4703,17 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
if (err < 0)
goto out;
+ sigemptyset(&sig_mask);
+ sigaddset(&sig_mask, SIGCHLD);
+ sigaddset(&sig_mask, SIGINT);
+ sigaddset(&sig_mask, SIGTERM);
+ sigprocmask(SIG_BLOCK, &sig_mask, &oldmask);
+
if (argc)
evlist__start_workload(evlist);
- /* wait for signal */
- pause();
+ sigsuspend(&oldmask);
+ sigprocmask(SIG_SETMASK, &oldmask, NULL);
if (reset) {
err = disable_sched_schedstat();
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2 2/3] perf sched stats: Fix SIGCHLD race in schedstat_live()
2026-04-09 16:22 ` [PATCH v2 2/3] perf sched stats: Fix SIGCHLD race in schedstat_live() Swapnil Sapkal
@ 2026-04-09 17:18 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-04-09 17:18 UTC (permalink / raw)
To: Swapnil Sapkal; +Cc: linux-perf-users
Sashiko AI review found 2 potential issue(s):
- [High] Unconditional use of sigsuspend() without a state flag creates a window where early signals are lost, leading to an indefinite hang.
- [High] Workload child process is orphaned if the tool is interrupted via SIGINT/SIGTERM.
--
commit f1c0b504081c33314d50fb6bf644c4e537e47385
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>
Subject: perf sched stats: Fix SIGCHLD race in schedstat_live()
This commit attempts to fix a lost-wakeup race by blocking SIGCHLD, SIGINT,
and SIGTERM before starting the workload, and then using sigsuspend()
instead of pause() to wait for the workload to finish.
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 3f448acd5046f..c716dd33a9c93 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -4702,11 +4703,17 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
> if (err < 0)
> goto out;
>
> + sigemptyset(&sig_mask);
> + sigaddset(&sig_mask, SIGCHLD);
> + sigaddset(&sig_mask, SIGINT);
> + sigaddset(&sig_mask, SIGTERM);
> + sigprocmask(SIG_BLOCK, &sig_mask, &oldmask);
Will this still hang if a signal arrives before sigprocmask() is called?
Because the registered signal handler is an empty dummy function without
a state flag, any signal arriving before sigprocmask() will be consumed
by the handler.
If a signal such as SIGINT or a premature SIGCHLD arrives during the earlier
setup phases (e.g., during /proc/schedstat parsing), the signal is
discarded and sigsuspend() will block indefinitely waiting for a signal
that has already arrived.
Should this handler set a flag (like a volatile sig_atomic_t) that is
checked after the sigprocmask(SIG_BLOCK) call but before sigsuspend()?
> +
> if (argc)
> evlist__start_workload(evlist);
>
> - /* wait for signal */
> - pause();
> + sigsuspend(&oldmask);
> + sigprocmask(SIG_SETMASK, &oldmask, NULL);
Does the exit path leave the child workload running if interrupted?
While not introduced by this patch, if sigsuspend() returns due to an
interruption like SIGINT, the code proceeds to the cleanup path.
This path calls evlist__delete() but does not appear to send SIGTERM or
SIGKILL to the workload child process (evlist->workload.pid), nor does it
waitpid() for the child to terminate.
Could this leave the workload executing as an orphaned process in the
background, leaking resources?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260409162249.25581-1-swapnil.sapkal@amd.com?part=2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 3/3] perf lock contention: Fix SIGCHLD race in __cmd_contention()
2026-04-09 16:22 [PATCH v2 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
2026-04-09 16:22 ` [PATCH v2 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record() Swapnil Sapkal
2026-04-09 16:22 ` [PATCH v2 2/3] perf sched stats: Fix SIGCHLD race in schedstat_live() Swapnil Sapkal
@ 2026-04-09 16:22 ` Swapnil Sapkal
2026-04-09 17:37 ` sashiko-bot
2 siblings, 1 reply; 7+ messages in thread
From: Swapnil Sapkal @ 2026-04-09 16:22 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers, james.clark
Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal
__cmd_contention() in builtin-lock.c has the same signal race condition
as the perf sched stats code paths. When running with a short-lived
workload via 'perf lock contention -- <cmd>', the child can exit and
deliver SIGCHLD before pause() is entered, causing an indefinite hang.
Fix this by blocking SIGCHLD, SIGINT and SIGTERM after the workload is
forked but before it is started, then replacing pause() with
sigsuspend() to atomically unblock and wait. Blocking is done after
evlist__prepare_workload() so the child does not inherit a modified
signal mask that could break workloads relying on SIGCHLD delivery.
Reviewed-by: James Clark <james.clark@linaro.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
tools/perf/builtin-lock.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index e8962c985d34..2da689de134a 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -1991,6 +1991,7 @@ static int check_lock_contention_options(const struct option *options,
static int __cmd_contention(int argc, const char **argv)
{
+ sigset_t sig_mask, oldmask;
int err = -EINVAL;
struct perf_tool eops;
struct perf_data data = {
@@ -2123,12 +2124,25 @@ static int __cmd_contention(int argc, const char **argv)
}
if (use_bpf) {
+ /*
+ * Block all handled signals after evlist__prepare_workload()
+ * so the forked child does not inherit a modified signal
+ * mask, then use sigsuspend() to atomically unblock and
+ * wait. This prevents a race where a short-lived workload
+ * exits and delivers SIGCHLD before we are ready to wait.
+ */
+ sigemptyset(&sig_mask);
+ sigaddset(&sig_mask, SIGCHLD);
+ sigaddset(&sig_mask, SIGINT);
+ sigaddset(&sig_mask, SIGTERM);
+ sigprocmask(SIG_BLOCK, &sig_mask, &oldmask);
+
lock_contention_start();
if (argc)
evlist__start_workload(con.evlist);
- /* wait for signal */
- pause();
+ sigsuspend(&oldmask);
+ sigprocmask(SIG_SETMASK, &oldmask, NULL);
lock_contention_stop();
lock_contention_read(&con);
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2 3/3] perf lock contention: Fix SIGCHLD race in __cmd_contention()
2026-04-09 16:22 ` [PATCH v2 3/3] perf lock contention: Fix SIGCHLD race in __cmd_contention() Swapnil Sapkal
@ 2026-04-09 17:37 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-04-09 17:37 UTC (permalink / raw)
To: Swapnil Sapkal; +Cc: linux-perf-users
Sashiko AI review found 1 potential issue(s):
- [High] Unconditional use of `sigsuspend()` without a state flag fails to fix the race condition and causes indefinite hangs.
--
commit bbcae9323bad3368550c5fd9165d487b92cd3bbd
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>
perf lock contention: Fix SIGCHLD race in __cmd_contention()
This commit replaces pause() with sigsuspend() to prevent a race condition
where a short-lived workload exits and delivers SIGCHLD before the wait
begins.
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index e8962c985d34a..2da689de134a4 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
[ ... ]
> @@ -2123,12 +2124,25 @@ static int __cmd_contention(int argc, const char **argv)
> }
>
> if (use_bpf) {
> + /*
> + * Block all handled signals after evlist__prepare_workload()
> + * so the forked child does not inherit a modified signal
> + * mask, then use sigsuspend() to atomically unblock and
> + * wait. This prevents a race where a short-lived workload
> + * exits and delivers SIGCHLD before we are ready to wait.
> + */
> + sigemptyset(&sig_mask);
> + sigaddset(&sig_mask, SIGCHLD);
> + sigaddset(&sig_mask, SIGINT);
> + sigaddset(&sig_mask, SIGTERM);
> + sigprocmask(SIG_BLOCK, &sig_mask, &oldmask);
> +
> lock_contention_start();
> if (argc)
> evlist__start_workload(con.evlist);
>
> - /* wait for signal */
> - pause();
> + sigsuspend(&oldmask);
If SIGINT or SIGCHLD is delivered before sigprocmask() blocks them, won't it
be consumed by sighandler without setting any checked state, causing
sigsuspend() to hang indefinitely?
The signals are registered earlier in the function, leaving a window during
evlist__prepare_workload() and lock_contention_prepare() where they are
unblocked.
Should there be a volatile sig_atomic_t flag set by sighandler that is
checked in a loop, for example:
while (!done)
sigsuspend(&oldmask);
This would avoid missing signals that arrive before the block and ensure
sigsuspend() doesn't hang.
> + sigprocmask(SIG_SETMASK, &oldmask, NULL);
>
> lock_contention_stop();
> lock_contention_read(&con);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260409162249.25581-1-swapnil.sapkal@amd.com?part=3
^ permalink raw reply [flat|nested] 7+ messages in thread