Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads
@ 2026-05-20 10:20 Swapnil Sapkal
  2026-05-20 10:20 ` [PATCH v4 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Swapnil Sapkal @ 2026-05-20 10:20 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

Several perf subcommands (sched stats, lock contention) use the pattern
of forking a workload child, calling evlist__start_workload() to uncork
it, and then calling pause() to wait for a signal (typically SIGCHLD
when the child exits, or SIGINT/SIGTERM from the user).

This pattern has a race condition: if the workload is very short-lived,
the child can exit and deliver SIGCHLD in the window between
evlist__start_workload() and pause(). Since pause() only returns when a
signal is received *while the process is suspended*, and SIGCHLD has
already been delivered and handled by the empty sighandler(), pause()
blocks indefinitely.

The fix replaces pause() with a simpler approach:

 - The signal handler now sets a 'volatile sig_atomic_t done' flag
   to record delivery.  The flag is reset before handler registration
   so that a signal arriving during the setup phase is not discarded.

 - Replace pause() with a unified loop:

       while (!done) {
           if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
               break;
           sleep(1);
       }

   This handles both workload mode (child exit detected via waitpid)
   and system-wide mode (user sends SIGINT/SIGTERM setting done).
   Using WNOHANG avoids the SA_RESTART problem where a blocking
   waitpid() would auto-restart and ignore the done flag if the child
   doesn't exit on signal.

Three call sites are affected across two files:
  - perf_sched__schedstat_record() in builtin-sched.c
  - perf_sched__schedstat_live()   in builtin-sched.c
  - __cmd_contention()             in builtin-lock.c

The two pause() sites in builtin-kwork.c are NOT affected because they
do not register SIGCHLD or fork workload children; they only wait for
user-initiated SIGINT/SIGTERM.

v3: https://lore.kernel.org/all/20260422064700.12345-1-swapnil.sapkal@amd.com/
Changes since v3:
  - Replaced blocking waitpid() with waitpid(WNOHANG) in a unified
    loop that also checks 'done'.  Because perf is compiled with
    _GNU_SOURCE, glibc's signal() sets SA_RESTART, causing a blocking
    waitpid() to auto-restart after the handler returns.  If the
    workload ignores SIGINT, the tool would hang.  (Sashiko review)
  - Moved 'done = 0' before signal handler registration so that an
    early signal during the setup phase is not discarded by a later
    reset.  (Sashiko review)
  - Used single unified loop for both workload and system-wide cases
    as suggested by Namhyung.

v2: https://lore.kernel.org/all/20260409162249.25581-1-swapnil.sapkal@amd.com/
Changes since v2:
  - The signal handler now sets a 'volatile sig_atomic_t done' flag
    instead of being an empty function.  pause()/sigsuspend() alone
    cannot cover the window between signal() handler registration and
    sigprocmask(): a signal consumed during that unblocked window was
    silently lost.  The while(!done) loop detects such early delivery.
    (Sashiko review)
  - Replaced sigprocmask()/sigsuspend() approach with the much simpler
    waitpid() + while(!done) sleep(1) pattern as suggested by the
    maintainer.

v1: https://lore.kernel.org/all/20260401064114.141066-1-swapnil.sapkal@amd.com/
Changes since v1:
  - Moved sigprocmask() to after evlist__prepare_workload() so the
    forked child does not inherit a blocked SIGCHLD mask, which would
    break workloads relying on SIGCHLD (e.g., Node.js, Python asyncio).
    (Sashiko review)
  - Block SIGINT and SIGTERM in addition to SIGCHLD to prevent an
    early Ctrl+C during setup from being consumed before sigsuspend().
  - Added signal mask restoration on error paths.
    (Sashiko review)

Swapnil Sapkal (3):
  perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
  perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
  perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()

 tools/perf/builtin-lock.c  | 12 ++++++++++--
 tools/perf/builtin-sched.c | 20 ++++++++++++++++----
 2 files changed, 26 insertions(+), 6 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
  2026-05-20 10:20 [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
@ 2026-05-20 10:20 ` Swapnil Sapkal
  2026-05-20 10:20 ` [PATCH v4 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Swapnil Sapkal @ 2026-05-20 10:20 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

If the profiled workload exits very quickly, SIGCHLD can be delivered
and consumed by the empty signal handler before the process enters
pause(), causing an indefinite hang.

Fix this with a simpler approach:

 - The signal handler now sets a 'volatile sig_atomic_t done' flag.
   Reset 'done' before registering signal handlers so that an early
   signal during setup is not discarded by a later reset.

 - Replace pause() with a loop that checks 'done' and uses
   waitpid(WNOHANG) to detect child exit without blocking.  This
   handles both workload mode (child exits) and system-wide mode
   (user sends SIGINT/SIGTERM).  Using WNOHANG avoids the SA_RESTART
   problem where a blocking waitpid() would auto-restart and ignore
   the done flag if the child doesn't exit on signal.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
 tools/perf/builtin-sched.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 53a93aa18853..7da71c372e25 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -36,6 +36,7 @@
 #include <linux/zalloc.h>
 #include <sys/prctl.h>
 #include <sys/resource.h>
+#include <sys/wait.h>
 #include <inttypes.h>
 
 #include <errno.h>
@@ -3758,8 +3759,11 @@ static int process_synthesized_schedstat_event(const struct perf_tool *tool,
 	return 0;
 }
 
+static volatile sig_atomic_t done;
+
 static void sighandler(int sig __maybe_unused)
 {
+	done = 1;
 }
 
 static int enable_sched_schedstats(int *reset)
@@ -3819,6 +3823,7 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
 		.mode  = PERF_DATA_MODE_WRITE,
 	};
 
+	done = 0;
 	signal(SIGINT, sighandler);
 	signal(SIGCHLD, sighandler);
 	signal(SIGTERM, sighandler);
@@ -3903,8 +3908,11 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
 	if (argc)
 		evlist__start_workload(evlist);
 
-	/* wait for signal */
-	pause();
+	while (!done) {
+		if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
+			break;
+		sleep(1);
+	}
 
 	if (reset) {
 		err = disable_sched_schedstat();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
  2026-05-20 10:20 [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
  2026-05-20 10:20 ` [PATCH v4 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
@ 2026-05-20 10:20 ` Swapnil Sapkal
  2026-05-20 11:18   ` sashiko-bot
  2026-05-20 10:20 ` [PATCH v4 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
  2026-05-20 19:58 ` [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Arnaldo Carvalho de Melo
  3 siblings, 1 reply; 6+ messages in thread
From: Swapnil Sapkal @ 2026-05-20 10:20 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

perf_sched__schedstat_live() has the same lost-wakeup race as
perf_sched__schedstat_record(): a short-lived workload's SIGCHLD
can be consumed by the signal handler before pause() is entered,
hanging the process.

Apply the same fix: replace pause() with a loop checking the 'done'
flag and using waitpid(WNOHANG) for the workload case.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
 tools/perf/builtin-sched.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7da71c372e25..9e8941ae1b3d 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -4755,6 +4755,7 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
 	int reset = 0;
 	int err = 0;
 
+	done = 0;
 	signal(SIGINT, sighandler);
 	signal(SIGCHLD, sighandler);
 	signal(SIGTERM, sighandler);
@@ -4800,8 +4801,11 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
 	if (argc)
 		evlist__start_workload(evlist);
 
-	/* wait for signal */
-	pause();
+	while (!done) {
+		if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
+			break;
+		sleep(1);
+	}
 
 	if (reset) {
 		err = disable_sched_schedstat();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()
  2026-05-20 10:20 [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
  2026-05-20 10:20 ` [PATCH v4 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
  2026-05-20 10:20 ` [PATCH v4 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
@ 2026-05-20 10:20 ` Swapnil Sapkal
  2026-05-20 19:58 ` [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 6+ messages in thread
From: Swapnil Sapkal @ 2026-05-20 10:20 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

__cmd_contention() suffers from the same lost-wakeup race as the perf
sched stats paths: SIGCHLD can be consumed by the signal handler
before pause() is entered, hanging the process.

Apply the same fix: replace pause() with a loop checking the 'done'
flag and using waitpid(WNOHANG) for the workload case.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
 tools/perf/builtin-lock.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 5585aeb97684..e1c033e3a8d5 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -31,6 +31,7 @@
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/prctl.h>
+#include <sys/wait.h>
 #include <semaphore.h>
 #include <math.h>
 #include <limits.h>
@@ -1921,8 +1922,11 @@ static int __cmd_report(bool display_info)
 	return err;
 }
 
+static volatile sig_atomic_t done;
+
 static void sighandler(int sig __maybe_unused)
 {
+	done = 1;
 }
 
 static int check_lock_contention_options(const struct option *options,
@@ -2060,6 +2064,7 @@ static int __cmd_contention(int argc, const char **argv)
 			goto out_delete;
 		}
 
+		done = 0;
 		signal(SIGINT, sighandler);
 		signal(SIGCHLD, sighandler);
 		signal(SIGTERM, sighandler);
@@ -2127,8 +2132,11 @@ static int __cmd_contention(int argc, const char **argv)
 		if (argc)
 			evlist__start_workload(con.evlist);
 
-		/* wait for signal */
-		pause();
+		while (!done) {
+			if (argc && waitpid(con.evlist->workload.pid, NULL, WNOHANG) > 0)
+				break;
+			sleep(1);
+		}
 
 		lock_contention_stop();
 		lock_contention_read(&con);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
  2026-05-20 10:20 ` [PATCH v4 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
@ 2026-05-20 11:18   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-20 11:18 UTC (permalink / raw)
  To: Swapnil Sapkal; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The polling loop introduces unnecessary periodic wakeups when running system-wide without a workload.
--

commit 7bab731e30cd90b3c5ff6f388399fa3469974523
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>

perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()

This commit addresses a lost-wakeup race in schedstat_live where a
short-lived workload's SIGCHLD could be consumed before entering pause().
It replaces the pause() call with a polling loop.

> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c

[ ... ]

> @@ -4698,8 +4699,11 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
>  	if (argc)
>  		evlist__start_workload(evlist);
>  
> -	/* wait for signal */
> -	pause();
> +	while (!done) {
> +		if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
> +			break;
> +		sleep(1);
> +	}

[Severity: Low]
When no workload is provided (argc == 0), there is no child process and thus
no risk of a SIGCHLD lost-wakeup race. Does replacing pause() with a 
sleep(1) loop unconditionally introduce unnecessary periodic wakeups for
the system-wide case?

>  
>  	if (reset) {
>  		err = disable_sched_schedstat();

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520102017.293419-1-swapnil.sapkal@amd.com?part=2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads
  2026-05-20 10:20 [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
                   ` (2 preceding siblings ...)
  2026-05-20 10:20 ` [PATCH v4 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
@ 2026-05-20 19:58 ` Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-05-20 19:58 UTC (permalink / raw)
  To: Swapnil Sapkal
  Cc: peterz, mingo, namhyung, irogers, james.clark, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, ravi.bangoria,
	linux-perf-users, linux-kernel

On Wed, May 20, 2026 at 10:20:14AM +0000, Swapnil Sapkal wrote:
> Several perf subcommands (sched stats, lock contention) use the pattern
> of forking a workload child, calling evlist__start_workload() to uncork
> it, and then calling pause() to wait for a signal (typically SIGCHLD
> when the child exits, or SIGINT/SIGTERM from the user).
> 
> This pattern has a race condition: if the workload is very short-lived,
> the child can exit and deliver SIGCHLD in the window between
> evlist__start_workload() and pause(). Since pause() only returns when a
> signal is received *while the process is suspended*, and SIGCHLD has
> already been delivered and handled by the empty sighandler(), pause()
> blocks indefinitely.
> 
> The fix replaces pause() with a simpler approach:
> 
>  - The signal handler now sets a 'volatile sig_atomic_t done' flag
>    to record delivery.  The flag is reset before handler registration
>    so that a signal arriving during the setup phase is not discarded.
> 
>  - Replace pause() with a unified loop:
> 
>        while (!done) {
>            if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
>                break;
>            sleep(1);
>        }
> 
>    This handles both workload mode (child exit detected via waitpid)
>    and system-wide mode (user sends SIGINT/SIGTERM setting done).
>    Using WNOHANG avoids the SA_RESTART problem where a blocking
>    waitpid() would auto-restart and ignore the done flag if the child
>    doesn't exit on signal.
> 
> Three call sites are affected across two files:
>   - perf_sched__schedstat_record() in builtin-sched.c
>   - perf_sched__schedstat_live()   in builtin-sched.c
>   - __cmd_contention()             in builtin-lock.c
> 
> The two pause() sites in builtin-kwork.c are NOT affected because they
> do not register SIGCHLD or fork workload children; they only wait for
> user-initiated SIGINT/SIGTERM.

Thanks, applied to perf-tools-next, for v7.2.

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-20 19:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 10:20 [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
2026-05-20 10:20 ` [PATCH v4 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
2026-05-20 10:20 ` [PATCH v4 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
2026-05-20 11:18   ` sashiko-bot
2026-05-20 10:20 ` [PATCH v4 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
2026-05-20 19:58 ` [PATCH v4 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox