[PATCH v3 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads

public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads
@ 2026-04-22  5:05 Swapnil Sapkal
  2026-04-22  5:05 ` [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Swapnil Sapkal @ 2026-04-22  5:05 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

Several perf subcommands (sched stats, lock contention) use the pattern
of forking a workload child, calling evlist__start_workload() to uncork
it, and then calling pause() to wait for a signal (typically SIGCHLD
when the child exits, or SIGINT/SIGTERM from the user).

This pattern has a race condition: if the workload is very short-lived,
the child can exit and deliver SIGCHLD in the window between
evlist__start_workload() and pause(). Since pause() only returns when a
signal is received *while the process is suspended*, and SIGCHLD has
already been delivered and handled by the empty sighandler(), pause()
blocks indefinitely.

The fix replaces pause() with a simpler approach:

 - When a workload is given (argc), use waitpid() to directly wait for
   the child process to exit.  This is race-free since waitpid()
   collects the child's status regardless of when it exited.

 - In system-wide mode (no workload), use 'while (!done) sleep(1)' to
   wait for SIGINT/SIGTERM.  The signal handler sets a
   'volatile sig_atomic_t done' flag, and sleep() is interrupted by
   signal delivery so the flag is checked promptly.

Three call sites are affected across two files:
  - perf_sched__schedstat_record() in builtin-sched.c
  - perf_sched__schedstat_live()   in builtin-sched.c
  - __cmd_contention()             in builtin-lock.c

The two pause() sites in builtin-kwork.c are NOT affected because they
do not register SIGCHLD or fork workload children; they only wait for
user-initiated SIGINT/SIGTERM.

Changes since v2:
  - Replaced sigsuspend()/sigprocmask() approach with the much simpler
    waitpid() + while(!done) sleep(1) pattern as suggested by
    Namhyung.  This eliminates all signal mask manipulation while
    still being race-free.

Changes since v1:
  - Moved sigprocmask() to after evlist__prepare_workload() so the
    forked child does not inherit a blocked SIGCHLD mask, which would
    break workloads relying on SIGCHLD (Sashiko review)
  - Block SIGINT and SIGTERM in addition to SIGCHLD to prevent an
    early Ctrl+C during setup from being consumed before sigsuspend().
  - Error paths before sigprocmask no longer need mask restoration
    since the mask is not yet modified at that point.
    (Sashiko review)

Swapnil Sapkal (3):
  perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
  perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
  perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()

 tools/perf/builtin-lock.c  | 17 +++++++++++++----
 tools/perf/builtin-sched.c | 28 ++++++++++++++++++++--------
 2 files changed, 33 insertions(+), 12 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
  2026-04-22  5:05 [PATCH v3 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
@ 2026-04-22  5:05 ` Swapnil Sapkal
  2026-04-22  5:29   ` sashiko-bot
  2026-04-22  5:05 ` [PATCH v3 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
  2026-04-22  5:05 ` [PATCH v3 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
  2 siblings, 1 reply; 8+ messages in thread
From: Swapnil Sapkal @ 2026-04-22  5:05 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

If the profiled workload exits very quickly, SIGCHLD can be delivered
and consumed by the empty signal handler before the process enters
pause(), causing an indefinite hang.

Fix this with a simpler approach:

 - When a workload is given, use waitpid() to directly wait for the
   child to exit.  This is race-free since waitpid() will collect the
   child regardless of when it exited.

 - In system-wide mode (no workload), use 'while (!done) sleep(1)'
   to wait for SIGINT/SIGTERM.  The signal handler now sets a
   'volatile sig_atomic_t done' flag, and sleep() is interrupted
   by signal delivery, so the flag is checked promptly.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
 tools/perf/builtin-sched.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3f509cfdd58c..cfd93bf11c2e 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -36,6 +36,7 @@
 #include <linux/zalloc.h>
 #include <sys/prctl.h>
 #include <sys/resource.h>
+#include <sys/wait.h>
 #include <inttypes.h>
 
 #include <errno.h>
@@ -3757,8 +3758,11 @@ static int process_synthesized_schedstat_event(const struct perf_tool *tool,
 	return 0;
 }
 
+static volatile sig_atomic_t done;
+
 static void sighandler(int sig __maybe_unused)
 {
+	done = 1;
 }
 
 static int enable_sched_schedstats(int *reset)
@@ -3899,11 +3903,15 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
 	if (err < 0)
 		goto out;
 
-	if (argc)
-		evlist__start_workload(evlist);
+	done = 0;
 
-	/* wait for signal */
-	pause();
+	if (argc) {
+		evlist__start_workload(evlist);
+		waitpid(evlist->workload.pid, NULL, 0);
+	} else {
+		while (!done)
+			sleep(1);
+	}
 
 	if (reset) {
 		err = disable_sched_schedstat();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
  2026-04-22  5:05 [PATCH v3 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
  2026-04-22  5:05 ` [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
@ 2026-04-22  5:05 ` Swapnil Sapkal
  2026-04-22  7:20   ` sashiko-bot
  2026-04-22  5:05 ` [PATCH v3 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
  2 siblings, 1 reply; 8+ messages in thread
From: Swapnil Sapkal @ 2026-04-22  5:05 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

perf_sched__schedstat_live() has the same lost-wakeup race as
perf_sched__schedstat_record(): a short-lived workload's SIGCHLD
can be consumed by the signal handler before pause() is entered,
hanging the process.

Apply the same fix: use waitpid() for workload mode and
'while (!done) sleep(1)' for system-wide mode.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
 tools/perf/builtin-sched.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index cfd93bf11c2e..36fbc20dbebd 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -4695,11 +4695,15 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
 	if (err < 0)
 		goto out;
 
-	if (argc)
-		evlist__start_workload(evlist);
+	done = 0;
 
-	/* wait for signal */
-	pause();
+	if (argc) {
+		evlist__start_workload(evlist);
+		waitpid(evlist->workload.pid, NULL, 0);
+	} else {
+		while (!done)
+			sleep(1);
+	}
 
 	if (reset) {
 		err = disable_sched_schedstat();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()
  2026-04-22  5:05 [PATCH v3 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
  2026-04-22  5:05 ` [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
  2026-04-22  5:05 ` [PATCH v3 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
@ 2026-04-22  5:05 ` Swapnil Sapkal
  2026-04-22 11:31   ` sashiko-bot
  2 siblings, 1 reply; 8+ messages in thread
From: Swapnil Sapkal @ 2026-04-22  5:05 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	ravi.bangoria, linux-perf-users, linux-kernel, Swapnil Sapkal

__cmd_contention() suffers from the same lost-wakeup race as the perf
sched stats paths: SIGCHLD can be consumed by the signal handler
before pause() is entered, hanging the process.

Apply the same fix: use waitpid() for workload mode and
'while (!done) sleep(1)' for system-wide mode.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
 tools/perf/builtin-lock.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index e8962c985d34..3c165d632941 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -31,6 +31,7 @@
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/prctl.h>
+#include <sys/wait.h>
 #include <semaphore.h>
 #include <math.h>
 #include <limits.h>
@@ -1921,8 +1922,11 @@ static int __cmd_report(bool display_info)
 	return err;
 }
 
+static volatile sig_atomic_t done;
+
 static void sighandler(int sig __maybe_unused)
 {
+	done = 1;
 }
 
 static int check_lock_contention_options(const struct option *options,
@@ -2124,11 +2128,16 @@ static int __cmd_contention(int argc, const char **argv)
 
 	if (use_bpf) {
 		lock_contention_start();
-		if (argc)
-			evlist__start_workload(con.evlist);
 
-		/* wait for signal */
-		pause();
+		done = 0;
+
+		if (argc) {
+			evlist__start_workload(con.evlist);
+			waitpid(con.evlist->workload.pid, NULL, 0);
+		} else {
+			while (!done)
+				sleep(1);
+		}
 
 		lock_contention_stop();
 		lock_contention_read(&con);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
  2026-04-22  5:05 ` [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
@ 2026-04-22  5:29   ` sashiko-bot
  2026-04-22 21:21     ` Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: sashiko-bot @ 2026-04-22  5:29 UTC (permalink / raw)
  To: Swapnil Sapkal; +Cc: linux-perf-users

Sashiko AI review found 2 potential issue(s):
- [High] Blocking waitpid() ignores the cancellation flag, causing a hang or premature exit.
- [Medium] Unconditional `done = 0;` assignment discards termination signals received during initialization.
--

commit 1c95f818d983ad8f7f5ed28e4a0036553b2d2758
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>

perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()

This commit replaces pause() with waitpid() or a sleep() loop to fix an
indefinite hang if SIGCHLD is delivered before the process enters pause().
It also introduces a sig_atomic_t flag set by the signal handler.

> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 3f509cfdd58c4..cfd93bf11c2ee 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
[ ... ]
> @@ -3899,11 +3903,15 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
>  	if (err < 0)
>  		goto out;
>  
> -	if (argc)
> -		evlist__start_workload(evlist);
> +	done = 0;

Is it possible that this unconditional assignment discards termination signals
received during initialization?

Since the signal handlers are registered at the beginning of the function,
if a user sends SIGINT during the setup phase, the handler will set done to 1.
This assignment would then erase the early cancellation request, requiring the
user to send the signal again.

>  
> -	/* wait for signal */
> -	pause();
> +	if (argc) {
> +		evlist__start_workload(evlist);
> +		waitpid(evlist->workload.pid, NULL, 0);

Could this blocking waitpid() cause the tool to ignore the cancellation flag?

Because perf is compiled with _GNU_SOURCE, glibc's signal() uses the
SA_RESTART flag by default. This causes waitpid() to automatically restart
after the signal handler returns.

If the profiled workload ignores or delays handling SIGINT, waitpid() will
continue to block indefinitely and ignore the done flag.

Conversely, if SA_RESTART were disabled, waitpid() would return -1 (EINTR).
Since the return value is not checked, the program would exit prematurely
and leave the workload orphaned in the background.

Would it be safer to use waitpid() with WNOHANG inside a loop that checks
the done flag?

> +	} else {
> +		while (!done)
> +			sleep(1);
> +	}
>  
>  	if (reset) {
>  		err = disable_sched_schedstat();

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422050545.129448-1-swapnil.sapkal@amd.com?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
  2026-04-22  5:05 ` [PATCH v3 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
@ 2026-04-22  7:20   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-04-22  7:20 UTC (permalink / raw)
  To: Swapnil Sapkal; +Cc: linux-perf-users

Sashiko AI review found 2 potential issue(s):
- [High] Overwriting `done` with `0` defeats the intended fix for the lost-wakeup race, causing an indefinite hang.
- [High] The blocking `waitpid()` ignores `SIGTERM` and `SIGINT` because of `SA_RESTART` semantics, causing the tool to hang unresponsively.
--

commit 5047a8fcd3c2b9c566153aee1d997f695b5b0543
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>

perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()

This commit addresses a lost-wakeup race in perf_sched__schedstat_live()
by replacing pause() with waitpid() for workloads, and using a while (!done)
sleep loop for system-wide mode.

> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index cfd93bf11c2ee..36fbc20dbebd8 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -4695,11 +4695,15 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
>  	if (err < 0)
>  		goto out;
>  
> -	if (argc)
> -		evlist__start_workload(evlist);
> +	done = 0;

Could setting done = 0 here inadvertently overwrite the state if a signal
was received during the setup phase?

If a termination signal like SIGINT or SIGTERM is delivered during the
earlier perf_event__synthesize_schedstat() call, the signal handler
sets done = 1. This assignment unconditionally resets it back to 0, which
seems to bypass the intended fix and could cause the subsequent loop to
hang indefinitely.

>  
> -	/* wait for signal */
> -	pause();
> +	if (argc) {
> +		evlist__start_workload(evlist);
> +		waitpid(evlist->workload.pid, NULL, 0);

Could this blocking waitpid() cause the tool to become unresponsive to
SIGINT or SIGTERM?

Since the tool uses signal() for handler registration, glibc provides BSD
semantics which sets the SA_RESTART flag. If a signal arrives while
waitpid() is blocked, the signal handler executes and sets done = 1, but
the waitpid() syscall is automatically restarted by the kernel instead of
returning EINTR.

Because the code is blocked inside waitpid() and doesn't check the done
flag, the tool might ignore termination signals and hang until the workload
completes.

Would a non-blocking wait combined with sleep prevent this issue?
For example:

while (!done) {
        if (waitpid(evlist->workload.pid, NULL, WNOHANG) != 0)
                break;
        sleep(1);
}

> +	} else {
> +		while (!done)
> +			sleep(1);
> +	}
>  
>  	if (reset) {
>  		err = disable_sched_schedstat();

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422050545.129448-1-swapnil.sapkal@amd.com?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()
  2026-04-22  5:05 ` [PATCH v3 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
@ 2026-04-22 11:31   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-04-22 11:31 UTC (permalink / raw)
  To: Swapnil Sapkal; +Cc: linux-perf-users

Sashiko AI review found 2 potential issue(s):
- [High] Blocking `waitpid()` without checking the `done` flag prevents clean teardown and ignores signals in workload mode.
- [Medium] The `done = 0` assignment introduces a race condition that causes early `SIGINT`/`SIGTERM` signals to be completely ignored.
--

commit 8ef94afdd78ecb3553bc6a801fe79f49d3d5680d
Author: Swapnil Sapkal <swapnil.sapkal@amd.com>

perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()

This commit replaces the pause() call with waitpid() for workload mode
and a sleep loop for system-wide mode to fix a lost-wakeup race.

> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index e8962c985d34a..3c165d632941f 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
[ ... ]
> @@ -2124,11 +2128,16 @@ static int __cmd_contention(int argc, const char **argv)
>  
>  	if (use_bpf) {
>  		lock_contention_start();
> -		if (argc)
> -			evlist__start_workload(con.evlist);
>  
> -		/* wait for signal */
> -		pause();
> +		done = 0;

Is it possible for this assignment to overwrite a signal received earlier?

The signal handlers are registered before BPF loading, which can take a
noticeable amount of time:

__cmd_contention() {
    ...
    signal(SIGINT, sighandler);
    ...
    err = lock_contention_prepare(&con);
    ...
}

If a user sends SIGINT during BPF loading, the handler correctly sets
done = 1, but this assignment then erases that state. Since done is
statically zero-initialized, could this assignment be safely removed?

> +
> +		if (argc) {
> +			evlist__start_workload(con.evlist);
> +			waitpid(con.evlist->workload.pid, NULL, 0);

Will this blocking waitpid() hang if the process receives a signal?

Signal handlers registered with glibc's signal() typically have the
SA_RESTART flag set by default. If perf receives a signal like SIGTERM
during this waitpid(), the handler sets done = 1, but the kernel will
automatically restart the interrupted waitpid().

If the child process ignores the signal or the signal is only sent to perf,
waitpid() resumes without checking the done flag. This prevents the process
from gracefully stopping the trace and printing the results.

Should this use a polling loop with WNOHANG to allow checking the done flag?

> +		} else {
> +			while (!done)
> +				sleep(1);
> +		}
>  
>  		lock_contention_stop();
>  		lock_contention_read(&con);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422050545.129448-1-swapnil.sapkal@amd.com?part=3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
  2026-04-22  5:29   ` sashiko-bot
@ 2026-04-22 21:21     ` Namhyung Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Namhyung Kim @ 2026-04-22 21:21 UTC (permalink / raw)
  To: sashiko; +Cc: Swapnil Sapkal, linux-perf-users

Hello,

On Wed, Apr 22, 2026 at 05:29:52AM +0000, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 2 potential issue(s):
> - [High] Blocking waitpid() ignores the cancellation flag, causing a hang or premature exit.
> - [Medium] Unconditional `done = 0;` assignment discards termination signals received during initialization.
> --
> 
> commit 1c95f818d983ad8f7f5ed28e4a0036553b2d2758
> Author: Swapnil Sapkal <swapnil.sapkal@amd.com>
> 
> perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
> 
> This commit replaces pause() with waitpid() or a sleep() loop to fix an
> indefinite hang if SIGCHLD is delivered before the process enters pause().
> It also introduces a sig_atomic_t flag set by the signal handler.
> 
> > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > index 3f509cfdd58c4..cfd93bf11c2ee 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> [ ... ]
> > @@ -3899,11 +3903,15 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
> >  	if (err < 0)
> >  		goto out;
> >  
> > -	if (argc)
> > -		evlist__start_workload(evlist);
> > +	done = 0;
> 
> Is it possible that this unconditional assignment discards termination signals
> received during initialization?
> 
> Since the signal handlers are registered at the beginning of the function,
> if a user sends SIGINT during the setup phase, the handler will set done to 1.
> This assignment would then erase the early cancellation request, requiring the
> user to send the signal again.

Right, as it's initialized to 0 you don't need to set it here.

> 
> >  
> > -	/* wait for signal */
> > -	pause();
> > +	if (argc) {
> > +		evlist__start_workload(evlist);
> > +		waitpid(evlist->workload.pid, NULL, 0);
> 
> Could this blocking waitpid() cause the tool to ignore the cancellation flag?
> 
> Because perf is compiled with _GNU_SOURCE, glibc's signal() uses the
> SA_RESTART flag by default. This causes waitpid() to automatically restart
> after the signal handler returns.
> 
> If the profiled workload ignores or delays handling SIGINT, waitpid() will
> continue to block indefinitely and ignore the done flag.
> 
> Conversely, if SA_RESTART were disabled, waitpid() would return -1 (EINTR).
> Since the return value is not checked, the program would exit prematurely
> and leave the workload orphaned in the background.
> 
> Would it be safer to use waitpid() with WNOHANG inside a loop that checks
> the done flag?

Right, something like this? (not tested)

	while (!done) {
		if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
			break;
		sleep(1);
	}

Thanks,
Namhyung

> 
> > +	} else {
> > +		while (!done)
> > +			sleep(1);
> > +	}
> >  
> >  	if (reset) {
> >  		err = disable_sched_schedstat();
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260422050545.129448-1-swapnil.sapkal@amd.com?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-22 21:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22  5:05 [PATCH v3 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
2026-04-22  5:05 ` [PATCH v3 1/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record() Swapnil Sapkal
2026-04-22  5:29   ` sashiko-bot
2026-04-22 21:21     ` Namhyung Kim
2026-04-22  5:05 ` [PATCH v3 2/3] perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live() Swapnil Sapkal
2026-04-22  7:20   ` sashiko-bot
2026-04-22  5:05 ` [PATCH v3 3/3] perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention() Swapnil Sapkal
2026-04-22 11:31   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox