public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads
@ 2026-04-01  6:41 Swapnil Sapkal
  2026-04-01  6:41 ` [PATCH 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record() Swapnil Sapkal
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Swapnil Sapkal @ 2026-04-01  6:41 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	gautham.shenoy, ravi.bangoria, KPrateek.Nayak, linux-perf-users,
	linux-kernel, Swapnil Sapkal

Some perf subcommands (sched stats, lock contention) use the pattern
of forking a workload child, calling evlist__start_workload() to uncork
it, and then calling pause() to wait for a signal (typically SIGCHLD
when the child exits, or SIGINT/SIGTERM from the user).

This pattern has a race condition: if the workload is very short-lived,
the child can exit and deliver SIGCHLD in the window between
evlist__start_workload() and pause(). Since pause() only returns when a
signal is received *while the process is suspended*, and SIGCHLD has
already been delivered and handled by the empty sighandler(), pause()
blocks indefinitely.

The fix uses the standard POSIX pattern for this class of bug:

1. Block SIGCHLD (via sigprocmask) before starting the workload.
   If the child exits, the signal remains pending rather than being
   delivered and lost.

2. Replace pause() with sigsuspend(&oldmask), which atomically
   unblocks SIGCHLD and suspends the process. There is no window
   where the signal can slip through unnoticed.

3. Restore the original signal mask after sigsuspend() returns.

SIGINT and SIGTERM are not blocked at any point, so Ctrl+C and
graceful termination continue to work exactly as before.

Three call sites are affected across two files:
  - perf_sched__schedstat_record() in builtin-sched.c
  - perf_sched__schedstat_live()   in builtin-sched.c
  - __cmd_contention()             in builtin-lock.c

The two pause() sites in builtin-kwork.c are NOT affected because they
do not register SIGCHLD or fork workload children; they only wait for
user-initiated SIGINT/SIGTERM.

Swapnil Sapkal (3):
  perf sched stats: Fix SIGCHLD race in schedstat_record()
  perf sched stats: Fix SIGCHLD race in schedstat_live()
  perf lock contention: Fix SIGCHLD race in __cmd_contention()

 tools/perf/builtin-lock.c  | 20 ++++++++++++++++++--
 tools/perf/builtin-sched.c | 30 ++++++++++++++++++++++++++----
 2 files changed, 44 insertions(+), 6 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-09 16:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01  6:41 [PATCH 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads Swapnil Sapkal
2026-04-01  6:41 ` [PATCH 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record() Swapnil Sapkal
2026-04-01 16:26   ` Ian Rogers
2026-04-09 16:29     ` Swapnil Sapkal
2026-04-01  6:41 ` [PATCH 2/3] perf sched stats: Fix SIGCHLD race in schedstat_live() Swapnil Sapkal
2026-04-01  6:41 ` [PATCH 3/3] perf lock contention: Fix SIGCHLD race in __cmd_contention() Swapnil Sapkal
2026-04-01 10:55 ` [PATCH 0/3] perf: Fix SIGCHLD vs pause() race with short-lived workloads James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox