From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D55883B3BEC for ; Tue, 2 Jun 2026 07:46:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780386418; cv=none; b=mOWAkdJd+Vyz/GB1fGjqULlVp/Nn/4da2aq7qEg/88GG5FrhI/J/z7vTHWVy5ziIOWOMzlxEaFfd4c4IvRU/r/YRSBJm5DKSb4blDZXjZxQJX+cRCSF+nAH+WWh4UwZ7+EC4/8saKkbrniBJze7Sar+DqYZmVpYpJ+JppPPIB3o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780386418; c=relaxed/simple; bh=orIKeLzshiqw+3j7kfb49NUWTF7fhwCmzUwdXCZM7DU=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=rKKnPevFcy56rEZLIBn0gckOmUSSpeQ2IBUMWH6YDfP0q3cJL+iRbIwcLEMDHkKgB4p9UZlxplhBR+wLq6X6NoS9D+/amQdA5plE1MPoAmnQdVuEf1iBklJYSnICOWJbiYRH+4A2M0XVivvBguxaNz0meUag32zJpP6wS3JPTyA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a3tZG0+R; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a3tZG0+R" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 499C01F00893; Tue, 2 Jun 2026 07:46:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780386416; bh=YJufmRRRKoaX933obu9iqv0yqWCGO69XpW16Mge4rBA=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=a3tZG0+RAeQwgisMc4znCpDZE+3G/o2sLZLPk6nPyAciwO3ChzyEcskR2lJ46mbbR 4wa/ATCeGDXtwcwg8q2QzbeN/vTELj+EMPhdCRfkvQvMSI2sMqhih9N4DmvDBN1XLU AY6T0Xri77u1znFUUTZbgBYcsHLXkEM1aS9HueUATv423XRqJma2+ZM143vFMwuyWo Rw7J4m5uwRnQDE0C791s0jIW6MBPknch6K7zOV3laDXg5zS3smF1MEx/5eDobhsHhn Jag+TexWFXxrGU0ZQ+TkM+AKamvaGj4JznPa9dlAz4gnnURZh1JmI0b4oG/JaTeq0Y MLYjh5EG7GuGg== From: sashiko-bot@kernel.org Subject: Re: [PATCH v7] perf tpebs: Fix concurrent stop races and PID reuse hazards in tpebs_stop Reply-To: sashiko-reviews@lists.linux.dev To: "Ian Rogers" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <20260602073132.2653307-2-irogers@google.com> References: <20260602073132.2653307-2-irogers@google.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 02 Jun 2026 07:46:55 +0000 Message-Id: <20260602074656.499C01F00893@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: New issues: - [High] Spurious `-EBUSY` failures and skipped event initialization due to= a lockless `tpebs_cmd.pid` check racing with temporary PID restoration. Pre-existing issues: - [Critical] Unsynchronized check of `tpebs_cmd.pid` allows concurrent thre= ads to hit `assert(tpebs_cmd.pid =3D=3D 0)` and crash the process. - [High] Unprotected `pthread_join` window allows concurrent session start,= leading to orphaned processes and state corruption. - [High] Statically allocated `tpebs_cmd` struct is not re-initialized betw= een executions, silently skipping `waitpid()` and leaking zombie processes. - [High] Error paths in `evsel__tpebs_open()` fail to clean up file descrip= tors, child processes, and global state, permanently breaking TPEBS functio= nality. -- commit 6304c5f7736398fadcc6a882cdc4bef08d2b9994 Author: Ian Rogers perf tpebs: Fix concurrent stop races and PID reuse hazards in tpebs_stop This commit introduces a tpebs_stopping flag to prevent redundant stop execution paths. It also temporarily restores cmd.pid during finish_command() to ensure the process is reaped correctly. > diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c > index 8b615dc94e9ea..6d642356fd9dc 100644 > --- a/tools/perf/util/intel-tpebs.c > +++ b/tools/perf/util/intel-tpebs.c > @@ -37,6 +37,7 @@ static pthread_t tpebs_reader_thread; > static struct child_process tpebs_cmd; > static int control_fd[2], ack_fd[2]; > static struct mutex tpebs_mtx; > +static bool tpebs_stopping; [Severity: High] This is a pre-existing issue, but since tpebs_cmd is a static global, does it need to be re-initialized between executions? When a session finishes, finish_command() internally sets tpebs_cmd.finished =3D 1. When a subsequent session begins, evsel__tpebs_start_perf_record() calls start_command(&tpebs_cmd), which does not clear the finished flag from previous executions. When the second session stops, finish_command() will see finished =3D=3D 1 and skip waitpid() entirely, leaving subsequent perf record processes as zombies. [Severity: Critical] This isn't a bug introduced by this patch, but the unsynchronized check of tpebs_cmd.pid in evsel__tpebs_open() allows concurrent threads to bypass validation: int evsel__tpebs_open(...) { if (tpebs_cmd.pid !=3D 0) { If two threads call this concurrently while no session is running, both bypass the check. They serialize on tpebs_mtx in evsel__tpebs_prepare(), and both will observe !list_empty(&tpebs_results). Thread 1 successfully spawns the process and sets tpebs_cmd.pid. Once Thread 1 releases the lock, Thread 2 acquires it and calls evsel__tpebs_start_perf_record(), where it hits: assert(tpebs_cmd.pid =3D=3D 0); Can this cause the entire process to crash when the assertion fails? [Severity: High] This is a pre-existing issue, but the error path in evsel__tpebs_open() fails to fully clean up state when pthread_create() fails: if (pthread_create(&tpebs_reader_thread, NULL, __sample_reader, NULL)) { kill(tpebs_cmd.pid, SIGTERM); close(tpebs_cmd.out); pr_err("Could not create thread to process sample data.\n"); ret =3D -1; goto out; } This path does not close the control_fd and ack_fd pipes, and it does not reap the killed child with finish_command(). It also leaves tpebs_cmd.pid persistently non-zero. Would this permanently break TPEBS functionality for the process lifetime by causing future calls to locklessly observe a non-zero PID and return -EBUSY? [ ... ] > @@ -321,8 +322,14 @@ static int tpebs_stop(void) EXCLUSIVE_LOCKS_REQUIRED= (tpebs_mtx_get()) > { > int ret =3D 0; > =20 > + if (tpebs_stopping) > + return 0; > + > /* Like tpebs_start, we should only run tpebs_end once. */ > if (tpebs_cmd.pid !=3D 0) { > + pid_t actual_pid =3D tpebs_cmd.pid; > + > + tpebs_stopping =3D true; > tpebs_send_record_cmd(EVLIST_CTL_CMD_STOP_TAG); > tpebs_cmd.pid =3D 0; > mutex_unlock(tpebs_mtx_get()); > pthread_join(tpebs_reader_thread, NULL); [Severity: High] This is a pre-existing issue, but dropping the mutex while tpebs_cmd.pid is zero allows a concurrent call to evsel__tpebs_open() to spawn a brand new TPEBS session during the pthread_join() window. When this thread reacquires the lock, it will blindly close the file descriptors of the new session and clobber the new session's PID by restori= ng the old actual_pid. Does this leave the newly spawned process permanently untracked and its reader thread leaked? > mutex_lock(tpebs_mtx_get()); > close(control_fd[0]); > close(control_fd[1]); > close(ack_fd[0]); > close(ack_fd[1]); > close(tpebs_cmd.out); > + tpebs_cmd.pid =3D actual_pid; > ret =3D finish_command(&tpebs_cmd); > tpebs_cmd.pid =3D 0; [Severity: High] Does temporarily restoring tpebs_cmd.pid to a non-zero value here create a race condition with evsel__tpebs_open()? Since evsel__tpebs_open() performs a lockless check of tpebs_cmd.pid: int evsel__tpebs_open(...) { if (tpebs_cmd.pid !=3D 0) { struct tpebs_retire_lat *t; A concurrent thread could observe the restored non-zero PID, assume a sessi= on is actively running, and incorrectly return -EBUSY because it skipped the initialization sequence. > + tpebs_stopping =3D false; > if (ret =3D=3D -ERR_RUN_COMMAND_WAITPID_SIGNAL) > ret =3D 0; > } --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260602073132.2653= 307-2-irogers@google.com?part=3D1