From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3078D1DE4FB for ; Tue, 23 Jun 2026 19:46:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782243981; cv=none; b=LE1BPzfo9EgQBG1NmTcMTJkvxxSdC5DNv6VBOBB89gy8k0FbrloZbO1l/1vBsSzGS+2ooWPbNWVrCig4pJKXiVKDJU/ReJQn5yBR9ihSnf5CifZ6S10GrscWoijWZ7O89gICwXUN9gDkyYpC4hoDmwnfeS3l/bFTVNZdQaAnChw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782243981; c=relaxed/simple; bh=m4B5vMh/o47S5POOX29zQeyx7jgSY2eMRd09QJSC2aw=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=RMz47h49CqFctnvGWz2XzjvPtHqDKhB2e1lBYuSdk1LZsD2cU9wnVYwfpYAImk9wJIeeqV7NrwpyDig+IHDO10dUcwaKicRhCy/zG7Qglce7FU6sGocLXjUuF5oQdgp7sxGPGhkgH8aELxjiWs7ddPap+Dp+ZC/5FCuhtdf/9Ls= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J4Ko6Y8e; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J4Ko6Y8e" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B3B4E1F000E9; Tue, 23 Jun 2026 19:46:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782243979; bh=ujDS3aVoudhawEQ6ZMdTp6Uk7EAL1kzM0xSSASvZelY=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=J4Ko6Y8esb1hxN+togE21KRypab41EHbsMMzxN6i//sHRHyUNomwVhI1vx2hZ0tzT NwXVyPjemyoOCpM8JLw6K0Espevjj68b88xvGz1W/Xafu1Yi+ymZL1gSPXCMglTaoM uEm2stBHfXEYvFnQYmtcJ7TZdiQOSykbYLb2/G7G+SvmajRvvaSvqBsuuuCEWrITPQ nGxkJb3Rf/sf4+57vy6GrM0/smK2ZOsoD4c8pHJp2MQAqQtR50jMRqbw2FY151RbF6 AWG0MLVyIY2LBR6I9P6Jqrh8XuISL0jkv/RY1KhRKhHxYEBzon8OvZ7GeknbT6JXEd QDLVqg4S9kHuw== From: sashiko-bot@kernel.org Subject: Re: [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries Reply-To: sashiko-reviews@lists.linux.dev To: "Leo Yan" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v7-4-1ba770c862ae@arm.com> References: <20260611-b4-arm_cs_callchain_support_v1-v7-4-1ba770c862ae@arm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 23 Jun 2026 19:46:19 +0000 Message-Id: <20260623194619.B3B4E1F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: New issues: - [High] Cross-CPU State Corruption via global thread stack flush in `cs_et= m__flush()` Pre-existing issues: - [Critical] Heap buffer overflow in `thread_stack__br_sample()` when `ts->= br_stack_sz > sz` - [High] Unbounded memory leak for per-CPU threads when `callstack` is false - [High] NULL pointer dereference crash when `thread_stack__event()` is int= erleaved with `thread_stack__process()` -- commit b1984c35dca6a97fee9de9c9a7a5db94fd997b41 Author: Leo Yan perf cs-etm: Use thread-stack for last branch entries This commit modifies the CoreSight ETM trace code to use the common thread_stack helper functions for last branch entries. It removes the private last_branch_rb circular buffer in favor of tracking branch history at the thread level. > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 4127120459418..8798bf0471faf 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c [ ... ] > @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(stru= ct cs_etm_queue *etmq, [ ... ] > +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq, > + struct cs_etm_traceid_queue *tidq) > +{ [ ... ] > + if (etmq->etm->synth_opts.last_branch) { > + from =3D cs_etm__last_executed_instr(tidq->prev_packet); > + to =3D cs_etm__first_executed_instr(tidq->packet); > + > + size =3D cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from); > + > + /* Enable callchain so thread stack entry can be allocated */ > + thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu, > + tidq->prev_packet->flags, from, to, size, > + etmq->buffer->buffer_nr + 1, false, > + tidq->br_stack_sz, 0); [Severity: High] This is a pre-existing issue, but does passing false for callstack here trigger an unbounded memory leak in thread_stack__event()? In tools/perf/util/thread-stack.c, thread__cpu_stack() will unconditionally return NULL if !ts->stack. Because callstack=3Dfalse skips allocating ts->stack, thread__cpu_stack() constantly returns NULL. Subsequently, thread_stack__event() sees a missing stack and re-invokes thread_stack__new(), allocating and leaking a new ts->br_stack_rb ring buffer on every single branch event. This could lead to rapid OOM crashes during trace decoding. [Severity: High] This is also a pre-existing issue, but could the new thread stack usage cause a NULL pointer dereference crash if thread_stack__event() is interleaved with thread_stack__process()? If perf script is run with a DB export script, db_export__sample() calls thread_stack__process(), which supersedes the thread stack with br_stack_sz=3D0, leaving ts->br_stack_rb as NULL. If a subsequent branch packet invokes thread_stack__event() here with br_stack_sz > 0, thread_stack__update_br_stack() will unconditionally dereference the NULL ts->br_stack_rb buffer. [ ... ] > @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct= cs_etm_queue *etmq, > =20 > cs_etm__copy_insn(etmq, tidq, packet, &sample); > =20 > - if (etm->synth_opts.last_branch) > + if (etm->synth_opts.last_branch) { > + thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu, > + tidq->last_branch, tidq->br_stack_sz); [Severity: Critical] This isn't a bug introduced by this patch, but does calling thread_stack__br_sample() risk a heap buffer overflow here if ts->br_stack_sz > sz? In tools/perf/util/thread-stack.c:thread_stack__br_sample(), when wrapping the ring buffer, the memcpy incorrectly uses ts->br_stack_pos as the size scalar instead of the newly computed nr bound: if (src->nr >=3D ts->br_stack_sz) { sz -=3D nr; be =3D &dst->entries[nr]; nr =3D min(ts->br_stack_pos, sz); memcpy(be, &src->entries[0], bsz * ts->br_stack_pos); } If another tracing mechanism (like intel_pt) inflates ts->br_stack_sz beyond the requested branch stack size sz, this memcpy could write far beyond the end of the dst->entries allocation. [ ... ] > @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, > =20 > /* Reset last branches after flush the trace */ > if (etm->synth_opts.last_branch) > - cs_etm__reset_last_branch_rb(tidq); > + thread_stack__flush(tidq->frontend_thread); [Severity: High] Could this global thread stack flush cause cross-CPU state corruption? When a trace discontinuity or block end on a single CPU invokes cs_etm__flush(), thread_stack__flush() unconditionally loops over all CPUs for the given thread.=20 For globally shared threads (like the idle thread) or multi-threaded processes, a local trace flush on one CPU appears to improperly reset and destroy the active branch stack arrays of all other concurrently traced CPUs. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260611-b4-arm_cs_= callchain_support_v1-v7-0-1ba770c862ae@arm.com?part=3D4