From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95F0C40B6EE for ; Tue, 26 May 2026 18:32:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779820347; cv=none; b=HvXYFCMd20JPymMBGpwuxLGqdlPEQiwLz3BzgT02lFtANMfcjihoNo6ZOa6+RP8i4xcI6j6uIraCf4B6eH9NJvMvcgFfZyFosrbTvilkLz7i6WrH8UMdIlrioNwdcS6wK5yEw/Xs2iJZwmVm671m8al/ySzkAm12Fuuja86nO4w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779820347; c=relaxed/simple; bh=vkWY2pwRUjYo08lNGtur+k2dF0+dUG3dKa3M/HDliAY=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=rzXO11F7BW62fvNPFHv+t6r5kTVBgGk5KZKgOfmX0G0eO8i7szpOSJEZdB+cuu0CFs9QOcsd+nybfHQmVGUecJepUCc5fKYuzEABZgpdG6uyss+4QAVgmI4+9KNN2kGHpy2E/P2S+1HU5ar0Q7oyuGNsCVWipadqzznCa4akM48= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jGkxmEMu; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jGkxmEMu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 376271F000E9; Tue, 26 May 2026 18:32:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779820346; bh=L4LyxIQRJvYvCdQSZLQGMe3OtsJ3yVqg1xtRoG9GwWs=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=jGkxmEMuEtCbFJ5j4pGfRd4XkqFe64iNiL0oshOjeH4+4tjU2EDijah3C/RZBBr6z yBSzB+TG3WKi0n2i2FGdA0Dx5ohmaHG4xGLWSrUV4tX1TkX3RZezB1Zqtzgbj/C7D0 q5llJ9WfBHHi1JEpcZ/BM9VMz3TEaY9YVun3VsNmcMiHH3QQltcCIUwGIluBuwanwQ SV9V1XNnPIyyiS0j1HwspEm1XxiFvIsNE4ffKjdt5T8ReIF6iubU52xklToFHLgPWn hDPDUviJo5huGyM5xsEeVuYjdLHVt8rMLcIoWRCZsOG03GdBXIHF962d1ec3VK28a1 dXgvmY/CsFXSQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries Reply-To: sashiko-reviews@lists.linux.dev To: "Leo Yan" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <20260526-b4-arm_cs_callchain_support_v1-v6-3-f9f49f53c9dd@arm.com> References: <20260526-b4-arm_cs_callchain_support_v1-v6-3-f9f49f53c9dd@arm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 26 May 2026 18:32:25 +0000 Message-Id: <20260526183226.376271F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: New issues: - [High] Flushing the thread branch stack via thread_stack__flush() globall= y resets branch state for all CPUs a thread has run on, corrupting concurre= nt branch traces. - [High] Unconditional NULL pointer dereference of `tidq->thread` in `cs_et= m__flush()` when flushing the thread stack. - [Medium] Forcing the creation of a full callstack introduces hidden CPU a= nd memory overhead even when only branch tracking is requested. Pre-existing issues: - [Critical] Heap buffer overflow in thread_stack__br_sample() when copying= branch entries. - [High] NULL pointer dereference in thread_stack__update_br_stack() when p= rocessing mixed perf events. -- commit 423cf82c1cfc3330f4878f3ed31b1cf46de41008 Author: Leo Yan perf cs-etm: Use thread-stack for last branch entries CS ETM maintains its own circular array for last branch entries, with local helpers to update, copy and reset the branch stack. This duplicates logic already provided by the common code. Record branch with thread_stack__event() and synthesize branch stack with thread_stack__br_sample(). This removes the local last_branch_rb buffer and position tracking. > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c [ ... ] > @@ -1579,6 +1492,37 @@ static inline u64 cs_etm__resolve_sample_time( > + /* Enable callchain so thread stack entry can be allocated */ > + thread_stack__event(tidq->thread, tidq->prev_packet->cpu, > + tidq->prev_packet->flags, from, to, size, > + etmq->buffer->buffer_nr + 1, true, > + tidq->br_stack_sz, 0); [Severity: High] This is a pre-existing issue, but is there a NULL pointer dereference in thread_stack__update_br_stack() when processing mixed perf events? If a thread stack is initially created by a normal PERF_RECORD_SAMPLE event via thread_stack__process(), it is initialized with br_stack_sz =3D 0. Consequently, ts->br_stack_rb is left as NULL. When an auxtrace record (from intel-pt or cs-etm) is processed subsequently, thread_stack__event() bypasses initialization because the thread stack alre= ady exists.=20 It then unconditionally calls thread_stack__update_br_stack() since tidq->br_stack_sz > 0, which dereferences ts->br_stack_rb->entries[...], causing an immediate crash. [Severity: Medium] Does passing true for the callstack parameter force the creation of a full callstack? This causes thread_stack__event() to parse PERF_IP_FLAG_CALL and PERF_IP_FLAG_RETURN events and maintain a full function call stack in memor= y. Could this introduce hidden CPU and memory overhead even when users only requested the lightweight --last-branch tracking? [ ... ] > @@ -1608,8 +1552,12 @@ static int cs_etm__synth_instruction_sample( > =20 > cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample); > =20 > - if (etm->synth_opts.last_branch) > + if (etm->synth_opts.last_branch) { > + thread_stack__br_sample(tidq->thread, tidq->packet->cpu, > + tidq->last_branch, > + tidq->br_stack_sz); [Severity: Critical] This is a pre-existing issue, but does thread_stack__br_sample() have a heap buffer overflow when copying branch entries? If ts->br_stack_sz is greater than the size of the destination buffer, the wrap-around logic is triggered. The second memcpy intended to copy the remaining elements is: memcpy(be, &src->entries[0], bsz * ts->br_stack_pos); It completely ignores the calculated nr bounds and copies up to ts->br_stac= k_sz entries into the destination buffer. For Intel PT, ts->br_stack_sz can be 1= 024 while the destination size is 16. Can this write hundreds of branch entries past the end of the array, leading to a heap buffer overflow? [ ... ] > @@ -1982,7 +1915,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, > =20 > /* Reset last branches after flush the trace */ > if (etm->synth_opts.last_branch) > - cs_etm__reset_last_branch_rb(tidq); > + thread_stack__flush(tidq->thread); [Severity: High] Does thread_stack__flush() globally reset the branch state for all CPUs a thread has run on? For threads that can run on multiple CPUs concurrently (e.g., the idle thre= ad), thread->ts points to an array of thread stacks for each CPU, and thread_stack__flush() iterates over ts->arr_sz and resets all of them. Can this cause a trace discontinuity on one CPU's queue to wipe out the act= ive branch histories and call chains of the same thread on all other CPUs? [Severity: High] Can tidq->thread be NULL here? In cs_etm__init_traceid_queue(), tidq->thread is initialized via machine__findnew_thread(), which may return NULL on allocation failure. If tidq->thread is NULL, won't thread_stack__flush() cause an immediate NULL pointer dereference? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260526-b4-arm_cs_= callchain_support_v1-v6-0-f9f49f53c9dd@arm.com?part=3D3