From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F96236B06F for ; Mon, 13 Apr 2026 09:41:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776073271; cv=none; b=LshcnIBv/IVVwqLbD1xtLQDGqsG/SUy29EXmLA2ODNqauN0lVz5R//Xe8dYR2XBw8Ceu3ivlwPkLv8hYaC5YnOiuerJwb3PNr5po6AlobTKK68BO2rUYfZTba+MH70z7ITP/d1FTxLuZ1GHHok6ED1mONW6CdP2OS/RjgL0PLNY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776073271; c=relaxed/simple; bh=MvsSjY800qzV9UiV/BKtYfdNcsgsGJXeu6GKFLkpDAY=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=p0WQcUCDexkkuMKL/V9whlVMp5XSQ9WsTq6Mi/b54JbpGqEAvZGPl2NKUteiuv1krtdAHdlplTNIHOnCn98rgKVEhIMvz4B5GwcxTVwCchKbNVlF+NO+9FLbhs/e2ottVt+DjpxW9Eb97wPN7R6yU3VX+Sfmws/bwuehxV3ORjw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XV1gnFUS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XV1gnFUS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3CAAC116C6; Mon, 13 Apr 2026 09:41:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776073270; bh=MvsSjY800qzV9UiV/BKtYfdNcsgsGJXeu6GKFLkpDAY=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=XV1gnFUSV92aB5sLX/dlK6gef6OGn8J9er30Em+bu2yXAZIUOpSmVyhkh7mCI21KG Z/SJF/2BrybKKuVwlwGxWwoiBPOaEXP+iOO3V9am4HC4/GKbvjTvsS5wo2Jh7iAt7j x9ttEmDymVvykJ2Nz7lRMSZdUcTs+sqbrheGZNnJHlwna7niy8F1kua5K/UP5r35QR Nm/aTy1cGdkVwp8McE5eP4nymEjpLucNBk88eac/IwOJ3IZPxQP4Svw1Rr3hhVJYYK 8AgNZnbCb9vvLZcPmRwxafmlo+V+3w2vSyttfA1HSekqM839AXIUnKs0Of5Hn7umCw ZFcc2XP+Pw8fA== From: sashiko-bot@kernel.org Subject: Re: [GIT PULL] Performance events changes for v7.1 Reply-To: sashiko@lists.linux.dev To: "Ingo Molnar" Cc: linux-perf-users@vger.kernel.org In-Reply-To: References: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Mon, 13 Apr 2026 09:41:09 +0000 Message-Id: <20260413094110.C3CAAC116C6@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Sashiko AI review found 2 potential issue(s): - [Critical] Pre-existing Stack Buffer Overflow in AMD IBS NMI handler on C= PUs supporting `IBS_CAPS_OPDATA4`. - [High] Reference count leak of `perf_ctx_data` in `attach_global_ctx_data= ()` when racing with a concurrent non-global context creation. -- commit 167b4d230f2ffe8284d5afdbd849ef2f4f05ef9a Author: Ingo Molnar Performance events changes for v7.1 This commit pulls in various performance event changes for v7.1. It includes an optimization to allocate task_ctx_data quickly on large systems, updates and fixes for the AMD PMU driver IBS support, Intel PEBS constraint validation, and MSR driver changes. > diff --git a/kernel/events/core.c b/kernel/events/core.c > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5487,6 +5487,12 @@ attach_global_ctx_data(struct kmem_cache *ctx_cach= e) > cd =3D NULL; > } > if (!cd) { > + /* > + * Try to allocate context quickly before > + * traversing the whole thread list again. > + */ > + if (!attach_task_ctx_data(p, ctx_cache, true, GFP_NOWAIT)) > + continue; > get_task_struct(p); > goto alloc; > } Could the new GFP_NOWAIT fast-path in attach_global_ctx_data() leak the perf_ctx_data reference if it races with a concurrent non-global context creation? If attach_task_ctx_data() races with a per-task event attaching a non-global context, it will find an existing context where old is not NULL. It increme= nts old->refcount and returns 0, but intentionally ignores the global argument and does not set old->global to 1. Because it returns success, attach_global_ctx_data() executes continue and skips to the next thread. The context for the thread is left with an extra reference but global remains 0. Later, when the global context is detached, __detach_global_ctx_data() iterates over all threads and only calls detach_task_ctx_data(p) if cd and cd->global are true. Since cd->global is 0, the context is skipped, permanently leaking the perf_ctx_data allocation. > @@ -5497,7 +5503,7 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache) > =20 > return 0; > alloc: > - ret =3D attach_task_ctx_data(p, ctx_cache, true); > + ret =3D attach_task_ctx_data(p, ctx_cache, true, GFP_KERNEL); > put_task_struct(p); > if (ret) { > __detach_global_ctx_data(); Can this result in a double-increment leak? If the same race occurs, attach_task_ctx_data() bumps the refcount without setting global to 1 and returns success. Then goto again restarts the loop. When the loop reaches the thread again, it evaluates: cd =3D rcu_dereference(p->perf_ctx_data); if (cd && !cd->global) { cd->global =3D 1; refcount_inc_not_zero(&cd->refcount); } This bumps the refcount a second time. Since __detach_global_ctx_data() only decrements once, the extra reference is leaked. > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-= index.h > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -698,6 +698,8 @@ > #define MSR_AMD64_IBSBRTARGET 0xc001103b > #define MSR_AMD64_ICIBSEXTDCTL 0xc001103c > #define MSR_AMD64_IBSOPDATA4 0xc001103d > +#define MSR_AMD64_IBSOPCTL2 0xc001103e > +#define MSR_AMD64_IBSFETCHCTL2 0xc001103f > #define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET = */ This isn't a regression from the current patch, but looking at the adjacent MSR handling code, is there a potential stack buffer overflow in the IBS NMI handler on CPUs supporting IBS_CAPS_OPDATA4? In arch/x86/events/amd/ibs.c, struct perf_ibs_data is allocated on the stack in perf_ibs_handle_irq(). Its regs array is sized using MSR_AMD64_IBS_REG_COUNT_MAX, which is 8. When handling a perf_ibs_op event, the initial read loop fetches up to 7 MSRs, advancing the pointer buf to &ibs_data.regs[7]. Subsequently, the handler reads additional MSRs depending on CPU caps: if (event->attr.sample_type & PERF_SAMPLE_RAW) { if (perf_ibs =3D=3D &perf_ibs_op) { if (ibs_caps & IBS_CAPS_BRNTRGT) { rdmsrq(MSR_AMD64_IBSBRTARGET, *buf++); br_target_idx =3D size; size++; } if (ibs_caps & IBS_CAPS_OPDATA4) { rdmsrq(MSR_AMD64_IBSOPDATA4, *buf++); size++; } } If the CPU supports IBS_CAPS_BRNTRGT, MSR_AMD64_IBSBRTARGET is read into regs[7], advancing buf to ®s[8]. If the CPU also supports IBS_CAPS_OPDATA4, it reads MSR_AMD64_IBSOPDATA4 in= to the next pointer position, which translates to regs[8]. This writes one element past the end of the 8-element array, overwriting adjacent local variables on the stack. Furthermore, ibs_data.size is incremented to 9, so perf_sample_save_raw_data() will subsequently read 72 bytes of regs, exposing the corrupted stack data to user-space in the raw perf record. Sin= ce unprivileged users can configure RAW IBS events, this might lead to unintended behavior. Should MSR_AMD64_IBS_REG_COUNT_MAX be increased to account for IBS_CAPS_OPDATA4? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/adynEfZ0r_GIu06k@gm= ail.com?part=3D1