From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD60A13D503 for ; Tue, 13 Jan 2026 21:21:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768339280; cv=none; b=Vn9MvGWYh/SinPm3ujBB9KdzKVqoi+pP393ZJYeA2CG+laJDBJ4Cd2JhpLc48Wy8RTPe3Hdc7LFNSsAD7pSq1Kwq1iHyaztZUXJKuBqU10IpoSdG4wFuFZMdTCTjMN2xY8+CkVoTYvHGzDjXPFb/ooVuM5uwLcgBc6dmmrmit5k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768339280; c=relaxed/simple; bh=syT5L/d+zwOIQPfDp0D+/V06gdwd2KuvMKu8RWXChhM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=giUSvavSfZrPIvzxBelH8SK7A6UUNnACQbZ2G5J4N+hFXQgK8FYlhkxpJcJ7+zwqnE7GqaPALtpZbnCB0ChXD/v3FKwMQ705HCPzozNg+nq9E+NetFAAGSKQqtvyY3F9H1WJ8wFtNuwN3LozTeWH9AFwS6mI+Ay333+vXbpMoxg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o6kruWDm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o6kruWDm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EEE68C116C6; Tue, 13 Jan 2026 21:21:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768339280; bh=syT5L/d+zwOIQPfDp0D+/V06gdwd2KuvMKu8RWXChhM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=o6kruWDmtGC04qgrCcJuWOQVNtd4mOxR1jXXnqqo/wL9EhVdEcNAW8VGuftAimg5z Z9WdQjPJp/cnz/KIMmc1jJMgzs7zmiR6SgKeHvx7Bgp7jmsDhB3q4o841Z8+ABc8tF h9ePv4Lx2QIFHWdJvL9EgsRVs2Ni2LR0j58SpGvSlmgHDRxVq7Wa4nSNLw376Hvmil qvdIBQyinN6hk2TkoLwxZH3Ox5i8irAjfAGgavcHGNSa6qmw+3pAjOCcAF3TeC2cTa 6/KW32Czfq82OEPpbwEjhavWiHcET9MZeeMnkZp1HpkMJ4YxM7TzZR8INJy4QVCdsx V0IVumTxrv/7g== Date: Tue, 13 Jan 2026 13:21:18 -0800 From: Namhyung Kim To: Peter Zijlstra , Ingo Molnar Cc: Mark Rutland , Alexander Shishkin , Arnaldo Carvalho de Melo , LKML , Rosalie Fang Subject: Re: [PATCH] perf/core: Fix slow perf_event_task_exit() with LBR callstacks Message-ID: References: <20260112165157.1919624-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20260112165157.1919624-1-namhyung@kernel.org> On Mon, Jan 12, 2026 at 08:51:57AM -0800, Namhyung Kim wrote: > I got a report that a task is stuck in perf_event_exit_task() waiting > for global_ctx_data_rwsem. On large systems with lots threads, it'd > have performance issues when it grabs the lock to iterate all threads > in the system to allocate the context data. > > And it'd block task exit path which is problematic especially under > memory pressure. > > perf_event_open > perf_event_alloc > attach_perf_ctx_data > attach_global_ctx_data > percpu_down_write (global_ctx_data_rwsem) > for_each_process_thread > alloc_task_ctx_data > do_exit > perf_event_exit_task > percpu_down_read (global_ctx_data_rwsem) > > It should not hold the global_ctx_data_rwsem on the exit path. Let's > skip allocation for exiting tasks and free the data carefully. > > Reported-by: Rosalie Fang > Suggested-by: Peter Zijlstra > Signed-off-by: Namhyung Kim > --- > kernel/events/core.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 376fb07d869b8b50..e87bb43b7bb3dd4b 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5421,9 +5421,20 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache, > return -ENOMEM; > > for (;;) { > - if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) { > + if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) { It seems we need to keep this casting to suppress sparse warnings. Thanks, Namhyung > if (old) > perf_free_ctx_data_rcu(old); > + /* > + * Above try_cmpxchg() pairs with try_cmpxchg() from > + * detach_task_ctx_data() such that > + * if we race with perf_event_exit_task(), we must > + * observe PF_EXITING. > + */ > + if (task->flags & PF_EXITING) { > + /* detach_task_ctx_data() may free it already */ > + if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL)) > + perf_free_ctx_data_rcu(cd); > + } > return 0; > } > > @@ -5469,6 +5480,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache) > /* Allocate everything */ > scoped_guard (rcu) { > for_each_process_thread(g, p) { > + if (p->flags & PF_EXITING) > + continue; > cd = rcu_dereference(p->perf_ctx_data); > if (cd && !cd->global) { > cd->global = 1; > @@ -14562,8 +14575,11 @@ void perf_event_exit_task(struct task_struct *task) > > /* > * Detach the perf_ctx_data for the system-wide event. > + * > + * Done without holding global_ctx_data_rwsem; typically > + * attach_global_ctx_data() will skip over this task, but otherwise > + * attach_task_ctx_data() will observe PF_EXITING. > */ > - guard(percpu_read)(&global_ctx_data_rwsem); > detach_task_ctx_data(task); > } > > -- > 2.52.0.457.g6b5491de43-goog >