From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8123C23373D; Wed, 7 Jan 2026 22:28:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767824915; cv=none; b=kcXkoL5BJY+r5BqAJnEq7FmH0YzDWYw8+lvtHWkjdZFG8/PS+Cde/kRQrW385vREoU+affrtzLBGL/hSf3QqLhBTw+otEyC3TGzMV7TAI1WPrM9rYF4r/85k5s8oI667vtHD6XEkpAoijxyyIKMDoMHYPMVmF+iiliwI0ArU5Y4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767824915; c=relaxed/simple; bh=PimGvbG5d0u5yDrL3sfsK25xon0Xmx3qpJd4Hy4Z0UE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=qTcyaFonakF3hwmzTL1ttBp9ugli5HVsBkvEg1Gb8Nt6jDtyKDEzFrY4GWIl111qFO8mjFm/Vxn2hEwhUpE38C8LMKjhphQVJdWiBfd/r9HfdHXsmC6nqWc08pndoGgLWbHWnJxe4Jb7iuO8bTqzxTmLpAQKyqnGpOuZBTctBks= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=mfCFsY55; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="mfCFsY55" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JCLex8bTXbqt0z3LTkMl1q3RkJu8DhZlqqkvOAey+YM=; b=mfCFsY55PI8b2+r72ZN205jyto wR2WSw1t6l2d+aeHpkmFOXW5YkPHzLcCFVkZ4cwy8g9QzW1GHWZ3LBNWLoHZHZOJbmZa5fj4uitV4 39MOE5HYMAs0L5/jMtQd/aFsyHFFgqiYs2C/j0qfIQEM/EJcV6o9d4W/QDt9ymBe8Usetw8k2q9e4 /lseB06zLuCf5JZ23/KnBSOFRFfgJ1DJ9CbFG81OXAGs2ND8vBihj0Jzugg7dctu+EJuQRHiuq6Ji QU+TLMEPzn5skKf4FH32F37J788lZESI/oUlh5Wdw/e83tkDDqMR58EnhgdP21GcCJC+PoAqggayf rug/xK6Q==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vdc0v-0000000E1mN-1Uow; Wed, 07 Jan 2026 22:28:25 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 193DB30057E; Wed, 07 Jan 2026 23:28:24 +0100 (CET) Date: Wed, 7 Jan 2026 23:28:23 +0100 From: Peter Zijlstra To: Namhyung Kim Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , James Clark , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem Message-ID: <20260107222823.GC694817@noisy.programming.kicks-ass.net> References: <20260107091652.GB3707891@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jan 07, 2026 at 11:01:53AM -0800, Namhyung Kim wrote: > > But yes, I suppose this can do. The question is however, how do you get > > into this predicament to begin with? Are you creating and destroying a > > lot of global LBR events or something? > > I think it's just because there are too many tasks in the system like > O(100K). And any thread going to exit needs to wait for > attach_global_ctx_data() to finish the iteration over every task. OMG, so many tasks ... > > Would it make sense to delay detach_global_ctx_data() for a second or > > so? That is, what is your event creation pattern? > > I don't think it has a special pattern, but I'm curious how we can > handle a race like below. > > attach_global_ctx_data > check p->flags & PF_EXITING > do_exit > (preemption) set PF_EXITING > detach_task_ctx_data() > check p->perf_ctx_data > attach_task_ctx_data() ---> memory leak Oh right. Something like so perhaps? --- diff --git a/kernel/events/core.c b/kernel/events/core.c index 3c2a491200c6..e5e716420eb3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5421,9 +5421,19 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache, return -ENOMEM; for (;;) { - if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) { + if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) { if (old) perf_free_ctx_data_rcu(old); + /* + * try_cmpxchg() pairs with try_cmpxchg() from + * detach_task_ctx_data() such that + * if we race with perf_event_exit_task(), we must + * observe PF_EXITING. + */ + if (task->flags & PF_EXITING) { + task->perf_ctx_data = NULL; + perf_free_ctx_data_rcu(cd); + } return 0; } @@ -5469,6 +5479,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache) /* Allocate everything */ scoped_guard (rcu) { for_each_process_thread(g, p) { + if (p->flags & PF_EXITING) + continue; cd = rcu_dereference(p->perf_ctx_data); if (cd && !cd->global) { cd->global = 1; @@ -14568,8 +14580,11 @@ void perf_event_exit_task(struct task_struct *task) /* * Detach the perf_ctx_data for the system-wide event. + * + * Done without holding global_ctx_data_rwsem; typically + * attach_global_ctx_data() will skip over this task, but otherwise + * attach_task_ctx_data() will observe PF_EXITING. */ - guard(percpu_read)(&global_ctx_data_rwsem); detach_task_ctx_data(task); }