From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23B7B343892 for ; Tue, 19 May 2026 17:52:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779213152; cv=none; b=f0Hg/w11NeqhCElRgskZ7yGbe7gcxI0gF/wCJksKdqO/UG8pbePDYoUSWGmP6wLbc/3MMsz+j/ax5H40PaXp6/MB3j6xq6AkGWEgcbA86p3NoQ5rUseKuNF+KP4VyvL6d6fFELYzc8jpknYmjSC4o+snPlG42BunpwNMyVLn6kU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779213152; c=relaxed/simple; bh=PLC0L7SsHhTW77HiaeMjx7oo+/CgDFAVJttg9jWN8gk=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=jiTKkU5TwIiLcgDD+b7zO8uWUypNJyYnwZydfU12u9ZQpWsvJ7mn3HINKFFmwAfXUUGRSxBq8dXgtg6FU/rlFiGAJc4SH5Tv5rceY5d2QYIjayejrR9YXgkBmrXkDXfZs7kgwesZoj4/2EcgTD04GGy524ZQ+1l09NM2vk5h+RE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R8hX2oFS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R8hX2oFS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 520A0C2BCB3; Tue, 19 May 2026 17:52:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779213151; bh=PLC0L7SsHhTW77HiaeMjx7oo+/CgDFAVJttg9jWN8gk=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=R8hX2oFS+C4zMk+Dqa8FVizhiEwO/MKxM68gI0Yj1Z4Xv9i5f7UwW8SB6zmxlFPKL 8euRtO1nMG2oHUlReXTOHSfeXCa2lZBjSs3l7tfdoITi3oCVXZxQg25QTKECFkGwpT fncQjQgPzxNpu5CFofpuqZHmX7bBwyHbotLlsoniR6u2PBPlHgo4ECYOspjLLGBaMm p94iZOaqVxuGWu2diWDfLZBK+FBUY1ydcesLZeckvukC1TCU2l0HWG9VdfwGyC2TWX uxLuqZ5yK4kFh3D+dmg7HpZgy192pPbqOQP4lVhYoxcCxxViqLWW6d+4HimJU4x9o9 Pdi4jGxxdBCLQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH 2/2] sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs Reply-To: sashiko-reviews@lists.linux.dev To: "Cheng-Yang Chou" Cc: sched-ext@lists.linux.dev In-Reply-To: <20260519171745.1551340-3-yphbchou0911@gmail.com> References: <20260519171745.1551340-3-yphbchou0911@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 19 May 2026 17:52:29 +0000 Message-Id: <20260519175231.520A0C2BCB3@smtp.kernel.org> Precedence: bulk X-Mailing-List: sched-ext@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 1 potential issue(= s) to consider: - [High] Dynamic memory allocation (`GFP_ATOMIC`) in the RCU stall detector= path can cause a hard deadlock, masking the system hang instead of reporti= ng it. -- commit ffd18e78b03bfaba9705c61803e8bd0bb6325d38 Author: Cheng-Yang Chou sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs This commit updates the RCU stall paths to capture all stalled CPUs in a cpumask rather than just the detecting CPU. The cpumask is passed through the stall handlers and dumped first during a sched_ext state dump to avoid losing the stalled CPUs to truncation. > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 82cada459e5d0..b8f4ca984c1c9 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h [ ... ] > @@ -659,23 +662,28 @@ static void synchronize_rcu_expedited_wait(void) > =20 > for (;;) { > unsigned long j; > + cpumask_var_t stalled_mask; > + bool have_mask; > =20 > if (synchronize_rcu_expedited_wait_once(jiffies_stall)) > return; > if (rcu_stall_is_suppressed()) > continue; > =20 > + have_mask =3D zalloc_cpumask_var(&stalled_mask, GFP_ATOMIC | __GFP_NOW= ARN); > + [Severity: High] Could this allocation cause a hard deadlock during an RCU stall? If CONFIG_CPUMASK_OFFSTACK=3Dy is set, zalloc_cpumask_var() performs a dyna= mic memory allocation via kmalloc(). If the stalled CPU is stuck holding a memo= ry allocator lock (such as zone->lock or the SLUB list_lock), the detecting CPU will deadlock trying to acquire it. Additionally, GFP_ATOMIC includes __GFP_KSWAPD_RECLAIM, which can wake up kswapd and acquire scheduler locks (pi_lock, rq->lock). If the stalled CPU holds one of these scheduler locks, it could also deadlock here. This would completely prevent the RCU stall warning from being printed and mask the root cause of the hang. Would it be safer to use a statically preallocated cpumask for stall reporting? > nbcon_cpu_emergency_enter(); > =20 > j =3D jiffies; [ ... ] > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index b67532cb87705..90d6eaf1b8415 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h [ ... ] > @@ -645,6 +647,8 @@ static void print_other_cpu_stall(unsigned long gp_se= q, unsigned long gps) > if (rcu_stall_is_suppressed()) > return; > =20 > + have_mask =3D zalloc_cpumask_var(&stalled_mask, GFP_ATOMIC | __GFP_NOWA= RN); > + [Severity: High] Can this dynamic allocation also trigger a hard deadlock? print_other_cpu_stall() executes in a critical diagnostic context with IRQs disabled (enforced by lockdep_assert_irqs_disabled()). Similar to the issue in synchronize_rcu_expedited_wait(), allocating memory here might deadlock if the stalled CPU holds a lock required by the memory allocator or scheduler. > nbcon_cpu_emergency_enter(); > =20 > /* --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260519171745.1551= 340-1-yphbchou0911@gmail.com?part=3D2