All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Cheng-Yang Chou" <yphbchou0911@gmail.com>
Cc: sched-ext@lists.linux.dev
Subject: Re: [PATCH 2/2] sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs
Date: Tue, 19 May 2026 17:52:29 +0000	[thread overview]
Message-ID: <20260519175231.520A0C2BCB3@smtp.kernel.org> (raw)
In-Reply-To: <20260519171745.1551340-3-yphbchou0911@gmail.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Dynamic memory allocation (`GFP_ATOMIC`) in the RCU stall detector path can cause a hard deadlock, masking the system hang instead of reporting it.
--

commit ffd18e78b03bfaba9705c61803e8bd0bb6325d38
Author: Cheng-Yang Chou <yphbchou0911@gmail.com>

sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs

This commit updates the RCU stall paths to capture all stalled CPUs in a
cpumask rather than just the detecting CPU. The cpumask is passed through
the stall handlers and dumped first during a sched_ext state dump to avoid
losing the stalled CPUs to truncation.

> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 82cada459e5d0..b8f4ca984c1c9 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h

[ ... ]

> @@ -659,23 +662,28 @@ static void synchronize_rcu_expedited_wait(void)
>  
>  	for (;;) {
>  		unsigned long j;
> +		cpumask_var_t stalled_mask;
> +		bool have_mask;
>  
>  		if (synchronize_rcu_expedited_wait_once(jiffies_stall))
>  			return;
>  		if (rcu_stall_is_suppressed())
>  			continue;
>  
> +		have_mask = zalloc_cpumask_var(&stalled_mask, GFP_ATOMIC | __GFP_NOWARN);
> +

[Severity: High]
Could this allocation cause a hard deadlock during an RCU stall?

If CONFIG_CPUMASK_OFFSTACK=y is set, zalloc_cpumask_var() performs a dynamic
memory allocation via kmalloc(). If the stalled CPU is stuck holding a memory
allocator lock (such as zone->lock or the SLUB list_lock), the detecting CPU
will deadlock trying to acquire it.

Additionally, GFP_ATOMIC includes __GFP_KSWAPD_RECLAIM, which can wake up
kswapd and acquire scheduler locks (pi_lock, rq->lock). If the stalled
CPU holds one of these scheduler locks, it could also deadlock here.

This would completely prevent the RCU stall warning from being printed
and mask the root cause of the hang. Would it be safer to use a
statically preallocated cpumask for stall reporting?

>  		nbcon_cpu_emergency_enter();
>  
>  		j = jiffies;

[ ... ]

> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b67532cb87705..90d6eaf1b8415 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h

[ ... ]

> @@ -645,6 +647,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (rcu_stall_is_suppressed())
>  		return;
>  
> +	have_mask = zalloc_cpumask_var(&stalled_mask, GFP_ATOMIC | __GFP_NOWARN);
> +

[Severity: High]
Can this dynamic allocation also trigger a hard deadlock?

print_other_cpu_stall() executes in a critical diagnostic context with
IRQs disabled (enforced by lockdep_assert_irqs_disabled()). Similar to
the issue in synchronize_rcu_expedited_wait(), allocating memory here
might deadlock if the stalled CPU holds a lock required by the memory
allocator or scheduler.

>  	nbcon_cpu_emergency_enter();
>  
>  	/*

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260519171745.1551340-1-yphbchou0911@gmail.com?part=2

  reply	other threads:[~2026-05-19 17:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 17:17 [PATCH v4 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
2026-05-19 17:17 ` [PATCH 1/2] sched_ext: Fix exit_cpu accuracy for lockup paths Cheng-Yang Chou
2026-05-19 17:17 ` [PATCH 2/2] sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs Cheng-Yang Chou
2026-05-19 17:52   ` sashiko-bot [this message]
2026-05-19 18:22     ` Cheng-Yang Chou
2026-05-19 23:48   ` Paul E. McKenney
2026-05-20 14:56     ` Cheng-Yang Chou
2026-05-20 16:35       ` Paul E. McKenney
2026-05-21  7:00         ` Cheng-Yang Chou
  -- strict thread matches above, loose matches on Subject: below --
2026-05-21 16:16 [PATCH v5 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
2026-05-21 16:16 ` [PATCH 2/2] sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs Cheng-Yang Chou
2026-05-21 17:05   ` sashiko-bot
2026-05-27 23:19   ` Paul E. McKenney
2026-05-29 15:51     ` Cheng-Yang Chou
2026-05-31 15:25 [PATCH v6 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
2026-05-31 15:25 ` [PATCH 2/2] sched_ext, rcu: Upgrade RCU stall paths to report cpumask of stalled CPUs Cheng-Yang Chou
2026-06-04 17:57   ` Paul E. McKenney
2026-06-05 14:33     ` Cheng-Yang Chou
2026-06-09  8:06   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519175231.520A0C2BCB3@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=sched-ext@lists.linux.dev \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.