From: Steven Rostedt <rostedt@goodmis.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
Eric Dumazet <edumazet@google.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Willem de Bruijn <willemb@google.com>,
Yao Kai <yaokai34@huawei.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [WARNING] RCU stall in sock_def_readable()
Date: Sat, 18 Apr 2026 18:36:57 -0400 [thread overview]
Message-ID: <20260418183657.19c00883@fedora> (raw)
In-Reply-To: <61264010-608d-46dd-884c-2448b65a57ed@paulmck-laptop>
On Fri, 17 Apr 2026 12:03:22 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:
> Huh. One other possibility is that CONFIG_PREEMPT_NONE=y and its friend
> CONFIG_PREEMPT_VOLUNTARY=y responded to RCU calls for help at any call
> to cond_resched() or (in the voluntary case) might_sleep(). Isn't it
> the case that with CONFIG_PREEMPT_LAZY=y, nothing happens until some
> other task shows up on that CPU? Maybe RCU needs to be more aggressive
> about invoking resched_cpu() when the grace period starts getting a bit
> long in the tooth?
Perhaps.
Note, I was still able to trigger the issue with PREEMPT_FULL. But
anyway, I think I may have found the culprit. It looks to be a
__wake_up_common that is waking up too many tasks:
hackbench_64-4695 [001] ..... 328.224508: function: consume_skb <-- unix_stream_read_generic
hackbench_64-4695 [001] ..... 328.224508: function: skb_release_head_state <-- consume_skb
hackbench_64-4695 [001] ..... 328.224508: function: unix_destruct_scm <-- skb_release_head_state
hackbench_64-4695 [001] ..... 328.224508: function: put_pid <-- unix_destruct_scm
hackbench_64-4695 [001] ..... 328.224508: function: sock_wfree <-- unix_destruct_scm
hackbench_64-4695 [001] ..... 328.224509: function: unix_write_space <-- sock_wfree
hackbench_64-4695 [001] ..... 328.224509: function: __rcu_read_lock <-- unix_write_space
hackbench_64-4695 [001] ..... 328.224509: function: __wake_up_sync_key <-- unix_write_space
hackbench_64-4695 [001] ..... 328.224510: function: _raw_spin_lock_irqsave <-- __wake_up_sync_key
hackbench_64-4695 [001] d.... 328.224510: function: preempt_count_add <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] d..1. 328.224510: function: do_raw_spin_lock <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] d..1. 328.224511: function: __wake_up_common <-- __wake_up_sync_key
hackbench_64-4695 [001] d..1. 328.224511: function: autoremove_wake_function <-- __wake_up_common
hackbench_64-4695 [001] d..1. 328.224511: function: default_wake_function <-- autoremove_wake_function
hackbench_64-4695 [001] d..1. 328.224512: function: try_to_wake_up <-- autoremove_wake_function
The above __wake_up_common() goes into a very long loop, so much so
that the LAZY_NEED_RESCHED is triggered (look at the 'L' in the flags
field of the trace:
hackbench_64-4695 [001] d..4. 328.224582: function: vruntime_eligible <-- __pick_eevdf
hackbench_64-4695 [001] d..4. 328.224582: function: vruntime_eligible <-- __pick_eevdf
hackbench_64-4695 [001] d..4. 328.224584: function: resched_curr_lazy <-- wakeup_preempt
hackbench_64-4695 [001] d..4. 328.224584: function: __resched_curr <-- wakeup_preempt
hackbench_64-4695 [001] dL.4. 328.224584: function: __srcu_check_read_flavor <-- ttwu_do_activate
hackbench_64-4695 [001] dL.4. 328.224585: function: __srcu_check_read_flavor <-- ttwu_do_activate
hackbench_64-4695 [001] dL.4. 328.224585: function: _raw_spin_unlock <-- try_to_wake_up
hackbench_64-4695 [001] dL.4. 328.224585: function: do_raw_spin_unlock <-- _raw_spin_unlock
hackbench_64-4695 [001] dL.4. 328.224586: function: preempt_count_sub <-- _raw_spin_unlock
hackbench_64-4695 [001] dL.3. 328.224586: function: _raw_spin_unlock_irqrestore <-- try_to_wake_up
hackbench_64-4695 [001] dL.3. 328.224586: function: do_raw_spin_unlock <-- _raw_spin_unlock_irqrestore
hackbench_64-4695 [001] dL.3. 328.224586: function: preempt_count_sub <-- _raw_spin_unlock_irqrestore
hackbench_64-4695 [001] dL.2. 328.224586: function: preempt_count_sub <-- try_to_wake_up
hackbench_64-4695 [001] dL.1. 328.224587: function: autoremove_wake_function <-- __wake_up_common
hackbench_64-4695 [001] dL.1. 328.224587: function: default_wake_function <-- autoremove_wake_function
hackbench_64-4695 [001] dL.1. 328.224587: function: try_to_wake_up <-- autoremove_wake_function
hackbench_64-4695 [001] dL.1. 328.224588: function: preempt_count_add <-- try_to_wake_up
hackbench_64-4695 [001] dL.2. 328.224588: function: _raw_spin_lock_irqsave <-- try_to_wake_up
hackbench_64-4695 [001] dL.2. 328.224588: function: preempt_count_add <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] dL.3. 328.224588: function: do_raw_spin_lock <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] dL.3. 328.224589: function: select_task_rq_fair <-- try_to_wake_up
hackbench_64-4695 [001] dL.3. 328.224589: function: __rcu_read_lock <-- select_task_rq_fair
hackbench_64-4695 [001] dL.3. 328.224589: function: __rcu_read_unlock <-- select_task_rq_fair
hackbench_64-4695 [001] dL.3. 328.224590: function: ttwu_queue_wakelist <-- try_to_wake_up
hackbench_64-4695 [001] dL.3. 328.224590: function: __smp_call_single_queue <-- ttwu_queue_wakelist
^
This goes on for a very long time, and the next tick is hit which causes
a full NEED_RESCHED to happen:
hackbench_64-4695 [001] dLh1. 328.229455: function: sched_tick <-- update_process_times
hackbench_64-4695 [001] dLh1. 328.229456: function: arch_scale_freq_tick <-- sched_tick
hackbench_64-4695 [001] dLh1. 328.229456: function: preempt_count_add <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229457: function: _raw_spin_lock_nested <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229457: function: preempt_count_add <-- _raw_spin_lock_nested
hackbench_64-4695 [001] dLh3. 328.229458: function: do_raw_spin_lock <-- sched_tick
hackbench_64-4695 [001] dLh3. 328.229470: function: preempt_count_sub <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229470: function: update_rq_clock <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229471: function: arch_scale_cpu_capacity <-- update_rq_clock
hackbench_64-4695 [001] dLh2. 328.229472: function: __resched_curr <-- sched_tick
hackbench_64-4695 [001] dNh2. 328.229472: function: task_tick_fair <-- sched_tick
hackbench_64-4695 [001] dNh2. 328.229473: function: update_curr <-- task_tick_fair
hackbench_64-4695 [001] dNh2. 328.229473: function: update_se <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229474: function: cpuacct_charge <-- update_se
hackbench_64-4695 [001] dNh2. 328.229474: function: __cgroup_account_cputime <-- update_se
hackbench_64-4695 [001] dNh2. 328.229475: function: preempt_count_add <-- __cgroup_account_cputime
hackbench_64-4695 [001] dNh3. 328.229476: function: css_rstat_updated <-- __cgroup_account_cputime
hackbench_64-4695 [001] dNh3. 328.229476: function: preempt_count_sub <-- __cgroup_account_cputime
hackbench_64-4695 [001] dNh2. 328.229477: function: dl_server_update <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229477: function: update_curr_dl_se <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229478: function: resched_curr_lazy <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229478: function: __resched_curr <-- update_curr
So this task finally gets scheduled out (while holding RCU read lock):
static void unix_write_space(struct sock *sk)
{
struct socket_wq *wq;
rcu_read_lock();
if (unix_writable(sk, READ_ONCE(sk->sk_state))) {
wq = rcu_dereference(sk->sk_wq);
if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait,
EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND);
sk_wake_async_rcu(sk, SOCK_WAKE_SPACE, POLL_OUT);
}
rcu_read_unlock();
}
And because the system is very busy, it doesn't get scheduled in for a
long time, and RCU complains about a stall:
[ 357.679707] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 357.810980] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-3): P4695/1:b..l
[ 357.813601] rcu: (detected by 2, t=6535 jiffies, g=4125, q=100 ncpus=4)
[ 357.816256] task:hackbench_64 state:R running task stack:0 pid:4695 tgid:4695 ppid:4690 task_flags:0x400000 flags:0x00080000
[ 357.821176] Call Trace:
[ 357.822382] <TASK>
[ 357.823474] __schedule+0x4ac/0x12f0
[ 357.825104] preempt_schedule_common+0x26/0xe0
[ 357.826988] ? preempt_schedule_thunk+0x16/0x30
[ 357.828930] preempt_schedule_thunk+0x16/0x30
[ 357.830788] ? _raw_spin_unlock_irqrestore+0x39/0x70
[ 357.832901] _raw_spin_unlock_irqrestore+0x5d/0x70
[ 357.834884] sock_def_readable+0x9c/0x2b0
[ 357.836633] unix_stream_sendmsg+0x2d7/0x710
[ 357.838423] sock_write_iter+0x185/0x190
[ 357.840084] vfs_write+0x457/0x5b0
[ 357.841568] ksys_write+0xc8/0xf0
[ 357.842914] do_syscall_64+0x117/0x1660
[ 357.844413] ? irqentry_exit+0xd9/0x690
Now I guess the question is, how do we fix this?
Before LAZY_PREEMPT, an rcu_read_lock() would never schedule out. Now,
long loops under RCU can schedule out, which can be flagged as an RCU
stall. Should we not schedule out when rcu_read_lock() is held?
-- Steve
next prev parent reply other threads:[~2026-04-18 22:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 17:27 [WARNING] RCU stall in sock_def_readable() Steven Rostedt
2026-04-17 0:16 ` Paul E. McKenney
2026-04-17 12:43 ` Steven Rostedt
2026-04-17 13:30 ` Steven Rostedt
2026-04-17 19:03 ` Paul E. McKenney
2026-04-18 22:36 ` Steven Rostedt [this message]
2026-04-18 23:01 ` Paul E. McKenney
2026-04-18 23:26 ` Steven Rostedt
2026-04-19 0:09 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260418183657.19c00883@fedora \
--to=rostedt@goodmis.org \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=joel@joelfernandes.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=willemb@google.com \
--cc=yaokai34@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox