From: Steven Rostedt <rostedt@goodmis.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
Eric Dumazet <edumazet@google.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Willem de Bruijn <willemb@google.com>,
Yao Kai <yaokai34@huawei.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [WARNING] RCU stall in sock_def_readable()
Date: Sat, 18 Apr 2026 18:36:57 -0400 [thread overview]
Message-ID: <20260418183657.19c00883@fedora> (raw)
In-Reply-To: <61264010-608d-46dd-884c-2448b65a57ed@paulmck-laptop>
On Fri, 17 Apr 2026 12:03:22 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:
> Huh. One other possibility is that CONFIG_PREEMPT_NONE=y and its friend
> CONFIG_PREEMPT_VOLUNTARY=y responded to RCU calls for help at any call
> to cond_resched() or (in the voluntary case) might_sleep(). Isn't it
> the case that with CONFIG_PREEMPT_LAZY=y, nothing happens until some
> other task shows up on that CPU? Maybe RCU needs to be more aggressive
> about invoking resched_cpu() when the grace period starts getting a bit
> long in the tooth?
Perhaps.
Note, I was still able to trigger the issue with PREEMPT_FULL. But
anyway, I think I may have found the culprit. It looks to be a
__wake_up_common that is waking up too many tasks:
hackbench_64-4695 [001] ..... 328.224508: function: consume_skb <-- unix_stream_read_generic
hackbench_64-4695 [001] ..... 328.224508: function: skb_release_head_state <-- consume_skb
hackbench_64-4695 [001] ..... 328.224508: function: unix_destruct_scm <-- skb_release_head_state
hackbench_64-4695 [001] ..... 328.224508: function: put_pid <-- unix_destruct_scm
hackbench_64-4695 [001] ..... 328.224508: function: sock_wfree <-- unix_destruct_scm
hackbench_64-4695 [001] ..... 328.224509: function: unix_write_space <-- sock_wfree
hackbench_64-4695 [001] ..... 328.224509: function: __rcu_read_lock <-- unix_write_space
hackbench_64-4695 [001] ..... 328.224509: function: __wake_up_sync_key <-- unix_write_space
hackbench_64-4695 [001] ..... 328.224510: function: _raw_spin_lock_irqsave <-- __wake_up_sync_key
hackbench_64-4695 [001] d.... 328.224510: function: preempt_count_add <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] d..1. 328.224510: function: do_raw_spin_lock <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] d..1. 328.224511: function: __wake_up_common <-- __wake_up_sync_key
hackbench_64-4695 [001] d..1. 328.224511: function: autoremove_wake_function <-- __wake_up_common
hackbench_64-4695 [001] d..1. 328.224511: function: default_wake_function <-- autoremove_wake_function
hackbench_64-4695 [001] d..1. 328.224512: function: try_to_wake_up <-- autoremove_wake_function
The above __wake_up_common() goes into a very long loop, so much so
that the LAZY_NEED_RESCHED is triggered (look at the 'L' in the flags
field of the trace:
hackbench_64-4695 [001] d..4. 328.224582: function: vruntime_eligible <-- __pick_eevdf
hackbench_64-4695 [001] d..4. 328.224582: function: vruntime_eligible <-- __pick_eevdf
hackbench_64-4695 [001] d..4. 328.224584: function: resched_curr_lazy <-- wakeup_preempt
hackbench_64-4695 [001] d..4. 328.224584: function: __resched_curr <-- wakeup_preempt
hackbench_64-4695 [001] dL.4. 328.224584: function: __srcu_check_read_flavor <-- ttwu_do_activate
hackbench_64-4695 [001] dL.4. 328.224585: function: __srcu_check_read_flavor <-- ttwu_do_activate
hackbench_64-4695 [001] dL.4. 328.224585: function: _raw_spin_unlock <-- try_to_wake_up
hackbench_64-4695 [001] dL.4. 328.224585: function: do_raw_spin_unlock <-- _raw_spin_unlock
hackbench_64-4695 [001] dL.4. 328.224586: function: preempt_count_sub <-- _raw_spin_unlock
hackbench_64-4695 [001] dL.3. 328.224586: function: _raw_spin_unlock_irqrestore <-- try_to_wake_up
hackbench_64-4695 [001] dL.3. 328.224586: function: do_raw_spin_unlock <-- _raw_spin_unlock_irqrestore
hackbench_64-4695 [001] dL.3. 328.224586: function: preempt_count_sub <-- _raw_spin_unlock_irqrestore
hackbench_64-4695 [001] dL.2. 328.224586: function: preempt_count_sub <-- try_to_wake_up
hackbench_64-4695 [001] dL.1. 328.224587: function: autoremove_wake_function <-- __wake_up_common
hackbench_64-4695 [001] dL.1. 328.224587: function: default_wake_function <-- autoremove_wake_function
hackbench_64-4695 [001] dL.1. 328.224587: function: try_to_wake_up <-- autoremove_wake_function
hackbench_64-4695 [001] dL.1. 328.224588: function: preempt_count_add <-- try_to_wake_up
hackbench_64-4695 [001] dL.2. 328.224588: function: _raw_spin_lock_irqsave <-- try_to_wake_up
hackbench_64-4695 [001] dL.2. 328.224588: function: preempt_count_add <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] dL.3. 328.224588: function: do_raw_spin_lock <-- _raw_spin_lock_irqsave
hackbench_64-4695 [001] dL.3. 328.224589: function: select_task_rq_fair <-- try_to_wake_up
hackbench_64-4695 [001] dL.3. 328.224589: function: __rcu_read_lock <-- select_task_rq_fair
hackbench_64-4695 [001] dL.3. 328.224589: function: __rcu_read_unlock <-- select_task_rq_fair
hackbench_64-4695 [001] dL.3. 328.224590: function: ttwu_queue_wakelist <-- try_to_wake_up
hackbench_64-4695 [001] dL.3. 328.224590: function: __smp_call_single_queue <-- ttwu_queue_wakelist
^
This goes on for a very long time, and the next tick is hit which causes
a full NEED_RESCHED to happen:
hackbench_64-4695 [001] dLh1. 328.229455: function: sched_tick <-- update_process_times
hackbench_64-4695 [001] dLh1. 328.229456: function: arch_scale_freq_tick <-- sched_tick
hackbench_64-4695 [001] dLh1. 328.229456: function: preempt_count_add <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229457: function: _raw_spin_lock_nested <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229457: function: preempt_count_add <-- _raw_spin_lock_nested
hackbench_64-4695 [001] dLh3. 328.229458: function: do_raw_spin_lock <-- sched_tick
hackbench_64-4695 [001] dLh3. 328.229470: function: preempt_count_sub <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229470: function: update_rq_clock <-- sched_tick
hackbench_64-4695 [001] dLh2. 328.229471: function: arch_scale_cpu_capacity <-- update_rq_clock
hackbench_64-4695 [001] dLh2. 328.229472: function: __resched_curr <-- sched_tick
hackbench_64-4695 [001] dNh2. 328.229472: function: task_tick_fair <-- sched_tick
hackbench_64-4695 [001] dNh2. 328.229473: function: update_curr <-- task_tick_fair
hackbench_64-4695 [001] dNh2. 328.229473: function: update_se <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229474: function: cpuacct_charge <-- update_se
hackbench_64-4695 [001] dNh2. 328.229474: function: __cgroup_account_cputime <-- update_se
hackbench_64-4695 [001] dNh2. 328.229475: function: preempt_count_add <-- __cgroup_account_cputime
hackbench_64-4695 [001] dNh3. 328.229476: function: css_rstat_updated <-- __cgroup_account_cputime
hackbench_64-4695 [001] dNh3. 328.229476: function: preempt_count_sub <-- __cgroup_account_cputime
hackbench_64-4695 [001] dNh2. 328.229477: function: dl_server_update <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229477: function: update_curr_dl_se <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229478: function: resched_curr_lazy <-- update_curr
hackbench_64-4695 [001] dNh2. 328.229478: function: __resched_curr <-- update_curr
So this task finally gets scheduled out (while holding RCU read lock):
static void unix_write_space(struct sock *sk)
{
struct socket_wq *wq;
rcu_read_lock();
if (unix_writable(sk, READ_ONCE(sk->sk_state))) {
wq = rcu_dereference(sk->sk_wq);
if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait,
EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND);
sk_wake_async_rcu(sk, SOCK_WAKE_SPACE, POLL_OUT);
}
rcu_read_unlock();
}
And because the system is very busy, it doesn't get scheduled in for a
long time, and RCU complains about a stall:
[ 357.679707] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 357.810980] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-3): P4695/1:b..l
[ 357.813601] rcu: (detected by 2, t=6535 jiffies, g=4125, q=100 ncpus=4)
[ 357.816256] task:hackbench_64 state:R running task stack:0 pid:4695 tgid:4695 ppid:4690 task_flags:0x400000 flags:0x00080000
[ 357.821176] Call Trace:
[ 357.822382] <TASK>
[ 357.823474] __schedule+0x4ac/0x12f0
[ 357.825104] preempt_schedule_common+0x26/0xe0
[ 357.826988] ? preempt_schedule_thunk+0x16/0x30
[ 357.828930] preempt_schedule_thunk+0x16/0x30
[ 357.830788] ? _raw_spin_unlock_irqrestore+0x39/0x70
[ 357.832901] _raw_spin_unlock_irqrestore+0x5d/0x70
[ 357.834884] sock_def_readable+0x9c/0x2b0
[ 357.836633] unix_stream_sendmsg+0x2d7/0x710
[ 357.838423] sock_write_iter+0x185/0x190
[ 357.840084] vfs_write+0x457/0x5b0
[ 357.841568] ksys_write+0xc8/0xf0
[ 357.842914] do_syscall_64+0x117/0x1660
[ 357.844413] ? irqentry_exit+0xd9/0x690
Now I guess the question is, how do we fix this?
Before LAZY_PREEMPT, an rcu_read_lock() would never schedule out. Now,
long loops under RCU can schedule out, which can be flagged as an RCU
stall. Should we not schedule out when rcu_read_lock() is held?
-- Steve
next prev parent reply other threads:[~2026-04-18 22:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 17:27 [WARNING] RCU stall in sock_def_readable() Steven Rostedt
2026-04-17 0:16 ` Paul E. McKenney
2026-04-17 12:43 ` Steven Rostedt
2026-04-17 13:30 ` Steven Rostedt
2026-04-17 19:03 ` Paul E. McKenney
2026-04-18 22:36 ` Steven Rostedt [this message]
2026-04-18 23:01 ` Paul E. McKenney
2026-04-18 23:26 ` Steven Rostedt
2026-04-19 0:09 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260418183657.19c00883@fedora \
--to=rostedt@goodmis.org \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=joel@joelfernandes.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=willemb@google.com \
--cc=yaokai34@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.