All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Eric Dumazet <edumazet@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Willem de Bruijn <willemb@google.com>,
	Yao Kai <yaokai34@huawei.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [WARNING] RCU stall in sock_def_readable()
Date: Sat, 18 Apr 2026 18:36:57 -0400	[thread overview]
Message-ID: <20260418183657.19c00883@fedora> (raw)
In-Reply-To: <61264010-608d-46dd-884c-2448b65a57ed@paulmck-laptop>

On Fri, 17 Apr 2026 12:03:22 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> Huh.  One other possibility is that CONFIG_PREEMPT_NONE=y and its friend
> CONFIG_PREEMPT_VOLUNTARY=y responded to RCU calls for help at any call
> to cond_resched() or (in the voluntary case) might_sleep().  Isn't it
> the case that with CONFIG_PREEMPT_LAZY=y, nothing happens until some
> other task shows up on that CPU?  Maybe RCU needs to be more aggressive
> about invoking resched_cpu() when the grace period starts getting a bit
> long in the tooth?

Perhaps.

Note, I was still able to trigger the issue with PREEMPT_FULL. But
anyway, I think I may have found the culprit. It looks to be a
__wake_up_common that is waking up too many tasks:

    hackbench_64-4695  [001] .....   328.224508: function:                               consume_skb <-- unix_stream_read_generic
    hackbench_64-4695  [001] .....   328.224508: function:                                  skb_release_head_state <-- consume_skb
    hackbench_64-4695  [001] .....   328.224508: function:                                     unix_destruct_scm <-- skb_release_head_state
    hackbench_64-4695  [001] .....   328.224508: function:                                        put_pid <-- unix_destruct_scm
    hackbench_64-4695  [001] .....   328.224508: function:                                        sock_wfree <-- unix_destruct_scm
    hackbench_64-4695  [001] .....   328.224509: function:                                           unix_write_space <-- sock_wfree
    hackbench_64-4695  [001] .....   328.224509: function:                                              __rcu_read_lock <-- unix_write_space
    hackbench_64-4695  [001] .....   328.224509: function:                                              __wake_up_sync_key <-- unix_write_space
    hackbench_64-4695  [001] .....   328.224510: function:                                                 _raw_spin_lock_irqsave <-- __wake_up_sync_key
    hackbench_64-4695  [001] d....   328.224510: function:                                                    preempt_count_add <-- _raw_spin_lock_irqsave
    hackbench_64-4695  [001] d..1.   328.224510: function:                                                    do_raw_spin_lock <-- _raw_spin_lock_irqsave
    hackbench_64-4695  [001] d..1.   328.224511: function:                                                 __wake_up_common <-- __wake_up_sync_key
    hackbench_64-4695  [001] d..1.   328.224511: function:                                                    autoremove_wake_function <-- __wake_up_common
    hackbench_64-4695  [001] d..1.   328.224511: function:                                                       default_wake_function <-- autoremove_wake_function
    hackbench_64-4695  [001] d..1.   328.224512: function:                                                       try_to_wake_up <-- autoremove_wake_function

The above __wake_up_common() goes into a very long loop, so much so
that the LAZY_NEED_RESCHED is triggered (look at the 'L' in the flags
field of the trace:

    hackbench_64-4695  [001] d..4.   328.224582: function:                                                                      vruntime_eligible <-- __pick_eevdf
    hackbench_64-4695  [001] d..4.   328.224582: function:                                                                      vruntime_eligible <-- __pick_eevdf
    hackbench_64-4695  [001] d..4.   328.224584: function:                                                                resched_curr_lazy <-- wakeup_preempt
    hackbench_64-4695  [001] d..4.   328.224584: function:                                                                __resched_curr <-- wakeup_preempt
    hackbench_64-4695  [001] dL.4.   328.224584: function:                                                             __srcu_check_read_flavor <-- ttwu_do_activate
    hackbench_64-4695  [001] dL.4.   328.224585: function:                                                             __srcu_check_read_flavor <-- ttwu_do_activate
    hackbench_64-4695  [001] dL.4.   328.224585: function:                                                          _raw_spin_unlock <-- try_to_wake_up
    hackbench_64-4695  [001] dL.4.   328.224585: function:                                                             do_raw_spin_unlock <-- _raw_spin_unlock
    hackbench_64-4695  [001] dL.4.   328.224586: function:                                                             preempt_count_sub <-- _raw_spin_unlock
    hackbench_64-4695  [001] dL.3.   328.224586: function:                                                          _raw_spin_unlock_irqrestore <-- try_to_wake_up
    hackbench_64-4695  [001] dL.3.   328.224586: function:                                                             do_raw_spin_unlock <-- _raw_spin_unlock_irqrestore
    hackbench_64-4695  [001] dL.3.   328.224586: function:                                                             preempt_count_sub <-- _raw_spin_unlock_irqrestore
    hackbench_64-4695  [001] dL.2.   328.224586: function:                                                          preempt_count_sub <-- try_to_wake_up
    hackbench_64-4695  [001] dL.1.   328.224587: function:                                                    autoremove_wake_function <-- __wake_up_common
    hackbench_64-4695  [001] dL.1.   328.224587: function:                                                       default_wake_function <-- autoremove_wake_function
    hackbench_64-4695  [001] dL.1.   328.224587: function:                                                       try_to_wake_up <-- autoremove_wake_function
    hackbench_64-4695  [001] dL.1.   328.224588: function:                                                          preempt_count_add <-- try_to_wake_up
    hackbench_64-4695  [001] dL.2.   328.224588: function:                                                          _raw_spin_lock_irqsave <-- try_to_wake_up
    hackbench_64-4695  [001] dL.2.   328.224588: function:                                                             preempt_count_add <-- _raw_spin_lock_irqsave
    hackbench_64-4695  [001] dL.3.   328.224588: function:                                                             do_raw_spin_lock <-- _raw_spin_lock_irqsave
    hackbench_64-4695  [001] dL.3.   328.224589: function:                                                          select_task_rq_fair <-- try_to_wake_up
    hackbench_64-4695  [001] dL.3.   328.224589: function:                                                             __rcu_read_lock <-- select_task_rq_fair
    hackbench_64-4695  [001] dL.3.   328.224589: function:                                                             __rcu_read_unlock <-- select_task_rq_fair
    hackbench_64-4695  [001] dL.3.   328.224590: function:                                                          ttwu_queue_wakelist <-- try_to_wake_up
    hackbench_64-4695  [001] dL.3.   328.224590: function:                                                             __smp_call_single_queue <-- ttwu_queue_wakelist
                              ^
This goes on for a very long time, and the next tick is hit which causes
a full NEED_RESCHED to happen:

    hackbench_64-4695  [001] dLh1.   328.229455: function:                            sched_tick <-- update_process_times
    hackbench_64-4695  [001] dLh1.   328.229456: function:                               arch_scale_freq_tick <-- sched_tick
    hackbench_64-4695  [001] dLh1.   328.229456: function:                               preempt_count_add <-- sched_tick
    hackbench_64-4695  [001] dLh2.   328.229457: function:                               _raw_spin_lock_nested <-- sched_tick
    hackbench_64-4695  [001] dLh2.   328.229457: function:                                  preempt_count_add <-- _raw_spin_lock_nested
    hackbench_64-4695  [001] dLh3.   328.229458: function:                               do_raw_spin_lock <-- sched_tick
    hackbench_64-4695  [001] dLh3.   328.229470: function:                               preempt_count_sub <-- sched_tick
    hackbench_64-4695  [001] dLh2.   328.229470: function:                               update_rq_clock <-- sched_tick
    hackbench_64-4695  [001] dLh2.   328.229471: function:                                  arch_scale_cpu_capacity <-- update_rq_clock
    hackbench_64-4695  [001] dLh2.   328.229472: function:                               __resched_curr <-- sched_tick
    hackbench_64-4695  [001] dNh2.   328.229472: function:                               task_tick_fair <-- sched_tick
    hackbench_64-4695  [001] dNh2.   328.229473: function:                                  update_curr <-- task_tick_fair
    hackbench_64-4695  [001] dNh2.   328.229473: function:                                     update_se <-- update_curr
    hackbench_64-4695  [001] dNh2.   328.229474: function:                                        cpuacct_charge <-- update_se
    hackbench_64-4695  [001] dNh2.   328.229474: function:                                        __cgroup_account_cputime <-- update_se
    hackbench_64-4695  [001] dNh2.   328.229475: function:                                           preempt_count_add <-- __cgroup_account_cputime
    hackbench_64-4695  [001] dNh3.   328.229476: function:                                           css_rstat_updated <-- __cgroup_account_cputime
    hackbench_64-4695  [001] dNh3.   328.229476: function:                                           preempt_count_sub <-- __cgroup_account_cputime
    hackbench_64-4695  [001] dNh2.   328.229477: function:                                     dl_server_update <-- update_curr
    hackbench_64-4695  [001] dNh2.   328.229477: function:                                     update_curr_dl_se <-- update_curr
    hackbench_64-4695  [001] dNh2.   328.229478: function:                                     resched_curr_lazy <-- update_curr
    hackbench_64-4695  [001] dNh2.   328.229478: function:                                     __resched_curr <-- update_curr

So this task finally gets scheduled out (while holding RCU read lock):

static void unix_write_space(struct sock *sk)
{
        struct socket_wq *wq;

        rcu_read_lock();
        if (unix_writable(sk, READ_ONCE(sk->sk_state))) {
                wq = rcu_dereference(sk->sk_wq);
                if (skwq_has_sleeper(wq))
                        wake_up_interruptible_sync_poll(&wq->wait,
                                EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND);
                sk_wake_async_rcu(sk, SOCK_WAKE_SPACE, POLL_OUT);
        }
        rcu_read_unlock();
}

And because the system is very busy, it doesn't get scheduled in for a
long time, and RCU complains about a stall:

[  357.679707] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  357.810980] rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-3): P4695/1:b..l
[  357.813601] rcu:     (detected by 2, t=6535 jiffies, g=4125, q=100 ncpus=4)
[  357.816256] task:hackbench_64    state:R  running task     stack:0     pid:4695  tgid:4695  ppid:4690   task_flags:0x400000 flags:0x00080000
[  357.821176] Call Trace:
[  357.822382]  <TASK>
[  357.823474]  __schedule+0x4ac/0x12f0
[  357.825104]  preempt_schedule_common+0x26/0xe0
[  357.826988]  ? preempt_schedule_thunk+0x16/0x30
[  357.828930]  preempt_schedule_thunk+0x16/0x30
[  357.830788]  ? _raw_spin_unlock_irqrestore+0x39/0x70
[  357.832901]  _raw_spin_unlock_irqrestore+0x5d/0x70
[  357.834884]  sock_def_readable+0x9c/0x2b0
[  357.836633]  unix_stream_sendmsg+0x2d7/0x710
[  357.838423]  sock_write_iter+0x185/0x190
[  357.840084]  vfs_write+0x457/0x5b0
[  357.841568]  ksys_write+0xc8/0xf0
[  357.842914]  do_syscall_64+0x117/0x1660
[  357.844413]  ? irqentry_exit+0xd9/0x690

Now I guess the question is, how do we fix this?

Before LAZY_PREEMPT, an rcu_read_lock() would never schedule out. Now,
long loops under RCU can schedule out, which can be flagged as an RCU
stall. Should we not schedule out when rcu_read_lock() is held?

-- Steve

  reply	other threads:[~2026-04-18 22:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 17:27 [WARNING] RCU stall in sock_def_readable() Steven Rostedt
2026-04-17  0:16 ` Paul E. McKenney
2026-04-17 12:43   ` Steven Rostedt
2026-04-17 13:30     ` Steven Rostedt
2026-04-17 19:03       ` Paul E. McKenney
2026-04-18 22:36         ` Steven Rostedt [this message]
2026-04-18 23:01           ` Paul E. McKenney
2026-04-18 23:26             ` Steven Rostedt
2026-04-19  0:09               ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260418183657.19c00883@fedora \
    --to=rostedt@goodmis.org \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willemb@google.com \
    --cc=yaokai34@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.