Re: [WARNING] RCU stall in sock_def_readable()

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Steven Rostedt <rostedt@goodmis.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Eric Dumazet <edumazet@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Willem de Bruijn <willemb@google.com>,
	Yao Kai <yaokai34@huawei.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [WARNING] RCU stall in sock_def_readable()
Date: Fri, 17 Apr 2026 09:30:25 -0400	[thread overview]
Message-ID: <20260417093025.38faf68d@fedora> (raw)
In-Reply-To: <20260417084313.010864e8@fedora>

On Fri, 17 Apr 2026 08:43:13 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 16 Apr 2026 17:16:11 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > One "hail Mary" thought is to revert this guy and see if it helps:
> > 
> > d41e37f26b31 ("rcu: Fix rcu_read_unlock() deadloop due to softirq")
> > 
> > This commit fixes a bug, so we cannot revert it in mainline, but there
> > is some reason to believe that there are more bugs beyond the one that
> > it fixed, and it might have (through no fault of its own) made those
> > other bugs more probable.
> > 
> > Worth a try, anyway!  
> 
> Hail mary's are worth a try, but the reason they call it a hail mary is
> because it is unlikely to succeed :-p
> 
> run test ssh -t root@tracetest "trace-cmd record -p function -e syscalls /work/c/hackbench_64 50"
> ssh -t root@tracetest "trace-cmd record -p function -e syscalls /work/c/hackbench_64 50" ... [  209.590500] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  209.592620] rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-3): P3151/1:b..l
> [  209.595266] rcu:     (detected by 0, t=6502 jiffies, g=29673, q=186 ncpus=4)
> [  209.597557] task:hackbench_64    state:R  running task     stack:0     pid:3151  tgid:3151  ppid:3144   task_flags:0x400000 flags:0x00080000
> [  209.601871] Call Trace:
> [  209.602852]  <TASK>
> [  209.603752]  __schedule+0x4ac/0x12f0
> [  209.605172]  preempt_schedule_common+0x26/0xe0
> [  209.606755]  ? preempt_schedule_thunk+0x16/0x30
> [  209.608337]  preempt_schedule_thunk+0x16/0x30
> [  209.609973]  ? _raw_spin_unlock_irqrestore+0x39/0x70
> [  209.611688]  _raw_spin_unlock_irqrestore+0x5d/0x70
> [  209.613408]  sock_def_readable+0x9c/0x2b0
> [  209.614841]  unix_stream_sendmsg+0x2d7/0x710
> [  209.616420]  sock_write_iter+0x185/0x190
> [  209.617934]  vfs_write+0x457/0x5b0
> [  209.619242]  ksys_write+0xc8/0xf0
> [  209.620532]  do_syscall_64+0x117/0x1660
> [  209.621936]  ? irqentry_exit+0xd9/0x690
> [  209.623319]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  209.625199] RIP: 0033:0x7f603e8e5190
> [  209.626628] RSP: 002b:00007ffd003f99c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [  209.629304] RAX: ffffffffffffffda RBX: 00007ffd003f9b58 RCX: 00007f603e8e5190
> [  209.631710] RDX: 0000000000000001 RSI: 00007ffd003f99ef RDI: 0000000000000006
> [  209.634200] RBP: 00007ffd003f9a40 R08: 0011861580000000 R09: 0000000000000000
> [  209.636638] R10: 00007f603e8064d0 R11: 0000000000000202 R12: 0000000000000000
> [  209.639050] R13: 00007ffd003f9b70 R14: 00005637df126dd8 R15: 00007f603ea10020
> [  209.641600]  </TASK>
> Detected kernel crash!
> 
> 
> That was with the revert :-(

I went and looked at the configs that it used to see if that changed.
One thing that stands out is that it used to use
CONFIG_PREEMPT_VOLUNTARY. Now it's using CONFIG_PREEMPT_LAZY. I'm
thinking that because preemption now doesn't happen until tasks go back
to user space (and kernel threads do not preempt at all), that this
could have delayed the RCU threads much longer.

I'm not sure why the stack trace is always the same. Maybe that's where
the biggest delay is caused by hackbench? I'm going to switch it over
to PREEMPT_FULL and see if that makes the warning go away. Oh, and when
I logged into this box, I noticed that it triggered an OOM due to
memory not being freed up fast enough.

All that said, my config is full of a lot of debugging that does have a
high overhead which makes this issue much more predominate. It may not
even be something to worry about. If switch to PREEMPT_FULL fixes it,
then that may be all I do.

Configs that cause overhead:

  PROVE_LOCKING
  FTRACE_RECORD_RECURSION - keeps track of function trace recursion
  RING_BUFFER_VALIDATE_TIME_DELTAS - this causes a big overhead with tracing
                            it tests the timestamps of every event.
                            This requires walking the sub-buffer page
                            and adding the time deltas of each event to
                            make sure it adds up to the current event.
                            That's an O(n^2) operation on the number of
                            events in the sub-buffer.

With the above overhead I do consider this one of those "Doctor it
hurts me when I do this. Doctor: Then don't do that" moments. But this
test has been running for years with no issues except for catching
cases where the timestamp did get out of sync. Hence, I don't want to
stop testing this. But if I can find the culprit, I can modify the test
to avoid failing due to it.

-- Steve

next prev parent reply	other threads:[~2026-04-17 13:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 17:27 [WARNING] RCU stall in sock_def_readable() Steven Rostedt
2026-04-17  0:16 ` Paul E. McKenney
2026-04-17 12:43   ` Steven Rostedt
2026-04-17 13:30     ` Steven Rostedt [this message]
2026-04-17 19:03       ` Paul E. McKenney
2026-04-18 22:36         ` Steven Rostedt
2026-04-18 23:01           ` Paul E. McKenney
2026-04-18 23:26             ` Steven Rostedt
2026-04-19  0:09               ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260417093025.38faf68d@fedora \
    --to=rostedt@goodmis.org \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willemb@google.com \
    --cc=yaokai34@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox