From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4ADB327C09 for ; Sat, 18 Apr 2026 22:37:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.14 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776551839; cv=none; b=XtFZ9NUc91wg0BqJNxH2Mk1q+Tl9ik51HSi/C/UkGNwW0VaQW4LRHVFntSAqLkmszRk70PwPhFhSMcXeRCrM+bXIVeDUHJrREb+Sc6taa3ySm9DBDBHqogNc5R+eznAi8i42fnDktIBZXeHXJiOOfLGWJWBUap4ZYrl29WJwEmk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776551839; c=relaxed/simple; bh=9ZEwC5tKOqzK4j2uTWWOxEFQjrK0oiaItdCQh2sWkEs=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TnAZFiPrOZLYIhjyGIEd1Tm0dm8CGFEdZ7mbJF0iVg3fwr7mLRrOFIDqf95OVXTks7/3yKQtwwYz6ASOWbJOQfa02Ay34/H/yCdFcJ10Z38IcZEp8FCzLtyIfBEOC+kRkuLkGtPZy0g78vOmPE+tmdcAaXXZsePDJBHDKUMSzq4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6C43A5948C; Sat, 18 Apr 2026 22:37:10 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: rostedt@goodmis.org) by omf18.hostedemail.com (Postfix) with ESMTPA id 6F3832E; Sat, 18 Apr 2026 22:37:08 +0000 (UTC) Date: Sat, 18 Apr 2026 18:36:57 -0400 From: Steven Rostedt To: "Paul E. McKenney" Cc: LKML , Frederic Weisbecker , Joel Fernandes , Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , Yao Kai , Peter Zijlstra , Thomas Gleixner Subject: Re: [WARNING] RCU stall in sock_def_readable() Message-ID: <20260418183657.19c00883@fedora> In-Reply-To: <61264010-608d-46dd-884c-2448b65a57ed@paulmck-laptop> References: <20260415132722.788bbdcf@fedora> <20260417084313.010864e8@fedora> <20260417093025.38faf68d@fedora> <61264010-608d-46dd-884c-2448b65a57ed@paulmck-laptop> X-Mailer: Claws Mail 4.3.1 (GTK 3.24.52; x86_64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6F3832E X-Rspamd-Server: rspamout06 X-Stat-Signature: febk46w4q68mgzg9ao1ehgqrqkgr9mcp X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18BeL9eUY7OOjde65hpgMhCgU9oGbnoVVA= X-HE-Tag: 1776551828-502936 X-HE-Meta: U2FsdGVkX1/Mi33+1/JuKQJX+qDHcG3K8jUt/bvb5z2W/kekcDtL/KVVmKyDgf3boh2KzO5UdFY= On Fri, 17 Apr 2026 12:03:22 -0700 "Paul E. McKenney" wrote: > Huh. One other possibility is that CONFIG_PREEMPT_NONE=y and its friend > CONFIG_PREEMPT_VOLUNTARY=y responded to RCU calls for help at any call > to cond_resched() or (in the voluntary case) might_sleep(). Isn't it > the case that with CONFIG_PREEMPT_LAZY=y, nothing happens until some > other task shows up on that CPU? Maybe RCU needs to be more aggressive > about invoking resched_cpu() when the grace period starts getting a bit > long in the tooth? Perhaps. Note, I was still able to trigger the issue with PREEMPT_FULL. But anyway, I think I may have found the culprit. It looks to be a __wake_up_common that is waking up too many tasks: hackbench_64-4695 [001] ..... 328.224508: function: consume_skb <-- unix_stream_read_generic hackbench_64-4695 [001] ..... 328.224508: function: skb_release_head_state <-- consume_skb hackbench_64-4695 [001] ..... 328.224508: function: unix_destruct_scm <-- skb_release_head_state hackbench_64-4695 [001] ..... 328.224508: function: put_pid <-- unix_destruct_scm hackbench_64-4695 [001] ..... 328.224508: function: sock_wfree <-- unix_destruct_scm hackbench_64-4695 [001] ..... 328.224509: function: unix_write_space <-- sock_wfree hackbench_64-4695 [001] ..... 328.224509: function: __rcu_read_lock <-- unix_write_space hackbench_64-4695 [001] ..... 328.224509: function: __wake_up_sync_key <-- unix_write_space hackbench_64-4695 [001] ..... 328.224510: function: _raw_spin_lock_irqsave <-- __wake_up_sync_key hackbench_64-4695 [001] d.... 328.224510: function: preempt_count_add <-- _raw_spin_lock_irqsave hackbench_64-4695 [001] d..1. 328.224510: function: do_raw_spin_lock <-- _raw_spin_lock_irqsave hackbench_64-4695 [001] d..1. 328.224511: function: __wake_up_common <-- __wake_up_sync_key hackbench_64-4695 [001] d..1. 328.224511: function: autoremove_wake_function <-- __wake_up_common hackbench_64-4695 [001] d..1. 328.224511: function: default_wake_function <-- autoremove_wake_function hackbench_64-4695 [001] d..1. 328.224512: function: try_to_wake_up <-- autoremove_wake_function The above __wake_up_common() goes into a very long loop, so much so that the LAZY_NEED_RESCHED is triggered (look at the 'L' in the flags field of the trace: hackbench_64-4695 [001] d..4. 328.224582: function: vruntime_eligible <-- __pick_eevdf hackbench_64-4695 [001] d..4. 328.224582: function: vruntime_eligible <-- __pick_eevdf hackbench_64-4695 [001] d..4. 328.224584: function: resched_curr_lazy <-- wakeup_preempt hackbench_64-4695 [001] d..4. 328.224584: function: __resched_curr <-- wakeup_preempt hackbench_64-4695 [001] dL.4. 328.224584: function: __srcu_check_read_flavor <-- ttwu_do_activate hackbench_64-4695 [001] dL.4. 328.224585: function: __srcu_check_read_flavor <-- ttwu_do_activate hackbench_64-4695 [001] dL.4. 328.224585: function: _raw_spin_unlock <-- try_to_wake_up hackbench_64-4695 [001] dL.4. 328.224585: function: do_raw_spin_unlock <-- _raw_spin_unlock hackbench_64-4695 [001] dL.4. 328.224586: function: preempt_count_sub <-- _raw_spin_unlock hackbench_64-4695 [001] dL.3. 328.224586: function: _raw_spin_unlock_irqrestore <-- try_to_wake_up hackbench_64-4695 [001] dL.3. 328.224586: function: do_raw_spin_unlock <-- _raw_spin_unlock_irqrestore hackbench_64-4695 [001] dL.3. 328.224586: function: preempt_count_sub <-- _raw_spin_unlock_irqrestore hackbench_64-4695 [001] dL.2. 328.224586: function: preempt_count_sub <-- try_to_wake_up hackbench_64-4695 [001] dL.1. 328.224587: function: autoremove_wake_function <-- __wake_up_common hackbench_64-4695 [001] dL.1. 328.224587: function: default_wake_function <-- autoremove_wake_function hackbench_64-4695 [001] dL.1. 328.224587: function: try_to_wake_up <-- autoremove_wake_function hackbench_64-4695 [001] dL.1. 328.224588: function: preempt_count_add <-- try_to_wake_up hackbench_64-4695 [001] dL.2. 328.224588: function: _raw_spin_lock_irqsave <-- try_to_wake_up hackbench_64-4695 [001] dL.2. 328.224588: function: preempt_count_add <-- _raw_spin_lock_irqsave hackbench_64-4695 [001] dL.3. 328.224588: function: do_raw_spin_lock <-- _raw_spin_lock_irqsave hackbench_64-4695 [001] dL.3. 328.224589: function: select_task_rq_fair <-- try_to_wake_up hackbench_64-4695 [001] dL.3. 328.224589: function: __rcu_read_lock <-- select_task_rq_fair hackbench_64-4695 [001] dL.3. 328.224589: function: __rcu_read_unlock <-- select_task_rq_fair hackbench_64-4695 [001] dL.3. 328.224590: function: ttwu_queue_wakelist <-- try_to_wake_up hackbench_64-4695 [001] dL.3. 328.224590: function: __smp_call_single_queue <-- ttwu_queue_wakelist ^ This goes on for a very long time, and the next tick is hit which causes a full NEED_RESCHED to happen: hackbench_64-4695 [001] dLh1. 328.229455: function: sched_tick <-- update_process_times hackbench_64-4695 [001] dLh1. 328.229456: function: arch_scale_freq_tick <-- sched_tick hackbench_64-4695 [001] dLh1. 328.229456: function: preempt_count_add <-- sched_tick hackbench_64-4695 [001] dLh2. 328.229457: function: _raw_spin_lock_nested <-- sched_tick hackbench_64-4695 [001] dLh2. 328.229457: function: preempt_count_add <-- _raw_spin_lock_nested hackbench_64-4695 [001] dLh3. 328.229458: function: do_raw_spin_lock <-- sched_tick hackbench_64-4695 [001] dLh3. 328.229470: function: preempt_count_sub <-- sched_tick hackbench_64-4695 [001] dLh2. 328.229470: function: update_rq_clock <-- sched_tick hackbench_64-4695 [001] dLh2. 328.229471: function: arch_scale_cpu_capacity <-- update_rq_clock hackbench_64-4695 [001] dLh2. 328.229472: function: __resched_curr <-- sched_tick hackbench_64-4695 [001] dNh2. 328.229472: function: task_tick_fair <-- sched_tick hackbench_64-4695 [001] dNh2. 328.229473: function: update_curr <-- task_tick_fair hackbench_64-4695 [001] dNh2. 328.229473: function: update_se <-- update_curr hackbench_64-4695 [001] dNh2. 328.229474: function: cpuacct_charge <-- update_se hackbench_64-4695 [001] dNh2. 328.229474: function: __cgroup_account_cputime <-- update_se hackbench_64-4695 [001] dNh2. 328.229475: function: preempt_count_add <-- __cgroup_account_cputime hackbench_64-4695 [001] dNh3. 328.229476: function: css_rstat_updated <-- __cgroup_account_cputime hackbench_64-4695 [001] dNh3. 328.229476: function: preempt_count_sub <-- __cgroup_account_cputime hackbench_64-4695 [001] dNh2. 328.229477: function: dl_server_update <-- update_curr hackbench_64-4695 [001] dNh2. 328.229477: function: update_curr_dl_se <-- update_curr hackbench_64-4695 [001] dNh2. 328.229478: function: resched_curr_lazy <-- update_curr hackbench_64-4695 [001] dNh2. 328.229478: function: __resched_curr <-- update_curr So this task finally gets scheduled out (while holding RCU read lock): static void unix_write_space(struct sock *sk) { struct socket_wq *wq; rcu_read_lock(); if (unix_writable(sk, READ_ONCE(sk->sk_state))) { wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND); sk_wake_async_rcu(sk, SOCK_WAKE_SPACE, POLL_OUT); } rcu_read_unlock(); } And because the system is very busy, it doesn't get scheduled in for a long time, and RCU complains about a stall: [ 357.679707] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 357.810980] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-3): P4695/1:b..l [ 357.813601] rcu: (detected by 2, t=6535 jiffies, g=4125, q=100 ncpus=4) [ 357.816256] task:hackbench_64 state:R running task stack:0 pid:4695 tgid:4695 ppid:4690 task_flags:0x400000 flags:0x00080000 [ 357.821176] Call Trace: [ 357.822382] [ 357.823474] __schedule+0x4ac/0x12f0 [ 357.825104] preempt_schedule_common+0x26/0xe0 [ 357.826988] ? preempt_schedule_thunk+0x16/0x30 [ 357.828930] preempt_schedule_thunk+0x16/0x30 [ 357.830788] ? _raw_spin_unlock_irqrestore+0x39/0x70 [ 357.832901] _raw_spin_unlock_irqrestore+0x5d/0x70 [ 357.834884] sock_def_readable+0x9c/0x2b0 [ 357.836633] unix_stream_sendmsg+0x2d7/0x710 [ 357.838423] sock_write_iter+0x185/0x190 [ 357.840084] vfs_write+0x457/0x5b0 [ 357.841568] ksys_write+0xc8/0xf0 [ 357.842914] do_syscall_64+0x117/0x1660 [ 357.844413] ? irqentry_exit+0xd9/0x690 Now I guess the question is, how do we fix this? Before LAZY_PREEMPT, an rcu_read_lock() would never schedule out. Now, long loops under RCU can schedule out, which can be flagged as an RCU stall. Should we not schedule out when rcu_read_lock() is held? -- Steve