From: Jiri Slaby <jirislaby@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@kernel.org>
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
Netdev <netdev@vger.kernel.org>,
rcu@vger.kernel.org, "MPTCP Linux" <mptcp@lists.linux.dev>,
"Linux Kernel" <linux-kernel@vger.kernel.org>,
"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"luto@kernel.org" <luto@kernel.org>,
"Michal Koutný" <MKoutny@suse.com>,
"Waiman Long" <longman@redhat.com>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Thu, 5 Mar 2026 12:53:31 +0100 [thread overview]
Message-ID: <a2b573b4-af61-4b84-a7d1-012ed6bb23c9@kernel.org> (raw)
In-Reply-To: <717310d8-6274-4b7f-8a19-561c45f5f565@kernel.org>
On 05. 03. 26, 8:00, Jiri Slaby wrote:
> On 02. 03. 26, 12:46, Peter Zijlstra wrote:
>> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>>
>>> The state of the lock:
>>>
>>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>> __lock = {
>>> raw_lock = {
>>> {
>>> val = {
>>> counter = 0x40003
>>> },
>>> {
>>> locked = 0x3,
>>> pending = 0x0
>>> },
>>> {
>>> locked_pending = 0x3,
>>> tail = 0x4
>>> }
>>> }
>>> }
>>> },
>>>
>>
>>
>> That had me remember the below patch that never quite made it. I've
>> rebased it to something more recent so it applies.
>>
>> If you stick that in, we might get a clue as to who is owning that lock.
>> Provided it all wants to reproduce well enough.
>
> Thanks, I applied it, but to date it is still not accepted yet:
> https://build.opensuse.org/requests/1335893
OK, I have a first dump with the patch applied:
__lock = {
raw_lock = {
{
val = {
counter = 0x2c0003
},
{
locked = 0x3,
pending = 0x0
},
{
locked_pending = 0x3,
tail = 0x2c
}
}
}
},
I am not sure if it is of any help?
BUT: I have another dump with LOCKDEP (but NOT the patch above). The
kernel is again spinning in mm_get_cid(), presumably waiting for a free
bit in the map as before [1]:
[ 162.660584] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 162.661378] Sending NMI from CPU 3 to CPUs 1:
[ 162.661398] NMI backtrace for cpu 1
...
[ 162.661411] RIP: 0010:mm_get_cid+0x54/0xc0
7680 is active on CPU 1:
PID: 7680 TASK: ffff8cc4038525c0 CPU: 1 COMMAND: "asm"
CPU3 is waiting for the CPU1's rq_lock:
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8cc72fcb8500
...
#3 [ffffd2e9c0083da0] raw_spin_rq_lock_nested+0x20 at ffffffff9339e700
crash> struct rq.__lock -x ffff8cc72fcb8500
__lock = {
raw_lock = {
{
val = {
counter = 0x100003
},
{
locked = 0x3,
pending = 0x0
},
{
locked_pending = 0x3,
tail = 0x10
}
}
},
magic = 0xdead4ead,
owner_cpu = 0x1,
owner = 0xffff8cc4038b8000,
dep_map = {
key = 0xffffffff96245970 <__key.7>,
class_cache = {0xffffffff9644b488 <lock_classes+10600>, 0x0},
name = 0xffffffff94ba3ab3 "&rq->__lock",
wait_type_outer = 0x0,
wait_type_inner = 0x2,
lock_type = 0x0
}
},
owner_cpu is 1, owner is:
PID: 7508 TASK: ffff8cc4038b8000 CPU: 1 COMMAND: "compile"
But as you can see above, CPU1 is occupied with a different task:
crash> bt -sxc 1
PID: 7680 TASK: ffff8cc4038525c0 CPU: 1 COMMAND: "asm"
spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.
[1] https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
> ffffffff8139cd40 <mm_get_cid>:
> mm_get_cid():
> include/linux/cpumask.h:1020
> ffffffff8139cd40: 8b 05 9a d7 40 02 mov 0x240d79a(%rip),%eax # ffffffff837aa4e0 <nr_cpu_ids>
> kernel/sched/sched.h:3779
> ffffffff8139cd46: 55 push %rbp
> ffffffff8139cd47: 53 push %rbx
> include/linux/mm_types.h:1477
> ffffffff8139cd48: 48 8d 9f 80 0b 00 00 lea 0xb80(%rdi),%rbx
> kernel/sched/sched.h:3780 (discriminator 2)
> ffffffff8139cd4f: 8b b7 0c 01 00 00 mov 0x10c(%rdi),%esi
> include/linux/cpumask.h:1020
> ffffffff8139cd55: 83 c0 3f add $0x3f,%eax
> ffffffff8139cd58: c1 e8 03 shr $0x3,%eax
> kernel/sched/sched.h:3780 (discriminator 2)
> ffffffff8139cd5b: 48 89 f5 mov %rsi,%rbp
> include/linux/mm_types.h:1479 (discriminator 1)
> ffffffff8139cd5e: 25 f8 ff ff 1f and $0x1ffffff8,%eax
> include/linux/mm_types.h:1489 (discriminator 1)
> ffffffff8139cd63: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi
> include/linux/find.h:393
> ffffffff8139cd67: e8 44 d8 6e 00 call ffffffff81a8a5b0 <_find_first_zero_bit>
> kernel/sched/sched.h:3771
> ffffffff8139cd6c: 39 e8 cmp %ebp,%eax
> ffffffff8139cd6e: 73 7c jae ffffffff8139cdec <mm_get_cid+0xac>
> ffffffff8139cd70: 89 c1 mov %eax,%ecx
> kernel/sched/sched.h:3773 (discriminator 1)
> ffffffff8139cd72: 89 c2 mov %eax,%edx
> include/linux/cpumask.h:1020
> ffffffff8139cd74: 8b 05 66 d7 40 02 mov 0x240d766(%rip),%eax # ffffffff837aa4e0 <nr_cpu_ids>
> ffffffff8139cd7a: 83 c0 3f add $0x3f,%eax
> ffffffff8139cd7d: c1 e8 03 shr $0x3,%eax
> include/linux/mm_types.h:1479 (discriminator 1)
> ffffffff8139cd80: 25 f8 ff ff 1f and $0x1ffffff8,%eax
> include/linux/mm_types.h:1489 (discriminator 1)
> ffffffff8139cd85: 48 8d 04 43 lea (%rbx,%rax,2),%rax
> arch/x86/include/asm/bitops.h:136
> ffffffff8139cd89: f0 48 0f ab 10 lock bts %rdx,(%rax)
> kernel/sched/sched.h:3773 (discriminator 2)
> ffffffff8139cd8e: 73 4b jae ffffffff8139cddb <mm_get_cid+0x9b>
> ffffffff8139cd90: eb 5a jmp ffffffff8139cdec <mm_get_cid+0xac>
> arch/x86/include/asm/vdso/processor.h:13
> ffffffff8139cd92: f3 90 pause
> include/linux/cpumask.h:1020
> ffffffff8139cd94: 8b 05 46 d7 40 02 mov 0x240d746(%rip),%eax # ffffffff837aa4e0 <nr_cpu_ids>
The CPU1 was caught by the NMI here ^^^^^^^^^^^^^^^^^^^^.
> In the meantime, me and Michal K. did some digging into qemu dumps.
> Details at (and a couple previous comments):
> https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
>
> tl;dr:
>
> In one of the dumps, one process sits in
> context_switch
> -> mm_get_cid (before switch_to())
>
> > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee
> (ffffffff820f162e) -> call mm_get_cid
>
> Michal extracted the vCPU's RIP and it turned out:
> > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a
> free CID.
> > ...
> > ffff8a88458137c0: 000000000000000f 000000000000000f
> > ^
> > Hm, so indeed CIDs for all four CPUs are occupied.
>
> To me (I don't know what CID is either), this might point as a possible
> culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].
>
> Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on weakly
> ordered systems") spells:
> > As a consequence the task will
> > not drop the CID when scheduling out before the fixup is
> completed, which
> > means the CID space can be exhausted and the next task scheduling
> in will
> > loop in mm_get_cid() and the fixup thread can livelock on the
> held runqueue
> > lock as above.
>
> Which sounds like what exactly happens here. Except the patch is from
> the series above, so is already in 6.19 obviously.
>
>
> I noticed there is also a 7.0-rc1 fix:
> 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
> But that got into 6.19.1 already (we are at 6.19.3). So does not improve
> the situation.
>
> Any ideas?
>
>
>
> [1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/
>
> thanks,
--
js
suse labs
next prev parent reply other threads:[~2026-03-05 11:53 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13 ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02 5:28 ` Jiri Slaby
2026-03-02 11:46 ` Peter Zijlstra
2026-03-02 14:30 ` Waiman Long
2026-03-05 7:00 ` Jiri Slaby
2026-03-05 11:53 ` Jiri Slaby [this message]
2026-03-05 12:20 ` Jiri Slaby
2026-03-05 16:16 ` Thomas Gleixner
2026-03-05 17:33 ` Jiri Slaby
2026-03-05 19:25 ` Thomas Gleixner
2026-03-06 5:48 ` Jiri Slaby
2026-03-06 9:57 ` Thomas Gleixner
2026-03-06 10:16 ` Jiri Slaby
2026-03-06 16:28 ` Thomas Gleixner
2026-03-06 11:06 ` Matthieu Baerts
2026-03-06 16:57 ` Matthieu Baerts
2026-03-06 18:31 ` Jiri Slaby
2026-03-06 18:44 ` Matthieu Baerts
2026-03-06 21:40 ` Matthieu Baerts
2026-03-06 15:24 ` Peter Zijlstra
2026-03-07 9:01 ` Thomas Gleixner
2026-03-07 22:29 ` Thomas Gleixner
2026-03-08 9:15 ` Thomas Gleixner
2026-03-08 16:55 ` Jiri Slaby
2026-03-08 16:58 ` Thomas Gleixner
2026-03-08 17:23 ` Matthieu Baerts
2026-03-09 8:43 ` Thomas Gleixner
2026-03-09 12:23 ` Matthieu Baerts
2026-03-10 8:09 ` Thomas Gleixner
2026-03-10 8:20 ` Thomas Gleixner
2026-03-10 8:56 ` Jiri Slaby
2026-03-10 9:00 ` Jiri Slaby
2026-03-10 10:03 ` Thomas Gleixner
2026-03-10 10:06 ` Thomas Gleixner
2026-03-10 11:24 ` Matthieu Baerts
2026-03-10 11:54 ` Peter Zijlstra
2026-03-10 12:28 ` Thomas Gleixner
2026-03-10 13:40 ` Matthieu Baerts
2026-03-10 13:47 ` Thomas Gleixner
2026-03-10 15:51 ` Matthieu Baerts
2026-03-03 13:23 ` Matthieu Baerts
2026-03-05 6:46 ` Jiri Slaby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a2b573b4-af61-4b84-a7d1-012ed6bb23c9@kernel.org \
--to=jirislaby@kernel.org \
--cc=MKoutny@suse.com \
--cc=dave.hansen@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=luto@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=sgarzare@redhat.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=stefanha@redhat.com \
--cc=tglx@kernel.org \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.