From: Jiri Slaby <jirislaby@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@kernel.org>
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
Netdev <netdev@vger.kernel.org>,
rcu@vger.kernel.org, "MPTCP Linux" <mptcp@lists.linux.dev>,
"Linux Kernel" <linux-kernel@vger.kernel.org>,
"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"luto@kernel.org" <luto@kernel.org>,
"Michal Koutný" <MKoutny@suse.com>,
"Waiman Long" <longman@redhat.com>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Thu, 5 Mar 2026 13:20:38 +0100 [thread overview]
Message-ID: <ba067933-bf3b-476d-a0bb-53eda56996ca@kernel.org> (raw)
In-Reply-To: <a2b573b4-af61-4b84-a7d1-012ed6bb23c9@kernel.org>
On 05. 03. 26, 12:53, Jiri Slaby wrote:
> On 05. 03. 26, 8:00, Jiri Slaby wrote:
>> On 02. 03. 26, 12:46, Peter Zijlstra wrote:
>>> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>>>
>>>> The state of the lock:
>>>>
>>>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>>> __lock = {
>>>> raw_lock = {
>>>> {
>>>> val = {
>>>> counter = 0x40003
>>>> },
>>>> {
>>>> locked = 0x3,
>>>> pending = 0x0
>>>> },
>>>> {
>>>> locked_pending = 0x3,
>>>> tail = 0x4
>>>> }
>>>> }
>>>> }
>>>> },
>>>>
>>>
>>>
>>> That had me remember the below patch that never quite made it. I've
>>> rebased it to something more recent so it applies.
>>>
>>> If you stick that in, we might get a clue as to who is owning that lock.
>>> Provided it all wants to reproduce well enough.
>>
>> Thanks, I applied it, but to date it is still not accepted yet:
>> https://build.opensuse.org/requests/1335893
>
> OK, I have a first dump with the patch applied:
> __lock = {
> raw_lock = {
> {
> val = {
> counter = 0x2c0003
> },
> {
> locked = 0x3,
> pending = 0x0
> },
> {
> locked_pending = 0x3,
> tail = 0x2c
> }
> }
> }
> },
>
> I am not sure if it is of any help?
>
>
>
>
> BUT: I have another dump with LOCKDEP (but NOT the patch above). The
> kernel is again spinning in mm_get_cid(), presumably waiting for a free
> bit in the map as before [1]:
>
>
> [ 162.660584] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> ...
> [ 162.661378] Sending NMI from CPU 3 to CPUs 1:
> [ 162.661398] NMI backtrace for cpu 1
> ...
> [ 162.661411] RIP: 0010:mm_get_cid+0x54/0xc0
>
>
> 7680 is active on CPU 1:
> PID: 7680 TASK: ffff8cc4038525c0 CPU: 1 COMMAND: "asm"
>
>
> CPU3 is waiting for the CPU1's rq_lock:
> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8cc72fcb8500
> ...
> #3 [ffffd2e9c0083da0] raw_spin_rq_lock_nested+0x20 at ffffffff9339e700
>
> crash> struct rq.__lock -x ffff8cc72fcb8500
> __lock = {
> raw_lock = {
> {
> val = {
> counter = 0x100003
> },
> {
> locked = 0x3,
> pending = 0x0
> },
> {
> locked_pending = 0x3,
> tail = 0x10
> }
> }
> },
> magic = 0xdead4ead,
> owner_cpu = 0x1,
> owner = 0xffff8cc4038b8000,
> dep_map = {
> key = 0xffffffff96245970 <__key.7>,
> class_cache = {0xffffffff9644b488 <lock_classes+10600>, 0x0},
> name = 0xffffffff94ba3ab3 "&rq->__lock",
> wait_type_outer = 0x0,
> wait_type_inner = 0x2,
> lock_type = 0x0
> }
> },
>
> owner_cpu is 1, owner is:
> PID: 7508 TASK: ffff8cc4038b8000 CPU: 1 COMMAND: "compile"
>
> But as you can see above, CPU1 is occupied with a different task:
> crash> bt -sxc 1
> PID: 7680 TASK: ffff8cc4038525c0 CPU: 1 COMMAND: "asm"
>
> spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.
You might be interested in mm_cid dumps:
====== PID 7508 (sleeping, holding the rq lock) ======
crash> task -R mm_cid -x 7508
PID: 7508 TASK: ffff8cc4038b8000 CPU: 1 COMMAND: "compile"
mm_cid = {
active = 0x1,
cid = 0x40000003
},
crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
$6 = {
pcpu = 0x66222619df40,
mode = 1073741824,
max_cids = 4,
====== PID 7680 (spinning in mm_get_cid()) ======
crash> task -R mm_cid -x 7680
PID: 7680 TASK: ffff8cc4038525c0 CPU: 1 COMMAND: "asm"
mm_cid = {
active = 0x1,
cid = 0x80000000
},
crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
$8 = {
pcpu = 0x66222619df40,
mode = 1073741824,
max_cids = 4,
====== per-cpu for CPU1 ======
crash> struct mm_cid_pcpu -x fffff2e9bfc89f40
struct mm_cid_pcpu {
cid = 0x40000003
}
Dump of any other's mm_cids needed?
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
>
>
>> ffffffff8139cd40 <mm_get_cid>:
>> mm_get_cid():
>> include/linux/cpumask.h:1020
>> ffffffff8139cd40: 8b 05 9a d7 40 02 mov
>> 0x240d79a(%rip),%eax # ffffffff837aa4e0 <nr_cpu_ids>
>> kernel/sched/sched.h:3779
>> ffffffff8139cd46: 55 push %rbp
>> ffffffff8139cd47: 53 push %rbx
>> include/linux/mm_types.h:1477
>> ffffffff8139cd48: 48 8d 9f 80 0b 00 00 lea 0xb80(%rdi),%rbx
>> kernel/sched/sched.h:3780 (discriminator 2)
>> ffffffff8139cd4f: 8b b7 0c 01 00 00 mov 0x10c(%rdi),%esi
>> include/linux/cpumask.h:1020
>> ffffffff8139cd55: 83 c0 3f add $0x3f,%eax
>> ffffffff8139cd58: c1 e8 03 shr $0x3,%eax
>> kernel/sched/sched.h:3780 (discriminator 2)
>> ffffffff8139cd5b: 48 89 f5 mov %rsi,%rbp
>> include/linux/mm_types.h:1479 (discriminator 1)
>> ffffffff8139cd5e: 25 f8 ff ff 1f and $0x1ffffff8,%eax
>> include/linux/mm_types.h:1489 (discriminator 1)
>> ffffffff8139cd63: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi
>> include/linux/find.h:393
>> ffffffff8139cd67: e8 44 d8 6e 00 call
>> ffffffff81a8a5b0 <_find_first_zero_bit>
>> kernel/sched/sched.h:3771
>> ffffffff8139cd6c: 39 e8 cmp %ebp,%eax
>> ffffffff8139cd6e: 73 7c jae
>> ffffffff8139cdec <mm_get_cid+0xac>
>> ffffffff8139cd70: 89 c1 mov %eax,%ecx
>> kernel/sched/sched.h:3773 (discriminator 1)
>> ffffffff8139cd72: 89 c2 mov %eax,%edx
>> include/linux/cpumask.h:1020
>> ffffffff8139cd74: 8b 05 66 d7 40 02 mov
>> 0x240d766(%rip),%eax # ffffffff837aa4e0 <nr_cpu_ids>
>> ffffffff8139cd7a: 83 c0 3f add $0x3f,%eax
>> ffffffff8139cd7d: c1 e8 03 shr $0x3,%eax
>> include/linux/mm_types.h:1479 (discriminator 1)
>> ffffffff8139cd80: 25 f8 ff ff 1f and $0x1ffffff8,%eax
>> include/linux/mm_types.h:1489 (discriminator 1)
>> ffffffff8139cd85: 48 8d 04 43 lea (%rbx,%rax,2),%rax
>> arch/x86/include/asm/bitops.h:136
>> ffffffff8139cd89: f0 48 0f ab 10 lock bts %rdx,(%rax)
>> kernel/sched/sched.h:3773 (discriminator 2)
>> ffffffff8139cd8e: 73 4b jae
>> ffffffff8139cddb <mm_get_cid+0x9b>
>> ffffffff8139cd90: eb 5a jmp
>> ffffffff8139cdec <mm_get_cid+0xac>
>> arch/x86/include/asm/vdso/processor.h:13
>> ffffffff8139cd92: f3 90 pause
>> include/linux/cpumask.h:1020
>> ffffffff8139cd94: 8b 05 46 d7 40 02 mov
>> 0x240d746(%rip),%eax # ffffffff837aa4e0 <nr_cpu_ids>
>
> The CPU1 was caught by the NMI here ^^^^^^^^^^^^^^^^^^^^.
>
>
>
>
>> In the meantime, me and Michal K. did some digging into qemu dumps.
>> Details at (and a couple previous comments):
>> https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
>>
>> tl;dr:
>>
>> In one of the dumps, one process sits in
>> context_switch
>> -> mm_get_cid (before switch_to())
>>
>> > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee
>> (ffffffff820f162e) -> call mm_get_cid
>>
>> Michal extracted the vCPU's RIP and it turned out:
>> > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a
>> free CID.
>> > ...
>> > ffff8a88458137c0: 000000000000000f 000000000000000f
>> > ^
>> > Hm, so indeed CIDs for all four CPUs are occupied.
>>
>> To me (I don't know what CID is either), this might point as a
>> possible culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].
>>
>> Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on
>> weakly ordered systems") spells:
>> > As a consequence the task will
>> > not drop the CID when scheduling out before the fixup is
>> completed, which
>> > means the CID space can be exhausted and the next task
>> scheduling in will
>> > loop in mm_get_cid() and the fixup thread can livelock on the
>> held runqueue
>> > lock as above.
>>
>> Which sounds like what exactly happens here. Except the patch is from
>> the series above, so is already in 6.19 obviously.
>>
>>
>> I noticed there is also a 7.0-rc1 fix:
>> 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
>> But that got into 6.19.1 already (we are at 6.19.3). So does not
>> improve the situation.
>>
>> Any ideas?
>>
>>
>>
>> [1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/
>>
>> thanks,
>
--
js
suse labs
next prev parent reply other threads:[~2026-03-05 12:20 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13 ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02 5:28 ` Jiri Slaby
2026-03-02 11:46 ` Peter Zijlstra
2026-03-02 14:30 ` Waiman Long
2026-03-05 7:00 ` Jiri Slaby
2026-03-05 11:53 ` Jiri Slaby
2026-03-05 12:20 ` Jiri Slaby [this message]
2026-03-05 16:16 ` Thomas Gleixner
2026-03-05 17:33 ` Jiri Slaby
2026-03-05 19:25 ` Thomas Gleixner
2026-03-06 5:48 ` Jiri Slaby
2026-03-06 9:57 ` Thomas Gleixner
2026-03-06 10:16 ` Jiri Slaby
2026-03-06 16:28 ` Thomas Gleixner
2026-03-06 11:06 ` Matthieu Baerts
2026-03-06 16:57 ` Matthieu Baerts
2026-03-06 18:31 ` Jiri Slaby
2026-03-06 18:44 ` Matthieu Baerts
2026-03-06 21:40 ` Matthieu Baerts
2026-03-06 15:24 ` Peter Zijlstra
2026-03-07 9:01 ` Thomas Gleixner
2026-03-07 22:29 ` Thomas Gleixner
2026-03-08 9:15 ` Thomas Gleixner
2026-03-08 16:55 ` Jiri Slaby
2026-03-08 16:58 ` Thomas Gleixner
2026-03-08 17:23 ` Matthieu Baerts
2026-03-09 8:43 ` Thomas Gleixner
2026-03-09 12:23 ` Matthieu Baerts
2026-03-10 8:09 ` Thomas Gleixner
2026-03-10 8:20 ` Thomas Gleixner
2026-03-10 8:56 ` Jiri Slaby
2026-03-10 9:00 ` Jiri Slaby
2026-03-10 10:03 ` Thomas Gleixner
2026-03-10 10:06 ` Thomas Gleixner
2026-03-10 11:24 ` Matthieu Baerts
2026-03-10 11:54 ` Peter Zijlstra
2026-03-10 12:28 ` Thomas Gleixner
2026-03-10 13:40 ` Matthieu Baerts
2026-03-10 13:47 ` Thomas Gleixner
2026-03-10 15:51 ` Matthieu Baerts
2026-03-03 13:23 ` Matthieu Baerts
2026-03-05 6:46 ` Jiri Slaby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba067933-bf3b-476d-a0bb-53eda56996ca@kernel.org \
--to=jirislaby@kernel.org \
--cc=MKoutny@suse.com \
--cc=dave.hansen@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=luto@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=sgarzare@redhat.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=stefanha@redhat.com \
--cc=tglx@kernel.org \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox