Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jiri Slaby <jirislaby@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@kernel.org>
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux.dev,
	Netdev <netdev@vger.kernel.org>,
	rcu@vger.kernel.org, "MPTCP Linux" <mptcp@lists.linux.dev>,
	"Linux Kernel" <linux-kernel@vger.kernel.org>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"Michal Koutný" <MKoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Thu, 5 Mar 2026 12:53:31 +0100	[thread overview]
Message-ID: <a2b573b4-af61-4b84-a7d1-012ed6bb23c9@kernel.org> (raw)
In-Reply-To: <717310d8-6274-4b7f-8a19-561c45f5f565@kernel.org>

On 05. 03. 26, 8:00, Jiri Slaby wrote:
> On 02. 03. 26, 12:46, Peter Zijlstra wrote:
>> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>>
>>> The state of the lock:
>>>
>>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>>    __lock = {
>>>      raw_lock = {
>>>        {
>>>          val = {
>>>            counter = 0x40003
>>>          },
>>>          {
>>>            locked = 0x3,
>>>            pending = 0x0
>>>          },
>>>          {
>>>            locked_pending = 0x3,
>>>            tail = 0x4
>>>          }
>>>        }
>>>      }
>>>    },
>>>
>>
>>
>> That had me remember the below patch that never quite made it. I've
>> rebased it to something more recent so it applies.
>>
>> If you stick that in, we might get a clue as to who is owning that lock.
>> Provided it all wants to reproduce well enough.
> 
> Thanks, I applied it, but to date it is still not accepted yet:
> https://build.opensuse.org/requests/1335893

OK, I have a first dump with the patch applied:
   __lock = {
     raw_lock = {
       {
         val = {
           counter = 0x2c0003
         },
         {
           locked = 0x3,
           pending = 0x0
         },
         {
           locked_pending = 0x3,
           tail = 0x2c
         }
       }
     }
   },

I am not sure if it is of any help?




BUT: I have another dump with LOCKDEP (but NOT the patch above). The 
kernel is again spinning in mm_get_cid(), presumably waiting for a free 
bit in the map as before [1]:


[  162.660584] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[  162.661378] Sending NMI from CPU 3 to CPUs 1:
[  162.661398] NMI backtrace for cpu 1
...
[  162.661411] RIP: 0010:mm_get_cid+0x54/0xc0


7680 is active on CPU 1:
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"


CPU3 is waiting for the CPU1's rq_lock:
RDX: 0000000000000000  RSI: 0000000000000003  RDI: ffff8cc72fcb8500
...
  #3 [ffffd2e9c0083da0] raw_spin_rq_lock_nested+0x20 at ffffffff9339e700

crash> struct rq.__lock -x ffff8cc72fcb8500
   __lock = {
     raw_lock = {
       {
         val = {
           counter = 0x100003
         },
         {
           locked = 0x3,
           pending = 0x0
         },
         {
           locked_pending = 0x3,
           tail = 0x10
         }
       }
     },
     magic = 0xdead4ead,
     owner_cpu = 0x1,
     owner = 0xffff8cc4038b8000,
     dep_map = {
       key = 0xffffffff96245970 <__key.7>,
       class_cache = {0xffffffff9644b488 <lock_classes+10600>, 0x0},
       name = 0xffffffff94ba3ab3 "&rq->__lock",
       wait_type_outer = 0x0,
       wait_type_inner = 0x2,
       lock_type = 0x0
     }
   },

owner_cpu is 1, owner is:
PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"

But as you can see above, CPU1 is occupied with a different task:
crash> bt -sxc 1
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"

spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17


> ffffffff8139cd40 <mm_get_cid>:
> mm_get_cid():
> include/linux/cpumask.h:1020
> ffffffff8139cd40:       8b 05 9a d7 40 02       mov    0x240d79a(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
> kernel/sched/sched.h:3779
> ffffffff8139cd46:       55                      push   %rbp
> ffffffff8139cd47:       53                      push   %rbx
> include/linux/mm_types.h:1477
> ffffffff8139cd48:       48 8d 9f 80 0b 00 00    lea    0xb80(%rdi),%rbx
> kernel/sched/sched.h:3780 (discriminator 2)
> ffffffff8139cd4f:       8b b7 0c 01 00 00       mov    0x10c(%rdi),%esi
> include/linux/cpumask.h:1020
> ffffffff8139cd55:       83 c0 3f                add    $0x3f,%eax
> ffffffff8139cd58:       c1 e8 03                shr    $0x3,%eax
> kernel/sched/sched.h:3780 (discriminator 2)
> ffffffff8139cd5b:       48 89 f5                mov    %rsi,%rbp
> include/linux/mm_types.h:1479 (discriminator 1)
> ffffffff8139cd5e:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
> include/linux/mm_types.h:1489 (discriminator 1)
> ffffffff8139cd63:       48 8d 3c 43             lea    (%rbx,%rax,2),%rdi
> include/linux/find.h:393
> ffffffff8139cd67:       e8 44 d8 6e 00          call   ffffffff81a8a5b0 <_find_first_zero_bit>
> kernel/sched/sched.h:3771
> ffffffff8139cd6c:       39 e8                   cmp    %ebp,%eax
> ffffffff8139cd6e:       73 7c                   jae    ffffffff8139cdec <mm_get_cid+0xac>
> ffffffff8139cd70:       89 c1                   mov    %eax,%ecx
> kernel/sched/sched.h:3773 (discriminator 1)
> ffffffff8139cd72:       89 c2                   mov    %eax,%edx
> include/linux/cpumask.h:1020
> ffffffff8139cd74:       8b 05 66 d7 40 02       mov    0x240d766(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
> ffffffff8139cd7a:       83 c0 3f                add    $0x3f,%eax
> ffffffff8139cd7d:       c1 e8 03                shr    $0x3,%eax
> include/linux/mm_types.h:1479 (discriminator 1)
> ffffffff8139cd80:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
> include/linux/mm_types.h:1489 (discriminator 1)
> ffffffff8139cd85:       48 8d 04 43             lea    (%rbx,%rax,2),%rax
> arch/x86/include/asm/bitops.h:136
> ffffffff8139cd89:       f0 48 0f ab 10          lock bts %rdx,(%rax)
> kernel/sched/sched.h:3773 (discriminator 2)
> ffffffff8139cd8e:       73 4b                   jae    ffffffff8139cddb <mm_get_cid+0x9b>
> ffffffff8139cd90:       eb 5a                   jmp    ffffffff8139cdec <mm_get_cid+0xac>
> arch/x86/include/asm/vdso/processor.h:13
> ffffffff8139cd92:       f3 90                   pause
> include/linux/cpumask.h:1020
> ffffffff8139cd94:       8b 05 46 d7 40 02       mov    0x240d746(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>

The CPU1 was caught by the NMI here ^^^^^^^^^^^^^^^^^^^^.




> In the meantime, me and Michal K. did some digging into qemu dumps. 
> Details at (and a couple previous comments):
> https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
> 
> tl;dr:
> 
> In one of the dumps, one process sits in
>    context_switch
>      -> mm_get_cid (before switch_to())
> 
>  > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee 
> (ffffffff820f162e) -> call mm_get_cid
> 
> Michal extracted the vCPU's RIP and it turned out:
>  > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a 
> free CID.
>  > ...
>  > ffff8a88458137c0:  000000000000000f 000000000000000f
>  >                                                    ^
>  > Hm, so indeed CIDs for all four CPUs are occupied.
> 
> To me (I don't know what CID is either), this might point as a possible 
> culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].
> 
> Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on weakly 
> ordered systems") spells:
>  >     As a consequence the task will
>  >     not drop the CID when scheduling out before the fixup is 
> completed, which
>  >     means the CID space can be exhausted and the next task scheduling 
> in will
>  >     loop in mm_get_cid() and the fixup thread can livelock on the 
> held runqueue
>  >     lock as above.
> 
> Which sounds like what exactly happens here. Except the patch is from 
> the series above, so is already in 6.19 obviously.
> 
> 
> I noticed there is also a 7.0-rc1 fix:
>    1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
> But that got into 6.19.1 already (we are at 6.19.3). So does not improve 
> the situation.
> 
> Any ideas?
> 
> 
> 
> [1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/
> 
> thanks,

-- 
js
suse labs

next prev parent reply	other threads:[~2026-03-05 11:53 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13   ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02  5:28   ` Jiri Slaby
2026-03-02 11:46     ` Peter Zijlstra
2026-03-02 14:30       ` Waiman Long
2026-03-05  7:00       ` Jiri Slaby
2026-03-05 11:53         ` Jiri Slaby [this message]
2026-03-05 12:20           ` Jiri Slaby
2026-03-05 16:16             ` Thomas Gleixner
2026-03-05 17:33               ` Jiri Slaby
2026-03-05 19:25                 ` Thomas Gleixner
2026-03-06  5:48                   ` Jiri Slaby
2026-03-06  9:57                     ` Thomas Gleixner
2026-03-06 10:16                       ` Jiri Slaby
2026-03-06 16:28                         ` Thomas Gleixner
2026-03-06 11:06                       ` Matthieu Baerts
2026-03-06 16:57                         ` Matthieu Baerts
2026-03-06 18:31                           ` Jiri Slaby
2026-03-06 18:44                             ` Matthieu Baerts
2026-03-06 21:40                           ` Matthieu Baerts
2026-03-06 15:24                       ` Peter Zijlstra
2026-03-07  9:01                         ` Thomas Gleixner
2026-03-07 22:29                           ` Thomas Gleixner
2026-03-08  9:15                             ` Thomas Gleixner
2026-03-08 16:55                               ` Jiri Slaby
2026-03-08 16:58                               ` Thomas Gleixner
2026-03-08 17:23                                 ` Matthieu Baerts
2026-03-09  8:43                                   ` Thomas Gleixner
2026-03-09 12:23                                     ` Matthieu Baerts
2026-03-10  8:09                                       ` Thomas Gleixner
2026-03-10  8:20                                         ` Thomas Gleixner
2026-03-10  8:56                                         ` Jiri Slaby
2026-03-10  9:00                                           ` Jiri Slaby
2026-03-10 10:03                                             ` Thomas Gleixner
2026-03-10 10:06                                               ` Thomas Gleixner
2026-03-10 11:24                                                 ` Matthieu Baerts
2026-03-10 11:54                                                   ` Peter Zijlstra
2026-03-10 12:28                                                     ` Thomas Gleixner
2026-03-10 13:40                                                       ` Matthieu Baerts
2026-03-10 13:47                                                         ` Thomas Gleixner
2026-03-10 15:51                                                           ` Matthieu Baerts
2026-03-03 13:23   ` Matthieu Baerts
2026-03-05  6:46     ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2b573b4-af61-4b84-a7d1-012ed6bb23c9@kernel.org \
    --to=jirislaby@kernel.org \
    --cc=MKoutny@suse.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=luto@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=stefanha@redhat.com \
    --cc=tglx@kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.