Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jiri Slaby <jirislaby@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@kernel.org>
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux.dev,
	Netdev <netdev@vger.kernel.org>,
	rcu@vger.kernel.org, "MPTCP Linux" <mptcp@lists.linux.dev>,
	"Linux Kernel" <linux-kernel@vger.kernel.org>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"Michal Koutný" <MKoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Thu, 5 Mar 2026 13:20:38 +0100	[thread overview]
Message-ID: <ba067933-bf3b-476d-a0bb-53eda56996ca@kernel.org> (raw)
In-Reply-To: <a2b573b4-af61-4b84-a7d1-012ed6bb23c9@kernel.org>

On 05. 03. 26, 12:53, Jiri Slaby wrote:
> On 05. 03. 26, 8:00, Jiri Slaby wrote:
>> On 02. 03. 26, 12:46, Peter Zijlstra wrote:
>>> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>>>
>>>> The state of the lock:
>>>>
>>>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>>>    __lock = {
>>>>      raw_lock = {
>>>>        {
>>>>          val = {
>>>>            counter = 0x40003
>>>>          },
>>>>          {
>>>>            locked = 0x3,
>>>>            pending = 0x0
>>>>          },
>>>>          {
>>>>            locked_pending = 0x3,
>>>>            tail = 0x4
>>>>          }
>>>>        }
>>>>      }
>>>>    },
>>>>
>>>
>>>
>>> That had me remember the below patch that never quite made it. I've
>>> rebased it to something more recent so it applies.
>>>
>>> If you stick that in, we might get a clue as to who is owning that lock.
>>> Provided it all wants to reproduce well enough.
>>
>> Thanks, I applied it, but to date it is still not accepted yet:
>> https://build.opensuse.org/requests/1335893
> 
> OK, I have a first dump with the patch applied:
>    __lock = {
>      raw_lock = {
>        {
>          val = {
>            counter = 0x2c0003
>          },
>          {
>            locked = 0x3,
>            pending = 0x0
>          },
>          {
>            locked_pending = 0x3,
>            tail = 0x2c
>          }
>        }
>      }
>    },
> 
> I am not sure if it is of any help?
> 
> 
> 
> 
> BUT: I have another dump with LOCKDEP (but NOT the patch above). The 
> kernel is again spinning in mm_get_cid(), presumably waiting for a free 
> bit in the map as before [1]:
> 
> 
> [  162.660584] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> ...
> [  162.661378] Sending NMI from CPU 3 to CPUs 1:
> [  162.661398] NMI backtrace for cpu 1
> ...
> [  162.661411] RIP: 0010:mm_get_cid+0x54/0xc0
> 
> 
> 7680 is active on CPU 1:
> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
> 
> 
> CPU3 is waiting for the CPU1's rq_lock:
> RDX: 0000000000000000  RSI: 0000000000000003  RDI: ffff8cc72fcb8500
> ...
>   #3 [ffffd2e9c0083da0] raw_spin_rq_lock_nested+0x20 at ffffffff9339e700
> 
> crash> struct rq.__lock -x ffff8cc72fcb8500
>    __lock = {
>      raw_lock = {
>        {
>          val = {
>            counter = 0x100003
>          },
>          {
>            locked = 0x3,
>            pending = 0x0
>          },
>          {
>            locked_pending = 0x3,
>            tail = 0x10
>          }
>        }
>      },
>      magic = 0xdead4ead,
>      owner_cpu = 0x1,
>      owner = 0xffff8cc4038b8000,
>      dep_map = {
>        key = 0xffffffff96245970 <__key.7>,
>        class_cache = {0xffffffff9644b488 <lock_classes+10600>, 0x0},
>        name = 0xffffffff94ba3ab3 "&rq->__lock",
>        wait_type_outer = 0x0,
>        wait_type_inner = 0x2,
>        lock_type = 0x0
>      }
>    },
> 
> owner_cpu is 1, owner is:
> PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
> 
> But as you can see above, CPU1 is occupied with a different task:
> crash> bt -sxc 1
> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
> 
> spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.

You might be interested in mm_cid dumps:

====== PID 7508 (sleeping, holding the rq lock) ======

crash> task -R mm_cid -x 7508
PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000003
   },

crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
$6 = {
   pcpu = 0x66222619df40,
   mode = 1073741824,
   max_cids = 4,


====== PID 7680 (spinning in mm_get_cid()) ======

crash> task -R mm_cid -x 7680
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
   mm_cid = {
     active = 0x1,
     cid = 0x80000000
   },

crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
$8 = {
   pcpu = 0x66222619df40,
   mode = 1073741824,
   max_cids = 4,


====== per-cpu for CPU1 ======

crash> struct mm_cid_pcpu -x fffff2e9bfc89f40
struct mm_cid_pcpu {
   cid = 0x40000003
}



Dump of any other's mm_cids needed?

> [1] https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
> 
> 
>> ffffffff8139cd40 <mm_get_cid>:
>> mm_get_cid():
>> include/linux/cpumask.h:1020
>> ffffffff8139cd40:       8b 05 9a d7 40 02       mov    
>> 0x240d79a(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
>> kernel/sched/sched.h:3779
>> ffffffff8139cd46:       55                      push   %rbp
>> ffffffff8139cd47:       53                      push   %rbx
>> include/linux/mm_types.h:1477
>> ffffffff8139cd48:       48 8d 9f 80 0b 00 00    lea    0xb80(%rdi),%rbx
>> kernel/sched/sched.h:3780 (discriminator 2)
>> ffffffff8139cd4f:       8b b7 0c 01 00 00       mov    0x10c(%rdi),%esi
>> include/linux/cpumask.h:1020
>> ffffffff8139cd55:       83 c0 3f                add    $0x3f,%eax
>> ffffffff8139cd58:       c1 e8 03                shr    $0x3,%eax
>> kernel/sched/sched.h:3780 (discriminator 2)
>> ffffffff8139cd5b:       48 89 f5                mov    %rsi,%rbp
>> include/linux/mm_types.h:1479 (discriminator 1)
>> ffffffff8139cd5e:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
>> include/linux/mm_types.h:1489 (discriminator 1)
>> ffffffff8139cd63:       48 8d 3c 43             lea    (%rbx,%rax,2),%rdi
>> include/linux/find.h:393
>> ffffffff8139cd67:       e8 44 d8 6e 00          call   
>> ffffffff81a8a5b0 <_find_first_zero_bit>
>> kernel/sched/sched.h:3771
>> ffffffff8139cd6c:       39 e8                   cmp    %ebp,%eax
>> ffffffff8139cd6e:       73 7c                   jae    
>> ffffffff8139cdec <mm_get_cid+0xac>
>> ffffffff8139cd70:       89 c1                   mov    %eax,%ecx
>> kernel/sched/sched.h:3773 (discriminator 1)
>> ffffffff8139cd72:       89 c2                   mov    %eax,%edx
>> include/linux/cpumask.h:1020
>> ffffffff8139cd74:       8b 05 66 d7 40 02       mov    
>> 0x240d766(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
>> ffffffff8139cd7a:       83 c0 3f                add    $0x3f,%eax
>> ffffffff8139cd7d:       c1 e8 03                shr    $0x3,%eax
>> include/linux/mm_types.h:1479 (discriminator 1)
>> ffffffff8139cd80:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
>> include/linux/mm_types.h:1489 (discriminator 1)
>> ffffffff8139cd85:       48 8d 04 43             lea    (%rbx,%rax,2),%rax
>> arch/x86/include/asm/bitops.h:136
>> ffffffff8139cd89:       f0 48 0f ab 10          lock bts %rdx,(%rax)
>> kernel/sched/sched.h:3773 (discriminator 2)
>> ffffffff8139cd8e:       73 4b                   jae    
>> ffffffff8139cddb <mm_get_cid+0x9b>
>> ffffffff8139cd90:       eb 5a                   jmp    
>> ffffffff8139cdec <mm_get_cid+0xac>
>> arch/x86/include/asm/vdso/processor.h:13
>> ffffffff8139cd92:       f3 90                   pause
>> include/linux/cpumask.h:1020
>> ffffffff8139cd94:       8b 05 46 d7 40 02       mov    
>> 0x240d746(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
> 
> The CPU1 was caught by the NMI here ^^^^^^^^^^^^^^^^^^^^.
> 
> 
> 
> 
>> In the meantime, me and Michal K. did some digging into qemu dumps. 
>> Details at (and a couple previous comments):
>> https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
>>
>> tl;dr:
>>
>> In one of the dumps, one process sits in
>>    context_switch
>>      -> mm_get_cid (before switch_to())
>>
>>  > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee 
>> (ffffffff820f162e) -> call mm_get_cid
>>
>> Michal extracted the vCPU's RIP and it turned out:
>>  > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a 
>> free CID.
>>  > ...
>>  > ffff8a88458137c0:  000000000000000f 000000000000000f
>>  >                                                    ^
>>  > Hm, so indeed CIDs for all four CPUs are occupied.
>>
>> To me (I don't know what CID is either), this might point as a 
>> possible culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].
>>
>> Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on 
>> weakly ordered systems") spells:
>>  >     As a consequence the task will
>>  >     not drop the CID when scheduling out before the fixup is 
>> completed, which
>>  >     means the CID space can be exhausted and the next task 
>> scheduling in will
>>  >     loop in mm_get_cid() and the fixup thread can livelock on the 
>> held runqueue
>>  >     lock as above.
>>
>> Which sounds like what exactly happens here. Except the patch is from 
>> the series above, so is already in 6.19 obviously.
>>
>>
>> I noticed there is also a 7.0-rc1 fix:
>>    1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
>> But that got into 6.19.1 already (we are at 6.19.3). So does not 
>> improve the situation.
>>
>> Any ideas?
>>
>>
>>
>> [1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/
>>
>> thanks,
> 

-- 
js
suse labs

next prev parent reply	other threads:[~2026-03-05 12:20 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13   ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02  5:28   ` Jiri Slaby
2026-03-02 11:46     ` Peter Zijlstra
2026-03-02 14:30       ` Waiman Long
2026-03-05  7:00       ` Jiri Slaby
2026-03-05 11:53         ` Jiri Slaby
2026-03-05 12:20           ` Jiri Slaby [this message]
2026-03-05 16:16             ` Thomas Gleixner
2026-03-05 17:33               ` Jiri Slaby
2026-03-05 19:25                 ` Thomas Gleixner
2026-03-06  5:48                   ` Jiri Slaby
2026-03-06  9:57                     ` Thomas Gleixner
2026-03-06 10:16                       ` Jiri Slaby
2026-03-06 16:28                         ` Thomas Gleixner
2026-03-06 11:06                       ` Matthieu Baerts
2026-03-06 16:57                         ` Matthieu Baerts
2026-03-06 18:31                           ` Jiri Slaby
2026-03-06 18:44                             ` Matthieu Baerts
2026-03-06 21:40                           ` Matthieu Baerts
2026-03-06 15:24                       ` Peter Zijlstra
2026-03-07  9:01                         ` Thomas Gleixner
2026-03-07 22:29                           ` Thomas Gleixner
2026-03-08  9:15                             ` Thomas Gleixner
2026-03-08 16:55                               ` Jiri Slaby
2026-03-08 16:58                               ` Thomas Gleixner
2026-03-08 17:23                                 ` Matthieu Baerts
2026-03-09  8:43                                   ` Thomas Gleixner
2026-03-09 12:23                                     ` Matthieu Baerts
2026-03-10  8:09                                       ` Thomas Gleixner
2026-03-10  8:20                                         ` Thomas Gleixner
2026-03-10  8:56                                         ` Jiri Slaby
2026-03-10  9:00                                           ` Jiri Slaby
2026-03-10 10:03                                             ` Thomas Gleixner
2026-03-10 10:06                                               ` Thomas Gleixner
2026-03-10 11:24                                                 ` Matthieu Baerts
2026-03-10 11:54                                                   ` Peter Zijlstra
2026-03-10 12:28                                                     ` Thomas Gleixner
2026-03-10 13:40                                                       ` Matthieu Baerts
2026-03-10 13:47                                                         ` Thomas Gleixner
2026-03-10 15:51                                                           ` Matthieu Baerts
2026-03-03 13:23   ` Matthieu Baerts
2026-03-05  6:46     ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba067933-bf3b-476d-a0bb-53eda56996ca@kernel.org \
    --to=jirislaby@kernel.org \
    --cc=MKoutny@suse.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=luto@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=stefanha@redhat.com \
    --cc=tglx@kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.