public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
From: Jiri Slaby <jirislaby@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@kernel.org>
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux.dev,
	Netdev <netdev@vger.kernel.org>,
	rcu@vger.kernel.org, "MPTCP Linux" <mptcp@lists.linux.dev>,
	"Linux Kernel" <linux-kernel@vger.kernel.org>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"Michal Koutný" <MKoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Thu, 5 Mar 2026 08:00:15 +0100	[thread overview]
Message-ID: <717310d8-6274-4b7f-8a19-561c45f5f565@kernel.org> (raw)
In-Reply-To: <20260302114636.GL606826@noisy.programming.kicks-ass.net>

On 02. 03. 26, 12:46, Peter Zijlstra wrote:
> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
> 
>> The state of the lock:
>>
>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>    __lock = {
>>      raw_lock = {
>>        {
>>          val = {
>>            counter = 0x40003
>>          },
>>          {
>>            locked = 0x3,
>>            pending = 0x0
>>          },
>>          {
>>            locked_pending = 0x3,
>>            tail = 0x4
>>          }
>>        }
>>      }
>>    },
>>
> 
> 
> That had me remember the below patch that never quite made it. I've
> rebased it to something more recent so it applies.
> 
> If you stick that in, we might get a clue as to who is owning that lock.
> Provided it all wants to reproduce well enough.

Thanks, I applied it, but to date it is still not accepted yet:
https://build.opensuse.org/requests/1335893


In the meantime, me and Michal K. did some digging into qemu dumps. 
Details at (and a couple previous comments):
https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17

tl;dr:

In one of the dumps, one process sits in
   context_switch
     -> mm_get_cid (before switch_to())

 > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee 
(ffffffff820f162e) -> call mm_get_cid

Michal extracted the vCPU's RIP and it turned out:
 > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a 
free CID.
 > ...
 > ffff8a88458137c0:  000000000000000f 000000000000000f
 >                                                    ^
 > Hm, so indeed CIDs for all four CPUs are occupied.

To me (I don't know what CID is either), this might point as a possible 
culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].

Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on weakly 
ordered systems") spells:
 >     As a consequence the task will
 >     not drop the CID when scheduling out before the fixup is 
completed, which
 >     means the CID space can be exhausted and the next task scheduling 
in will
 >     loop in mm_get_cid() and the fixup thread can livelock on the 
held runqueue
 >     lock as above.

Which sounds like what exactly happens here. Except the patch is from 
the series above, so is already in 6.19 obviously.


I noticed there is also a 7.0-rc1 fix:
   1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
But that got into 6.19.1 already (we are at 6.19.3). So does not improve 
the situation.

Any ideas?



[1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/

thanks,
-- 
js
suse labs

  parent reply	other threads:[~2026-03-05  7:00 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13   ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02  5:28   ` Jiri Slaby
2026-03-02 11:46     ` Peter Zijlstra
2026-03-02 14:30       ` Waiman Long
2026-03-05  7:00       ` Jiri Slaby [this message]
2026-03-05 11:53         ` Jiri Slaby
2026-03-05 12:20           ` Jiri Slaby
2026-03-05 16:16             ` Thomas Gleixner
2026-03-05 17:33               ` Jiri Slaby
2026-03-05 19:25                 ` Thomas Gleixner
2026-03-06  5:48                   ` Jiri Slaby
2026-03-06  9:57                     ` Thomas Gleixner
2026-03-06 10:16                       ` Jiri Slaby
2026-03-06 16:28                         ` Thomas Gleixner
2026-03-06 11:06                       ` Matthieu Baerts
2026-03-06 16:57                         ` Matthieu Baerts
2026-03-06 18:31                           ` Jiri Slaby
2026-03-06 18:44                             ` Matthieu Baerts
2026-03-06 21:40                           ` Matthieu Baerts
2026-03-06 15:24                       ` Peter Zijlstra
2026-03-07  9:01                         ` Thomas Gleixner
2026-03-07 22:29                           ` Thomas Gleixner
2026-03-08  9:15                             ` Thomas Gleixner
2026-03-08 16:55                               ` Jiri Slaby
2026-03-08 16:58                               ` Thomas Gleixner
2026-03-08 17:23                                 ` Matthieu Baerts
2026-03-09  8:43                                   ` Thomas Gleixner
2026-03-09 12:23                                     ` Matthieu Baerts
2026-03-10  8:09                                       ` Thomas Gleixner
2026-03-10  8:20                                         ` Thomas Gleixner
2026-03-10  8:56                                         ` Jiri Slaby
2026-03-10  9:00                                           ` Jiri Slaby
2026-03-10 10:03                                             ` Thomas Gleixner
2026-03-10 10:06                                               ` Thomas Gleixner
2026-03-10 11:24                                                 ` Matthieu Baerts
2026-03-10 11:54                                                   ` Peter Zijlstra
2026-03-10 12:28                                                     ` Thomas Gleixner
2026-03-10 13:40                                                       ` Matthieu Baerts
2026-03-10 13:47                                                         ` Thomas Gleixner
2026-03-10 15:51                                                           ` Matthieu Baerts
2026-03-03 13:23   ` Matthieu Baerts
2026-03-05  6:46     ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=717310d8-6274-4b7f-8a19-561c45f5f565@kernel.org \
    --to=jirislaby@kernel.org \
    --cc=MKoutny@suse.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=luto@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=stefanha@redhat.com \
    --cc=tglx@kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox