From: Matthieu Baerts <matttbe@kernel.org>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
Netdev <netdev@vger.kernel.org>,
rcu@vger.kernel.org, MPTCP Linux <mptcp@lists.linux.dev>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@kernel.org>,
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Fri, 6 Feb 2026 18:13:44 +0100 [thread overview]
Message-ID: <d7724321-3e04-4fc3-be64-e19a6103de64@kernel.org> (raw)
In-Reply-To: <aYYWQHiu8j_Zlu3v@sgarzare-redhat>
Hi Stefano,
Thank you for your reply!
On 06/02/2026 17:38, Stefano Garzarella wrote:
> On Fri, Feb 06, 2026 at 12:54:13PM +0100, Matthieu Baerts wrote:
>> Hi Stefan, Stefano, + VM, RCU, sched people,
>
> Hi Matt,
>
>>
>> First, I'm sorry to cc a few MLs, but I'm still trying to locate the
>> origin of the issue I'm seeing.
>>
>> Our CI for the MPTCP subsystem is now regularly hitting various stalls
>> before even starting the MPTCP test suite. These issues are visible on
>> top of the latest net and net-next trees, which have been sync with
>> Linus' tree yesterday. All these issues have been seen on a "public CI"
>> using GitHub-hosted runners with KVM support, where the tested kernel is
>> launched in a nested (I suppose) VM. I can see the issue with or without
>
> Just to be sure I'm on the same page, the issue is in the most nested
> guest, right? (the last VM started)
That's correct. From what I see [1], each GitHub-hosted runner is a new
VM, and I'm launching QEmu from there.
[1]
https://docs.github.com/en/actions/concepts/runners/github-hosted-runners
>> debug.config. According to the logs, it might have started around
>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>> and the CI doesn't currently have the ability to execute bisections.
>>
>> The stalls happen before starting the MPTCP test suite. The init program
>> creates a VSOCK listening socket via socat [1], and different hangs are
>> then visible: RCU stalls followed by a soft lockup [2], only a soft
>> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
>> there is no RCU stalls or soft lockups detected after one minute, but VM
>> is stalled [6]. In the last case, the VM is stopped after having
>> launched GDB to get more details about what was being executed.
>>
>> It feels like the issue is not directly caused by the VSOCK listening
>> socket, but the stalls always happen after having started the socat
>> command [1] in the background.
>>
>> One last thing: I thought my issue was linked to another one seen on XFS
>> side and reported by Shinichiro Kawasaki [7], but apparently not.
>> Indeed, Paul McKenney mentioned Shinichiro's issue is probably fixed by
>> Thomas Gleixner's series called "sched/mmcid: Cure mode transition woes"
>> [8]. I applied these patches from Peter Zijlstra's tree from
>> tip/sched/urgent [9], and my issue is still present.
>>
>> Any idea what could cause that, where to look at, or what could help to
>> find the root cause?
>
> Mmm, nothing comes to mind at the vsock side :-(
That's OK, thank you for having checked! I hope someone else in CC can
help me finding the root cause!
> I understand that bisection can't be done in the CI env, but can you
> confirm in some way that 6.18 is working right with the same userspace?
Yes, I can confirm that. We run the tests on both the dev ("export") and
fixes ("export-net") branches, but also on stable versions:
https://ci-results.mptcp.dev/flakes.html
(The "critical issues" have their headers red)
We don't see such issues in v6.18 and old kernels.
> That could help to try to identify at least if there is anything in
> AF_VSOCK we merged recently that can trigger that.
Our dev branch is on top of net-next, I guess I would have seen issues
directly related to AF_VSOCK earlier than after the net-next freeze in
January. Here, it looks like the first issues came during Linus' merge
window from the beginning of December, e.g. [2] is from the 4th of
December, on top of 'net' which was at commit 8f7aa3d3c732 ("Merge tag
'net-next-6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") from
Linus tree.
[2]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19919313666/job/57104626001#step:7:5052
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2026-02-06 17:13 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13 ` Matthieu Baerts [this message]
2026-02-26 10:37 ` Jiri Slaby
2026-03-02 5:28 ` Jiri Slaby
2026-03-02 11:46 ` Peter Zijlstra
2026-03-02 14:30 ` Waiman Long
2026-03-05 7:00 ` Jiri Slaby
2026-03-05 11:53 ` Jiri Slaby
2026-03-05 12:20 ` Jiri Slaby
2026-03-05 16:16 ` Thomas Gleixner
2026-03-05 17:33 ` Jiri Slaby
2026-03-05 19:25 ` Thomas Gleixner
2026-03-06 5:48 ` Jiri Slaby
2026-03-06 9:57 ` Thomas Gleixner
2026-03-06 10:16 ` Jiri Slaby
2026-03-06 16:28 ` Thomas Gleixner
2026-03-06 11:06 ` Matthieu Baerts
2026-03-06 16:57 ` Matthieu Baerts
2026-03-06 18:31 ` Jiri Slaby
2026-03-06 18:44 ` Matthieu Baerts
2026-03-06 21:40 ` Matthieu Baerts
2026-03-06 15:24 ` Peter Zijlstra
2026-03-07 9:01 ` Thomas Gleixner
2026-03-07 22:29 ` Thomas Gleixner
2026-03-08 9:15 ` Thomas Gleixner
2026-03-08 16:55 ` Jiri Slaby
2026-03-08 16:58 ` Thomas Gleixner
2026-03-08 17:23 ` Matthieu Baerts
2026-03-09 8:43 ` Thomas Gleixner
2026-03-09 12:23 ` Matthieu Baerts
2026-03-10 8:09 ` Thomas Gleixner
2026-03-10 8:20 ` Thomas Gleixner
2026-03-10 8:56 ` Jiri Slaby
2026-03-10 9:00 ` Jiri Slaby
2026-03-10 10:03 ` Thomas Gleixner
2026-03-10 10:06 ` Thomas Gleixner
2026-03-10 11:24 ` Matthieu Baerts
2026-03-10 11:54 ` Peter Zijlstra
2026-03-10 12:28 ` Thomas Gleixner
2026-03-10 13:40 ` Matthieu Baerts
2026-03-10 13:47 ` Thomas Gleixner
2026-03-10 15:51 ` Matthieu Baerts
2026-03-03 13:23 ` Matthieu Baerts
2026-03-05 6:46 ` Jiri Slaby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d7724321-3e04-4fc3-be64-e19a6103de64@kernel.org \
--to=matttbe@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=sgarzare@redhat.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=stefanha@redhat.com \
--cc=tglx@kernel.org \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.