public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux.dev,
	Netdev <netdev@vger.kernel.org>,
	rcu@vger.kernel.org, MPTCP Linux <mptcp@lists.linux.dev>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@kernel.org>,
	Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Fri, 6 Feb 2026 18:13:44 +0100	[thread overview]
Message-ID: <d7724321-3e04-4fc3-be64-e19a6103de64@kernel.org> (raw)
In-Reply-To: <aYYWQHiu8j_Zlu3v@sgarzare-redhat>

Hi Stefano,

Thank you for your reply!

On 06/02/2026 17:38, Stefano Garzarella wrote:
> On Fri, Feb 06, 2026 at 12:54:13PM +0100, Matthieu Baerts wrote:
>> Hi Stefan, Stefano, + VM, RCU, sched people,
> 
> Hi Matt,
> 
>>
>> First, I'm sorry to cc a few MLs, but I'm still trying to locate the
>> origin of the issue I'm seeing.
>>
>> Our CI for the MPTCP subsystem is now regularly hitting various stalls
>> before even starting the MPTCP test suite. These issues are visible on
>> top of the latest net and net-next trees, which have been sync with
>> Linus' tree yesterday. All these issues have been seen on a "public CI"
>> using GitHub-hosted runners with KVM support, where the tested kernel is
>> launched in a nested (I suppose) VM. I can see the issue with or without
> 
> Just to be sure I'm on the same page, the issue is in the most nested
> guest, right? (the last VM started)

That's correct. From what I see [1], each GitHub-hosted runner is a new
VM, and I'm launching QEmu from there.

[1]
https://docs.github.com/en/actions/concepts/runners/github-hosted-runners

>> debug.config. According to the logs, it might have started around
>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>> and the CI doesn't currently have the ability to execute bisections.
>>
>> The stalls happen before starting the MPTCP test suite. The init program
>> creates a VSOCK listening socket via socat [1], and different hangs are
>> then visible: RCU stalls followed by a soft lockup [2], only a soft
>> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
>> there is no RCU stalls or soft lockups detected after one minute, but VM
>> is stalled [6]. In the last case, the VM is stopped after having
>> launched GDB to get more details about what was being executed.
>>
>> It feels like the issue is not directly caused by the VSOCK listening
>> socket, but the stalls always happen after having started the socat
>> command [1] in the background.
>>
>> One last thing: I thought my issue was linked to another one seen on XFS
>> side and reported by Shinichiro Kawasaki [7], but apparently not.
>> Indeed, Paul McKenney mentioned Shinichiro's issue is probably fixed by
>> Thomas Gleixner's series called "sched/mmcid: Cure mode transition woes"
>> [8]. I applied these patches from Peter Zijlstra's tree from
>> tip/sched/urgent [9], and my issue is still present.
>>
>> Any idea what could cause that, where to look at, or what could help to
>> find the root cause?
> 
> Mmm, nothing comes to mind at the vsock side :-(

That's OK, thank you for having checked! I hope someone else in CC can
help me finding the root cause!

> I understand that bisection can't be done in the CI env, but can you
> confirm in some way that 6.18 is working right with the same userspace?

Yes, I can confirm that. We run the tests on both the dev ("export") and
fixes ("export-net") branches, but also on stable versions:

  https://ci-results.mptcp.dev/flakes.html

(The "critical issues" have their headers red)

We don't see such issues in v6.18 and old kernels.

> That could help to try to identify at least if there is anything in
> AF_VSOCK we merged recently that can trigger that.

Our dev branch is on top of net-next, I guess I would have seen issues
directly related to AF_VSOCK earlier than after the net-next freeze in
January. Here, it looks like the first issues came during Linus' merge
window from the beginning of December, e.g. [2] is from the 4th of
December, on top of 'net' which was at commit 8f7aa3d3c732 ("Merge tag
'net-next-6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") from
Linus tree.

[2]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19919313666/job/57104626001#step:7:5052

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  reply	other threads:[~2026-02-06 17:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13   ` Matthieu Baerts [this message]
2026-02-26 10:37 ` Jiri Slaby
2026-03-02  5:28   ` Jiri Slaby
2026-03-02 11:46     ` Peter Zijlstra
2026-03-02 14:30       ` Waiman Long
2026-03-05  7:00       ` Jiri Slaby
2026-03-05 11:53         ` Jiri Slaby
2026-03-05 12:20           ` Jiri Slaby
2026-03-05 16:16             ` Thomas Gleixner
2026-03-05 17:33               ` Jiri Slaby
2026-03-05 19:25                 ` Thomas Gleixner
2026-03-06  5:48                   ` Jiri Slaby
2026-03-06  9:57                     ` Thomas Gleixner
2026-03-06 10:16                       ` Jiri Slaby
2026-03-06 16:28                         ` Thomas Gleixner
2026-03-06 11:06                       ` Matthieu Baerts
2026-03-06 16:57                         ` Matthieu Baerts
2026-03-06 18:31                           ` Jiri Slaby
2026-03-06 18:44                             ` Matthieu Baerts
2026-03-06 21:40                           ` Matthieu Baerts
2026-03-06 15:24                       ` Peter Zijlstra
2026-03-07  9:01                         ` Thomas Gleixner
2026-03-07 22:29                           ` Thomas Gleixner
2026-03-08  9:15                             ` Thomas Gleixner
2026-03-08 16:55                               ` Jiri Slaby
2026-03-08 16:58                               ` Thomas Gleixner
2026-03-08 17:23                                 ` Matthieu Baerts
2026-03-09  8:43                                   ` Thomas Gleixner
2026-03-09 12:23                                     ` Matthieu Baerts
2026-03-10  8:09                                       ` Thomas Gleixner
2026-03-10  8:20                                         ` Thomas Gleixner
2026-03-10  8:56                                         ` Jiri Slaby
2026-03-10  9:00                                           ` Jiri Slaby
2026-03-10 10:03                                             ` Thomas Gleixner
2026-03-10 10:06                                               ` Thomas Gleixner
2026-03-10 11:24                                                 ` Matthieu Baerts
2026-03-10 11:54                                                   ` Peter Zijlstra
2026-03-10 12:28                                                     ` Thomas Gleixner
2026-03-10 13:40                                                       ` Matthieu Baerts
2026-03-10 13:47                                                         ` Thomas Gleixner
2026-03-10 15:51                                                           ` Matthieu Baerts
2026-03-03 13:23   ` Matthieu Baerts
2026-03-05  6:46     ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7724321-3e04-4fc3-be64-e19a6103de64@kernel.org \
    --to=matttbe@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=stefanha@redhat.com \
    --cc=tglx@kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox