From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFC06336896; Thu, 5 Mar 2026 06:46:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772693175; cv=none; b=RPUeDA15eJnUPtmA+OeTRzaP5VDChpYdFk4JkenbiTdTU0K3zPwinSygEoqZrmmirDaBzj/bAhX4J8cq5AM1Shlw3Yht0BF1zg7fEABdYq5zUBQ8Bya6kIFx6OwQMAyFDcv1Znoo2c3KF/oYlUQivH7i+TMxO0OYMhqHjRNbPkY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772693175; c=relaxed/simple; bh=3u19y1LrpGKmC+svD4K+yeZYEeMBLA1yE5PsHeNPx7U=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=EnGg7AgDvY1/tyQAhMDgildwMq1weU0zQJWWj+DII2Z3li6GWxdmkhRAynGOBrwjL6+6m7pbZAKA3gLvi/4OUrv0t/evKfPHm2iuPsn8LuXHpp9BAd1Gtkcvwb54Dkof3lT4dYDapPSQJX4dXhKRwoeGgY3pAC4W67/nntaUhFw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XAM7LS/W; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XAM7LS/W" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FB2AC116C6; Thu, 5 Mar 2026 06:46:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772693174; bh=3u19y1LrpGKmC+svD4K+yeZYEeMBLA1yE5PsHeNPx7U=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=XAM7LS/WUlquMoixMbwR0XTSL6kZMX6LZHlc5LDWZ9mQCwPxr+nANg3eJAQtw+awH z4iY+YSxW6OHZLZ8UMn3/8PPfPF4/JChTB5ximymU3jWo3QYuivWRJNIOU9aelrDNd pspCSSnpEfdv5s3QIYJmDjHpHeAEnvbsXub3pkorkUEEiCa+/crWxbQwnfuqpKdqQ+ 8fqsAV1giPwUTG5bfkxfCLHDJ17w6Shprq/Wamxi8kQqR1+kdDoG2MkgjVVr54QImS 0n5/h8YFUp4AzuM9fJrKu8YHM0bK/vPrE9Nw8oZe6fcv1mly+bHLX0cfOjTIHqF6+v a2xDHkeQIH2cA== Message-ID: <0d7cc7d0-defc-40eb-8c0d-78888ff273bc@kernel.org> Date: Thu, 5 Mar 2026 07:46:09 +0100 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout To: Matthieu Baerts Cc: kvm@vger.kernel.org, virtualization@lists.linux.dev, Netdev , rcu@vger.kernel.org, MPTCP Linux , Linux Kernel , Peter Zijlstra , Thomas Gleixner , Shinichiro Kawasaki , "Paul E. McKenney" , Dave Hansen , Stefan Hajnoczi , "luto@kernel.org" , Stefano Garzarella References: <7f3e74d7-67dc-48d7-99d2-0b87f671651b@kernel.org> <29d38308-e8ae-42aa-8eeb-1c3b347c284b@kernel.org> Content-Language: en-US From: Jiri Slaby Autocrypt: addr=jirislaby@kernel.org; keydata= xsFNBE6S54YBEACzzjLwDUbU5elY4GTg/NdotjA0jyyJtYI86wdKraekbNE0bC4zV+ryvH4j rrcDwGs6tFVrAHvdHeIdI07s1iIx5R/ndcHwt4fvI8CL5PzPmn5J+h0WERR5rFprRh6axhOk rSD5CwQl19fm4AJCS6A9GJtOoiLpWn2/IbogPc71jQVrupZYYx51rAaHZ0D2KYK/uhfc6neJ i0WqPlbtIlIrpvWxckucNu6ZwXjFY0f3qIRg3Vqh5QxPkojGsq9tXVFVLEkSVz6FoqCHrUTx wr+aw6qqQVgvT/McQtsI0S66uIkQjzPUrgAEtWUv76rM4ekqL9stHyvTGw0Fjsualwb0Gwdx ReTZzMgheAyoy/umIOKrSEpWouVoBt5FFSZUyjuDdlPPYyPav+hpI6ggmCTld3u2hyiHji2H cDpcLM2LMhlHBipu80s9anNeZhCANDhbC5E+NZmuwgzHBcan8WC7xsPXPaiZSIm7TKaVoOcL 9tE5aN3jQmIlrT7ZUX52Ff/hSdx/JKDP3YMNtt4B0cH6ejIjtqTd+Ge8sSttsnNM0CQUkXps w98jwz+Lxw/bKMr3NSnnFpUZaxwji3BC9vYyxKMAwNelBCHEgS/OAa3EJoTfuYOK6wT6nadm YqYjwYbZE5V/SwzMbpWu7Jwlvuwyfo5mh7w5iMfnZE+vHFwp/wARAQABzSFKaXJpIFNsYWJ5 IDxqaXJpc2xhYnlAa2VybmVsLm9yZz7CwXcEEwEIACEFAlW3RUwCGwMFCwkIBwIGFQgJCgsC BBYCAwECHgECF4AACgkQvSWxBAa0cEnVTg//TQpdIAr8Tn0VAeUjdVIH9XCFw+cPSU+zMSCH eCZoA/N6gitEcnvHoFVVM7b3hK2HgoFUNbmYC0RdcSc80pOF5gCnACSP9XWHGWzeKCARRcQR 4s5YD8I4VV5hqXcKo2DFAtIOVbHDW+0okOzcecdasCakUTr7s2fXz97uuoc2gIBB7bmHUGAH XQXHvdnCLjDjR+eJN+zrtbqZKYSfj89s/ZHn5Slug6w8qOPT1sVNGG+eWPlc5s7XYhT9z66E l5C0rG35JE4PhC+tl7BaE5IwjJlBMHf/cMJxNHAYoQ1hWQCKOfMDQ6bsEr++kGUCbHkrEFwD UVA72iLnnnlZCMevwE4hc0zVhseWhPc/KMYObU1sDGqaCesRLkE3tiE7X2cikmj/qH0CoMWe gjnwnQ2qVJcaPSzJ4QITvchEQ+tbuVAyvn9H+9MkdT7b7b2OaqYsUP8rn/2k1Td5zknUz7iF oJ0Z9wPTl6tDfF8phaMIPISYrhceVOIoL+rWfaikhBulZTIT5ihieY9nQOw6vhOfWkYvv0Dl o4GRnb2ybPQpfEs7WtetOsUgiUbfljTgILFw3CsPW8JESOGQc0Pv8ieznIighqPPFz9g+zSu Ss/rpcsqag5n9rQp/H3WW5zKUpeYcKGaPDp/vSUovMcjp8USIhzBBrmI7UWAtuedG9prjqfO wU0ETpLnhgEQAM+cDWLL+Wvc9cLhA2OXZ/gMmu7NbYKjfth1UyOuBd5emIO+d4RfFM02XFTI t4MxwhAryhsKQQcA4iQNldkbyeviYrPKWjLTjRXT5cD2lpWzr+Jx7mX7InV5JOz1Qq+P+nJW YIBjUKhI03ux89p58CYil24Zpyn2F5cX7U+inY8lJIBwLPBnc9Z0An/DVnUOD+0wIcYVnZAK DiIXODkGqTg3fhZwbbi+KAhtHPFM2fGw2VTUf62IHzV+eBSnamzPOBc1XsJYKRo3FHNeLuS8 f4wUe7bWb9O66PPFK/RkeqNX6akkFBf9VfrZ1rTEKAyJ2uqf1EI1olYnENk4+00IBa+BavGQ 8UW9dGW3nbPrfuOV5UUvbnsSQwj67pSdrBQqilr5N/5H9z7VCDQ0dhuJNtvDSlTf2iUFBqgk 3smln31PUYiVPrMP0V4ja0i9qtO/TB01rTfTyXTRtqz53qO5dGsYiliJO5aUmh8swVpotgK4 /57h3zGsaXO9PGgnnAdqeKVITaFTLY1ISg+Ptb4KoliiOjrBMmQUSJVtkUXMrCMCeuPDGHo7 39Xc75lcHlGuM3yEB//htKjyprbLeLf1y4xPyTeeF5zg/0ztRZNKZicgEmxyUNBHHnBKHQxz 1j+mzH0HjZZtXjGu2KLJ18G07q0fpz2ZPk2D53Ww39VNI/J9ABEBAAHCwV8EGAECAAkFAk6S 54YCGwwACgkQvSWxBAa0cEk3tRAAgO+DFpbyIa4RlnfpcW17AfnpZi9VR5+zr496n2jH/1ld wRO/S+QNSA8qdABqMb9WI4BNaoANgcg0AS429Mq0taaWKkAjkkGAT7mD1Q5PiLr06Y/+Kzdr 90eUVneqM2TUQQbK+Kh7JwmGVrRGNqQrDk+gRNvKnGwFNeTkTKtJ0P8jYd7P1gZb9Fwj9YLx jhn/sVIhNmEBLBoI7PL+9fbILqJPHgAwW35rpnq4f/EYTykbk1sa13Tav6btJ+4QOgbcezWI wZ5w/JVfEJW9JXp3BFAVzRQ5nVrrLDAJZ8Y5ioWcm99JtSIIxXxt9FJaGc1Bgsi5K/+dyTKL wLMJgiBzbVx8G+fCJJ9YtlNOPWhbKPlrQ8+AY52Aagi9WNhe6XfJdh5g6ptiOILm330mkR4g W6nEgZVyIyTq3ekOuruftWL99qpP5zi+eNrMmLRQx9iecDNgFr342R9bTDlb1TLuRb+/tJ98 f/bIWIr0cqQmqQ33FgRhrG1+Xml6UXyJ2jExmlO8JljuOGeXYh6ZkIEyzqzffzBLXZCujlYQ DFXpyMNVJ2ZwPmX2mWEoYuaBU0JN7wM+/zWgOf2zRwhEuD3A2cO2PxoiIfyUEfB9SSmffaK/ S4xXoB6wvGENZ85Hg37C7WDNdaAt6Xh2uQIly5grkgvWppkNy4ZHxE+jeNsU7tg= In-Reply-To: <29d38308-e8ae-42aa-8eeb-1c3b347c284b@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi, On 03. 03. 26, 14:23, Matthieu Baerts wrote: > On 26/02/2026 11:37, Jiri Slaby wrote: >> On 06. 02. 26, 12:54, Matthieu Baerts wrote: >>> Our CI for the MPTCP subsystem is now regularly hitting various stalls >>> before even starting the MPTCP test suite. These issues are visible on >>> top of the latest net and net-next trees, which have been sync with >>> Linus' tree yesterday. All these issues have been seen on a "public CI" >>> using GitHub-hosted runners with KVM support, where the tested kernel is >>> launched in a nested (I suppose) VM. I can see the issue with or without >>> debug.config. According to the logs, it might have started around >>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react >>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally, >>> and the CI doesn't currently have the ability to execute bisections. >> >> Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse) >> build service is stalling in smp_call_function_many_cond() randomly too: >> https://bugzilla.suse.com/show_bug.cgi?id=1258936 >> >> The attachment from there contains sysrq-t logs too: >> https://bugzilla.suse.com/attachment.cgi?id=888612 > > I'm glad I'm not the only one with this issue :) > > In your case, do you also have nested VMs with KVM support? No, it's KVM directly on bare metal. > Are you able to easily reproduce the issue and change the guest kernel > in your build service? Unfortunately no and no. > On my side, any debugging steps need to be automated. Lately, it looks > like the issue is more easily triggered on a stable 6.19 kernel, than on > the last RC. > >>> The stalls happen before starting the MPTCP test suite. The init program >>> creates a VSOCK listening socket via socat [1], and different hangs are >>> then visible: RCU stalls followed by a soft lockup [2], only a soft >>> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or >>> there is no RCU stalls or soft lockups detected after one minute, but VM >>> is stalled [6]. In the last case, the VM is stopped after having >>> launched GDB to get more details about what was being executed. >>> >>> It feels like the issue is not directly caused by the VSOCK listening >>> socket, but the stalls always happen after having started the socat >>> command [1] in the background. >> >> It fails randomly while building random packages (go, libreoffice, >> bayle, ...). I don't think it is VSOCK related in those cases, but who >> knows what the builds do... > > Indeed, unlikely to be VSOCK then. > >> I cannot reproduce locally either. >> >> I came across: >>   614da1d3d4cd x86: make page fault handling disable interrupts properly >> but I have no idea if it could have impact on this at all. > > Did it help to revert it? We haven't tried, it is unlikely the cause. -- js suse labs