public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Matthieu Baerts <matttbe@kernel.org>
Cc: "Thomas Gleixner" <tglx@kernel.org>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux.dev,
	Netdev <netdev@vger.kernel.org>,
	rcu@vger.kernel.org, "MPTCP Linux" <mptcp@lists.linux.dev>,
	"Linux Kernel" <linux-kernel@vger.kernel.org>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	luto@kernel.org, "Michal Koutný" <MKoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>,
	"Marco Elver" <elver@google.com>
Subject: Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
Date: Tue, 10 Mar 2026 12:54:33 +0100	[thread overview]
Message-ID: <20260310115433.GV1282955@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <4cad1b9a-e157-427b-9896-cf54cf6fba36@kernel.org>

On Tue, Mar 10, 2026 at 12:24:02PM +0100, Matthieu Baerts wrote:

> Just did. Output is available there:
> 
>   https://github.com/user-attachments/files/25867817/issue-617-debug-20260310.txt.gz
> 
> Only 7.7k lines this time.

Same damn thing again...

[    2.533811] virtme-n-1         3d..1. 849756us : mmcid_user_add: pid=1 users=1 mm=000000002b3f8459
[    4.523998] virtme-n-1         3d..1. 1115085us : mmcid_user_add: pid=71 users=2 mm=000000002b3f8459
[    4.529065] virtme-n-1         3d..1. 1115937us : mmcid_user_add: pid=72 users=3 mm=000000002b3f8459

[    4.529448] virtme-n-71        2d..1. 1115969us : mmcid_user_add: pid=73 users=4 mm=000000002b3f8459         <=== missing!
[    4.529946] virtme-n-71        2d..1. 1115971us : mmcid_getcid: mm=000000002b3f8459 cid=00000003

71 spawns 73, assigns cid 3

[    4.530573]   <idle>-0         1d..2. 1115991us : sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=73 next_prio=120
[    4.530865]   <idle>-0         1d..2. 1115993us : mmcid_cpu_update: cpu=1 cid=00000003 mm=000000002b3f8459

It gets scheduled on CPU-1, sets CID...

[    4.531038] virtme-n-1         3d..1. 1116013us : mmcid_user_add: pid=74 users=5 mm=000000002b3f8459

Then 1 spawns 74 on CPU 3, this is the 5th task, so we initiate a
task->cpu cid transition:

[    4.531203] virtme-n-1         3d..1. 1116014us : mmcid_task_update: pid=1 cid=20000000 mm=000000002b3f8459
[    4.531369] virtme-n-1         3d..1. 1116014us : mmcid_cpu_update: cpu=3 cid=20000000 mm=000000002b3f8459

Task 1

[    4.531530] virtme-n-1         3..... 1116014us : mmcid_fixup_task: pid=71 cid=00000001 active=1 users=4 mm=000000002b3f8459
[    4.531790] virtme-n-1         3d..2. 1116015us : mmcid_task_update: pid=71 cid=80000000 mm=000000002b3f8459
[    4.532000] virtme-n-1         3d..2. 1116015us : mmcid_putcid: mm=000000002b3f8459 cid=00000001

Task 71

[    4.532169] virtme-n-1         3..... 1116015us : mmcid_fixup_task: pid=72 cid=00000002 active=1 users=3 mm=000000002b3f8459
[    4.532362] virtme-n-1         3d..2. 1116016us : mmcid_task_update: pid=72 cid=20000002 mm=000000002b3f8459
[    4.532514] virtme-n-1         3d..2. 1116016us : mmcid_cpu_update: cpu=0 cid=20000002 mm=000000002b3f8459

Task 72

[    4.532649] virtme-n-1         3..... 1116016us : mmcid_fixup_task: pid=74 cid=80000000 active=1 users=2 mm=000000002b3f8459

Task 74, note the glaring lack of 73!!! which all this time is running
on CPU 1. Per the fact that it got scheduled it must be on tasklist,
per the fact that 1 spawns 74 after it on CPU3, we must observe any
prior tasklist changes and per the fact that it got a cid ->active must
be set. WTF!

That said, we set active after tasklist_lock now, so it might be
possible we simply miss that store, observe the 'old' 0 and skip over
it?

Let me stare hard at that...


[    4.532912] virtme-n-1         3..... 1116017us : mmcid_fixup_task: pid=71 cid=80000000 active=1 users=1 mm=000000002b3f8459
[    4.533386] virtme-n-1         3d..2. 1116041us : mmcid_cpu_update: cpu=3 cid=40000000 mm=000000002b3f8459

I *think* this is the for_each_process_thread() hitting 71 again.

[    4.533805]   <idle>-0         2d..2. 1116043us : mmcid_getcid: mm=000000002b3f8459 cid=00000001
[    4.533980]   <idle>-0         2d..2. 1116044us : mmcid_cpu_update: cpu=2 cid=40000001 mm=000000002b3f8459
[    4.534156]   <idle>-0         2d..2. 1116044us : mmcid_task_update: pid=74 cid=40000001 mm=000000002b3f8459
[    4.534579] virtme-n-72        0d..2. 1116046us : mmcid_cpu_update: cpu=0 cid=40000002 mm=000000002b3f8459

[    4.535803] virtme-n-73        1d..2. 1116179us : sched_switch: prev_comm=virtme-ng-init prev_pid=73 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

And then after all that, 73 blocks.. not having been marked TRANSIT or
anything and thus holding on to the CID, leading to all this trouble.




  reply	other threads:[~2026-03-10 11:54 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13   ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02  5:28   ` Jiri Slaby
2026-03-02 11:46     ` Peter Zijlstra
2026-03-02 14:30       ` Waiman Long
2026-03-05  7:00       ` Jiri Slaby
2026-03-05 11:53         ` Jiri Slaby
2026-03-05 12:20           ` Jiri Slaby
2026-03-05 16:16             ` Thomas Gleixner
2026-03-05 17:33               ` Jiri Slaby
2026-03-05 19:25                 ` Thomas Gleixner
2026-03-06  5:48                   ` Jiri Slaby
2026-03-06  9:57                     ` Thomas Gleixner
2026-03-06 10:16                       ` Jiri Slaby
2026-03-06 16:28                         ` Thomas Gleixner
2026-03-06 11:06                       ` Matthieu Baerts
2026-03-06 16:57                         ` Matthieu Baerts
2026-03-06 18:31                           ` Jiri Slaby
2026-03-06 18:44                             ` Matthieu Baerts
2026-03-06 21:40                           ` Matthieu Baerts
2026-03-06 15:24                       ` Peter Zijlstra
2026-03-07  9:01                         ` Thomas Gleixner
2026-03-07 22:29                           ` Thomas Gleixner
2026-03-08  9:15                             ` Thomas Gleixner
2026-03-08 16:55                               ` Jiri Slaby
2026-03-08 16:58                               ` Thomas Gleixner
2026-03-08 17:23                                 ` Matthieu Baerts
2026-03-09  8:43                                   ` Thomas Gleixner
2026-03-09 12:23                                     ` Matthieu Baerts
2026-03-10  8:09                                       ` Thomas Gleixner
2026-03-10  8:20                                         ` Thomas Gleixner
2026-03-10  8:56                                         ` Jiri Slaby
2026-03-10  9:00                                           ` Jiri Slaby
2026-03-10 10:03                                             ` Thomas Gleixner
2026-03-10 10:06                                               ` Thomas Gleixner
2026-03-10 11:24                                                 ` Matthieu Baerts
2026-03-10 11:54                                                   ` Peter Zijlstra [this message]
2026-03-10 12:28                                                     ` Thomas Gleixner
2026-03-10 13:40                                                       ` Matthieu Baerts
2026-03-10 13:47                                                         ` Thomas Gleixner
2026-03-10 15:51                                                           ` Matthieu Baerts
2026-03-03 13:23   ` Matthieu Baerts
2026-03-05  6:46     ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310115433.GV1282955@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=MKoutny@suse.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=elver@google.com \
    --cc=jirislaby@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=luto@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=stefanha@redhat.com \
    --cc=tglx@kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox