All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ranguvar <ranguvar@ranguvar.io>,
	 Juri Lelli <juri.lelli@gmail.com>,
	 "regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	 "regressions@leemhuis.info" <regressions@leemhuis.info>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [REGRESSION][BISECTED] from bd9bbc96e835: cannot boot Win11 KVM guest
Date: Mon, 16 Dec 2024 08:50:45 -0800	[thread overview]
Message-ID: <Z2BaZSKtaAPGSCqb@google.com> (raw)
In-Reply-To: <gvam6amt25mlvpxlpcra2caesdfpr5a75cba3e4n373tzqld3k@ciutribtvmjj>

On Mon, Dec 16, 2024, Juri Lelli wrote:
> On 14/12/24 19:52, Peter Zijlstra wrote:
> > On Sat, Dec 14, 2024 at 06:32:57AM +0000, Ranguvar wrote:
> > > Hello, all,
> > > 
> > > Any assistance with proper format and process is appreciated as I am new
> > > to these lists.  After the commit bd9bbc96e835 "sched: Rework dl_server"
> > > I am no longer able to boot my Windows 11 23H2 guest using
> > > pinned/exclusive CPU cores and passing a PCIe graphics card.  This setup
> > > worked for me since at least 5.10, likely earlier, with minimal changes.
> > > 
> > > Most or all cores assigned to guest VM report 100% usage, and many tasks
> > > on the host hang indefinitely (10min+) until the guest is forcibly
> > > stopped.  This happens only once the Windows kernel begins loading - its
> > > spinner appears and freezes.
> > > 
> > > Still broken on 6.13-rc2, as well as 6.12.4 from Arch's repository.  When
> > > testing these, the failure is similar, but tasks on the host are slow to
> > > execute instead of stalling indefinitely, and hung tasks are not reported
> > > in dmesg. Only one guest core may show 100% utilization instead of many
> > > or all of them. This seems to be due to a separate regression which also
> > > impacts my usecase [0].  After patching it [1], I then find the same
> > > behavior as bd9bbc96e835, with hung tasks on host.
> > > 
> > > git bisect log: [2]
> > > dmesg from 6.11.0-rc1-1-git-00057-gbd9bbc96e835, with decoded hung task backtraces: [3]
> > > dmesg from arch 6.12.4: [4]
> > > dmesg from arch 6.12.4 patched for svm.c regression, has hung tasks, backtraces could not be decoded: [5]
> > > config for 6.11.0-rc1-1-git-00057-gbd9bbc96e835: [6]
> > > config for arch 6.12.4: [7]
> > > 
> > > If it helps, my host uses an AMD Ryzen 5950X CPU with latest UEFI and AMD
> > > WX 5100 (Polaris, GCN 4.0) PCIe graphics.  I use libvirt 10.10 and qemu
> > > 9.1.2, and I am passing three PCIe devices each from dedicated IOMMU
> > > groups: NVIDIA RTX 3090 graphics, a Renesas uPD720201 USB controller, and
> > > a Samsung 970 EVO NVMe disk.
> > > 
> > > I have in kernel cmdline `iommu=pt isolcpus=1-7,17-23 rcu_nocbs=1-7,17-23
> > > nohz_full=1-7,17-23`.  Removing iommu=pt does not produce a change, and
> > > dropping the core isolation freezes the host on VM startup.

As in, dropping all of isolcpus, rcu_nocbs, and nohz_full?  Or just dropping
isolcpus?

> > > Enabling/disabling kvm_amd.nested or kvm.enable_virt_at_load did not
> > > produce a change.
> > > 
> > > Thank you for your attention.
> > > - Devin
> > > 
> > > #regzbot introduced: bd9bbc96e8356886971317f57994247ca491dbf1
> > > 
> > > [0]: https://lore.kernel.org/regressions/52914da7-a97b-45ad-86a0-affdf8266c61@mailbox.org/
> > > [1]: https://lore.kernel.org/regressions/376c445a-9437-4bdd-9b67-e7ce786ae2c4@mailbox.org/
> > > [2]: https://ranguvar.io/pub/paste/linux-6.12-vm-regression/bisect.log
> > > [3]: https://ranguvar.io/pub/paste/linux-6.12-vm-regression/dmesg-6.11.0-rc1-1-git-00057-gbd9bbc96e835-decoded.log
> > 
> > Hmm, this has:
> > 
> > [  978.035637] sched: DL replenish lagged too much
> > 
> > Juri, have we seen that before?
> 
> Not in the context of dl_server. Hummm, looks like replenishment wasn't
> able to catch up with the clock or something like that (e.g.
> replenishment didn't happen for a long time).

I don't see anything in the logs that suggests KVM is doing something funky.  My
guess is that the issue is related to isolcpus+rcu_nocbs+nohz_full, and that KVM
setups are one of the more common use cases for such configurations.  But that's
just a wild guess on my part.

The hang from [4] occurs because KVM can't complete a memslot update.  Given that
this shows up with GPU passthrough, odds are good the guest is trying to relocate
a GPU bar and the relocation hangs because the KVM-side update hangs.

There are some interesting/unique paths in KVM's memslot code, but this is a
simple hang on SRCU synchronization.

   INFO: task CPU 0/KVM:2134 blocked for more than 122 seconds.
         Not tainted 6.11.0-rc1-1-git-00057-gbd9bbc96e835 #12
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   task:CPU 0/KVM       state:D stack:0     pid:2134  tgid:2114  ppid:1      flags:0x00004002
   Call Trace:
    <TASK>
   __schedule (kernel/sched/core.c:5258 kernel/sched/core.c:6594) 
   schedule (./arch/x86/include/asm/preempt.h:84 (discriminator 13) kernel/sched/core.c:6672 (discriminator 13) kernel/sched/core.c:6686 (discriminator 13)) 
   schedule_timeout (kernel/time/timer.c:2558) 
   wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) 
   __synchronize_srcu (kernel/rcu/srcutree.c:1408) 
   kvm_swap_active_memslots+0x133/0x180 kvm
   kvm_set_memslot+0x3de/0x680 kvm
   kvm_vm_ioctl+0x11da/0x18d0 kvm
   __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:907 fs/ioctl.c:893 fs/ioctl.c:893) 
   do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) 

And in [5], the host hang that first pops is also on wait_for_completion(), in
code that is potentially trying to queue work on all CPUs (I've no idea if
cpu_needs_drain() can be true on the isolated CPUs).

	cpumask_clear(&has_work);
	for_each_online_cpu(cpu) {
		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);

		if (cpu_needs_drain(cpu)) {
			INIT_WORK(work, lru_add_drain_per_cpu);
			queue_work_on(cpu, mm_percpu_wq, work);
			__cpumask_set_cpu(cpu, &has_work);
		}
	}

	for_each_cpu(cpu, &has_work)
		flush_work(&per_cpu(lru_add_drain_work, cpu));

  sched: DL replenish lagged too much
  systemd[1]: Starting Cleanup of Temporary Directories...
  systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
  systemd[1]: Finished Cleanup of Temporary Directories.
  systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Killing.
  systemd[1]: systemd-journald.service: Killing process 647 (systemd-journal) with signal SIGKILL.
  systemd[1]: Starting system activity accounting tool...
  systemd[1]: sysstat-collect.service: Deactivated successfully.
  systemd[1]: Finished system activity accounting tool.
  INFO: task khugepaged:263 blocked for more than 122 seconds.
        Not tainted 6.12.4-arch1-1 #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:khugepaged      state:D stack:0     pid:263   tgid:263   ppid:2      flags:0x00004000
  Call Trace:
   <TASK>
   __schedule+0x3b0/0x12b0
   schedule+0x27/0xf0
   schedule_timeout+0x12f/0x160
   wait_for_completion+0x86/0x170
   __flush_work+0x1bf/0x2c0
   __lru_add_drain_all+0x13e/0x1e0
   khugepaged+0x66/0x930
   kthread+0xd2/0x100
   ret_from_fork+0x34/0x50
   ret_from_fork_asm+0x1a/0x30
   </TASK>

  reply	other threads:[~2024-12-16 16:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-14  6:32 [REGRESSION][BISECTED] from bd9bbc96e835: cannot boot Win11 KVM guest Ranguvar
2024-12-14 18:52 ` Peter Zijlstra
2024-12-16 15:23   ` Juri Lelli
2024-12-16 16:50     ` Sean Christopherson [this message]
2024-12-16 20:40       ` Ranguvar
2024-12-17  8:57         ` Juri Lelli
2024-12-18  6:21           ` Ranguvar
     [not found] <nscDY8Zl-c9zxKZ0qGQX8eqpyHf-84yh3mPJWUUWkaNsx5A06rvv6tBOQSXXFjZzXeQl_ZVUbgGvK9yjH6avpoOwmZZkm3FSILtaz2AHgLk=@ranguvar.io>
2024-12-14 18:39 ` Peter Zijlstra
2024-12-15 18:50   ` Ranguvar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2BaZSKtaAPGSCqb@google.com \
    --to=seanjc@google.com \
    --cc=juri.lelli@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ranguvar@ranguvar.io \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.