[Bug 218259] New: High latency in KVM guests

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 218259] New: High latency in KVM guests
@ 2023-12-12 16:37 bugzilla-daemon
  2023-12-13 16:54 ` Sean Christopherson
                   ` (12 more replies)
  0 siblings, 13 replies; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-12 16:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

            Bug ID: 218259
           Summary: High latency in KVM guests
           Product: Virtualization
           Version: unspecified
          Hardware: Intel
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: kernelbugs2012@joern-heissler.de
        Regression: No

Hello,

some of my guests are experiencing heavy latency, e.g.:

* SSH sessions get stuck, sometimes for several seconds, then continue.
* PING replies can take several seconds.
* t0 = time(); sleep(1); t1 = time(); print(t1 - t0); can show several seconds.
* Various services with small timeouts throw errors.
* guest system clock appears to work correctly.

Sometimes this happens once or twice an hour or not for many hours. Usually the
lag is way below
a second, that's why I didn't notice it earlier.
On highly affected hosts this may happen much more often, and often in
clusters, e.g. lots of
time during a span of a few Minutes.

The affected hosts run Debian 12; until Debian 11 there was no trouble.
I git-bisected the kernel and the commit which appears to somehow cause the
trouble is:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f47e5bbbc92f5d234bbab317523c64a65b6ac4e2

CPU model: Intel(R) Xeon(R) CPU E5-2695 v4 (See below for /proc/cpuinfo)
Host kernels:
    * Debian, linux-image-6.1.0-15-amd64 (6.1.66-1), affected
    * Debian, linux-image-6.5.0-0.deb12.4-amd64 (6.5.10-1~bpo12+1), affected
    * Vanilla, v6.7-rc5, affected
    * Vanilla, v5.17-rc3-349-gf47e5bbbc92f, first affected commit
    * Vanilla, v5.17-rc3-348-ga80ced6ea514, last non-affected commit
    * Vanilla, several other versions during bisecting
Host kernel arch: x86_64
Host RAM: 512 GiB.
Host storage: HW-Raid6 over spinning disks.
Guest: Debian 11, x86_64. Debian Kernels linux-image-5.10.0-26-amd64
(5.10.197-1) and linux-image-6.1.0-0.deb11.13-amd64 (6.1.55-1~bpo11+1).
Qemu command line: See below.
Problem does *not* go away when appending "kernel_irqchip=off" to the -machine
parameter
Problem *does* go away with "-accel tcg", even though the guest becomes much
slower.

All affected guests run kubernetes with various workloads, mostly Java,
databases like postgres
und a few legacy 32 bit containers.

Best method to manually trigger the problem I found was to drain other
kubernetes nodes, causing
many pods to start at the same time on the affected guest. But even when the
initial load
settled, there's little I/O and the guest is like 80% idle, the problem still
occurs.

The problem occurs whether the host runs only a single guest or lots of other
(non-kubernetes) guests.

Other (i.e. not kubernetes) guests don't appear to be affected, but those got
way less resources and usually less load.

I adjusted several qemu parameters, e.g. pass-through host cpu, different smp
layout (2
sockets, 4 cores each, 2 threads each), remove memory balloon, add I/O-thread
for the disk, set
latest supported machine type. None of those resolved the problem.

There are no kernel logs in the host or the guest, and no userspace logs in the
host.

AMD hosts with SSDs seem to be less severely affected, but they still are.

Sadly I couldn't think of any good way to trigger the problem on say a blank
Debian guest.

If I can provide additional information or can run additional tests, please let
me know!

Many thanks in advance!
Jörn Heissler

---------

/usr/bin/qemu-system-x86_64
-name guest=k8s-worker6,debug-threads=on
-S
-object
{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-k8s-worker6/master-key.aes"}
-machine pc-i440fx-3.1,usb=off,dump-guest-core=off,memory-backend=pc.ram
-accel kvm
-cpu qemu64
-m 131072
-object {"qom-type":"memory-backend-ram","id":"pc.ram","size":137438953472}
-overcommit mem-lock=off
-smp 16,sockets=16,cores=1,threads=1
-uuid 2c220b5b-9d0a-4d41-a13e-cd78c5551b35
-no-user-config
-nodefaults
-chardev socket,id=charmonitor,fd=30,server=on,wait=off
-mon chardev=charmonitor,id=monitor,mode=control
-rtc base=utc
-no-shutdown
-boot menu=on,strict=on
-device {"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}
-blockdev
{"driver":"host_device","filename":"/dev/vg-kvm/k8s-worker6_root","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
-device
{"driver":"virtio-blk-pci","bus":"pci.0","addr":"0x4","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"}
-netdev {"type":"tap","fd":"32","vhost":true,"vhostfd":"34","id":"hostnet0"}
-device
{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:e7:56:ae","bus":"pci.0","addr":"0x3"}
-chardev pty,id=charserial0
-device
{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}
-audiodev {"id":"audio1","driver":"none"}
-vnc 127.0.0.1:0,audiodev=audio1
-device {"driver":"cirrus-vga","id":"video0","bus":"pci.0","addr":"0x2"}
-incoming defer
-device
{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x5"}
-sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on

---------

processor       : 71
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
stepping        : 1
microcode       : 0xb000040
cpu MHz         : 3300.000
cache size      : 46080 KB
physical id     : 1
siblings        : 36
core id         : 27
cpu cores       : 18
apicid          : 119
initial apicid  : 119
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb
stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle
avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt
cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
md_clear flush_l1d
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad
ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid
unrestricted_guest vapic_reg vid ple shadow_vmcs pml
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
swapgs taa itlb_multihit mmio_stale_data
bogomips        : 4199.23
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 218259] New: High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
@ 2023-12-13 16:54 ` Sean Christopherson
  2023-12-13 16:54 ` [Bug 218259] " bugzilla-daemon
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2023-12-13 16:54 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Tue, Dec 12, 2023, bugzilla-daemon@kernel.org wrote:
> The affected hosts run Debian 12; until Debian 11 there was no trouble.
> I git-bisected the kernel and the commit which appears to somehow cause the
> trouble is:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f47e5bbbc92f5d234bbab317523c64a65b6ac4e2

Huh.  That commit makes it so that KVM keeps non-leaf SPTEs, i.e. upper level page
table structures, when zapping/unmapping a guest memory range.  The idea is that
preserving paging structures will allow for faster unmapping (less work) and faster
repopulation if/when the guest faults the memory back in (again, less work to create
a valid mapping).

The only downside that comes to mind is that keeping upper level paging structures
will make it more costly to handle future invalidations as KVM will have to walk
deeper into the page tables before discovering more work that needs to be done.

> Qemu command line: See below.
> Problem does *not* go away when appending "kernel_irqchip=off" to the -machine
> parameter
> Problem *does* go away with "-accel tcg", even though the guest becomes much
> slower.

Yeah, that's expected, as that completely takes KVM out of the picture.

> All affected guests run kubernetes with various workloads, mostly Java,
> databases like postgres und a few legacy 32 bit containers.
> 
> Best method to manually trigger the problem I found was to drain other
> kubernetes nodes, causing many pods to start at the same time on the affected
> guest. But even when the initial load settled, there's little I/O and the
> guest is like 80% idle, the problem still occurs.
> 
> The problem occurs whether the host runs only a single guest or lots of other
> (non-kubernetes) guests.
> 
> Other (i.e. not kubernetes) guests don't appear to be affected, but those got
> way less resources and usually less load.

The affected flows are used only for handling mmu_notifier invalidations and for
edge cases related to non-coherent DMA.  I don't see any passthrough devices in
your setup, so that rules out the non-coherent DMA side of things.

A few things to try:

 1. Disable KSM (if enabled)

        echo 0 > /sys/kernel/mm/ksm/run

 2. Disable NUMA autobalancing (if enabled):

        echo 0 > /proc/sys/kernel/numa_balancing

 3. Disable KVM's TDP MMU.  On pre-v6.3 kernels, this can be done without having
    to reload KVM (or reboot the kernel if KVM is builtin).

        echo N > /sys/module/kvm/parameters/tdp_mmu

    On v6.3 and later kernels, tdp_mmu is a read-only module param and so needs
    to be disable when loading kvm.ko or when booting the kernel.

There are plenty more things that can be tried, but the above are relatively easy
and will hopefully narrow down the search significantly.

Oh, and one question: is your host kernel preemptible?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
  2023-12-13 16:54 ` Sean Christopherson
@ 2023-12-13 16:54 ` bugzilla-daemon
  2023-12-14  7:14 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-13 16:54 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
On Tue, Dec 12, 2023, bugzilla-daemon@kernel.org wrote:
> The affected hosts run Debian 12; until Debian 11 there was no trouble.
> I git-bisected the kernel and the commit which appears to somehow cause the
> trouble is:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f47e5bbbc92f5d234bbab317523c64a65b6ac4e2

Huh.  That commit makes it so that KVM keeps non-leaf SPTEs, i.e. upper level
page
table structures, when zapping/unmapping a guest memory range.  The idea is
that
preserving paging structures will allow for faster unmapping (less work) and
faster
repopulation if/when the guest faults the memory back in (again, less work to
create
a valid mapping).

The only downside that comes to mind is that keeping upper level paging
structures
will make it more costly to handle future invalidations as KVM will have to
walk
deeper into the page tables before discovering more work that needs to be done.

> Qemu command line: See below.
> Problem does *not* go away when appending "kernel_irqchip=off" to the
> -machine
> parameter
> Problem *does* go away with "-accel tcg", even though the guest becomes much
> slower.

Yeah, that's expected, as that completely takes KVM out of the picture.

> All affected guests run kubernetes with various workloads, mostly Java,
> databases like postgres und a few legacy 32 bit containers.
> 
> Best method to manually trigger the problem I found was to drain other
> kubernetes nodes, causing many pods to start at the same time on the affected
> guest. But even when the initial load settled, there's little I/O and the
> guest is like 80% idle, the problem still occurs.
> 
> The problem occurs whether the host runs only a single guest or lots of other
> (non-kubernetes) guests.
> 
> Other (i.e. not kubernetes) guests don't appear to be affected, but those got
> way less resources and usually less load.

The affected flows are used only for handling mmu_notifier invalidations and
for
edge cases related to non-coherent DMA.  I don't see any passthrough devices in
your setup, so that rules out the non-coherent DMA side of things.

A few things to try:

 1. Disable KSM (if enabled)

        echo 0 > /sys/kernel/mm/ksm/run

 2. Disable NUMA autobalancing (if enabled):

        echo 0 > /proc/sys/kernel/numa_balancing

 3. Disable KVM's TDP MMU.  On pre-v6.3 kernels, this can be done without
having
    to reload KVM (or reboot the kernel if KVM is builtin).

        echo N > /sys/module/kvm/parameters/tdp_mmu

    On v6.3 and later kernels, tdp_mmu is a read-only module param and so needs
    to be disable when loading kvm.ko or when booting the kernel.

There are plenty more things that can be tried, but the above are relatively
easy
and will hopefully narrow down the search significantly.

Oh, and one question: is your host kernel preemptible?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
  2023-12-13 16:54 ` Sean Christopherson
  2023-12-13 16:54 ` [Bug 218259] " bugzilla-daemon
@ 2023-12-14  7:14 ` bugzilla-daemon
  2023-12-18 17:07   ` Sean Christopherson
  2023-12-14  7:15 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-14  7:14 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #2 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
Hi,

1. KSM is already disabled. Didn't try to enable it.
2. NUMA autobalancing was enabled on the host (value 1), not in the guest. When
disabled, I can't see the issue anymore.
3. tdp_mmu was "Y", disabling it seems to make no difference.

So might be related to NUMA. On older kernels, the flag is 1 as well.

There's one difference in the kernel messages that I hadn't noticed before. The
newer one prints "pci_bus 0000:7f: Unknown NUMA node; performance will be
reduced" (same with ff again). The older ones don't. No idea what this means,
if it's important, and can't find info on the web regarding it.


I think the kernel is preemptible:

"uname -a" shows: "Linux vm123 6.1.0-15-amd64 #1 SMP PREEMPT_DYNAMIC Debian
6.1.66-1 (2023-12-09) x86_64 GNU/Linux"

"grep -i" on the config shows:
CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Attaching output of "dmesg" and "lspci -v". Perhaps there's something useful in
there.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (2 preceding siblings ...)
  2023-12-14  7:14 ` bugzilla-daemon
@ 2023-12-14  7:15 ` bugzilla-daemon
  2023-12-14  7:16 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-14  7:15 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #3 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
Created attachment 305596
  --> https://bugzilla.kernel.org/attachment.cgi?id=305596&action=edit
dmesg

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (3 preceding siblings ...)
  2023-12-14  7:15 ` bugzilla-daemon
@ 2023-12-14  7:16 ` bugzilla-daemon
  2023-12-18 17:07 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-14  7:16 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #4 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
Created attachment 305597
  --> https://bugzilla.kernel.org/attachment.cgi?id=305597&action=edit
lspci -v

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 218259] High latency in KVM guests
  2023-12-14  7:14 ` bugzilla-daemon
@ 2023-12-18 17:07   ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2023-12-18 17:07 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Thu, Dec 14, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218259
> 
> --- Comment #2 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
> Hi,
> 
> 1. KSM is already disabled. Didn't try to enable it.
> 2. NUMA autobalancing was enabled on the host (value 1), not in the guest. When
> disabled, I can't see the issue anymore.

This is likely/hopefully the same thing Yan encountered[1].  If you are able to
test patches, the proposed fix[2] applies cleanly on v6.6 (note, I need to post a
refreshed version of the series regardless), any feedback you can provide would
be much appreciated.

KVM changes aside, I highly recommend evaluating whether or not NUMA
autobalancing is a net positive for your environment.  The interactions between
autobalancing and KVM are often less than stellar, and disabling autobalancing
is sometimes a completely legitimate option/solution.

[1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
[2] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com

> 3. tdp_mmu was "Y", disabling it seems to make no difference.

Hrm, that's odd.  The commit blamed by bisection was purely a TDP MMU change.
Did you relaunch VMs after disabling the module params?  While the module param
is writable, it's effectively snapshotted by each VM during creation, i.e. toggling
it won't affect running VMs.

> So might be related to NUMA. On older kernels, the flag is 1 as well.
> 
> There's one difference in the kernel messages that I hadn't noticed before. The
> newer one prints "pci_bus 0000:7f: Unknown NUMA node; performance will be
> reduced" (same with ff again). The older ones don't. No idea what this means,
> if it's important, and can't find info on the web regarding it.

That was a new message added by commit ad5086108b9f ("PCI: Warn if no host bridge
NUMA node info"), which was first released in v5.5.  AFAICT, that warning is only
complaning about the driver code for PCI devices possibly running on the wrong
node.

However, if you are seeing that error on v6.1 or v6.6, but not v5.17, i.e. if the
message started showing up well after the printk was added, then it might be a
symptom of an underlying problem, e.g. maybe the kernel is botching parsing of
ACPI tables?

> I think the kernel is preemptible:

Ya, not fully preemptible (voluntary only), but the important part is that KVM
will drop mmu_lock if there is contention (which is a "requirement" for the bug
that Yan encountered).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (4 preceding siblings ...)
  2023-12-14  7:16 ` bugzilla-daemon
@ 2023-12-18 17:07 ` bugzilla-daemon
  2024-01-11 15:55   ` Sean Christopherson
  2023-12-19 14:09 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-18 17:07 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #5 from Sean Christopherson (seanjc@google.com) ---
On Thu, Dec 14, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218259
> 
> --- Comment #2 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
> Hi,
> 
> 1. KSM is already disabled. Didn't try to enable it.
> 2. NUMA autobalancing was enabled on the host (value 1), not in the guest.
> When
> disabled, I can't see the issue anymore.

This is likely/hopefully the same thing Yan encountered[1].  If you are able to
test patches, the proposed fix[2] applies cleanly on v6.6 (note, I need to post
a
refreshed version of the series regardless), any feedback you can provide would
be much appreciated.

KVM changes aside, I highly recommend evaluating whether or not NUMA
autobalancing is a net positive for your environment.  The interactions between
autobalancing and KVM are often less than stellar, and disabling autobalancing
is sometimes a completely legitimate option/solution.

[1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
[2] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com

> 3. tdp_mmu was "Y", disabling it seems to make no difference.

Hrm, that's odd.  The commit blamed by bisection was purely a TDP MMU change.
Did you relaunch VMs after disabling the module params?  While the module param
is writable, it's effectively snapshotted by each VM during creation, i.e.
toggling
it won't affect running VMs.

> So might be related to NUMA. On older kernels, the flag is 1 as well.
> 
> There's one difference in the kernel messages that I hadn't noticed before.
> The
> newer one prints "pci_bus 0000:7f: Unknown NUMA node; performance will be
> reduced" (same with ff again). The older ones don't. No idea what this means,
> if it's important, and can't find info on the web regarding it.

That was a new message added by commit ad5086108b9f ("PCI: Warn if no host
bridge
NUMA node info"), which was first released in v5.5.  AFAICT, that warning is
only
complaning about the driver code for PCI devices possibly running on the wrong
node.

However, if you are seeing that error on v6.1 or v6.6, but not v5.17, i.e. if
the
message started showing up well after the printk was added, then it might be a
symptom of an underlying problem, e.g. maybe the kernel is botching parsing of
ACPI tables?

> I think the kernel is preemptible:

Ya, not fully preemptible (voluntary only), but the important part is that KVM
will drop mmu_lock if there is contention (which is a "requirement" for the bug
that Yan encountered).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (5 preceding siblings ...)
  2023-12-18 17:07 ` bugzilla-daemon
@ 2023-12-19 14:09 ` bugzilla-daemon
  2023-12-19 15:16   ` Sean Christopherson
  2023-12-19 15:16 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-19 14:09 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #6 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
(In reply to Sean Christopherson from comment #5)

> This is likely/hopefully the same thing Yan encountered[1].  If you are able
> to
> test patches, the proposed fix[2] applies cleanly on v6.6 (note, I need to
> post a
> refreshed version of the series regardless), any feedback you can provide
> would
> be much appreciated.
> 
> [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> [2] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com

I admit that I don't understand most of what's written in the those threads.
I applied the two patches from [2] (excluding [3]) to v6.6 and it appears to
solve the problem.

However I haven't measured how/if any of the changes/flags affect performance
or if any other problems are caused. After about 1 hour uptime it appears to be
okay.

[3] https://lore.kernel.org/all/ZPtVF5KKxLhMj58n@google.com/


> KVM changes aside, I highly recommend evaluating whether or not NUMA
> autobalancing is a net positive for your environment.  The interactions
> between
> autobalancing and KVM are often less than stellar, and disabling
> autobalancing
> is sometimes a completely legitimate option/solution.

I'll have to evaluate multiple options for my production environment.
Patching+Building the kernel myself would only be a last resort. And it will
probably take a while until Debian ships a patch for the issue. So maybe
disable the NUMA balancing, or perhaps try to pin a VM's memory+cpu to a single
NUMA node.

> > 3. tdp_mmu was "Y", disabling it seems to make no difference.
> 
> Hrm, that's odd.  The commit blamed by bisection was purely a TDP MMU change.
> Did you relaunch VMs after disabling the module params?  While the module
> param
> is writable, it's effectively snapshotted by each VM during creation, i.e.
> toggling
> it won't affect running VMs.

It's quite possible that I did not restart the VM afterwards. I tried again,
this time paying attention. Setting it to "N" *does* seem to eliminate the
issue.


> > The newer one prints "pci_bus 0000:7f: Unknown NUMA node; performance will
> be
> > reduced" (same with ff again). The older ones don't.
> 
> That was a new message added by commit ad5086108b9f ("PCI: Warn if no host
> bridge
> NUMA node info"), which was first released in v5.5.

Seems I looked on systems running older (< v5.5) kernels. On the ones with
v5.10 the message is printed too.


Thanks a lot so far, I believe I've now got enough options to consider for my
production environment.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 218259] High latency in KVM guests
  2023-12-19 14:09 ` bugzilla-daemon
@ 2023-12-19 15:16   ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2023-12-19 15:16 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Tue, Dec 19, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218259
> 
> --- Comment #6 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
> (In reply to Sean Christopherson from comment #5)
> 
> > This is likely/hopefully the same thing Yan encountered[1].  If you are able
> > to
> > test patches, the proposed fix[2] applies cleanly on v6.6 (note, I need to
> > post a
> > refreshed version of the series regardless), any feedback you can provide
> > would
> > be much appreciated.
> > 
> > [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> > [2] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com
> 
> I admit that I don't understand most of what's written in the those threads.

LOL, no worries, sometimes none of us understand what's written either ;-)

> I applied the two patches from [2] (excluding [3]) to v6.6 and it appears to
> solve the problem.
> 
> However I haven't measured how/if any of the changes/flags affect performance
> or if any other problems are caused. After about 1 hour uptime it appears to be
> okay.

Don't worry too much about additional testing.  Barring a straight up bug (knock
wood), the changes in those patches have a very, very low probability of
introducing unwanted side effects.

> > KVM changes aside, I highly recommend evaluating whether or not NUMA
> > autobalancing is a net positive for your environment.  The interactions
> > between
> > autobalancing and KVM are often less than stellar, and disabling
> > autobalancing
> > is sometimes a completely legitimate option/solution.
> 
> I'll have to evaluate multiple options for my production environment.
> Patching+Building the kernel myself would only be a last resort. And it will
> probably take a while until Debian ships a patch for the issue. So maybe
> disable the NUMA balancing, or perhaps try to pin a VM's memory+cpu to a single
> NUMA node.

Another viable option is to disable the TDP MMU, at least until the above patches
land and are picked up by Debian.  You could even reference commit 7e546bd08943
("Revert "KVM: x86: enable TDP MMU by default"") from the v5.15 stable tree if
you want a paper trail that provides some justification as to why it's ok to revert
back to the "old" MMU.

Quoting from that:

  : As far as what is lost by disabling the TDP MMU, the main selling point of
  : the TDP MMU is its ability to service page fault VM-Exits in parallel,
  : i.e. the main benefactors of the TDP MMU are deployments of large VMs
  : (hundreds of vCPUs), and in particular delployments that live-migrate such
  : VMs and thus need to fault-in huge amounts of memory on many vCPUs after
  : restarting the VM after migration.

In other words, the old MMU is not broken, e.g. it didn't suddently become unusable
after 15+ years of use.  We enabled the newfangled TDP MMU by default because it
is the long-term replacement, e.g. it can scale to support use cases that the old
MMU falls over on, and we want to put the old MMU into maintenance-only mode.

But we are still ironing out some wrinkles in the TDP MMU, particularly for host
kernels that support preemption (the kernel has lock contention logic that is
unique to preemptible kernels).  And in the meantime, for most KVM use cases, the
old MMU is still perfectly servicable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (6 preceding siblings ...)
  2023-12-19 14:09 ` bugzilla-daemon
@ 2023-12-19 15:16 ` bugzilla-daemon
  2024-01-11 15:55 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2023-12-19 15:16 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #7 from Sean Christopherson (seanjc@google.com) ---
On Tue, Dec 19, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218259
> 
> --- Comment #6 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
> (In reply to Sean Christopherson from comment #5)
> 
> > This is likely/hopefully the same thing Yan encountered[1].  If you are
> able
> > to
> > test patches, the proposed fix[2] applies cleanly on v6.6 (note, I need to
> > post a
> > refreshed version of the series regardless), any feedback you can provide
> > would
> > be much appreciated.
> > 
> > [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> > [2] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com
> 
> I admit that I don't understand most of what's written in the those threads.

LOL, no worries, sometimes none of us understand what's written either ;-)

> I applied the two patches from [2] (excluding [3]) to v6.6 and it appears to
> solve the problem.
> 
> However I haven't measured how/if any of the changes/flags affect performance
> or if any other problems are caused. After about 1 hour uptime it appears to
> be
> okay.

Don't worry too much about additional testing.  Barring a straight up bug
(knock
wood), the changes in those patches have a very, very low probability of
introducing unwanted side effects.

> > KVM changes aside, I highly recommend evaluating whether or not NUMA
> > autobalancing is a net positive for your environment.  The interactions
> > between
> > autobalancing and KVM are often less than stellar, and disabling
> > autobalancing
> > is sometimes a completely legitimate option/solution.
> 
> I'll have to evaluate multiple options for my production environment.
> Patching+Building the kernel myself would only be a last resort. And it will
> probably take a while until Debian ships a patch for the issue. So maybe
> disable the NUMA balancing, or perhaps try to pin a VM's memory+cpu to a
> single
> NUMA node.

Another viable option is to disable the TDP MMU, at least until the above
patches
land and are picked up by Debian.  You could even reference commit 7e546bd08943
("Revert "KVM: x86: enable TDP MMU by default"") from the v5.15 stable tree if
you want a paper trail that provides some justification as to why it's ok to
revert
back to the "old" MMU.

Quoting from that:

  : As far as what is lost by disabling the TDP MMU, the main selling point of
  : the TDP MMU is its ability to service page fault VM-Exits in parallel,
  : i.e. the main benefactors of the TDP MMU are deployments of large VMs
  : (hundreds of vCPUs), and in particular delployments that live-migrate such
  : VMs and thus need to fault-in huge amounts of memory on many vCPUs after
  : restarting the VM after migration.

In other words, the old MMU is not broken, e.g. it didn't suddently become
unusable
after 15+ years of use.  We enabled the newfangled TDP MMU by default because
it
is the long-term replacement, e.g. it can scale to support use cases that the
old
MMU falls over on, and we want to put the old MMU into maintenance-only mode.

But we are still ironing out some wrinkles in the TDP MMU, particularly for
host
kernels that support preemption (the kernel has lock contention logic that is
unique to preemptible kernels).  And in the meantime, for most KVM use cases,
the
old MMU is still perfectly servicable.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 218259] High latency in KVM guests
  2023-12-18 17:07 ` bugzilla-daemon
@ 2024-01-11 15:55   ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-01-11 15:55 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Mon, Dec 18, 2023, bugzilla-daemon@kernel.org wrote:
> > I think the kernel is preemptible:
> 
> Ya, not fully preemptible (voluntary only), but the important part is that KVM
> will drop mmu_lock if there is contention (which is a "requirement" for the bug
> that Yan encountered).

For posterity, the above is wrong.  Volutary preemption isn't _supposed_ to enable
yielding of contended spinlocks/rwlocks, but due to a bug with dynamic preemption,
the behavior got enabled for all preempt models:

https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (7 preceding siblings ...)
  2023-12-19 15:16 ` bugzilla-daemon
@ 2024-01-11 15:55 ` bugzilla-daemon
  2024-01-16 13:29 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2024-01-11 15:55 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #8 from Sean Christopherson (seanjc@google.com) ---
On Mon, Dec 18, 2023, bugzilla-daemon@kernel.org wrote:
> > I think the kernel is preemptible:
> 
> Ya, not fully preemptible (voluntary only), but the important part is that
> KVM
> will drop mmu_lock if there is contention (which is a "requirement" for the
> bug
> that Yan encountered).

For posterity, the above is wrong.  Volutary preemption isn't _supposed_ to
enable
yielding of contended spinlocks/rwlocks, but due to a bug with dynamic
preemption,
the behavior got enabled for all preempt models:

https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (8 preceding siblings ...)
  2024-01-11 15:55 ` bugzilla-daemon
@ 2024-01-16 13:29 ` bugzilla-daemon
  2024-01-17  1:05   ` Sean Christopherson
  2024-01-17  1:05 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 18+ messages in thread
From: bugzilla-daemon @ 2024-01-16 13:29 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #9 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
(In reply to Sean Christopherson from comment #5)
> On Thu, Dec 14, 2023, bugzilla-daemon@kernel.org wrote:

> While the [tdp_mmu] module param is writable, it's effectively snapshotted by
> each VM during creation, i.e. toggling it won't affect running VMs.

How can I see which MMU a running VM is using?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 218259] High latency in KVM guests
  2024-01-16 13:29 ` bugzilla-daemon
@ 2024-01-17  1:05   ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-01-17  1:05 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Tue, Jan 16, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218259
> 
> --- Comment #9 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
> (In reply to Sean Christopherson from comment #5)
> > On Thu, Dec 14, 2023, bugzilla-daemon@kernel.org wrote:
> 
> > While the [tdp_mmu] module param is writable, it's effectively snapshotted by
> > each VM during creation, i.e. toggling it won't affect running VMs.
> 
> How can I see which MMU a running VM is using?

You can't, which in hindsight was a rather stupid thing for us to not make visible
somewhere.  As of v6.3, the module param is read-only, i.e. it's either enable or
disabled for all VMs, so sadly it's unlikely that older kernels will see any kind
of "fix".

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (9 preceding siblings ...)
  2024-01-16 13:29 ` bugzilla-daemon
@ 2024-01-17  1:05 ` bugzilla-daemon
  2024-01-17 14:04 ` bugzilla-daemon
  2024-11-05  0:27 ` bugzilla-daemon
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2024-01-17  1:05 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #10 from Sean Christopherson (seanjc@google.com) ---
On Tue, Jan 16, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218259
> 
> --- Comment #9 from Joern Heissler (kernelbugs2012@joern-heissler.de) ---
> (In reply to Sean Christopherson from comment #5)
> > On Thu, Dec 14, 2023, bugzilla-daemon@kernel.org wrote:
> 
> > While the [tdp_mmu] module param is writable, it's effectively snapshotted
> by
> > each VM during creation, i.e. toggling it won't affect running VMs.
> 
> How can I see which MMU a running VM is using?

You can't, which in hindsight was a rather stupid thing for us to not make
visible
somewhere.  As of v6.3, the module param is read-only, i.e. it's either enable
or
disabled for all VMs, so sadly it's unlikely that older kernels will see any
kind
of "fix".

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (10 preceding siblings ...)
  2024-01-17  1:05 ` bugzilla-daemon
@ 2024-01-17 14:04 ` bugzilla-daemon
  2024-11-05  0:27 ` bugzilla-daemon
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2024-01-17 14:04 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

Friedrich Weber (f.weber@proxmox.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |f.weber@proxmox.com

--- Comment #11 from Friedrich Weber (f.weber@proxmox.com) ---
For the record: I was seeing a variant of this issue in combination with KSM,
see [1] for more details and reproducer.

[1]
https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 218259] High latency in KVM guests
  2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
                   ` (11 preceding siblings ...)
  2024-01-17 14:04 ` bugzilla-daemon
@ 2024-11-05  0:27 ` bugzilla-daemon
  12 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2024-11-05  0:27 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218259

Roland Kletzing (devzero@web.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |devzero@web.de

--- Comment #12 from Roland Kletzing (devzero@web.de) ---
also keep an eye on https://bugzilla.kernel.org/show_bug.cgi?id=199727 , as I/O
also may be the cause for severe VM latency

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-11-05  0:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-12 16:37 [Bug 218259] New: High latency in KVM guests bugzilla-daemon
2023-12-13 16:54 ` Sean Christopherson
2023-12-13 16:54 ` [Bug 218259] " bugzilla-daemon
2023-12-14  7:14 ` bugzilla-daemon
2023-12-18 17:07   ` Sean Christopherson
2023-12-14  7:15 ` bugzilla-daemon
2023-12-14  7:16 ` bugzilla-daemon
2023-12-18 17:07 ` bugzilla-daemon
2024-01-11 15:55   ` Sean Christopherson
2023-12-19 14:09 ` bugzilla-daemon
2023-12-19 15:16   ` Sean Christopherson
2023-12-19 15:16 ` bugzilla-daemon
2024-01-11 15:55 ` bugzilla-daemon
2024-01-16 13:29 ` bugzilla-daemon
2024-01-17  1:05   ` Sean Christopherson
2024-01-17  1:05 ` bugzilla-daemon
2024-01-17 14:04 ` bugzilla-daemon
2024-11-05  0:27 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox