* [Bug 217796] New: latest Zen Inception fixes breaks nested kvm virtualization on AMD
@ 2023-08-15 14:55 bugzilla-daemon
2023-08-15 17:53 ` [Bug 217796] " bugzilla-daemon
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: bugzilla-daemon @ 2023-08-15 14:55 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=217796
Bug ID: 217796
Summary: latest Zen Inception fixes breaks nested kvm
virtualization on AMD
Product: Virtualization
Version: unspecified
Hardware: AMD
OS: Linux
Status: NEW
Severity: blocking
Priority: P3
Component: kvm
Assignee: virtualization_kvm@kernel-bugs.osdl.org
Reporter: sonst+kernel@o-oberst.de
Regression: No
Hi all,
today I updated to 6.4.10 on arch linux. This broke my setup with running a KVM
nested virtualization within a KVM VM. Problem seems kernel update related not
distribution specific since others report same issue on a totally different
setup:
https://forum.proxmox.com/threads/amd-incpetion-fixes-cause-qemu-kvm-memory-leak.132057/#post-581207
Issue:
1. Start KVM vm ("hostVM") with 60GB memory assigned -> all works.
2. within that hostVM I start a nestedVM with 5GB memory assigned.
3. Memory consumption of the quemu process within the hostVM goes beyond
available memory. Then the nestedVM gets OOM killed before even being started
using more than the 60GB + Swap.
I tried to setup fresh nestedVMs with no luck, same problem.
Reverting to an earlier kernel (6.4.7 on arch linux) lets everything work
again.
host kernel: 6.4.10-arch1 (this induces the problems, rest was unchanged)
hostVM kernel: 5.15.107+truenas
nestedVM kernel: 5.15.0-78-generic
Logs from the hostVM when OOM happens:
Aug 15 10:59:41 truenas kernel: CPU 0/KVM invoked oom-killer:
gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
Aug 15 10:59:42 truenas kernel: CPU: 9 PID: 7079 Comm: CPU 0/KVM Tainted: P
OE 5.15.107+truenas #1
Aug 15 10:59:43 truenas kernel: Hardware name: QEMU Standard PC (Q35 + ICH9,
2009), BIOS unknown 2/2/2022
Aug 15 10:59:43 truenas kernel: Call Trace:
Aug 15 10:59:43 truenas kernel: <TASK>
Aug 15 10:59:43 truenas kernel: dump_stack_lvl+0x46/0x5e
Aug 15 10:59:43 truenas kernel: dump_header+0x4a/0x1f4
Aug 15 10:59:43 truenas kernel: oom_kill_process.cold+0xb/0x10
Aug 15 10:59:43 truenas kernel: out_of_memory+0x1bd/0x4f0
Aug 15 10:59:43 truenas kernel: __alloc_pages_slowpath.constprop.0+0xc30/0xd00
Aug 15 10:59:44 truenas kernel: __alloc_pages+0x1e9/0x220
Aug 15 10:59:44 truenas kernel: __get_free_pages+0xd/0x40
Aug 15 10:59:44 truenas kernel: kvm_mmu_topup_memory_cache+0x56/0x80 [kvm]
Aug 15 10:59:44 truenas kernel: mmu_topup_memory_caches+0x39/0x70 [kvm]
Aug 15 10:59:44 truenas kernel: direct_page_fault+0x3d9/0xbb0 [kvm]
Aug 15 10:59:44 truenas kernel: ?
kvm_mtrr_check_gfn_range_consistency+0x61/0x120 [kvm]
Aug 15 10:59:44 truenas kernel: kvm_mmu_page_fault+0x7a/0x730 [kvm]
Aug 15 10:59:44 truenas kernel: ? ktime_get+0x38/0xa0
Aug 15 10:59:44 truenas kernel: ? lock_timer_base+0x61/0x80
Aug 15 10:59:44 truenas kernel: ? __svm_vcpu_run+0x5f/0xf0 [kvm_amd]
Aug 15 10:59:44 truenas kernel: ? __svm_vcpu_run+0x59/0xf0 [kvm_amd]
Aug 15 10:59:44 truenas kernel: ? __svm_vcpu_run+0xaa/0xf0 [kvm_amd]
Aug 15 10:59:44 truenas kernel: ? load_fixmap_gdt+0x22/0x30
Aug 15 10:59:44 truenas kernel: ? native_load_tr_desc+0x67/0x70
Aug 15 10:59:44 truenas kernel: ? x86_virt_spec_ctrl+0x43/0xb0
Aug 15 10:59:44 truenas kernel: kvm_arch_vcpu_ioctl_run+0xbff/0x1750 [kvm]
Aug 15 10:59:44 truenas kernel: kvm_vcpu_ioctl+0x278/0x660 [kvm]
Aug 15 10:59:44 truenas kernel: ? __seccomp_filter+0x385/0x5c0
Aug 15 10:59:44 truenas kernel: __x64_sys_ioctl+0x8b/0xc0
Aug 15 10:59:44 truenas kernel: do_syscall_64+0x3b/0xc0
Aug 15 10:59:44 truenas kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Aug 15 10:59:44 truenas kernel: RIP: 0033:0x7f29eee166b7
Aug 15 10:59:45 truenas kernel: Code: Unable to access opcode bytes at RIP
0x7f29eee1668d.
Aug 15 10:59:45 truenas kernel: RSP: 002b:00007f27f35fd4c8 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
Aug 15 10:59:45 truenas kernel: RAX: ffffffffffffffda RBX: 000000000000ae80
RCX: 00007f29eee166b7
Aug 15 10:59:45 truenas kernel: RDX: 0000000000000000 RSI: 000000000000ae80
RDI: 0000000000000015
Aug 15 10:59:45 truenas kernel: RBP: 00005558a87d3f00 R08: 00005558a7e52848
R09: 00005558a827c580
Aug 15 10:59:45 truenas kernel: R10: 0000000000000000 R11: 0000000000000246
R12: 0000000000000000
Aug 15 10:59:45 truenas kernel: R13: 00005558a8298bc0 R14: 00007f27f35fd780
R15: 0000000000802000
Aug 15 10:59:45 truenas kernel: </TASK>
Aug 15 10:59:45 truenas kernel: Mem-Info:
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug 217796] latest Zen Inception fixes breaks nested kvm virtualization on AMD
2023-08-15 14:55 [Bug 217796] New: latest Zen Inception fixes breaks nested kvm virtualization on AMD bugzilla-daemon
@ 2023-08-15 17:53 ` bugzilla-daemon
2023-08-15 20:30 ` bugzilla-daemon
2023-08-19 10:39 ` bugzilla-daemon
2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2023-08-15 17:53 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=217796
--- Comment #1 from sonst+kernel@o-oberst.de ---
Note, adding spec_rstack_overflow=off as a kernel command line makes nested VM
boot properly again without problems:
https://bugs.archlinux.org/task/79384
So, spec_rstack_overflow=safe-ret is breaking nested KVM virtualization.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug 217796] latest Zen Inception fixes breaks nested kvm virtualization on AMD
2023-08-15 14:55 [Bug 217796] New: latest Zen Inception fixes breaks nested kvm virtualization on AMD bugzilla-daemon
2023-08-15 17:53 ` [Bug 217796] " bugzilla-daemon
@ 2023-08-15 20:30 ` bugzilla-daemon
2023-08-19 10:39 ` bugzilla-daemon
2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2023-08-15 20:30 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=217796
Sean Christopherson (seanjc@google.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |seanjc@google.com
--- Comment #2 from Sean Christopherson (seanjc@google.com) ---
This is going to sound completely ridiculous, but can you try the fix for the
guest RFLAGS corruption issue in the return thunk? It's definitely unlikely
that the _only_ symptom is an unexpected OOM, but it's theoretically possible,
e.g. if your setup only triggers KVM (bare metal host) emulation in a handful
of flows, and one of those flows just happens to send a single Jcc in the wrong
direction.
https://lore.kernel.org/all/20230811155255.250835-1-seanjc@google.com
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug 217796] latest Zen Inception fixes breaks nested kvm virtualization on AMD
2023-08-15 14:55 [Bug 217796] New: latest Zen Inception fixes breaks nested kvm virtualization on AMD bugzilla-daemon
2023-08-15 17:53 ` [Bug 217796] " bugzilla-daemon
2023-08-15 20:30 ` bugzilla-daemon
@ 2023-08-19 10:39 ` bugzilla-daemon
2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2023-08-19 10:39 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=217796
--- Comment #3 from sonst+kernel@o-oberst.de ---
Sean,
it does sound ridiculous, but it isn't. I tested the fix you suggested and it
works now with that patch applied. In the meantime i switched to a differnet
machine to be able to test your fix and there I could also confirm the problem
there on a 6.4.11 kernel:
Test machine setup:
Gentoo, (vanilla) Kernel 6.4.11
Without the patch and spec_rstack_overflow in the default meaning
pec_rstack_overflow=safe-ret also on this system my nested VMs do not start and
get OOM-killed.
I then applied the patch from your link, Sean, and it works now.
Cheers,
Oliver
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-19 10:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-15 14:55 [Bug 217796] New: latest Zen Inception fixes breaks nested kvm virtualization on AMD bugzilla-daemon
2023-08-15 17:53 ` [Bug 217796] " bugzilla-daemon
2023-08-15 20:30 ` bugzilla-daemon
2023-08-19 10:39 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox