* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
@ 2020-09-09 6:00 ` bugzilla-daemon
2020-09-09 6:41 ` bugzilla-daemon
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-09 6:00 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
Wanpeng Li (wanpeng.li@hotmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wanpeng.li@hotmail.com
--- Comment #1 from Wanpeng Li (wanpeng.li@hotmail.com) ---
It is appreciated if you can bisect.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
2020-09-09 6:00 ` [Bug 209079] " bugzilla-daemon
@ 2020-09-09 6:41 ` bugzilla-daemon
2020-09-10 15:33 ` bugzilla-daemon
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-09 6:41 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
Sean Christopherson (sean.j.christopherson@intel.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sean.j.christopherson@intel
| |.com
--- Comment #2 from Sean Christopherson (sean.j.christopherson@intel.com) ---
Are you disabling NPT (via KVM module param)? You're obviously running a
64-bit kernel, and presumably that CPU supports NPT, so the only way KVM should
reach the failing allocation is if NPT is being explicitly disabled. There's
nothing wrong with using shadow paging, it's just uncommon these days.
NPT aside, the interesting part of the failing allocation is that it uses
GFP_DMA32. I did a quick test to force that allocation on my system and
nothing exploded. Odds are good the bug is outside of KVM, which means a
bisection is probably necessary.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
2020-09-09 6:00 ` [Bug 209079] " bugzilla-daemon
2020-09-09 6:41 ` bugzilla-daemon
@ 2020-09-10 15:33 ` bugzilla-daemon
2020-09-10 16:21 ` bugzilla-daemon
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-10 15:33 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
Martin Schrodt (kernel@martin.schrodt.org) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |OBSOLETE
--- Comment #3 from Martin Schrodt (kernel@martin.schrodt.org) ---
Damn.
I did some changes to the VM in the last few days, to make it support AVIC and
that made me change the kvm module parameters, without remembering what they
were before. They are now
> options kvm ignore_msrs=1 report_ignored_msrs=0
> options kvm_amd nested=0 avic=1 npt=1
and Seans post mentioning NPT having to be disabled for the bug to occur, I
updated the kernel again (to 5.8.7), and voilà, the VM works.
So I have to concur that it really was disabled before, but I can't remember
why I did so, maybe because of some bug that only existed when I setup the VM
somewhen in 2018.
Regarding GFP_DMA32, I don't know what it really means. Might be related to me
passing through a GPU, an NVME drive and a USB controller to the VM.
So I guess I'll leave learning how to bisect to my next future incident...
Thank you guys for all the work you do - Linux forever!
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
` (2 preceding siblings ...)
2020-09-10 15:33 ` bugzilla-daemon
@ 2020-09-10 16:21 ` bugzilla-daemon
2020-09-10 21:03 ` bugzilla-daemon
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-10 16:21 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
--- Comment #4 from Sean Christopherson (sean.j.christopherson@intel.com) ---
GFP_DMA32 is a flag that forces a memory allocation to use physical memory that
is 32-bit addressable, i.e. below the 4g boundary. Using GFP_DMA32 is
relatively uncommon, e.g. KVM uses that flag if and only if KVM is using or
shadowing 32-bit PAE paging. The latter case (shadowing) is what is triggered
if NPT is disabled.
Can you try trying running with "kvm_amd nested=0 avic=1 npt=0" and/or "kvm_amd
nested=0 npt=0" on v5.8.7? I'd like to at least confirm that whatever was
breaking your setup was fixed between v5.8.0 and v5.8.7, even if we don't
bisect to identify exactly what patch fixed the bug.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
` (3 preceding siblings ...)
2020-09-10 16:21 ` bugzilla-daemon
@ 2020-09-10 21:03 ` bugzilla-daemon
2020-09-11 16:19 ` bugzilla-daemon
2020-09-20 9:17 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-10 21:03 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
--- Comment #5 from Martin Schrodt (kernel@martin.schrodt.org) ---
Strange things happen sometimes...
What I did (I did only unload/reload the module after config changes, hoping
this would suffice):
- running with "kvm_amd nested=0 avic=1 npt=0" and "kvm_amd nested=0 npt=0" on
5.8.7, all working fine.
- rolling back to the 5.8.5 kernel I had the bug with, and trying the above
combinations -> working fine
- rolling the VM back to a state before changing it to AVIC (reasonably sure
it's the same) -> working fine, on both 5.8.7 and 5.8.5.
Heisenbugs here they come.
Trying to come up with things that I changed since then but did not roll back
yet:
I have a qemu hook, which did the following:
1) drop caches,
2) compact memory
3) create a cpuset for the host and move all tasks there to free the cores
assigned to the VM (which included a flag for memory migration, so that the
processes would have their memory moved to the non VM node)
4) then let qemu allocate memory
Since then I changed this to move the compacting step after the moving step (my
thought was that *after* moving the memory from node 1 to node 0, there is more
free space on node 1, compaction should yield better results)
Does the error I initially got say anything about *why* the allocation failed?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
` (4 preceding siblings ...)
2020-09-10 21:03 ` bugzilla-daemon
@ 2020-09-11 16:19 ` bugzilla-daemon
2020-09-20 9:17 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-11 16:19 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
--- Comment #6 from Sean Christopherson (sean.j.christopherson@intel.com) ---
Nope, the failure path is common so we can't even glean anything from the
offsets in the stack trace.
In your data dump, both nodes show 10gb+ of free memory so there's plenty of
space for the measly 4kb that KVM is trying to allocate. My best guess is that
the combination of nodemask/cpuset stuff resulted in a set of constraints that
were impossible to satisfy.
At this point, I'd say just chalk it up to a bad configuration unless you want
to pursue this further. If there's a kernel bug lurking then odds are someone
will run into again.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 209079] CPU 0/KVM: page allocation failure on 5.8 kernel
2020-08-30 15:22 [Bug 209079] New: CPU 0/KVM: page allocation failure on 5.8 kernel bugzilla-daemon
` (5 preceding siblings ...)
2020-09-11 16:19 ` bugzilla-daemon
@ 2020-09-20 9:17 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-09-20 9:17 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209079
--- Comment #7 from Martin Schrodt (kernel@martin.schrodt.org) ---
Fully agree. Thanks for your assistance!
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread