* [Bug 196717] CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100
2017-08-21 9:18 [Bug 196717] New: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 bugzilla-daemon
@ 2017-08-22 10:20 ` bugzilla-daemon
2017-08-22 15:03 ` bugzilla-daemon
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-08-22 10:20 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=196717
Dmitrii Tcvetkov (demfloro@demfloro.ru) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |demfloro@demfloro.ru
--- Comment #1 from Dmitrii Tcvetkov (demfloro@demfloro.ru) ---
Created attachment 258043
--> https://bugzilla.kernel.org/attachment.cgi?id=258043&action=edit
demfloro's dmesg
Have same issue on linux 4.13-rc5, reproducible by starting qemu-system-x86_64
and happens during guest OS (Arch Linux) loading.
Guest hard drive is an LVM volume.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread* [Bug 196717] CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100
2017-08-21 9:18 [Bug 196717] New: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 bugzilla-daemon
2017-08-22 10:20 ` [Bug 196717] " bugzilla-daemon
@ 2017-08-22 15:03 ` bugzilla-daemon
2017-08-22 15:05 ` bugzilla-daemon
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-08-22 15:03 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=196717
Jeff Cook (jeff@jeffcook.io) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jeff@jeffcook.io
--- Comment #2 from Jeff Cook (jeff@jeffcook.io) ---
I've seen this on 4.12 and now on 4.13-rc6. On 4.12, the system-wide impact is
very significant and the guests and host slow down or stop responding all
together. On 4.13-rc6, only one guest crashes and the rest of the system seems
to continue to operate as expected.
I initially get
[68470.767034] ------------[ cut here ]------------
[68470.767064] WARNING: CPU: 30 PID: 239 at arch/x86/kvm/mmu.c:717
mmu_spte_clear_track_bits+0xf0/0x100 [kvm]
[...]
[68470.767237] CPU: 30 PID: 239 Comm: khugepaged Tainted: P O
4.13.0-rc6-g14ccee78fc82 #5
[68470.767246] Hardware name: Supermicro SYS-7038A-I/X10DAI, BIOS 2.0a
11/09/2016
[68470.767255] task: ffff88085b6bc9c0 task.stack: ffffc90006b80000
[68470.767267] RIP: 0010:mmu_spte_clear_track_bits+0xf0/0x100 [kvm]
[68470.767271] RSP: 0018:ffffc90006b83bd8 EFLAGS: 00010246
[68470.767275] RAX: 0000000000000000 RBX: 0000000117f13f77 RCX:
dead0000000000ff
[68470.767279] RDX: 0000000000000000 RSI: ffff8802caff9140 RDI:
ffffea00045fc4c0
[68470.767283] RBP: ffffc90006b83bf0 R08: 0000000000000001 R09:
0000000000000000
[68470.767287] R10: ffff8803ed9d0008 R11: ffff8803ed9d0000 R12:
0000000000117f13
[68470.767291] R13: ffff880401f10000 R14: ffffffffa07a91f0 R15:
ffff8803ed9d0008
[68470.767296] FS: 0000000000000000(0000) GS:ffff88105d580000(0000)
knlGS:0000000000000000
[68470.767302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68470.767306] CR2: 000000000d070004 CR3: 0000000001a09000 CR4:
00000000003426e0
[68470.767310] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[68470.767314] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[68470.767318] Call Trace:
[68470.767332] drop_spte+0x1a/0xb0 [kvm]
[68470.767342] kvm_zap_rmapp+0x3b/0x70 [kvm]
[68470.767352] kvm_unmap_rmapp+0xe/0x20 [kvm]
[68470.767361] kvm_handle_hva_range+0x139/0x1b0 [kvm]
[68470.767373] kvm_unmap_hva_range+0x17/0x20 [kvm]
[68470.767382] kvm_mmu_notifier_invalidate_range_start+0x52/0x90 [kvm]
[68470.767389] __mmu_notifier_invalidate_range_start+0x55/0x80
[68470.767395] khugepaged+0x1eb7/0x1ee0
[68470.767403] ? wait_woken+0x80/0x80
[68470.767408] kthread+0x125/0x140
[68470.767413] ? khugepaged_scan_abort.part.6+0x60/0x60
[68470.767417] ? kthread_create_on_node+0x70/0x70
[68470.767423] ret_from_fork+0x25/0x30
[68470.767427] Code: 5f 04 00 48 85 c0 75 1c 4c 89 e7 e8 9b 2d fe ff 48 8b 05
d4 5f 04 00 48 85 c0 74 be 48 85 c3 0f 95 c3 eb bc 48 85 c3 74 e7 eb dd <0f> ff
eb 9b 4c 89 e7 e8 74 2d fe ff eb a1 66 90 0f 1f 44 00 00
[68470.767463] ---[ end trace 249e3dbfe7765567 ]---
[68470.767478] ------------[ cut here ]------------
Followed by many messages like this:
68627.864783] BUG: Bad page state in process khugepaged pfn:126aa9
[68627.864795] page:ffffea00049aaa40 count:0 mapcount:0 mapping:
(null) index:0x1
[68627.864802] flags: 0x17fff0000000014(referenced|dirty)
[68627.864808] raw: 017fff0000000014 0000000000000000 0000000000000001
00000000ffffffff
[68627.864814] raw: dead000000000100 dead000000000200 0000000000000000
0000000000000000
[68627.864819] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[68627.864823] bad because of flags: 0x14(referenced|dirty)
Full dmesg on rc6 forthcoming.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread* [Bug 196717] CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100
2017-08-21 9:18 [Bug 196717] New: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 bugzilla-daemon
2017-08-22 10:20 ` [Bug 196717] " bugzilla-daemon
2017-08-22 15:03 ` bugzilla-daemon
@ 2017-08-22 15:05 ` bugzilla-daemon
2017-08-22 15:09 ` bugzilla-daemon
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-08-22 15:05 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=196717
--- Comment #3 from Jeff Cook (jeff@jeffcook.io) ---
Also, see potentially related thread at https://lkml.org/lkml/2016/12/13/667.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread* [Bug 196717] CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100
2017-08-21 9:18 [Bug 196717] New: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 bugzilla-daemon
` (2 preceding siblings ...)
2017-08-22 15:05 ` bugzilla-daemon
@ 2017-08-22 15:09 ` bugzilla-daemon
2017-08-22 15:14 ` bugzilla-daemon
2017-08-23 1:39 ` bugzilla-daemon
5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-08-22 15:09 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=196717
--- Comment #4 from Jeff Cook (jeff@jeffcook.io) ---
Created attachment 258053
--> https://bugzilla.kernel.org/attachment.cgi?id=258053&action=edit
full dmesg from boot
dmesg from boot onward, demonstrates this error condition. I've run into this
multiple times on 4.12.3, 4.12.5, 4.13-rc5+git and 4.13-rc6. It does not seem
to occur on 4.11. On 4.13-rc5+git and 4.13-rc6, system impact is greatly
reduced (one QEMU guest remains hung/broken, instead of entire system).
I am using PCI passthrough, etc., as the dmesg shows.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread* [Bug 196717] CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100
2017-08-21 9:18 [Bug 196717] New: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 bugzilla-daemon
` (3 preceding siblings ...)
2017-08-22 15:09 ` bugzilla-daemon
@ 2017-08-22 15:14 ` bugzilla-daemon
2017-08-23 1:39 ` bugzilla-daemon
5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-08-22 15:14 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=196717
--- Comment #5 from Jeff Cook (jeff@jeffcook.io) ---
More similarities. 1. this seems to affect my Arch Linux guest, as the OP also
noted. Ubuntu 16.04 and Windows guests remain unharmed on 4.13-rc6. It does not
occur on boot for me, but may involve some interaction between KVM and later
kernels.
and 2. I am hosting my VM out of an LVM thin volume.
Sorry to spam the bug.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread* [Bug 196717] CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100
2017-08-21 9:18 [Bug 196717] New: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 bugzilla-daemon
` (4 preceding siblings ...)
2017-08-22 15:14 ` bugzilla-daemon
@ 2017-08-23 1:39 ` bugzilla-daemon
5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-08-23 1:39 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=196717
--- Comment #6 from Jeff Cook (jeff@jeffcook.io) ---
After a little bit of digging, it seems that the patchset "KVM: nVMX: nested
EPT improvements and A/D bits, RDRAND and RDSEED exits" (described at
https://lkml.org/lkml/2017/3/8/586) is the likely origin point of this error.
This was first merged in 4.12, which is when I first started encountering this
error.
Something about the "accessed_dirty" flag causes the warning on line 717 of
arch/x86/kvm/mmu.c to trigger:
WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
and then later, the system refuses to allocate the memory requested because
when checking if the page is safe to use, it encounters a dirty flag:
[94449.442437] BUG: Bad page state in process makepkg pfn:2a401a
[94449.442447] flags: 0x17fff0000000014(referenced|dirty)
[...]
[94449.442462] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[94449.442465] bad because of flags: 0x14(referenced|dirty)
[...]
[94449.442615] Call Trace:
[...]
[94449.442907] bad_page+0xce/0x130
[94449.442912] check_new_page_bad+0x67/0x80
[94449.442916] get_page_from_freelist+0x979/0xad0
[...]
As the diff for kvm/mmu.c between 4.11 and 4.12 is quite small, it seems likely
that this change is the origin point of the bug.
Perhaps something in along the way has not been updated to account for this
flag? I would revert to test, but it appears that this patchset has grown
several dependents. If someone wants to suggest a series of commits to revert
or a patch to test, I am happy to try that.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread