From: bugzilla-daemon@bugzilla.kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 207389] New: Regression in nested SVM from 5.7-rc1, starting L2 guest locks up L1
Date: Tue, 21 Apr 2020 13:23:33 +0000 [thread overview]
Message-ID: <bug-207389-28872@https.bugzilla.kernel.org/> (raw)
https://bugzilla.kernel.org/show_bug.cgi?id=207389
Bug ID: 207389
Summary: Regression in nested SVM from 5.7-rc1, starting L2
guest locks up L1
Product: Virtualization
Version: unspecified
Kernel Version: 5.7-rc1
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: kvm
Assignee: virtualization_kvm@kernel-bugs.osdl.org
Reporter: s.reiter@proxmox.com
Regression: No
With 5.7-rc1 (and subsequent, also tested on 5.7-rc2) starting an L2 guest from
an L1 (w/ kernel 5.4.30) almost immediately hangs the L1 with CPU lockups
(example below, stacktraces in the guest differ everytime, but always seem to
end at load_new_mm_cr3 or some tlb flushing). Host sees nothing in logs,
killing the VM and restarting it works.
I've successfully bisected the bug to this commit:
b518ba9fa691 KVM: nSVM: implement check_nested_events for interrupts
Consistently reproducible on my system (AMD 3960X), L0/L1 as described above,
L2 doesn't matter, it mostly already crashes in SeaBIOS.
# grep -H '' /sys/module/kvm/parameters/*
/sys/module/kvm/parameters/enable_vmware_backdoor:N
/sys/module/kvm/parameters/force_emulation_prefix:N
/sys/module/kvm/parameters/halt_poll_ns:0
/sys/module/kvm/parameters/halt_poll_ns_grow:2
/sys/module/kvm/parameters/halt_poll_ns_grow_start:10000
/sys/module/kvm/parameters/halt_poll_ns_shrink:0
/sys/module/kvm/parameters/ignore_msrs:Y
/sys/module/kvm/parameters/kvmclock_periodic_sync:Y
/sys/module/kvm/parameters/lapic_timer_advance_ns:-1
/sys/module/kvm/parameters/min_timer_period_us:200
/sys/module/kvm/parameters/mmu_audit:N
/sys/module/kvm/parameters/nx_huge_pages:N
/sys/module/kvm/parameters/nx_huge_pages_recovery_ratio:60
/sys/module/kvm/parameters/pi_inject_timer:1
/sys/module/kvm/parameters/report_ignored_msrs:N
/sys/module/kvm/parameters/tsc_tolerance_ppm:250
/sys/module/kvm/parameters/vector_hashing:Y
# grep -H '' /sys/module/kvm_amd/parameters/*
/sys/module/kvm_amd/parameters/avic:1 # (Note: Disabling AVIC doesn't change
the outcome, I tried this after seeing the commit was interrupt related)
/sys/module/kvm_amd/parameters/dump_invalid_vmcb:N
/sys/module/kvm_amd/parameters/nested:1
/sys/module/kvm_amd/parameters/npt:1
/sys/module/kvm_amd/parameters/nrips:1
/sys/module/kvm_amd/parameters/pause_filter_count:3000
/sys/module/kvm_amd/parameters/pause_filter_count_grow:2
/sys/module/kvm_amd/parameters/pause_filter_count_max:65535
/sys/module/kvm_amd/parameters/pause_filter_count_shrink:0
/sys/module/kvm_amd/parameters/pause_filter_thresh:128
/sys/module/kvm_amd/parameters/sev:0
/sys/module/kvm_amd/parameters/vgif:1
/sys/module/kvm_amd/parameters/vls:1
Log example from L1:
[ 228.112063] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [kvm:4676]
[ 228.112190] CPU: 3 PID: 1811 Comm: kworker/3:5 Tainted: P OE
5.4.30-1-pve #1
[ 228.112771] Modules linked in: veth(E) ceph(E) libceph(E) fscache(E)
ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) iptable_raw(E)
ip6table_filter(E) ip6_tables(E) sctp(E) iptable_filter(E) bpfilter(E)
softdog(E) nfnetlink_log(E) nfnetlink(E) kvm_amd(E) ccp(E) kvm(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) qxl(E)
aesni_intel(E) ttm(E) crypto_simd(E) cryptd(E) drm_kms_helper(E) input_leds(E)
glue_helper(E) joydev(E) pcspkr(E) serio_raw(E) drm(E) fb_sys_fops(E)
syscopyarea(E) sysfillrect(E) sysimgblt(E) qemu_fw_cfg(E) mac_hid(E)
vhost_net(E) vhost(E) tap(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E)
sunrpc(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
virtio_rng(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) zunicode(POE)
zlua(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) btrfs(E) xor(E)
zstd_compress(E) raid6_pq(E) libcrc32c(E) hid_generic(E) usbhid(E) hid(E)
ahci(E) libahci(E) psmouse(E) i2c_i801(E) virtio_net(E)
[ 228.112801] net_failover(E)
[ 228.113333] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0
02/06/2015
[ 228.117425] lpc_ich(E) virtio_scsi(E) failover(E)
[ 228.118061] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
[ 228.118686] CPU: 6 PID: 4676 Comm: kvm Tainted: P OE
5.4.30-1-pve #1
[ 228.119321] RIP: 0010:smp_call_function_many+0x200/0x270
[ 228.119942] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0
02/06/2015
[ 228.119947] RIP: 0010:smp_call_function_single+0xe4/0x110
[ 228.120583] Code: c7 e8 b4 aa 98 00 3b 05 02 16 70 01 0f 83 9a fe ff ff 48
63 d0 48 8b 0b 48 03 0c d5 20 69 05 b2 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18
83 e2 01 75 f6 eb c7 48 c7 c2 a0 ca 44 b2 4c 89 fe 89 df
[ 228.121209] Code: 8b 4c 24 38 65 48 33 0c 25 28 00 00 00 75 42 c9 c3 48 89
d1 48 89 f2 48 89 e6 e8 57 fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 <8b> 54 24
18 83 e2 01 75 f5 eb ca 8b 05 db 67 a4 01 85 c0 0f 85 72
[ 228.121845] RSP: 0018:ffffbb7d4e6ebca0 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[ 228.122563] RSP: 0018:ffffbb7d43977ba0 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[ 228.123792] RAX: 0000000000000000 RBX: ffff96806faebcc0 RCX:
ffff96806fa320a0
[ 228.123793] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
ffff96806dc29df8
[ 228.125042] RAX: 0000000000000000 RBX: 0000000000000006 RCX:
0000000000000830
[ 228.125718] RBP: ffffbb7d4e6ebcd8 R08: 0000000000000006 R09:
000000000000003b
[ 228.125719] R10: 0000000000000000 R11: 0000000000000008 R12:
ffffffffb0c87700
[ 228.126383] RDX: 0000000000000001 RSI: 00000000000000fb RDI:
0000000000000830
[ 228.126384] RBP: ffffbb7d43977bf0 R08: 0000000000000000 R09:
ffff96806dc07b80
[ 228.127057] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000008
[ 228.127060] FS: 0000000000000000(0000) GS:ffff96806fac0000(0000)
knlGS:0000000000000000
[ 228.127732] R10: ffff96806d157e90 R11: 0000000000000000 R12:
ffffffffb0c87ef0
[ 228.128401] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 228.129074] R13: ffff96806fbaa680 R14: 0000000000000001 R15:
ffff96806d157e90
[ 228.129748] CR2: 0000558be4e47000 CR3: 000000044ac44000 CR4:
0000000000340ee0
[ 228.129751] Call Trace:
[ 228.130428] FS: 00007fb5a05fe700(0000) GS:ffff96806fb80000(0000)
knlGS:0000000000000000
[ 228.131102] ? load_new_mm_cr3+0xe0/0xe0
[ 228.131773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 228.131774] CR2: 00005575d4137378 CR3: 00000003e6056000 CR4:
0000000000340ee0
[ 228.132459] on_each_cpu+0x2d/0x60
[ 228.133155] Call Trace:
[ 228.133849] flush_tlb_kernel_range+0x38/0x90
[ 228.134553] ? flush_tlb_func_common.constprop.9+0x230/0x230
[ 228.135249] __purge_vmap_area_lazy+0x7c/0x6d0
[ 228.135939] ? cpumask_next_and+0x1e/0x20
[ 228.136614] ? vunmap_page_range+0x208/0x390
[ 228.136615] free_vmap_area_noflush+0xe1/0xf0
[ 228.137283] smp_call_function_many+0x235/0x270
[ 228.137950] remove_vm_area+0x95/0xa0
[ 228.138614] ? flush_tlb_func_common.constprop.9+0x230/0x230
[ 228.139255] iounmap+0x86/0xb0
[ 228.139259] ttm_bo_kunmap+0x4a/0xd0 [ttm]
[ 228.139895] on_each_cpu_mask+0x28/0x70
[ 228.140522] qxl_bo_kunmap+0x31/0x40 [qxl]
[ 228.141141] ? x86_configure_nx+0x50/0x50
[ 228.141752] qxl_gem_prime_vunmap+0xe/0x10 [qxl]
[ 228.142359] on_each_cpu_cond_mask+0xab/0x140
[ 228.142963] drm_gem_vunmap+0x43/0x50 [drm]
[ 228.143538] ? flush_tlb_func_common.constprop.9+0x230/0x230
[ 228.144115] drm_client_buffer_vunmap+0x1a/0x30 [drm]
[ 228.144671] native_flush_tlb_others+0x62/0x180
[ 228.145223] drm_fb_helper_dirty_work+0x17e/0x190 [drm_kms_helper]
[ 228.145760] kvm_flush_tlb_others+0x8c/0xa0
[ 228.146287] process_one_work+0x20f/0x3d0
[ 228.146794] flush_tlb_mm_range+0xb1/0xe0
[ 228.147295] worker_thread+0x34/0x400
[ 228.147789] tlb_flush_mmu+0xb7/0x140
[ 228.148274] kthread+0x120/0x140
[ 228.148755] tlb_finish_mmu+0x41/0x80
[ 228.149231] ? process_one_work+0x3d0/0x3d0
[ 228.149232] ? kthread_park+0x90/0x90
[ 228.149713] zap_page_range+0x154/0x180
[ 228.150189] ret_from_fork+0x22/0x40
[ 228.150653] ? find_vma+0x1b/0x70
[ 228.158814] __do_sys_madvise+0x82c/0xaa0
[ 228.159410] ? __schedule+0x2ee/0x6f0
[ 228.159999] __x64_sys_madvise+0x1a/0x20
[ 228.160581] ? __x64_sys_madvise+0x1a/0x20
[ 228.161167] do_syscall_64+0x57/0x190
[ 228.161731] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 228.162294] RIP: 0033:0x7fb62e6c32d7
[ 228.162853] Code: ff ff ff ff c3 48 8b 15 b7 6b 0c 00 f7 d8 64 89 02 b8 ff
ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 90 b8 1c 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 89 6b 0c 00 f7 d8 64 89 01 48
[ 228.164049] RSP: 002b:00007fb5a05f8688 EFLAGS: 00000246 ORIG_RAX:
000000000000001c
[ 228.164654] RAX: ffffffffffffffda RBX: 00007fb623665b4c RCX:
00007fb62e6c32d7
[ 228.165254] RDX: 0000000000000004 RSI: 0000000000001000 RDI:
00007fb5d748a000
[ 228.165849] RBP: 0000000000001000 R08: 00007fb5a05faa40 R09:
0000000000001000
[ 228.166437] R10: 0000000000001000 R11: 0000000000000246 R12:
00000000ffffffff
[ 228.167025] R13: 00007fb5d748a000 R14: 0000000035c8a000 R15:
00000000ffffffff
--
You are receiving this mail because:
You are watching the assignee of the bug.
next reply other threads:[~2020-04-21 13:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-21 13:23 bugzilla-daemon [this message]
2020-06-29 5:48 ` [Bug 207389] Regression in nested SVM from 5.7-rc1, starting L2 guest locks up L1 bugzilla-daemon
2020-06-29 6:17 ` bugzilla-daemon
2020-06-30 13:53 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-207389-28872@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@bugzilla.kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox