From: Chris Friesen <chris.friesen@windriver.com>
To: <kvm@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com>,
<rkrcmar@redhat.com>
Subject: hitting a BUG in kvm kernel module during live migration stress test
Date: Fri, 7 Oct 2016 17:22:46 -0600 [thread overview]
Message-ID: <57F82E46.7030006@windriver.com> (raw)
Hi,
I'm running an x86_64 CentOS 7 kernel (3.10.0-327.28.3.7, slightly modified). We have not modified the kvm module.
This machine is acting as an OpenStack compute node, using libvirt/qemu. While doing a live-migration stress test
(where I just repeatedly live migrate instances between two compute nodes) I can reliably trigger the BUG below.
I realize this is CentOS and not mainline, but I was wondering if anyone had seen something like this and fixed it
in mainline.
Based on the Oops, the value of %rdi is 0x1af08000, which is obviously not a valid kernel pointer for x86_64.
I *think* it's choking while dereferencing "_spte_" in the second line of the for_each_rmap_spte() macro, which
is called by __rmap_clear_dirty(). However, I don't have the context to know why this would be corrupted.
Anyone have any ideas?
The initial assembly in __rmap_clear_dirty() looks like this. The line at 0xffffffffa134b0ea is the one
corresponding to __rmap_clear_dirty+0x4a. (This is from a different run, so the addresses are slightly different
from the Oops below.)
0xffffffffa134b0a0: nopl 0x0(%rax,%rax,1)
0xffffffffa134b0a5: push %rbp
0xffffffffa134b0a6: mov %rsp,%rbp
0xffffffffa134b0a9: push %rbx
0xffffffffa134b0aa: sub $0x18,%rsp
0xffffffffa134b0ae: mov %gs:0x28,%rax
0xffffffffa134b0b7: mov %rax,-0x10(%rbp)
0xffffffffa134b0bb: xor %eax,%eax
0xffffffffa134b0bd: mov (%rsi),%rdi # rdi is *rmapp
0xffffffffa134b0c0: test %rdi,%rdi # "if (!rmap)" in rmap_get_first()
0xffffffffa134b0c3: je 0xffffffffa134b158
0xffffffffa134b0c9: test $0x1,%dil # "if (!(rmap & 1))" in rmap_get_first()
0xffffffffa134b0cd: je 0xffffffffa134b178
0xffffffffa134b0d3: and $0xfffffffffffffffe,%rdi # "iter->desc = (struct pte_list_desc *)(rmap & ~1ul);" in rmap_get_first()
0xffffffffa134b0d7: movl $0x0,-0x18(%rbp)
0xffffffffa134b0de: mov %rdi,-0x20(%rbp)
0xffffffffa134b0e2: mov (%rdi),%rdi # "return iter->desc->sptes[iter->pos];" in rmap_get_first()
0xffffffffa134b0e5: test %rdi,%rdi # bail out if _spte_ is NULL?
0xffffffffa134b0e8: je 0xffffffffa134b158
0xffffffffa134b0ea: mov (%rdi),%rdx # calculating *_spte_ in the 2nd line of for_each_rmap_spte(), I think. This chokes.
0xffffffffa134b0ed: test $0x1,%dl
0xffffffffa134b0f0: je 0xffffffffa134b14f
Here's the BUG/Oops:
2016-10-06T23:57:39.264 controller-1 kernel: alert [260915.368163] BUG: unable to handle kernel paging request at 000000001af08000
2016-10-06T23:57:39.264 controller-1 kernel: alert [260915.376067] IP: [<ffffffffa12cb0ea>] __rmap_clear_dirty+0x4a/0xf0 [kvm]
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.383590] PGD 0
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.385947] Oops: 0000 [#1] PREEMPT SMP
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.390461] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iptable_raw xt_CHECKSUM xt_connmark iptable_mangle nbd ebtable_filter ebtables igb_uio(OE) uio drbd(OE) libcrc32c nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_conntrack iptable_filter xt_nat xt_comment xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack virtio_net veth nfsv3 nfs fscache 8021q garp stp mrp llc cls_u32 sch_sfq sch_htb dm_mod iTCO_wdt iTCO_vendor_support ipmi_devintf intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aes
ni_intel glue_helper lrw gf128mul ablk_helper cryptd lpc_ich mfd_core mei_me mei i2c_i801 ipmi_si ipmi_msghandler acpi_power_meter wrs_avp(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ex
t4 jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ixgbe(OE) ahci libahci libata igb i2c_algo_bit i2c_core dca i40e(OE) vxlan ip6_udp_tunnel udp_tunnel ptp pps_core
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.503652] CPU: 0 PID: 134880 Comm: qemu-kvm Tainted: G OE ------------ 3.10.0-327.28.3.7.tis.x86_64 #1
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.515605] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.527559] task: ffff880334019420 ti: ffff8803540cc000 task.ti: ffff8803540cc000
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.536018] RIP: 0010:[<ffffffffa12cb0ea>] [<ffffffffa12cb0ea>] __rmap_clear_dirty+0x4a/0xf0 [kvm]
2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.546264] RSP: 0018:ffff8803540cfbb8 EFLAGS: 00010206
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.552286] RAX: 0000000000000000 RBX: ffff8803572ac000 RCX: 0000000000000000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.560357] RDX: 0000000000000ef6 RSI: ffffc9002bab1ea0 RDI: 000000001af08000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568432] RBP: ffff8803540cfbd8 R08: 0000000000000000 R09: 0000000000000001
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568433] R10: 000000000000bbe8 R11: 0000000000000001 R12: 0000000000000000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568433] R13: ffffffffa12cb0a0 R14: ffff8803540cffd8 R15: 0000000000000000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568435] FS: 00007fb05e1ff700(0000) GS:ffff88103f200000(0000) knlGS:0000000000000000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568436] CR2: 000000001af08000 CR3: 0000000359e03000 CR4: 00000000003427f0
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568438] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568439] Stack:
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568446] 0000000000000000 ffff8803540cfbd8 000000004c224769 ffff8803572ac000
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568452] ffff8803540cfc60 ffffffffa12ca9c8 000000008108f82a ffff8803540cfd60
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568458] 0000000000000100 000000000001ffff 0000000100000001 00000000000038d4
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568459] Call Trace:
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568481] [<ffffffffa12ca9c8>] slot_handle_level_range+0x98/0x110 [kvm]
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568495] [<ffffffffa12cbc04>] kvm_mmu_slot_leaf_clear_dirty+0x54/0x80 [kvm]
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568502] [<ffffffffa14901b7>] vmx_slot_enable_log_dirty+0x17/0x30 [kvm_intel]
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568515] [<ffffffffa12c7778>] kvm_arch_commit_memory_region+0x58/0xe0 [kvm]
2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568525] [<ffffffffa12af9eb>] __kvm_set_memory_region+0x60b/0x980 [kvm]
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568536] [<ffffffffa12afd8b>] kvm_set_memory_region+0x2b/0x40 [kvm]
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568547] [<ffffffffa12b02e4>] kvm_vm_ioctl+0x544/0x760 [kvm]
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568553] [<ffffffff811b6295>] do_vfs_ioctl+0x2e5/0x4c0
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568558] [<ffffffff8104b040>] ? __do_page_fault+0x1a0/0x3a0
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568562] [<ffffffff811b6511>] SyS_ioctl+0xa1/0xc0
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568566] [<ffffffff81678f89>] system_call_fastpath+0x16/0x1b
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568588] Code: 3e 48 85 ff 0f 84 8f 00 00 00 40 f6 c7 01 0f 84 a5 00 00 00 48 83 e7 fe c7 45 e8 00 00 00 00 48 89 7d e0 48 8b 3f 48 85 ff 74 6e <48> 8b 17 f6 c2 01 74 5d 48 8b 05 37 ae 03 00 48 89 d1 31 db 48
2016-10-06T23:57:39.283 controller-1 kernel: alert [260915.568600] RIP [<ffffffffa12cb0ea>] __rmap_clear_dirty+0x4a/0xf0 [kvm]
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568601] RSP <ffff8803540cfbb8>
2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568601] CR2: 000000001af08000
2016-10-06T23:57:39.284 controller-1 kernel: warning [260915.569212] ---[ end trace 7c95db025611e651 ]---
2016-10-06T23:57:39.284 controller-1 kernel: info [260915.627598] note: qemu-kvm[134880] exited with preempt_count 1
Thanks,
Chris
reply other threads:[~2016-10-07 23:22 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57F82E46.7030006@windriver.com \
--to=chris.friesen@windriver.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.