[Bug 198205] New: infinite loop in vmx_vcpu_run on x86

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 198205] New: infinite loop in vmx_vcpu_run on x86_64
@ 2017-12-19 18:37 bugzilla-daemon
  2017-12-19 20:05 ` [Bug 198205] " bugzilla-daemon
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2017-12-19 18:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=198205

            Bug ID: 198205
           Summary: infinite loop in vmx_vcpu_run on x86_64
           Product: Virtualization
           Version: unspecified
    Kernel Version: 4.14.6
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: aeden@csail.mit.edu
        Regression: No

When I try to ssh into VM, a thread in QEMU 2.10.1 blocks on

ioctl(cpu->kvm_fd, KVM_RUN, arg);

and I can see that a kernel thread is running at 100% constantly in htop(1).
The ssh command never succeeds, and all of my other ssh sessions in the VM
become unresponsive. The only thing I can do is reboot the VM via libvirt.

So far I've only seen this happen twice, while running a VM on this kernel
version (4.14.6-1-ARCH) everyday. This time, I got a kernel backtrace via

echo l > /proc/sysrq-trigger

To me, kvm_intel doesn't look like it's making progress. Here's the stacktrace
taken from dmesg(1):

[96492.268071] sysrq: SysRq : Show backtrace of all active CPUs
[96492.268072] NMI backtrace for cpu 3
[96492.268073] CPU: 3 PID: 26664 Comm: bash Tainted: G         C     
4.14.6-1-ARCH #1
[96492.268074] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming
7/Z170X-Gaming 7, BIOS F22f 06/28/2017
[96492.268074] Call Trace:
[96492.268084]  dump_stack+0x5c/0x85
[96492.268085]  nmi_cpu_backtrace+0xbf/0xd0
[96492.268086]  ? irq_force_complete_move+0x100/0x100
[96492.268087]  nmi_trigger_cpumask_backtrace+0xea/0x130
[96492.268089]  ? __handle_sysrq+0x83/0x140
[96492.268089]  __handle_sysrq+0x83/0x140
[96492.268091]  write_sysrq_trigger+0x2b/0x30
[96492.268092]  proc_reg_write+0x3d/0x60
[96492.268093]  __vfs_write+0x33/0x170
[96492.268095]  ? set_close_on_exec+0x30/0x70
[96492.268096]  vfs_write+0xad/0x1a0
[96492.268096]  SyS_write+0x52/0xc0
[96492.268098]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[96492.268099] RIP: 0033:0x7fb7f123dab4
[96492.268099] RSP: 002b:00007fffe871cb18 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[96492.268100] RAX: ffffffffffffffda RBX: 0000000000000134 RCX:
00007fb7f123dab4
[96492.268100] RDX: 0000000000000002 RSI: 0000000002531260 RDI:
0000000000000001
[96492.268100] RBP: 00000000000000c8 R08: 000000000000000a R09:
000000000255ecb0
[96492.268101] R10: 000000000000000a R11: 0000000000000246 R12:
0000000000000000
[96492.268101] R13: 00007fffe871bf70 R14: 00007fffe871c100 R15:
00007fffe871bf70
[96492.268102] Sending NMI from CPU 3 to CPUs 0-2:
[96492.268105] NMI backtrace for cpu 0
[96492.268106] CPU: 0 PID: 1457 Comm: CPU 1/KVM Tainted: G         C     
4.14.6-1-ARCH #1
[96492.268106] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming
7/Z170X-Gaming 7, BIOS F22f 06/28/2017
[96492.268106] task: ffff984898c06ac0 task.stack: ffffb61b82734000
[96492.268109] RIP: 0010:vmx_complete_atomic_exit+0x77/0xd0 [kvm_intel]
[96492.268110] RSP: 0018:ffffb61b82737cf0 EFLAGS: 00000046
[96492.268110] RAX: 0000000080000202 RBX: 0000000000000000 RCX:
ffff98493b9a8000
[96492.268111] RDX: 0000000000004404 RSI: 0000000000000001 RDI:
ffff98493b9a8000
[96492.268111] RBP: ffff98493b9a8000 R08: 0000000000000000 R09:
0000000000000202
[96492.268111] R10: ffffffff9ad171b0 R11: ffffffff9b13c36d R12:
0000000080000200
[96492.268111] R13: 0000000080000202 R14: 0000000000000050 R15:
0000000000000000
[96492.268112] FS:  00007f18a3f63700(0000) GS:ffff98494ec00000(0000)
knlGS:0000000000000000
[96492.268112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96492.268112] CR2: 000000000001d001 CR3: 00000003f7c5f003 CR4:
00000000003626f0
[96492.268113] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[96492.268113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[96492.268113] Call Trace:
[96492.268116]  vmx_vcpu_run+0x305/0x4a0 [kvm_intel]
[96492.268131]  ? kvm_arch_vcpu_ioctl_run+0x538/0x1630 [kvm]
[96492.268136]  ? kvm_arch_vcpu_load+0x69/0x230 [kvm]
[96492.268142]  ? load_fixmap_gdt+0x30/0x40
[96492.268143]  ? __vmx_load_host_state.part.84+0x120/0x1f0 [kvm_intel]
[96492.268148]  ? kvm_arch_vcpu_load+0x84/0x230 [kvm]
[96492.268152]  ? kvm_vcpu_ioctl+0x27e/0x5e0 [kvm]
[96492.268156]  ? kvm_vcpu_ioctl+0x27e/0x5e0 [kvm]
[96492.268157]  ? do_futex+0x429/0xa90
[96492.268158]  ? do_vfs_ioctl+0xa1/0x610
[96492.268159]  ? __fget+0x49/0xb0
[96492.268160]  ? SyS_ioctl+0x74/0x80
[96492.268161]  ? exit_to_usermode_loop+0x94/0xa0
[96492.268162]  ? entry_SYSCALL_64_fastpath+0x1a/0xa5
[96492.268162] Code: 74 66 41 81 e4 00 07 00 80 66 83 fb 29 74 45 41 81 fd 12
03 00 80 74 3c 41 81 fc 00 02 00 80 75 af 48 89 ef e8 bb a7 d0 ff cd 02 <5b> 48
89 ef 5d 41 5c 41 5d e9 bb a7 d0 ff 66 83 fb 29 c7 87 8c 
[96492.268170] NMI backtrace for cpu 1
[96492.268171] CPU: 1 PID: 14521 Comm: Xorg Tainted: G         C     
4.14.6-1-ARCH #1
[96492.268171] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming
7/Z170X-Gaming 7, BIOS F22f 06/28/2017
[96492.268172] task: ffff98470c54db80 task.stack: ffffb61b81908000
[96492.268186] RIP: 0010:fw_domains_get+0x11e/0x1d0 [i915]
[96492.268187] RSP: 0018:ffff98494ec83ec0 EFLAGS: 00000086
[96492.268187] RAX: 000057c25b5d24fb RBX: ffff98493bc88000 RCX:
000000000000001f
[96492.268187] RDX: 0000000000000000 RSI: fffffffa093f71b2 RDI:
000000001fefa611
[96492.268188] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000001
[96492.268188] R10: ffff98494ec83ea0 R11: ffffb61b8190baf8 R12:
ffff98493bc88728
[96492.268188] R13: 000057c25b5bf754 R14: 0000000000000000 R15:
00000000ffffffff
[96492.268189] FS:  00007fc598d14940(0000) GS:ffff98494ec80000(0000)
knlGS:0000000000000000
[96492.268189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96492.268189] CR2: 00007f5ea7ef93b8 CR3: 000000042d4c8002 CR4:
00000000003626e0
[96492.268190] Call Trace:
[96492.268191]  <IRQ>
[96492.268201]  intel_uncore_forcewake_get.part.8+0x41/0xa0 [i915]
[96492.268210]  intel_lrc_irq_handler+0x2c/0x370 [i915]
[96492.268211]  ? hrtimer_wakeup+0x1e/0x30
[96492.268212]  ? ktime_get+0x3b/0x90
[96492.268213]  tasklet_hi_action+0x59/0x110
[96492.268214]  __do_softirq+0xd9/0x2da
[96492.268215]  do_softirq_own_stack+0x2a/0x40
[96492.268216]  </IRQ>
[96492.268216]  do_softirq.part.17+0x49/0x50
[96492.268217]  __local_bh_enable_ip+0x67/0x70
[96492.268225]  i915_gem_do_execbuffer+0x794/0x10e0 [i915]
[96492.268227]  ? ___slab_alloc+0xf3/0x4b0
[96492.268228]  ? ___slab_alloc+0xf3/0x4b0
[96492.268228]  ? kmem_cache_free+0x1d1/0x1f0
[96492.268235]  ? i915_gem_execbuffer2+0x5d/0x390 [i915]
[96492.268242]  ? i915_gem_execbuffer2+0x5d/0x390 [i915]
[96492.268249]  i915_gem_execbuffer2+0x1b7/0x390 [i915]
[96492.268255]  ? i915_gem_execbuffer+0x2d0/0x2d0 [i915]
[96492.268265]  drm_ioctl_kernel+0x59/0xb0 [drm]
[96492.268269]  drm_ioctl+0x2d5/0x370 [drm]
[96492.268276]  ? i915_gem_execbuffer+0x2d0/0x2d0 [i915]
[96492.268276]  ? hrtimer_start_range_ns+0x1b3/0x330
[96492.268278]  do_vfs_ioctl+0xa1/0x610
[96492.268279]  ? __sys_recvmsg+0x4e/0x90
[96492.268279]  ? __sys_recvmsg+0x7d/0x90
[96492.268280]  SyS_ioctl+0x74/0x80
[96492.268281]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[96492.268281] RIP: 0033:0x7fc596b11337
[96492.268282] RSP: 002b:00007fff81f349e8 EFLAGS: 00003246 ORIG_RAX:
0000000000000010
[96492.268282] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fc596b11337
[96492.268282] RDX: 00007fff81f34a20 RSI: 0000000040406469 RDI:
000000000000000d
[96492.268283] RBP: 000055c3b68ed610 R08: 0000000000000000 R09:
00007fc598ddeee0
[96492.268283] R10: 0000000000003fb0 R11: 0000000000003246 R12:
000055c3b62eab10
[96492.268283] R13: 000055c3b65806c0 R14: 000000000000001d R15:
0000000000000000
[96492.268284] Code: 89 c5 eb 0d 4c 29 e8 48 3d 7f f0 fa 02 77 77 f3 90 65 8b
3d 7d 74 51 3f e8 40 98 5b df 41 8b 54 24 54 48 03 93 90 06 00 00 8b 12 <83> e2
01 74 d4 45 85 f6 75 95 8b 44 24 04 09 83 18 07 00 00 48 
[96492.268292] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffa0678b05


Here's the QEMU stacktrace after attaching with gdb:
#0  0x00007f18b08f0337 in ioctl () at /usr/lib/libc.so.6
#1  0x000055f01bb9589b in kvm_vcpu_ioctl ()
#2  0x000055f01bb959e2 in kvm_cpu_exec ()
#3  0x000055f01bb72fa5 in  ()
#4  0x00007f18b0bc208a in start_thread () at /usr/lib/libpthread.so.0
#5  0x00007f18b08f942f in clone () at /usr/lib/libc.so.6

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 198205] infinite loop in vmx_vcpu_run on x86_64
  2017-12-19 18:37 [Bug 198205] New: infinite loop in vmx_vcpu_run on x86_64 bugzilla-daemon
@ 2017-12-19 20:05 ` bugzilla-daemon
  2018-01-07 13:54 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2017-12-19 20:05 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=198205

--- Comment #1 from Anthony L. Eden (aeden@csail.mit.edu) ---
errata in my earlier description: htop(1) showed a QEMU thread using 100% CPU
continuously.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 198205] infinite loop in vmx_vcpu_run on x86_64
  2017-12-19 18:37 [Bug 198205] New: infinite loop in vmx_vcpu_run on x86_64 bugzilla-daemon
  2017-12-19 20:05 ` [Bug 198205] " bugzilla-daemon
@ 2018-01-07 13:54 ` bugzilla-daemon
  2018-01-07 14:13 ` bugzilla-daemon
  2018-01-11 21:16 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2018-01-07 13:54 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=198205

--- Comment #2 from Mark Mielke (mark.mielke@gmail.com) ---
I believe I have the same symptoms as you.

I run 4 virtual machines on a desktop Intel i7-4790K (Haswell) that acts as my
server at home. After upgrading from Fedora 26 to Fedora 27, I started to
encounter symptoms like:

1) When the host was under load, and the otherwise idle guests were woken up
(SSH, yes that has been a trigger for me too, but sometimes just virt-top and
the act of observing seemed to do it?)

2) Often when one was affected, one or more of the other three would follow in
succession, usually when they are woken up as well.

3) The guests get locked up solid. I couldn't get a kernel backtrace in the
guests and I don't know how to send a sysrq key through virt-manager or
virt-viewer to confirm.

I think I can make it happen easily within a minutes on Linux 4.13, Linux
4.14., and Linux 4.15rc6 (just tried 4.15rc6 last night, and it still happens).
I was not able to get a kernel backtrace. I also tried QEMU 2.9.1, QEMU 2.10.1,
and QEMU 2.11.0.

I have reverted to Linux 4.12.14, and I have been unable to make any of the
guests lock up so far. I had previously had a stable setup with Linux 4.12.14
and QEMU 2.9.1. We'll see if it continues to stay stable with Linux 4.12.14 and
QEMU 2.11.0.

Please let me know what I can do to help identify the cause of this problem and
get it fixed. Thank you.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 198205] infinite loop in vmx_vcpu_run on x86_64
  2017-12-19 18:37 [Bug 198205] New: infinite loop in vmx_vcpu_run on x86_64 bugzilla-daemon
  2017-12-19 20:05 ` [Bug 198205] " bugzilla-daemon
  2018-01-07 13:54 ` bugzilla-daemon
@ 2018-01-07 14:13 ` bugzilla-daemon
  2018-01-11 21:16 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2018-01-07 14:13 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=198205

--- Comment #3 from Mark Mielke (mark.mielke@gmail.com) ---
I should add some other clarifications:

1) The guests usually don't lock up on an idle host. To make it fail, I
normally need to busy the network and storage. This has a side effect of
busying the CPU, but I haven't seen CPU as a primary trigger for me.

2) The guests that lock up include both a CentOS 7 guest, and Fedora 27 guests.
The Fedora 27 guests had both Linux 4.13 and Linux 4.14. The Fedora 27 guests
are currently running Linux 4.14.11, and they locked up. The CentOS 7 guest is
Linux 3.10.0-514.21.1.el7.x86_64. There is no particular order to the failures.
The CentOS 7 guest seems to fail with the same probability and triggers as the
Fedora 27 guests.

3) The host uses OpenVSwitch 2.8.1. The guests are attached to an OVS bridge.
vhost-net is active. The host also uses LVM thin volumes mounted on Intel SSD
750 NVMe drives as backing store for the guest volumes.

4) Force Reset, or Force Off, then Force On, usually clears it for a particular
guest for a particular time. Also, the guest can be suspended and resumed. I
did play with setting up libvirt to run "qemu -s" and running remote GDB
debugger on the guests, but this was too new to me to make complete sense. I
was able to see it was spinning inside the guest (not on the host), but I
wasn't able to see what it was doing in the guest, or why it was doing it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 198205] infinite loop in vmx_vcpu_run on x86_64
  2017-12-19 18:37 [Bug 198205] New: infinite loop in vmx_vcpu_run on x86_64 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-01-07 14:13 ` bugzilla-daemon
@ 2018-01-11 21:16 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2018-01-11 21:16 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=198205

Mike Marshall (hubcap@omnibond.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubcap@omnibond.com

--- Comment #4 from Mike Marshall (hubcap@omnibond.com) ---
I have a 3rd generation X1 Carbon. It had Fedora 25 on
it until the other day, I figured I better upgrade to
26 or 27 since those are the ones with the Meltdown updates.

I had been using qemu-kvm VMs on it, now when I try to
virt-install one, virt-viewer is unresponsive when it turns on.

When I turn on Virtual Machine Manager, I can see that
CPU is pegged at 100% in the not-yet-installed running VM.

Anywho... it seems the same with Fedora 27. I've been looking
around for what to do with no luck, and then I found this infinite
loop thread.

I have a 1st generation thinkpad as well, it has Fedora 26 on it,
and VMs work great there... perhaps there some combination of
hardware on the 3rd generation one that is the problem...

-Mike

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-11 21:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-19 18:37 [Bug 198205] New: infinite loop in vmx_vcpu_run on x86_64 bugzilla-daemon
2017-12-19 20:05 ` [Bug 198205] " bugzilla-daemon
2018-01-07 13:54 ` bugzilla-daemon
2018-01-07 14:13 ` bugzilla-daemon
2018-01-11 21:16 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox