From: Paolo Bonzini <pbonzini@redhat.com>
To: Hannes Reinecke <hare@suse.de>, Jim Minter <jminter@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
Date: Thu, 4 Feb 2016 12:27:00 +0100 [thread overview]
Message-ID: <56B33584.6070405@redhat.com> (raw)
In-Reply-To: <56B2F6EE.6030205@suse.de>
On 04/02/2016 07:59, Hannes Reinecke wrote:
> On 02/04/2016 12:19 AM, Paolo Bonzini wrote:
>>
>>
>> On 03/02/2016 22:46, Jim Minter wrote:
>>> I am hitting the following VM lockup issue running a VM with latest
>>> RHEL7 kernel on a host also running latest RHEL7 kernel. FWIW I'm using
>>> virtio-scsi because I want to use discard=unmap. I ran the VM as follows:
>>>
>>> /usr/libexec/qemu-kvm -nodefaults \
>>> -cpu host \
>>> -smp 4 \
>>> -m 8192 \
>>> -drive discard=unmap,file=vm.qcow2,id=disk1,if=none,cache=unsafe \
>>> -device virtio-scsi-pci \
>>> -device scsi-disk,drive=disk1 \
>>> -netdev bridge,id=net0,br=br0 \
>>> -device virtio-net-pci,netdev=net0,mac=$(utils/random-mac.py) \
>>> -chardev socket,id=chan0,path=/tmp/rhev.sock,server,nowait \
>>> -chardev socket,id=chan1,path=/tmp/qemu.sock,server,nowait \
>>> -monitor unix:tmp/vm.sock,server,nowait \
>>> -device virtio-serial-pci \
>>> -device virtserialport,chardev=chan0,name=com.redhat.rhevm.vdsm \
>>> -device virtserialport,chardev=chan1,name=org.qemu.guest_agent.0 \
>>> -device cirrus-vga \
>>> -vnc none \
>>> -usbdevice tablet
>>>
>>> The host was busyish at the time, but not excessively (IMO). Nothing
>>> untoward in the host's kernel log; host storage subsystem is fine. I
>>> didn't get any qemu logs this time around, but I will when the issue
>>> next recurs. The VM's full kernel log is attached; here are the
>>> highlights:
>>
>> Hannes, were you going to send a patch to disable time outs?
>>
> Rah. Didn't I do it already?
> Seems like I didn't; will be doing so shortly.
>
>>>
>>> INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 2, t=60002 jiffies, g=5253, c=5252, q=0)
>>> sending NMI to all CPUs:
>>> NMI backtrace for cpu 1
>>> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-327.4.5.el7.x86_64 #1
>>> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>>> task: ffff88023417d080 ti: ffff8802341a4000 task.ti: ffff8802341a4000
>>> RIP: 0010:[<ffffffff81058e96>] [<ffffffff81058e96>] native_safe_halt+0x6/0x10
>>> RSP: 0018:ffff8802341a7e98 EFLAGS: 00000286
>>> RAX: 00000000ffffffed RBX: ffff8802341a4000 RCX: 0100000000000000
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
>>> RBP: ffff8802341a7e98 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
>>> R13: ffff8802341a4000 R14: ffff8802341a4000 R15: 0000000000000000
>>> FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00007f4978587008 CR3: 000000003645e000 CR4: 00000000003407e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Stack:
>>> ffff8802341a7eb8 ffffffff8101dbcf ffff8802341a4000 ffffffff81a68260
>>> ffff8802341a7ec8 ffffffff8101e4d6 ffff8802341a7f20 ffffffff810d62e5
>>> ffff8802341a7fd8 ffff8802341a4000 2581685d70de192c 7ba58fdb3a3bc8d4
>>> Call Trace:
>>> [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
>>> [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
>>> [<ffffffff810d62e5>] cpu_startup_entry+0x245/0x290
>>> [<ffffffff810475fa>] start_secondary+0x1ba/0x230
>>> Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
>>> NMI backtrace for cpu 0
>>
>> This is the NMI watchdog firing; the CPU got stuck for 20 seconds. The
>> issue was not a busy host, but a busy storage (could it be a network
>> partition if the disk was hosted on NFS???)
>>
>> Firing the NMI watchdog is fixed in more recent QEMU, which has
>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
>>
> Actually, you still cannot do _real_ async cancellation of I/O; the
> linux aio subsystem implements io_cancel(), but the cancellation
> just aborts the (internal) waitqueue element, not the I/O itself.
Right, but at least the TMF is asynchronous. Synchronous TMFs keep the
VCPUs in QEMU for many seconds and cause the watchdog to fire.
Paolo
prev parent reply other threads:[~2016-02-04 11:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-03 21:46 [Qemu-devel] sda abort with virtio-scsi Jim Minter
2016-02-03 23:19 ` Paolo Bonzini
2016-02-03 23:34 ` Jim Minter
2016-02-04 10:23 ` Paolo Bonzini
2016-02-04 11:00 ` Denis V. Lunev
2016-02-04 13:41 ` Jim Minter
2016-02-04 13:54 ` Hannes Reinecke
2016-02-04 15:03 ` Paolo Bonzini
2016-02-04 15:11 ` Hannes Reinecke
2016-02-08 20:02 ` Jim Minter
2016-02-04 6:59 ` Hannes Reinecke
2016-02-04 11:27 ` Paolo Bonzini [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B33584.6070405@redhat.com \
--to=pbonzini@redhat.com \
--cc=hare@suse.de \
--cc=jminter@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.