From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58609) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRK9v-0008Ug-NZ for qemu-devel@nongnu.org; Thu, 04 Feb 2016 08:41:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRK9r-0006Ot-Gr for qemu-devel@nongnu.org; Thu, 04 Feb 2016 08:41:19 -0500 Received: from mail-wm0-f50.google.com ([74.125.82.50]:38075) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRK9r-0006OX-Au for qemu-devel@nongnu.org; Thu, 04 Feb 2016 08:41:15 -0500 Received: by mail-wm0-f50.google.com with SMTP id p63so117708906wmp.1 for ; Thu, 04 Feb 2016 05:41:15 -0800 (PST) References: <56B2754B.7030809@redhat.com> <56B28B1C.7060202@redhat.com> <56B28E8B.1030107@redhat.com> <56B326B4.1020407@redhat.com> From: Jim Minter Message-ID: <56B3550C.1010003@redhat.com> Date: Thu, 4 Feb 2016 13:41:32 +0000 MIME-Version: 1.0 In-Reply-To: <56B326B4.1020407@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] sda abort with virtio-scsi List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , qemu-devel , Hannes Reinecke FWIW, I've now done: echo 300 >/sys/block/sda/device/timeout Not entirely sure whether it would help or not, but so far I haven't had a recurrence. Cheers, Jim -- Jim Minter Principal Solution Architect, Red Hat UK e: jminter@redhat.com m: +44 (0)7906 098697 cal: https://www.google.com/calendar/embed?src=jminter@redhat.com&ctz=Europe/London&mode=week On 04/02/16 10:23, Paolo Bonzini wrote: > > > On 04/02/2016 00:34, Jim Minter wrote: >> I was worried there was >> some way in which the contention could cause an abort and perhaps thence >> the lockup (which does not seem to recover when the host load goes down). > > I don't know... It's not the most tested code, but it is not very > complicated either. > > The certain points that can be extracted from the kernel messages are: > 1) there was a cancellation request that took a long time, >20 seconds; > 2) despite taking a long time, it _did_ recover sooner or later because > otherwise you'd not have the lockup splat either. > > Paolo > >>> Firing the NMI watchdog is fixed in more recent QEMU, which has >>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3 >>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm). >> >> /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3 >> (qemu-kvm-1.5.3-105.el7_2.3)