From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60622) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRH4t-00040t-V0 for qemu-devel@nongnu.org; Thu, 04 Feb 2016 05:23:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRH4q-0007cj-NE for qemu-devel@nongnu.org; Thu, 04 Feb 2016 05:23:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57434) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRH4q-0007cf-HB for qemu-devel@nongnu.org; Thu, 04 Feb 2016 05:23:52 -0500 References: <56B2754B.7030809@redhat.com> <56B28B1C.7060202@redhat.com> <56B28E8B.1030107@redhat.com> From: Paolo Bonzini Message-ID: <56B326B4.1020407@redhat.com> Date: Thu, 4 Feb 2016 11:23:48 +0100 MIME-Version: 1.0 In-Reply-To: <56B28E8B.1030107@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] sda abort with virtio-scsi List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jim Minter , qemu-devel , Hannes Reinecke On 04/02/2016 00:34, Jim Minter wrote: > I was worried there was > some way in which the contention could cause an abort and perhaps thence > the lockup (which does not seem to recover when the host load goes down). I don't know... It's not the most tested code, but it is not very complicated either. The certain points that can be extracted from the kernel messages are: 1) there was a cancellation request that took a long time, >20 seconds; 2) despite taking a long time, it _did_ recover sooner or later because otherwise you'd not have the lockup splat either. Paolo >> Firing the NMI watchdog is fixed in more recent QEMU, which has >> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3 >> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm). > > /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3 > (qemu-kvm-1.5.3-105.el7_2.3)