From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60622)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aRH4t-00040t-V0
	for qemu-devel@nongnu.org; Thu, 04 Feb 2016 05:23:56 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aRH4q-0007cj-NE
	for qemu-devel@nongnu.org; Thu, 04 Feb 2016 05:23:55 -0500
Received: from mx1.redhat.com ([209.132.183.28]:57434)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aRH4q-0007cf-HB
	for qemu-devel@nongnu.org; Thu, 04 Feb 2016 05:23:52 -0500
References: <56B2754B.7030809@redhat.com> <56B28B1C.7060202@redhat.com>
	<56B28E8B.1030107@redhat.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <56B326B4.1020407@redhat.com>
Date: Thu, 4 Feb 2016 11:23:48 +0100
MIME-Version: 1.0
In-Reply-To: <56B28E8B.1030107@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jim Minter <jminter@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Hannes Reinecke <hare@suse.de>


On 04/02/2016 00:34, Jim Minter wrote:
> I was worried there was
> some way in which the contention could cause an abort and perhaps thence
> the lockup (which does not seem to recover when the host load goes down).

I don't know... It's not the most tested code, but it is not very
complicated either.

The certain points that can be extracted from the kernel messages are:
1) there was a cancellation request that took a long time, >20 seconds;
2) despite taking a long time, it _did_ recover sooner or later because
otherwise you'd not have the lockup splat either.

Paolo

>> Firing the NMI watchdog is fixed in more recent QEMU, which has
>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
> 
> /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3
> (qemu-kvm-1.5.3-105.el7_2.3)