From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58609)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jminter@redhat.com>) id 1aRK9v-0008Ug-NZ
	for qemu-devel@nongnu.org; Thu, 04 Feb 2016 08:41:20 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jminter@redhat.com>) id 1aRK9r-0006Ot-Gr
	for qemu-devel@nongnu.org; Thu, 04 Feb 2016 08:41:19 -0500
Received: from mail-wm0-f50.google.com ([74.125.82.50]:38075)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jminter@redhat.com>) id 1aRK9r-0006OX-Au
	for qemu-devel@nongnu.org; Thu, 04 Feb 2016 08:41:15 -0500
Received: by mail-wm0-f50.google.com with SMTP id p63so117708906wmp.1
	for <qemu-devel@nongnu.org>; Thu, 04 Feb 2016 05:41:15 -0800 (PST)
References: <56B2754B.7030809@redhat.com> <56B28B1C.7060202@redhat.com>
	<56B28E8B.1030107@redhat.com> <56B326B4.1020407@redhat.com>
From: Jim Minter <jminter@redhat.com>
Message-ID: <56B3550C.1010003@redhat.com>
Date: Thu, 4 Feb 2016 13:41:32 +0000
MIME-Version: 1.0
In-Reply-To: <56B326B4.1020407@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Hannes Reinecke <hare@suse.de>

FWIW, I've now done:

echo 300 >/sys/block/sda/device/timeout

Not entirely sure whether it would help or not, but so far I haven't had 
a recurrence.

Cheers,

Jim

-- 
Jim Minter
Principal Solution Architect, Red Hat UK
e: jminter@redhat.com
m: +44 (0)7906 098697
cal: 
https://www.google.com/calendar/embed?src=jminter@redhat.com&ctz=Europe/London&mode=week

On 04/02/16 10:23, Paolo Bonzini wrote:
>
>
> On 04/02/2016 00:34, Jim Minter wrote:
>> I was worried there was
>> some way in which the contention could cause an abort and perhaps thence
>> the lockup (which does not seem to recover when the host load goes down).
>
> I don't know... It's not the most tested code, but it is not very
> complicated either.
>
> The certain points that can be extracted from the kernel messages are:
> 1) there was a cancellation request that took a long time, >20 seconds;
> 2) despite taking a long time, it _did_ recover sooner or later because
> otherwise you'd not have the lockup splat either.
>
> Paolo
>
>>> Firing the NMI watchdog is fixed in more recent QEMU, which has
>>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
>>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
>>
>> /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3
>> (qemu-kvm-1.5.3-105.el7_2.3)