From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41614)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jminter@redhat.com>) id 1aSs0h-0004Dl-7k
	for qemu-devel@nongnu.org; Mon, 08 Feb 2016 15:02:11 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jminter@redhat.com>) id 1aSs0d-0004xT-SU
	for qemu-devel@nongnu.org; Mon, 08 Feb 2016 15:02:11 -0500
Received: from mail-wm0-f49.google.com ([74.125.82.49]:37627)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jminter@redhat.com>) id 1aSs0d-0004xA-N1
	for qemu-devel@nongnu.org; Mon, 08 Feb 2016 15:02:07 -0500
Received: by mail-wm0-f49.google.com with SMTP id g62so132096615wme.0
	for <qemu-devel@nongnu.org>; Mon, 08 Feb 2016 12:02:07 -0800 (PST)
References: <56B2754B.7030809@redhat.com> <56B28B1C.7060202@redhat.com>
	<56B28E8B.1030107@redhat.com> <56B326B4.1020407@redhat.com>
	<56B3550C.1010003@redhat.com>
From: Jim Minter <jminter@redhat.com>
Message-ID: <56B8F438.9090907@redhat.com>
Date: Mon, 8 Feb 2016 20:02:00 +0000
MIME-Version: 1.0
In-Reply-To: <56B3550C.1010003@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Hannes Reinecke <hare@suse.de>

Again FWIW:

No recurrence of the SCSI abort notices since increasing the timeout, 
but still getting guest userspace lockups.  Guest kernel logs show RCU 
"detected stalls" messages and triggering NMIs across the CPUs.  These 
consistently indicate CPU 2 sitting in the CFS scheduler via the timer 
interrupt, appearing to make some progress (i.e. RIP changes over time), 
and the other CPUs all sitting idle.  Although the guest kernel keeps 
going and logging these issues out, none of the guest userspace 
processes make any progress at all over several hours.

I'm upgrading to the QEMU version shipped with RHEV 
(qemu-kvm-rhev-2.3.0-31.el7_2.7) to see if that helps - so far so good. 
  My best guess is that there's a missing bugfix in the RHEL 7 qemu 
1.5.3 codebase, but which is fixed upstream and in the RHEV QEMU release.

Cheers,

Jim

On 04/02/16 13:41, Jim Minter wrote:
> FWIW, I've now done:
>
> echo 300 >/sys/block/sda/device/timeout
>
> Not entirely sure whether it would help or not, but so far I haven't had
> a recurrence.
>
> Cheers,
>
> Jim
>