From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41614) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSs0h-0004Dl-7k for qemu-devel@nongnu.org; Mon, 08 Feb 2016 15:02:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aSs0d-0004xT-SU for qemu-devel@nongnu.org; Mon, 08 Feb 2016 15:02:11 -0500 Received: from mail-wm0-f49.google.com ([74.125.82.49]:37627) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSs0d-0004xA-N1 for qemu-devel@nongnu.org; Mon, 08 Feb 2016 15:02:07 -0500 Received: by mail-wm0-f49.google.com with SMTP id g62so132096615wme.0 for ; Mon, 08 Feb 2016 12:02:07 -0800 (PST) References: <56B2754B.7030809@redhat.com> <56B28B1C.7060202@redhat.com> <56B28E8B.1030107@redhat.com> <56B326B4.1020407@redhat.com> <56B3550C.1010003@redhat.com> From: Jim Minter Message-ID: <56B8F438.9090907@redhat.com> Date: Mon, 8 Feb 2016 20:02:00 +0000 MIME-Version: 1.0 In-Reply-To: <56B3550C.1010003@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] sda abort with virtio-scsi List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , qemu-devel , Hannes Reinecke Again FWIW: No recurrence of the SCSI abort notices since increasing the timeout, but still getting guest userspace lockups. Guest kernel logs show RCU "detected stalls" messages and triggering NMIs across the CPUs. These consistently indicate CPU 2 sitting in the CFS scheduler via the timer interrupt, appearing to make some progress (i.e. RIP changes over time), and the other CPUs all sitting idle. Although the guest kernel keeps going and logging these issues out, none of the guest userspace processes make any progress at all over several hours. I'm upgrading to the QEMU version shipped with RHEV (qemu-kvm-rhev-2.3.0-31.el7_2.7) to see if that helps - so far so good. My best guess is that there's a missing bugfix in the RHEL 7 qemu 1.5.3 codebase, but which is fixed upstream and in the RHEV QEMU release. Cheers, Jim On 04/02/16 13:41, Jim Minter wrote: > FWIW, I've now done: > > echo 300 >/sys/block/sda/device/timeout > > Not entirely sure whether it would help or not, but so far I haven't had > a recurrence. > > Cheers, > > Jim >