From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54891) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UmMxM-0001bM-Eg for qemu-devel@nongnu.org; Tue, 11 Jun 2013 07:41:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UmMxL-0003vd-Ag for qemu-devel@nongnu.org; Tue, 11 Jun 2013 07:41:44 -0400 Received: from cantor2.suse.de ([195.135.220.15]:43189 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UmMxL-0003v4-4v for qemu-devel@nongnu.org; Tue, 11 Jun 2013 07:41:43 -0400 Message-ID: <51B70CF2.1020306@suse.de> Date: Tue, 11 Jun 2013 13:41:38 +0200 From: Hannes Reinecke MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] virtio-scsi and error handling List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Paolo Bonzini , Alexander Graf , "qemu-devel@nongnu.org" Hi Stefan, I currently playing around with improving SCSI EH, optimizing command aborts and the like. And, supposing it to be a nice testbed, tried to make things work with virtio_scsi. However, looking at the code there I've found virtscsi_tmf() just uses 'wait_for_completion', with no timeout specified. So in effect any abort might stall forever. Wouldn't it be more sensible to use 'wait_for_completion_timeout' here, to allow the error escalation to continue? This would especially be useful when running with multipathing, as the underlying device might stall, and aio_cancel() doesn't work reliably, if at all. Also I've found that there is no host reset. Currently the virtio semantics seem to require reliable communication, ie for every command send there _has_ to be a response. Long and painful experience with RAID HBAs has shown that this model works okay for the lower-level escalations, but you absolutely need a host reset to restore communication. In the case of virtio I would think that a virtio-level reset for host_reset would be a sensible idea. Any opinions from your side? Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)