All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Asias He <asias@redhat.com>, Hannes Reinecke <hare@suse.de>,
	Alexander Graf <agraf@suse.de>
Subject: Re: [Qemu-devel] virtio-scsi and error handling
Date: Wed, 12 Jun 2013 16:19:14 -0400	[thread overview]
Message-ID: <51B8D7C2.3040907@redhat.com> (raw)
In-Reply-To: <20130612075620.GD946@stefanha-thinkpad.muc.redhat.com>

Il 12/06/2013 03:56, Stefan Hajnoczi ha scritto:
> On Tue, Jun 11, 2013 at 01:41:38PM +0200, Hannes Reinecke wrote:
>> I currently playing around with improving SCSI EH, optimizing
>> command aborts and the like.
>>
>> And, supposing it to be a nice testbed, tried to make things work
>> with virtio_scsi.
>>
>> However, looking at the code there I've found virtscsi_tmf() just
>> uses 'wait_for_completion', with no timeout specified. So in effect
>> any abort might stall forever.
>>
>> Wouldn't it be more sensible to use 'wait_for_completion_timeout'
>> here, to allow the error escalation to continue?
>> This would especially be useful when running with multipathing,
>> as the underlying device might stall, and aio_cancel() doesn't work
>> reliably, if at all.
> 
> Hi,
> I agree that we need a timeout.  bdrv_aio_cancel() is not guaranteed to
> complete in bounded time.

I also agree that we need a timeout, but then note that host reset could
also not complete in bounded time if I/O doesn't terminate in the host.

Last time I checked the io_cancel system call was basically a no-op (for
aio=native), and for aio=threads the worker might stay in D state for an
unbounded time too.

Paolo

>> Also I've found that there is no host reset. Currently the virtio
>> semantics seem to require reliable communication, ie for every
>> command send there _has_ to be a response.
>>
>> Long and painful experience with RAID HBAs has shown that this model
>> works okay for the lower-level escalations, but you absolutely need
>> a host reset to restore communication.
>> In the case of virtio I would think that a virtio-level reset for
>> host_reset would be a sensible idea.
> 
> One thing to watch out for is that a virtio-scsi reset will likely hang
> too because it resets all pending requests.
> 
> Paolo Bonzini has done the lion's share of virtio-scsi work over the
> past year (or two?).  He might have some more thoughts.
> 
> Stefan
> 

      reply	other threads:[~2013-06-12 20:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 11:41 [Qemu-devel] virtio-scsi and error handling Hannes Reinecke
2013-06-12  7:56 ` Stefan Hajnoczi
2013-06-12 20:19   ` Paolo Bonzini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B8D7C2.3040907@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=agraf@suse.de \
    --cc=asias@redhat.com \
    --cc=hare@suse.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.