All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Halil Pasic <pasic@linux.vnet.ibm.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails
Date: Thu, 23 Mar 2017 19:09:31 +0200	[thread overview]
Message-ID: <20170323190609-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20170306170441.2a8f8b27.cornelia.huck@de.ibm.com>

On Mon, Mar 06, 2017 at 05:04:41PM +0100, Cornelia Huck wrote:
> On Mon, 6 Mar 2017 16:21:13 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
> > On 03/06/2017 03:56 PM, Cornelia Huck wrote:
> > > On Fri, 3 Mar 2017 14:08:37 +0100
> > > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > > 
> > >> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> > >>> On Fri, 3 Mar 2017 13:43:32 +0100
> > >>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > >>>
> > >>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> > >>>>> On Thu,  2 Mar 2017 19:59:42 +0100
> > >>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > [...]
> > >> I admit, I did not investigate this thoroughly, also because the patch
> > >> is flawed regarding multi-thread anyway. After a quick investigation
> > >> it seems the linux guest won't auto-reset the device so the guest should
> > >> end up with a not working device. I think it's pretty likely that the
> > >> admin will check the logs if the device was important.
> > > 
> > > Thinking a bit more about this, it seems setting the device broken is
> > > not the right solution for exactly that reason. Setting the virtio
> > > device broken is a way to signal the guest to 'you did something
> > > broken; please reset the device and start anew' (and that's how current
> > > callers use it). In our case, this is not the guest's fault.
> > 
> > Do we have something to just say stuff broken without blaming the guest?
> > And device reset might not be that stupid at all in the given situation,
> > if we want to save what can be saved from the perspective of the guest.
> > (After reset stuff should work again until we hit the race again -- and
> > since turning ioeventfd on/off should not happen that often during normal
> > operation it could help limit damage suffered -- e.g. controlled shutdown).
> 
> Checking again, the spec says
> 
> DEVICE_NEEDS_RESET (64) Indicates that the device has experienced an
> error from which it can’t recover.
> 
> Nothing about 'guest error'.
> 
> The only problem is that legacy devices don't have that state, which
> means they'll have a broken device through no fault of their own.
> 
> > 
> > > 
> > > Maybe go back to the assert 'solution'? But I'm not sure that's enough
> > > if production builds disable asserts...
> > > 
> > 
> > I will wait a bit, maybe other virtio folks are going to have an 
> > opinion too.
> > 
> > My concern about the assert solution is that for production it is
> > either too rigorous (kill off, hopefully with a dump) or not
> > enough (as you have mentioned, if NDEBUG assert does nothing).
> > 
> > 
> > I think there are setups where a loss of device does not have to be
> > fatal, and I would not like to be the one who makes it fatal (for the
> > guest).
> 
> Basically, it's a host bug (and not a bug specific to a certain
> device). Moving the device which was impacted to a broken state may be
> a useful mitigation.
> 
> But yes, let's hear some other opinions.

We don't support NDEBUG really so I think an assert is fine for now.
Handling unexpected errors more gracefully is laudable but I think we
want a more systematic approach than just open-coding it in
this specific place.


-- 
MST

  reply	other threads:[~2017-03-23 17:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 18:59 [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails Halil Pasic
2017-03-03 12:21 ` Cornelia Huck
2017-03-03 12:43   ` Halil Pasic
2017-03-03 12:50     ` Cornelia Huck
2017-03-03 13:08       ` Halil Pasic
2017-03-06 14:56         ` Cornelia Huck
2017-03-06 15:21           ` Halil Pasic
2017-03-06 16:04             ` Cornelia Huck
2017-03-23 17:09               ` Michael S. Tsirkin [this message]
2017-03-29 11:12                 ` Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170323190609-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=pasic@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.