qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cornelia.huck@de.ibm.com>
To: Halil Pasic <pasic@linux.vnet.ibm.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails
Date: Mon, 6 Mar 2017 15:56:25 +0100	[thread overview]
Message-ID: <20170306155625.5c426793.cornelia.huck@de.ibm.com> (raw)
In-Reply-To: <d00802cc-150a-2263-604d-9c2eaa5c19d1@linux.vnet.ibm.com>

On Fri, 3 Mar 2017 14:08:37 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> > On Fri, 3 Mar 2017 13:43:32 +0100
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > 
> >> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> >>> On Thu,  2 Mar 2017 19:59:42 +0100
> >>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >>>
> >>>> The function virtio_notify_irqfd used to ignore the return code of
> >>>> event_notifier_set. Let's fail the device should this occur.
> >>>
> >>> I'm wondering if there are reasons for event_notifier_set() to fail
> >>> beyond "we've hit an internal race and should make an effort to fix
> >>> that one, or else we have completely messed up in qemu". Marking the
> >>> device broken tells the guest that there's something wrong with the
> >>> device, but I think we want qemu bug reports when there's something
> >>> broken with the irqfd.
> >>>
> >>
> >> That's why the error is logged. I understand virtio_error like something
> >> suitable for indicating bugs.
> >>
> >> What do you suggest? Forcing a dump? I would rather leave it to the
> >> user to figure out how important is the state sitting in the machine
> >> and the device, and how much effort does (s)he want to put into recovering
> >> from the failure. 
> > 
> > How likely are those logged messages being brought to attention of the
> > admin? Does any management software flag machines with such error
> > messages? (that's more of a general question)
> > 
> 
> I admit, I did not investigate this thoroughly, also because the patch
> is flawed regarding multi-thread anyway. After a quick investigation
> it seems the linux guest won't auto-reset the device so the guest should
> end up with a not working device. I think it's pretty likely that the
> admin will check the logs if the device was important.

Thinking a bit more about this, it seems setting the device broken is
not the right solution for exactly that reason. Setting the virtio
device broken is a way to signal the guest to 'you did something
broken; please reset the device and start anew' (and that's how current
callers use it). In our case, this is not the guest's fault.

Maybe go back to the assert 'solution'? But I'm not sure that's enough
if production builds disable asserts...

> 
> I agree fully that it's a very general question, and I do not feel
> competent for answering it.
> 
> > I'd like to have some kind of trigger that rings an alarm bell so that
> > the admin might consider reporting this, but I don't have a good idea
> > on how to do that either...
> > 
> 
> There are tools for aggregating and processing logs, and triggering
> alarm bells too (for example ELK= logstash + Kibana + Elasticsearch).
> AFAIK logs are the most common way to deal with such stuff. But I'm far
> form being an expert. Of course logs are only as good as the messages
> landing in them...

Let's hope this works properly, then.

  reply	other threads:[~2017-03-06 14:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 18:59 [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails Halil Pasic
2017-03-03 12:21 ` Cornelia Huck
2017-03-03 12:43   ` Halil Pasic
2017-03-03 12:50     ` Cornelia Huck
2017-03-03 13:08       ` Halil Pasic
2017-03-06 14:56         ` Cornelia Huck [this message]
2017-03-06 15:21           ` Halil Pasic
2017-03-06 16:04             ` Cornelia Huck
2017-03-23 17:09               ` Michael S. Tsirkin
2017-03-29 11:12                 ` Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170306155625.5c426793.cornelia.huck@de.ibm.com \
    --to=cornelia.huck@de.ibm.com \
    --cc=mst@redhat.com \
    --cc=pasic@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).