qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Halil Pasic <pasic@linux.vnet.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails
Date: Wed, 29 Mar 2017 13:12:35 +0200	[thread overview]
Message-ID: <a286a86e-cc7f-27f5-98f1-a2836e0898d4@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170323190609-mutt-send-email-mst@kernel.org>



On 03/23/2017 06:09 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 06, 2017 at 05:04:41PM +0100, Cornelia Huck wrote:
>> On Mon, 6 Mar 2017 16:21:13 +0100
>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>
>>> On 03/06/2017 03:56 PM, Cornelia Huck wrote:
>>>> On Fri, 3 Mar 2017 14:08:37 +0100
>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>
>>>>> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
>>>>>> On Fri, 3 Mar 2017 13:43:32 +0100
>>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
>>>>>>>> On Thu,  2 Mar 2017 19:59:42 +0100
>>>>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>> [...]
>>>>> I admit, I did not investigate this thoroughly, also because the patch
>>>>> is flawed regarding multi-thread anyway. After a quick investigation
>>>>> it seems the linux guest won't auto-reset the device so the guest should
>>>>> end up with a not working device. I think it's pretty likely that the
>>>>> admin will check the logs if the device was important.
>>>>
>>>> Thinking a bit more about this, it seems setting the device broken is
>>>> not the right solution for exactly that reason. Setting the virtio
>>>> device broken is a way to signal the guest to 'you did something
>>>> broken; please reset the device and start anew' (and that's how current
>>>> callers use it). In our case, this is not the guest's fault.
>>>
>>> Do we have something to just say stuff broken without blaming the guest?
>>> And device reset might not be that stupid at all in the given situation,
>>> if we want to save what can be saved from the perspective of the guest.
>>> (After reset stuff should work again until we hit the race again -- and
>>> since turning ioeventfd on/off should not happen that often during normal
>>> operation it could help limit damage suffered -- e.g. controlled shutdown).
>>
>> Checking again, the spec says
>>
>> DEVICE_NEEDS_RESET (64) Indicates that the device has experienced an
>> error from which it can’t recover.
>>
>> Nothing about 'guest error'.
>>
>> The only problem is that legacy devices don't have that state, which
>> means they'll have a broken device through no fault of their own.
>>
>>>
>>>>
>>>> Maybe go back to the assert 'solution'? But I'm not sure that's enough
>>>> if production builds disable asserts...
>>>>
>>>
>>> I will wait a bit, maybe other virtio folks are going to have an 
>>> opinion too.
>>>
>>> My concern about the assert solution is that for production it is
>>> either too rigorous (kill off, hopefully with a dump) or not
>>> enough (as you have mentioned, if NDEBUG assert does nothing).
>>>
>>>
>>> I think there are setups where a loss of device does not have to be
>>> fatal, and I would not like to be the one who makes it fatal (for the
>>> guest).
>>
>> Basically, it's a host bug (and not a bug specific to a certain
>> device). Moving the device which was impacted to a broken state may be
>> a useful mitigation.
>>
>> But yes, let's hear some other opinions.
> 
> We don't support NDEBUG really so I think an assert is fine for now.

Thanks for your reply! You are right, virtio.c makes sure we do not have
NDEBUG defined, but at the very same  place a todo comment suggests that
not supporting NDEBUG is a temporary measure (and a so called technical
debt).

If my understanding is correct, I should use assert instead of
virtio_error for the next version. I that right?

> Handling unexpected errors more gracefully is laudable but I think we
> want a more systematic approach than just open-coding it in
> this specific place.
> 

Sorry, I mistook virtio_error for the systematic approach for handling
(expected and unexpected) errors in virtio. Should I create a patch which
documents what is considered an (in)appropriate usage of virtio_error?

AFAIU (based on Connie's comments) virtio_error should not be used for
reporting QEMU errors (or should only be used to report guest errors).

By the way I have read the exchange on '[PATCH 1/5] virtio: Error object
based virtio_error()'. The things you suggest there do make a lot of
sense to me. Unfortunately I can't tell if I'm going to have the time to
come up with an appropriate patch(set), so I won't volunteer for now.

Regards,

Halil

      reply	other threads:[~2017-03-29 11:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 18:59 [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails Halil Pasic
2017-03-03 12:21 ` Cornelia Huck
2017-03-03 12:43   ` Halil Pasic
2017-03-03 12:50     ` Cornelia Huck
2017-03-03 13:08       ` Halil Pasic
2017-03-06 14:56         ` Cornelia Huck
2017-03-06 15:21           ` Halil Pasic
2017-03-06 16:04             ` Cornelia Huck
2017-03-23 17:09               ` Michael S. Tsirkin
2017-03-29 11:12                 ` Halil Pasic [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a286a86e-cc7f-27f5-98f1-a2836e0898d4@linux.vnet.ibm.com \
    --to=pasic@linux.vnet.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).