From: Roman Kagan <rvkagan@yandex-team.ru>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Konstantin Khlebnikov" <khlebnikov@yandex-team.ru>,
qemu-devel@nongnu.org, yc-core@yandex-team.ru,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Eduardo Habkost" <ehabkost@gmail.com>,
"Eric Blake" <eblake@redhat.com>
Subject: Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
Date: Tue, 21 Jun 2022 15:02:56 +0300 [thread overview]
Message-ID: <YrGzcCPp1kb6RaLl@rvkaganb> (raw)
In-Reply-To: <87y1xqs02a.fsf@pond.sub.org>
On Tue, Jun 21, 2022 at 01:55:25PM +0200, Markus Armbruster wrote:
> Roman Kagan <rvkagan@yandex-team.ru> writes:
>
> > On Mon, May 30, 2022 at 06:04:32PM +0300, Roman Kagan wrote:
> >> On Mon, May 30, 2022 at 01:28:17PM +0200, Markus Armbruster wrote:
> >> > Roman Kagan <rvkagan@yandex-team.ru> writes:
> >> >
> >> > > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote:
> >> > >> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> >> > >>
> >> > >> > This event represents device runtime errors to give time and
> >> > >> > reason why device is broken.
> >> > >>
> >> > >> Can you give an or more examples of the "device runtime errors" you have
> >> > >> in mind?
> >> > >
> >> > > Initially we wanted to address a situation when a vhost device
> >> > > discovered an inconsistency during virtqueue processing and silently
> >> > > stopped the virtqueue. This resulted in device stall (partial for
> >> > > multiqueue devices) and we were the last to notice that.
> >> > >
> >> > > The solution appeared to be to employ errfd and, upon receiving a
> >> > > notification through it, to emit a QMP event which is actionable in the
> >> > > management layer or further up the stack.
> >> > >
> >> > > Then we observed that virtio (non-vhost) devices suffer from the same
> >> > > issue: they only log the error but don't signal it to the management
> >> > > layer. The case was very similar so we thought it would make sense to
> >> > > share the infrastructure and the QMP event between virtio and vhost.
> >> > >
> >> > > Then Konstantin went a bit further and generalized the concept into
> >> > > generic "device runtime error". I'm personally not completely convinced
> >> > > this generalization is appropriate here; we'd appreciate the opinions
> >> > > from the community on the matter.
> >> >
> >> > "Device emulation sending an even on entering certain error states, so
> >> > that a management application can do something about it" feels
> >> > reasonable enough to me as a general concept.
> >> >
> >> > The key point is of course "can do something": the event needs to be
> >> > actionable. Can you describe possible actions for the cases you
> >> > implement?
> >>
> >> The first one that we had in mind was informational, like triggering an
> >> alert in the monitoring system and/or painting the VM as malfunctioning
> >> in the owner's UI.
> >>
> >> There can be more advanced scenarios like autorecovery by resetting the
> >> faulty VM, or fencing it if it's a cluster member.
> >
> > The discussion kind of stalled here.
>
> My apologies...
>
> > Do you think the approach makes
> > sense or not? Should we try and resubmit the series with a proper cover
> > letter and possibly other improvements or is it a dead end?
>
> As QAPI schema maintainer, my concern is interface design. To sell this
> interface to me (so to speak), you have to show it's useful and
> reasonably general. Reasonably general, because we don't want to
> accumulate one-offs, even if they have their uses.
>
> I think this is mostly a matter of commit message(s) and documentation
> here. Explain your intended use cases. Maybe hand-wave at other use
> cases you can think of. Document that you're implementing the event
> only for the specific errors you need, but that it could be implemented
> more widely as needed. "Complete" feels impractical, though.
>
> Makes sense?
Absolutely. We'll rework and resubmit the series addressing the issues
you've noted, and we'll see how it goes.
Thanks,
Roman.
prev parent reply other threads:[~2022-06-21 12:05 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-19 14:19 [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error() Konstantin Khlebnikov
2022-05-24 19:25 ` Vladimir Sementsov-Ogievskiy
2022-05-19 14:19 ` [PATCH 3/4] vhost: add method vhost_set_vring_err Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 4/4] vhost: forward vring errors into virtio device Konstantin Khlebnikov
2022-05-24 19:04 ` [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Vladimir Sementsov-Ogievskiy
2022-05-25 8:26 ` Konstantin Khlebnikov
2022-05-25 10:54 ` Markus Armbruster
2022-05-27 12:49 ` Roman Kagan
2022-05-30 11:28 ` Markus Armbruster
2022-05-30 15:04 ` Roman Kagan
2022-06-20 13:49 ` Roman Kagan
2022-06-21 11:55 ` Markus Armbruster
2022-06-21 12:02 ` Roman Kagan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YrGzcCPp1kb6RaLl@rvkaganb \
--to=rvkagan@yandex-team.ru \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=ehabkost@gmail.com \
--cc=khlebnikov@yandex-team.ru \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yc-core@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).