qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Roman Kagan <rvkagan@yandex-team.ru>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Konstantin Khlebnikov" <khlebnikov@yandex-team.ru>,
	qemu-devel@nongnu.org, yc-core@yandex-team.ru,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <ehabkost@gmail.com>,
	"Eric Blake" <eblake@redhat.com>
Subject: Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
Date: Mon, 20 Jun 2022 16:49:06 +0300	[thread overview]
Message-ID: <YrB60nlxNeelb6r0@rvkaganb> (raw)
In-Reply-To: <YpTdAPAo8RGD735z@rvkaganb>

On Mon, May 30, 2022 at 06:04:32PM +0300, Roman Kagan wrote:
> On Mon, May 30, 2022 at 01:28:17PM +0200, Markus Armbruster wrote:
> > Roman Kagan <rvkagan@yandex-team.ru> writes:
> > 
> > > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote:
> > >> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> > >> 
> > >> > This event represents device runtime errors to give time and
> > >> > reason why device is broken.
> > >> 
> > >> Can you give an or more examples of the "device runtime errors" you have
> > >> in mind?
> > >
> > > Initially we wanted to address a situation when a vhost device
> > > discovered an inconsistency during virtqueue processing and silently
> > > stopped the virtqueue.  This resulted in device stall (partial for
> > > multiqueue devices) and we were the last to notice that.
> > >
> > > The solution appeared to be to employ errfd and, upon receiving a
> > > notification through it, to emit a QMP event which is actionable in the
> > > management layer or further up the stack.
> > >
> > > Then we observed that virtio (non-vhost) devices suffer from the same
> > > issue: they only log the error but don't signal it to the management
> > > layer.  The case was very similar so we thought it would make sense to
> > > share the infrastructure and the QMP event between virtio and vhost.
> > >
> > > Then Konstantin went a bit further and generalized the concept into
> > > generic "device runtime error".  I'm personally not completely convinced
> > > this generalization is appropriate here; we'd appreciate the opinions
> > > from the community on the matter.
> > 
> > "Device emulation sending an even on entering certain error states, so
> > that a management application can do something about it" feels
> > reasonable enough to me as a general concept.
> > 
> > The key point is of course "can do something": the event needs to be
> > actionable.  Can you describe possible actions for the cases you
> > implement?
> 
> The first one that we had in mind was informational, like triggering an
> alert in the monitoring system and/or painting the VM as malfunctioning
> in the owner's UI.
> 
> There can be more advanced scenarios like autorecovery by resetting the
> faulty VM, or fencing it if it's a cluster member.

The discussion kind of stalled here.  Do you think the approach makes
sense or not?  Should we try and resubmit the series with a proper cover
letter and possibly other improvements or is it a dead end?

Thanks,
Roman.


  reply	other threads:[~2022-06-20 13:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 14:19 [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error() Konstantin Khlebnikov
2022-05-24 19:25   ` Vladimir Sementsov-Ogievskiy
2022-05-19 14:19 ` [PATCH 3/4] vhost: add method vhost_set_vring_err Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 4/4] vhost: forward vring errors into virtio device Konstantin Khlebnikov
2022-05-24 19:04 ` [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Vladimir Sementsov-Ogievskiy
2022-05-25  8:26   ` Konstantin Khlebnikov
2022-05-25 10:54 ` Markus Armbruster
2022-05-27 12:49   ` Roman Kagan
2022-05-30 11:28     ` Markus Armbruster
2022-05-30 15:04       ` Roman Kagan
2022-06-20 13:49         ` Roman Kagan [this message]
2022-06-21 11:55           ` Markus Armbruster
2022-06-21 12:02             ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrB60nlxNeelb6r0@rvkaganb \
    --to=rvkagan@yandex-team.ru \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@gmail.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).