qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Roman Kagan <rvkagan@yandex-team.ru>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Konstantin Khlebnikov" <khlebnikov@yandex-team.ru>,
	qemu-devel@nongnu.org, yc-core@yandex-team.ru,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <ehabkost@gmail.com>,
	"Eric Blake" <eblake@redhat.com>
Subject: Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
Date: Mon, 30 May 2022 18:04:32 +0300	[thread overview]
Message-ID: <YpTdAPAo8RGD735z@rvkaganb> (raw)
In-Reply-To: <87sforb6pa.fsf@pond.sub.org>

On Mon, May 30, 2022 at 01:28:17PM +0200, Markus Armbruster wrote:
> Roman Kagan <rvkagan@yandex-team.ru> writes:
> 
> > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote:
> >> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> >> 
> >> > This event represents device runtime errors to give time and
> >> > reason why device is broken.
> >> 
> >> Can you give an or more examples of the "device runtime errors" you have
> >> in mind?
> >
> > Initially we wanted to address a situation when a vhost device
> > discovered an inconsistency during virtqueue processing and silently
> > stopped the virtqueue.  This resulted in device stall (partial for
> > multiqueue devices) and we were the last to notice that.
> >
> > The solution appeared to be to employ errfd and, upon receiving a
> > notification through it, to emit a QMP event which is actionable in the
> > management layer or further up the stack.
> >
> > Then we observed that virtio (non-vhost) devices suffer from the same
> > issue: they only log the error but don't signal it to the management
> > layer.  The case was very similar so we thought it would make sense to
> > share the infrastructure and the QMP event between virtio and vhost.
> >
> > Then Konstantin went a bit further and generalized the concept into
> > generic "device runtime error".  I'm personally not completely convinced
> > this generalization is appropriate here; we'd appreciate the opinions
> > from the community on the matter.
> 
> "Device emulation sending an even on entering certain error states, so
> that a management application can do something about it" feels
> reasonable enough to me as a general concept.
> 
> The key point is of course "can do something": the event needs to be
> actionable.  Can you describe possible actions for the cases you
> implement?

The first one that we had in mind was informational, like triggering an
alert in the monitoring system and/or painting the VM as malfunctioning
in the owner's UI.

There can be more advanced scenarios like autorecovery by resetting the
faulty VM, or fencing it if it's a cluster member.

Thanks,
Roman.


  reply	other threads:[~2022-05-30 15:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 14:19 [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error() Konstantin Khlebnikov
2022-05-24 19:25   ` Vladimir Sementsov-Ogievskiy
2022-05-19 14:19 ` [PATCH 3/4] vhost: add method vhost_set_vring_err Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 4/4] vhost: forward vring errors into virtio device Konstantin Khlebnikov
2022-05-24 19:04 ` [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Vladimir Sementsov-Ogievskiy
2022-05-25  8:26   ` Konstantin Khlebnikov
2022-05-25 10:54 ` Markus Armbruster
2022-05-27 12:49   ` Roman Kagan
2022-05-30 11:28     ` Markus Armbruster
2022-05-30 15:04       ` Roman Kagan [this message]
2022-06-20 13:49         ` Roman Kagan
2022-06-21 11:55           ` Markus Armbruster
2022-06-21 12:02             ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpTdAPAo8RGD735z@rvkaganb \
    --to=rvkagan@yandex-team.ru \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@gmail.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).