Re: [Qemu-devel] Error handling in realize() methods

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Peter Crosthwaite" <crosthwaitepeter@gmail.com>,
	qemu-devel@nongnu.org, "Andreas Färber" <afaerber@suse.de>
Subject: Re: [Qemu-devel] Error handling in realize() methods
Date: Tue, 8 Dec 2015 14:19:39 +0000	[thread overview]
Message-ID: <20151208141938.GB2593@work-vm> (raw)
In-Reply-To: <87a8pl9hmt.fsf@blackfin.pond.sub.org>

* Markus Armbruster (armbru@redhat.com) wrote:
> In general, code running withing a realize() method should not exit() on
> error.  Instad, errors should be propagated through the realize()
> method.  Additionally, the realize() method should fail cleanly,
> i.e. carefully undo its side effects such as wiring of interrupts,
> mapping of memory, and so forth.  Tedious work, but necessary to make
> hot plug safe.
> 
> Quite a few devices don't do that.
> 
> Some of them can be usefully hot-plugged, and for them unclean failures
> are simply bugs.  I'm going to mark the ones I can find.
> 
> Others are used only as onboard devices, and if their realize() fails,
> the machine's init() will exit()[*].  In an ideal world, we'd start with
> an empty board and cold-plugg devices, and there, clean failure may be
> useful.  In the world we live in, making these devices fail cleanly is a
> lot of tedious work for no immediate gain.
> 
> Example: machine "kzm" and device "fsl,imx31".  fsl_imx31_realize()
> returns without cleanup on error.  kzm_init() exit(1)s when realize
> fails, so the lack of cleanup is a non-issue.
> 
> I think this is basically okay for now, but I'd like us to mark these
> devices cannot_instantiate_with_device_add_yet, with /* Reason:
> realize() method fails uncleanly */.
> 
> Opinions?
> 
> Next, let's consider the special case "out of memory".
> 
> Our general approach is to treat it as immediately fatal.  This makes
> sense, because when a smallish allocation fails, the process is almost
> certainly doomed anyway.  Moreover, memory allocation is frequent, and
> attempting to recover from failed memory allocation adds loads of
> hard-to-test error paths.  These are *dangerous* unless carefully tested
> (and we don't).
> 
> Certain important allocations we handle more gracefully.  For instance,
> we don't want to die when the user tries to hot-plug more memory than we
> can allocate, or tries to open a QCOW2 image with a huge L1 table.
> 
> Guest memory allocation used to have the "immediately fatal" policy
> baked in at a fairly low level, but it's since been lifted into callers;
> see commit c261d77..fc7a580 and fixups 4f96676..0bdaa3a.  During review
> of the latter, Peter Crosthwaite called out the &error_fatal in the
> realize methods and their supporting code.  I agreed with him back then
> that the errors should really be propagated.  But now I've changed my
> mind: I think we should treat these memory allocation failures like
> we've always treated them, namely report and exit(1).  Except for
> "large" allocations, where we have a higher probability of failure, and
> a more realistic chance to recover safely.
> 
> Can we agree that passing &error_fatal to memory_region_init_ram() &
> friends is basically okay even in realize() methods and their supporting
> code?

I'd say it depends if they can be hotplugged; I think anything that we really
want to hotplug onto real running machines (as opposed to where we're just
hotplugging during setup) we should propagate errors properly.

And tbh I don't buy the small allocation argument; I think we should handle them
all; in my utopian world a guest wouldn't die unless there was no way out.

Dave

> 
> [*] Well, the ones that bother to check for errors, but that's a
> separate problem.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2015-12-08 14:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-08 13:47 [Qemu-devel] Error handling in realize() methods Markus Armbruster
2015-12-08 14:19 ` Dr. David Alan Gilbert [this message]
2015-12-09  9:30   ` Markus Armbruster
2015-12-09 10:29     ` Dr. David Alan Gilbert
2015-12-09 11:10       ` Laszlo Ersek
2015-12-10  9:22         ` Markus Armbruster
2015-12-10 11:10           ` Laszlo Ersek
2015-12-09 11:47       ` Peter Maydell
2015-12-09 12:25         ` Laszlo Ersek
2015-12-09 13:21         ` Dr. David Alan Gilbert
2015-12-10  9:27       ` Markus Armbruster
2015-12-09 13:09     ` Paolo Bonzini
2015-12-09 13:12       ` Dr. David Alan Gilbert
2015-12-09 13:43         ` Paolo Bonzini
2015-12-10 11:06       ` Markus Armbruster
2015-12-10 11:21         ` Dr. David Alan Gilbert
2015-12-10 11:22           ` Paolo Bonzini
2015-12-10 11:26         ` Paolo Bonzini
2015-12-10 12:25           ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151208141938.GB2593@work-vm \
    --to=dgilbert@redhat.com \
    --cc=afaerber@suse.de \
    --cc=armbru@redhat.com \
    --cc=crosthwaitepeter@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.