From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52198)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1a6z3R-0004E6-8X
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:06:34 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1a6z3N-00014M-7p
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:06:33 -0500
Received: from mx1.redhat.com ([209.132.183.28]:42458)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1a6z3N-00014G-0L
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:06:29 -0500
From: Markus Armbruster <armbru@redhat.com>
References: <87a8pl9hmt.fsf@blackfin.pond.sub.org>
	<20151208141938.GB2593@work-vm> <87io480y0n.fsf@blackfin.pond.sub.org>
	<566827FC.4080701@redhat.com>
Date: Thu, 10 Dec 2015 12:06:25 +0100
In-Reply-To: <566827FC.4080701@redhat.com> (Paolo Bonzini's message of "Wed, 9
	Dec 2015 14:09:16 +0100")
Message-ID: <87a8piimvi.fsf@blackfin.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] Error handling in realize() methods
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, Peter Crosthwaite <crosthwaitepeter@gmail.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Andreas =?utf-8?Q?F=C3=A4rber?= <afaerber@suse.de>, qemu-devel@nongnu.org

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 09/12/2015 10:30, Markus Armbruster wrote:
>> My current working assumption is that passing &error_fatal to
>> memory_region_init_ram() & friends is okay even in realize() methods and
>> their supporting code, except when the allocation can be large.
>
> I suspect a lot of memory_region_init_ram()s could be considered
> potentially large (at least in the 16-64 megabytes range).  Propagation
> of memory_region_init_ram() failures is easy enough, thanks to Error**,
> that we should just do it.

Propagating an out-of-memory error right in realize() is easy.  What's
not so easy is making realize() fail cleanly (all side effects undone;
we get that wrong in many places), and finding and propagating
out-of-memory errors hiding deeper in the call tree.

However, genuinely "large" allocations should be relatively few, and
handling them gracefully in hot-pluggable devices is probably feasible.

I doubt ensuring *all* allocations on behalf of a hot-pluggable device
are handled gracefully is a good use of our reseources, or even
feasible.

Likewise, graceful error handling for devices that cannot be hot-plugged
feels like a waste of resources we can ill afford.  I think we should
simply document their non-gracefulness by either setting hotpluggable =
false or cannot_instantiate_with_device_add_yet = true with a suitable
comment.

> Even if we don't, we should use &error_abort, not &error_fatal
> (programmer error---due to laziness---rather than user error).
> &error_fatal should really be restricted to code that is running very
> close to main().

"Very close to main" is a question of dynamic context.

Consider a device that can only be created during machine initialization
(cannot_instantiate_with_device_add_yet = true or hotpluggable = false).
&error_fatal is perfectly adequate there.  &error_abort would be
misleading, because when it fails, it's almost certainly because the
user tried to create too big a machine.

Now consider a hot-pluggable device.  Its recognized "large" allocations
all fail gracefully.  What about its other allocations?  Two kinds: the
ones visible in the device model code, and the ones hiding elsewhere,
which include "a few" of the 2300+ uses of GLib memory allocation.  The
latter exit().  Why should the former abort()?

Now use that hot-pluggable device during machine initialization.
abort() is again misleading.

Let's avoid a fruitless debate on when to exit() and when to abort() on
out-of-memory, and just stick to exit().  We don't need a core dump to
tell a developer to fix his lazy error handling.