From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58223)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1a6zMy-0005sq-ST
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:26:50 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1a6zMt-0006pv-Mi
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:26:44 -0500
Received: from mx1.redhat.com ([209.132.183.28]:34338)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1a6zMt-0006pV-FZ
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:26:39 -0500
References: <87a8pl9hmt.fsf@blackfin.pond.sub.org>
	<20151208141938.GB2593@work-vm> <87io480y0n.fsf@blackfin.pond.sub.org>
	<566827FC.4080701@redhat.com> <87a8piimvi.fsf@blackfin.pond.sub.org>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <56696164.8000009@redhat.com>
Date: Thu, 10 Dec 2015 12:26:28 +0100
MIME-Version: 1.0
In-Reply-To: <87a8piimvi.fsf@blackfin.pond.sub.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] Error handling in realize() methods
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, Peter Crosthwaite <crosthwaitepeter@gmail.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, =?UTF-8?Q?Andreas_F=c3=a4rber?= <afaerber@suse.de>, qemu-devel@nongnu.org


On 10/12/2015 12:06, Markus Armbruster wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>=20
>> On 09/12/2015 10:30, Markus Armbruster wrote:
>>> My current working assumption is that passing &error_fatal to
>>> memory_region_init_ram() & friends is okay even in realize() methods =
and
>>> their supporting code, except when the allocation can be large.
>>
>> I suspect a lot of memory_region_init_ram()s could be considered
>> potentially large (at least in the 16-64 megabytes range).  Propagatio=
n
>> of memory_region_init_ram() failures is easy enough, thanks to Error**=
,
>> that we should just do it.
>=20
> Propagating an out-of-memory error right in realize() is easy.  What's
> not so easy is making realize() fail cleanly (all side effects undone;
> we get that wrong in many places), and finding and propagating
> out-of-memory errors hiding deeper in the call tree.

grep is your friend.  We're talking of a subset of these:

hw/block/onenand.c:    memory_region_init_ram(&s->ram, OBJECT(s), "onenan=
d.ram",
hw/block/pflash_cfi01.c:    memory_region_init_rom_device(
hw/block/pflash_cfi02.c:    memory_region_init_rom_device(&pfl->orig_mem,=
 OBJECT(pfl), pfl->be ?
hw/display/cg3.c:    memory_region_init_ram(&s->rom, obj, "cg3.prom", FCO=
DE_MAX_ROM_SIZE,
hw/display/cg3.c:    memory_region_init_ram(&s->vram_mem, NULL, "cg3.vram=
", s->vram_size,
hw/display/g364fb.c:    memory_region_init_ram_ptr(&s->mem_vram, NULL, "v=
ram",
hw/display/qxl.c:    memory_region_init_ram(&qxl->rom_bar, OBJECT(qxl), "=
qxl.vrom",
hw/display/qxl.c:    memory_region_init_ram(&qxl->vram_bar, OBJECT(qxl), =
"qxl.vram",
hw/display/qxl.c:    memory_region_init_ram(&qxl->vga.vram, OBJECT(dev), =
"qxl.vgavram",
hw/display/sm501.c:    memory_region_init_ram(&s->local_mem_region, NULL,=
 "sm501.local",
hw/display/tc6393xb.c:    memory_region_init_ram(&s->vram, NULL, "tc6393x=
b.vram", 0x100000,
hw/display/tcx.c:    memory_region_init_ram(&s->rom, obj, "tcx.prom", FCO=
DE_MAX_ROM_SIZE,
hw/display/tcx.c:    memory_region_init_ram(&s->vram_mem, OBJECT(s), "tcx=
.vram",
hw/display/vga.c:    memory_region_init_ram(&s->vram, obj, "vga.vram", s-=
>vram_size,
hw/display/vmware_vga.c:    memory_region_init_ram(&s->fifo_ram, NULL, "v=
msvga.fifo", s->fifo_size,
hw/dma/rc4030.c:    memory_region_init_rom_device(&s->dma_tt, o,
hw/i386/pci-assign-load-rom.c:    memory_region_init_ram(&dev->rom, owner=
, name, st.st_size, &error_abort);
hw/input/milkymist-softusb.c:    memory_region_init_ram(&s->pmem, OBJECT(=
s), "milkymist-softusb.pmem",
hw/input/milkymist-softusb.c:    memory_region_init_ram(&s->dmem, OBJECT(=
s), "milkymist-softusb.dmem",
hw/net/dp8393x.c:    memory_region_init_ram(&s->prom, OBJECT(dev),
hw/net/milkymist-minimac2.c:    memory_region_init_ram(&s->buffers, OBJEC=
T(dev), "milkymist-minimac2.buffers",
hw/pci/pci.c:    memory_region_init_ram(&pdev->rom, OBJECT(pdev), name, s=
ize, &error_fatal);
hw/s390x/sclp.c:            memory_region_init_ram(standby_ram, NULL, id,=
 this_subregion_size,

Some of them are in sysbus devices and can thus be ignored.  Some of the
others may indeed need introduction of additional error propagation or ma=
y
be non-trivial.

> I doubt ensuring *all* allocations on behalf of a hot-pluggable device
> are handled gracefully is a good use of our reseources, or even
> feasible.

I agree.  I don't think we have large allocations in device models,
other than from memory region allocation (e.g. VRAM).

> Consider a device that can only be created during machine initializatio=
n
> (cannot_instantiate_with_device_add_yet =3D true or hotpluggable =3D fa=
lse).
> &error_fatal is perfectly adequate there.  &error_abort would be
> misleading, because when it fails, it's almost certainly because the
> user tried to create too big a machine.

I agree here.

> Now consider a hot-pluggable device.  Its recognized "large" allocation=
s
> all fail gracefully.  What about its other allocations?  Two kinds: the
> ones visible in the device model code, and the ones hiding elsewhere,
> which include "a few" of the 2300+ uses of GLib memory allocation.  The
> latter exit().  Why should the former abort()?

We've already ruled that the latter simply won't happen, so we can say at
the same time that g_malloc() does not exit at all. :)

&error_fatal is a user error, &error_abort is an assertion failure.  This
is much closer to an assertion failure.

> Now use that hot-pluggable device during machine initialization.
> abort() is again misleading.

Perhaps, but---because it's hotpluggable---what happens during machine
initialization is much less important.  It doesn't lead to lost data.
An abort() is much more likely to catch the developers' and the users'
attention and much more likely to be reported (compare "huh, it went
away?!?" with "abrt is asking me to open a Fedora bug, interesting").

> Let's avoid a fruitless debate on when to exit() and when to abort() on
> out-of-memory, and just stick to exit().  We don't need a core dump to
> tell a developer to fix his lazy error handling.

Perhaps we do...

Paolo