From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51266) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgUZW-0006rD-Tb for qemu-devel@nongnu.org; Mon, 28 Sep 2015 05:18:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZgUZS-0005iW-TC for qemu-devel@nongnu.org; Mon, 28 Sep 2015 05:18:10 -0400 References: <1443121042-3409-1-git-send-email-armbru@redhat.com> <1443121042-3409-7-git-send-email-armbru@redhat.com> <56054E5E.3090005@redhat.com> <87y4fu1t3j.fsf@blackfin.pond.sub.org> <560590A6.3030408@redhat.com> <87io6vm08l.fsf@blackfin.pond.sub.org> From: Thomas Huth Message-ID: <560905C5.2030209@redhat.com> Date: Mon, 28 Sep 2015 11:17:57 +0200 MIME-Version: 1.0 In-Reply-To: <87io6vm08l.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Peter Maydell , ehabkost@redhat.com, Peter Crosthwaite , qemu-devel@nongnu.org, qemu-stable@nongnu.org, Christian Borntraeger , Alexander Graf , qemu-ppc@nongnu.org, Antony Pavlov , stefanha@redhat.com, Cornelia Huck , Paolo Bonzini , Alistair Francis , afaerber@suse.de, Li Guang , Richard Henderson On 28/09/15 10:11, Markus Armbruster wrote: > Thomas Huth writes: >=20 >> On 25/09/15 16:17, Markus Armbruster wrote: >>> Thomas Huth writes: >>> >>>> On 24/09/15 20:57, Markus Armbruster wrote: >>>>> Several devices don't survive object_unref(object_new(T)): they cra= sh >>>>> or hang during cleanup, or they leave dangling pointers behind. >>>>> >>>>> This breaks at least device-list-properties, because >>>>> qmp_device_list_properties() needs to create a device to find its >>>>> properties. Broken in commit f4eb32b "qmp: show QOM properties in >>>>> device-list-properties", v2.1. Example reproducer: >>>>> >>>>> $ qemu-system-aarch64 -nodefaults -display none -machine none >>>>> -S -qmp stdio >>>>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, >>>>> "major": 2}, "package": ""}, "capabilities": []}} >>>>> { "execute": "qmp_capabilities" } >>>>> {"return": {}} >>>>> { "execute": "device-list-properties", "arguments": { >>>>> "typename": "pxa2xx-pcmcia" } } >>>>> qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307: >>>>> memory_region_finalize: Assertion `((&mr->subregions)->tqh_first >>>>> =3D=3D ((void *)0))' failed. >>>>> Aborted (core dumped) >>>>> [Exit 134 (SIGABRT)] >>>>> >>>>> Unfortunately, I can't fix the problems in these devices right now. >>>>> Instead, add DeviceClass member cannot_even_create_with_object_new_= yet >>>>> to mark them: >> ... >>>>> static void pxa2xx_pcmcia_register_types(void) >>>>> diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c >>>>> index ed43d5e..e1b115d 100644 >>>>> --- a/hw/ppc/spapr_rng.c >>>>> +++ b/hw/ppc/spapr_rng.c >>>>> @@ -169,6 +169,11 @@ static void spapr_rng_class_init(ObjectClass *= oc, void *data) >>>>> dc->realize =3D spapr_rng_realize; >>>>> set_bit(DEVICE_CATEGORY_MISC, dc->categories); >>>>> dc->props =3D spapr_rng_properties; >>>>> + >>>>> + /* >>>>> + * Reason: crashes device-introspect-test for unknown reason. >>>>> + */ >>>>> + dc->cannot_even_create_with_object_new_yet =3D true; >>>>> } >>>> >>>> Please don't do that! That breaks the help output from >>>> "-device spapr-rng,?" which should help the user to see how to use t= his >>>> device! >>> >>> Well, device-introspection-test makes qemu crash, with the backtrace >>> pointing squarely to this device. Stands to reason that device >>> introspection could crash in normal usage, too. Until the crash is >>> debugged, we better disable introspection of this device. >>> >>> I quite agree that disabling introspection hurts users. Just not as >>> much as crashes :) >>> >>>> I tried to debug why this device breaks the test, but the test >>>> environment is giving me a hard time ... how do you best hook a gdb = into >>>> that framework, so you can trace such problems? >>>> Anyway, with some trial and error, I found out that it seems like th= e >>>> >>>> object_resolve_path_type("", TYPE_SPAPR_RNG, NULL) >>>> >>>> in spapr_rng_instance_init() is causing the problems. Could it be th= at >>>> object_resolve_path_type is not working with the test environment? >>> >>> I tried to figure out why this device breaks under this test, but >>> couldn't, so I posted with the "for unknown reason" comment. >> >> I've debugged this now for a while (thanks for the tip with >> MALLOC_PERTURB, by the way!) and it seems to me that the problem is in >> the macio object than in spapr-rng - the latter is just the victim of >> some memory corruption caused by the first one: The >> object_resolve_path_type() crashes while trying to go through the maci= o >> object. >> >> So could you please add the "dc->cannot_even_create_with_object_new_ye= t >> =3D true;" to macio_class_init() instead? ... that seems to fix the cr= ash >> for me, too, and is likely the better place. >=20 > Hmm. >=20 > For most of the devices my patch marks, we have a pretty good idea on > what's wrong with them. spapr-rng is among the exceptions. You believ= e > it's actually "the macio object". Which one? "macio" is abstract... >=20 > You report introspecting "spapr-rng" crashes "while trying to go throug= h > the macio object". I wonder how omitting introspection of macio object= s > (that's what marking them does to this test) could affect the object > we're going through when we crash. I have to correct myself: It's not going through the macio object, the problem is actually the "macio[0]" property that is created during memory_region_init() with object_property_add_child() ... the property points to a free()d object when the crash happens. >> Or maybe we could get this also fixed? The problem could be the >> memory_region_init(&s->bar, NULL, "macio", 0x80000) in >> macio_instance_init() ... is this ok here? Or does this rather have to >> go to the realize() function instead? >=20 > Hmm, does creating and destroying a macio object leave the memory regio= n > behind? >=20 > Paolo, is calling memory_region_init() in an instance_init() method > okay? As Paolo mentioned, we likely need to pass an "owner" to memory_region_init() or the macio memory region will get attached to "/unattached" instead - and then leave a dangling link property behind when the original macio object got destroyed. By the way, there are some more spots like this in the code, e.g. in pxa2xx_fir_instance_init() in hw/arm/pxa2xx.c ... Thomas