From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58411) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVfIa-0003wQ-GO for qemu-devel@nongnu.org; Wed, 11 Mar 2015 07:59:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YVfIW-00029e-Ea for qemu-devel@nongnu.org; Wed, 11 Mar 2015 07:59:40 -0400 Received: from cantor2.suse.de ([195.135.220.15]:45241 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVfIW-00029Q-4t for qemu-devel@nongnu.org; Wed, 11 Mar 2015 07:59:36 -0400 Message-ID: <55002E25.8070704@suse.de> Date: Wed, 11 Mar 2015 12:59:33 +0100 From: =?windows-1252?Q?Andreas_F=E4rber?= MIME-Version: 1.0 References: <1426024655-17561-1-git-send-email-ehabkost@redhat.com> <1426024655-17561-2-git-send-email-ehabkost@redhat.com> <54FF739D.9030102@suse.de> <20150311111118.GG3513@thinpad.lan.raisama.net> In-Reply-To: <20150311111118.GG3513@thinpad.lan.raisama.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 1/1] target-i386: Remove icc_bridge parameter from cpu_x86_create() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: zhugh.fnst@cn.fujitsu.com, "Michael S. Tsirkin" , Markus Armbruster , qemu-devel@nongnu.org, tangchen@cn.fujitsu.com, chen.fan.fnst@cn.fujitsu.com, isimatu.yasuaki@jp.fujitsu.com, Paolo Bonzini , Gu Zheng , Igor Mammedov , anshul.makkar@profitbricks.com Am 11.03.2015 um 12:11 schrieb Eduardo Habkost: > On Tue, Mar 10, 2015 at 11:43:41PM +0100, Andreas F=E4rber wrote: >> Am 10.03.2015 um 22:57 schrieb Eduardo Habkost: >>> Instead of passing icc_bridge from the PC initialization code to >>> cpu_x86_create(), make the PC initialization code attach the CPU to >>> icc_bridge. >>> >>> The only difference here is that icc_bridge attachment will now be do= ne >>> after x86_cpu_parse_featurestr() is called. But this shouldn't make a= ny >>> difference, as property setters shouldn't depend on icc_bridge. >>> >>> Signed-off-by: Eduardo Habkost >>> --- >>> Changes v1 -> v2: >>> * Keep existing check for NULL icc_bridge and error reporting, inste= ad >>> of assing assert(icc_bridge) >>> --- >>> hw/i386/pc.c | 13 +++++++++++-- >>> target-i386/cpu.c | 14 ++------------ >>> target-i386/cpu.h | 3 +-- >>> 3 files changed, 14 insertions(+), 16 deletions(-) >>> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >>> index b5b2aad..a26e0ec 100644 >>> --- a/hw/i386/pc.c >>> +++ b/hw/i386/pc.c >>> @@ -992,18 +992,27 @@ void pc_acpi_smi_interrupt(void *opaque, int ir= q, int level) >>> static X86CPU *pc_new_cpu(const char *cpu_model, int64_t apic_id, >>> DeviceState *icc_bridge, Error **errp) >>> { >>> - X86CPU *cpu; >>> + X86CPU *cpu =3D NULL; >>> Error *local_err =3D NULL; >>> =20 >>> - cpu =3D cpu_x86_create(cpu_model, icc_bridge, &local_err); >>> + if (icc_bridge =3D=3D NULL) { >>> + error_setg(&local_err, "Invalid icc-bridge value"); >>> + goto out; >>> + } >>> + >>> + cpu =3D cpu_x86_create(cpu_model, &local_err); >> >> We had previously discussed reference counting. Here I would expect: >=20 > I will try to extend the analysis with ownership of each reference: >=20 >> >> OBJECT(cpu)->ref =3D=3D 1 >=20 > And the owner of the reference is pc_new_cpu() (cpu variable). >=20 >> >>> if (local_err !=3D NULL) { >>> error_propagate(errp, local_err); >>> return NULL; >>> } >>> =20 >>> + qdev_set_parent_bus(DEVICE(cpu), qdev_get_child_bus(icc_bridge, = "icc")); >> >> OBJECT(cpu)->ref =3D=3D 2 >=20 > And the owners are: pc_new_cpu()/cpu and icc_bridge. >=20 > Now, what if we error out and destroy the CPU after we already added th= e > CPU to icc_bridge? Is icc_bridge going to keep pointing to the dead > object, or is there some bus-detach magic somewhere that will make it > work? Lowering the ref count to 0 will trigger object_finalize(), which via object_property_del_all() triggers unparenting via object_finalize_child_property(), which in the device case causes unrealization of the device and unparenting of its owned buses (qdev.c:device_unparent()). >>> + object_unref(OBJECT(cpu)); >> >> OBJECT(cpu)->ref =3D=3D 1 >=20 > Here pc_new_cpu() is dropping its reference too early! In practice it i= s > now borrowing the reference owned by icc_bridge, and I don't think we > should do that. >=20 > I just kept the object_unref() call here because I didn't want to chang= e > any behavior when moving code, but I think it doesn't belong here. Agree. In my patches I placed it after the last usage of cpu in this function, but actually since it is the return value it would even need to go into the callers. pc_cpus_init() is currently ignoring the return value. For using the CPU as a child<> of a CPU core object that changes in my series, so I'll fix that. >>> + >>> object_property_set_int(OBJECT(cpu), apic_id, "apic-id", &local_= err); >>> object_property_set_bool(OBJECT(cpu), true, "realized", &local_e= rr); >> >> OBJECT(cpu)->ref =3D=3D 1 or 2 depending on DeviceClass::realize :) >=20 > If it's 2, it won't be our problem as we don't own the extra reference. > It's the responsibility of whoever grabbed the extra reference. >=20 > But I assume the property setters above MUST not add any extra referenc= e > in case they return an error. Correct? So, the extra reference would be the one added by device_set_realized() for /machine/unattached parent. That happens as very first thing on false -> true transition. If an error occurs either in DeviceClass::realize or later in device_set_realized() then that reference will remain, but it is re-entrant in only doing so once. I have refreshing and combining the two competing qom-tree HMP patches on my radar and am starting to think that reference counting information may be a third interesting mode of output... >>> =20 >>> +out: >>> if (local_err) { >>> error_propagate(errp, local_err); >>> object_unref(OBJECT(cpu)); >> >=20 > And here we have something that was already broken: X86CPU instance_ini= t > calls cpu_exec_init(), the CPU is added to the global CPU list without > increasing reference counting, and the global list will point to a > destroyed object if we enter the error path. >=20 > In other words, if anything fails after cpu_exec_init() is called, we'r= e > screwed. In the future it will be on realize, but we probably need to > move it closer to the end of realize. Yeah, most of the error handling kind of assumes that when an error occurs we report the Error* upwards and exit QEMU, with the user restarting from scratch. With CPU hotplug that is a nasty assumption. >> object_unref(NULL) looks unusual but is valid. >=20 > Yes. Makes the code simpler. :) >=20 >> >> Should we change the return NULL to jump here, too, then? >=20 > Yes, the cpu_x86_create() error check could just do a "goto out". I assume you are planning to apply this patch through your tree? Will you submit a v3? Otherwise can you tweak when applying and let me know so I can cherry-pick? Thanks. >> OBJECT(cpu)->ref =3D=3D 0 or 1 >> >> I wonder whether we need another object_unref(OBJECT(cpu)) for the >> non-error case, either here or in the callers? Out of scope for this >> patch, of course. >=20 > So, how I see it: if we are returning a reference to the object, now it > belongs to the caller, and it should be the caller responsibility to > call object_unref(). So the non-error object_unref() after > qdev_set_parent_bus() is not supposed to be in pc_new_cpus(), but in th= e > callers. Yeah, agree. > Either way we choose, we should document it so callers know who > owns the reference they get. >=20 > Alternatively, we could simply change pc_new_cpu() to _not_ return a > pointer to the CPU, and make pc_cpus_init() deal with the APIC MMIO > mapping using some another approach. No, that won't work for me. But my question was rather whether changes for the error case will be needed? If the reference for /machine/unassigned/device[n] remains around, then our local object_unref() won't unparent/finalize the device, meaning it stays around. If we do an additional unref then unparenting would make the ref count go to -1. So we may need to unparent it explicitly here in this error path? Hard to test. Regards, Andreas --=20 SUSE Linux GmbH, Maxfeldstr. 5, 90409 N=FCrnberg, Germany GF: Felix Imend=F6rffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG N=FCrnberg)