From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47068) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1ZQ9-0007Ft-Uu for qemu-devel@nongnu.org; Tue, 01 Nov 2016 09:48:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1ZQ6-0004UN-Qf for qemu-devel@nongnu.org; Tue, 01 Nov 2016 09:48:10 -0400 Received: from mga04.intel.com ([192.55.52.120]:43302) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c1ZQ6-0004Td-FQ for qemu-devel@nongnu.org; Tue, 01 Nov 2016 09:48:06 -0400 References: <1477850917-1214-1-git-send-email-mst@redhat.com> <1477850917-1214-38-git-send-email-mst@redhat.com> <20161031104518.7e423640@nial.brq.redhat.com> <46de99b0-8845-b324-1e5c-06b737fb45a6@linux.intel.com> <20161031120937.36d000b5@nial.brq.redhat.com> <20161101113526.1396cdd4@nial.brq.redhat.com> From: Xiao Guangrong Message-ID: Date: Tue, 1 Nov 2016 21:40:46 +0800 MIME-Version: 1.0 In-Reply-To: <20161101113526.1396cdd4@nial.brq.redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PULL 37/47] nvdimm acpi: introduce fit buffer List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: "Michael S. Tsirkin" , Peter Maydell , Richard Henderson , qemu-devel@nongnu.org, Eduardo Habkost , Paolo Bonzini On 11/01/2016 06:35 PM, Igor Mammedov wrote: > On Tue, 1 Nov 2016 11:30:46 +0800 > Xiao Guangrong wrote: > >> On 10/31/2016 07:09 PM, Igor Mammedov wrote: >>> On Mon, 31 Oct 2016 17:52:24 +0800 >>> Xiao Guangrong wrote: >>> >>>> On 10/31/2016 05:45 PM, Igor Mammedov wrote: >>>>> On Sun, 30 Oct 2016 23:25:05 +0200 >>>>> "Michael S. Tsirkin" wrote: >>>>> >>>>>> From: Xiao Guangrong >>>>>> >>>>>> The buffer is used to save the FIT info for all the presented nvdimm >>>>>> devices which is updated after the nvdimm device is plugged or >>>>>> unplugged. In the later patch, it will be used to construct NVDIMM >>>>>> ACPI _FIT method which reflects the presented nvdimm devices after >>>>>> nvdimm hotplug >>>>>> >>>>>> As FIT buffer can not completely mapped into guest address space, >>>>>> OSPM will exit to QEMU multiple times, however, there is the race >>>>>> condition - FIT may be changed during these multiple exits, so that >>>>>> some rules are introduced: >>>>>> 1) the user should hold the @lock to access the buffer and >>> Could you explain why lock is needed? >> >> Yes. These are two things need to be synced between QEMU and OSPM: >> - fit buffer. QEMU products it and VM will consume it at the same time. >> - dirty indicator. QEMU sets it and it will be cleared by OSPM, that means. >> both sides will modify it. > > I understand why 'dirty indicator' is necessary but > could you describe a concrete call flows in detail > that would cause concurrent access and need extra > lock protection. > > Note that there is global lock (dubbed BQL) in QEMU, > which is taken every time guest accesses IO port or > QMP/monitor control event happens. Ah. okay. If there is a lock to protect vm-exit handler and monitor handler, i think it is okay to drop the lock. > >>>>>> 2) mark @dirty whenever the buffer is updated. >>>>>> >>>>>> @dirty is cleared for the first time OSPM gets fit buffer, if >>>>>> dirty is detected in the later access, OSPM will restart the >>>>>> access >>>>>> >>>>>> As fit should be updated after nvdimm device is successfully realized >>>>>> so that a new hotplug callback, post_hotplug, is introduced >>>>>> >>>>>> Signed-off-by: Xiao Guangrong >>>>>> Reviewed-by: Michael S. Tsirkin >>>>>> Signed-off-by: Michael S. Tsirkin >>>>>> --- >>>>>> include/hw/hotplug.h | 10 +++++++++ >>>>>> include/hw/mem/nvdimm.h | 26 +++++++++++++++++++++- >>>>>> hw/acpi/nvdimm.c | 59 +++++++++++++++++++++++++++++++++++-------------- >>>>>> hw/core/hotplug.c | 11 +++++++++ >>>>>> hw/core/qdev.c | 20 +++++++++++++---- >>>>>> hw/i386/acpi-build.c | 2 +- >>>>>> hw/i386/pc.c | 19 ++++++++++++++++ >>>>>> 7 files changed, 124 insertions(+), 23 deletions(-) >>>>>> >>>>>> diff --git a/include/hw/hotplug.h b/include/hw/hotplug.h >>>>>> index c0db869..10ca5b6 100644 >>>>>> --- a/include/hw/hotplug.h >>>>>> +++ b/include/hw/hotplug.h >>>>>> @@ -47,6 +47,7 @@ typedef void (*hotplug_fn)(HotplugHandler *plug_handler, >>>>>> * @parent: Opaque parent interface. >>>>>> * @pre_plug: pre plug callback called at start of device.realize(true) >>>>>> * @plug: plug callback called at end of device.realize(true). >>>>>> + * @post_pug: post plug callback called after device is successfully plugged. >>>>> this doesn't seem to be necessary, >>>>> why hotplug_handler_plug() isn't sufficient? >>>> >>>> At the point of hotplug_handler_plug(), the device is not realize (realized == 0), >>>> however, nvdimm_get_plugged_device_list() works on realized nvdimm devices. >>> >>> I suggest instead of adding redundant hook, to reuse hotplug_handler_plug() >>> which is called after device::realize() successfully completed and amend >>> nvdimm_plugged_device_list() as follow: >>> >>> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c >>> index e486128..c6d8cbb 100644 >>> --- a/hw/acpi/nvdimm.c >>> +++ b/hw/acpi/nvdimm.c >>> @@ -38,11 +38,7 @@ static int nvdimm_plugged_device_list(Object *obj, void *opaque) >>> GSList **list = opaque; >>> >>> if (object_dynamic_cast(obj, TYPE_NVDIMM)) { >>> - DeviceState *dev = DEVICE(obj); >>> - >>> - if (dev->realized) { /* only realized NVDIMMs matter */ >>> - *list = g_slist_append(*list, DEVICE(obj)); >>> - } >>> + *list = g_slist_append(*list, obj); >>> } >>> >>> object_child_foreach(obj, nvdimm_plugged_device_list, opaque); >>> >>> that should count not only already present nvdimms but also the one that's >>> being hotplugged. >> >> It is not good as the device which failed to initialize > See device_set_realized() > [...] > if (dc->realize) { > dc->realize(dev, &local_err); > } > > if (local_err != NULL) { > goto fail; > } > > DEVICE_LISTENER_CALL(realize, Forward, dev); > > if (hotplug_ctrl) { > hotplug_handler_plug(hotplug_ctrl, dev, &local_err); > } > > i.e. control reaches to hotplug_handler_plug() only if > call dc->realize(dev, &local_err) is successful. > I mean, for example. if there are two devices, the first one is failed to be realized, the second one is init-ed successfully, then can nvdimm_plugged_device_list() get these two devices? Based the code of pc_dimm_built_list(), i guess yes. >> or is not being plugged >> (consider two nvdimm devices directly appended in QEMU command line, when we plug >> the first one, both nvdimm devices will be counted in this list) is also counted in >> this list... > nope, > qdev_device_add() is called sequentially (one time for each -device/device_add) > so nvdimm_plugged_device_list() sees only devices that's been added > by previous -device/device_add plus one extra device that's > being added by in progress qdev_device_add() call. > > so for "-device nvdimm,id=nv1 -device nvdimm,id=2" call sequence > would look like: > 1: > qdev_device_add("nvdimm,id=nv1") { > -> set parent // device becomes visible to nvdimm_get_plugged_device_list() > // this is the only time where device->realized == false > // could be observed, all other call chains would either > // see device->realized == true or not see device at all > -> realize() > -> device_set_realized() > -> nvdimm->realize() > -> hotplug_handler_plug() > nvdimm_get_plugged_device_list() > // would return list with 1 item (nv1) > // where nv1->realized == false) > > } > 2: > qdev_device_add("nvdimm,id=nv2") { > -> set parent // device becomes visible to nvdimm_get_plugged_device_list() > -> realize() > -> device_set_realized() > -> nvdimm->realize() > -> hotplug_handler_plug() > nvdimm_get_plugged_device_list() > // would return list with 2 items (nv1 and nv2) > // where nv1->realized == true and nv2->realized = false > } > Okay.