From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55342) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ateN4-0005hL-IV for qemu-devel@nongnu.org; Fri, 22 Apr 2016 12:56:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ateN1-0002fG-9f for qemu-devel@nongnu.org; Fri, 22 Apr 2016 12:55:58 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:47389) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ateN1-0002f5-2L for qemu-devel@nongnu.org; Fri, 22 Apr 2016 12:55:55 -0400 Received: from localhost by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 22 Apr 2016 10:55:53 -0600 References: <1460752385-13259-1-git-send-email-duanj@linux.vnet.ibm.com> <1460752385-13259-5-git-send-email-duanj@linux.vnet.ibm.com> <20160420051433.GG1133@voom> <57190C42.5080605@linux.vnet.ibm.com> <20160422042845.GE15176@voom.fritz.box> From: Jianjun Duan Message-ID: <571A5791.7080106@linux.vnet.ibm.com> Date: Fri, 22 Apr 2016 09:55:45 -0700 MIME-Version: 1.0 In-Reply-To: <20160422042845.GE15176@voom.fritz.box> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, mdroth@linux.vnet.ibm.com On 04/21/2016 09:28 PM, David Gibson wrote: > On Thu, Apr 21, 2016 at 10:22:10AM -0700, Jianjun Duan wrote: >> >> >> On 04/19/2016 10:14 PM, David Gibson wrote: >>> On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote: >>>> ccs_list in spapr state maintains the device tree related >>>> information on the rtas side for hotplugged devices. In racing >>>> situations between hotplug events and migration operation, a rtas >>>> hotplug event could be migrated from the source guest to target >>>> guest, or the source guest could have not yet finished fetching >>>> the device tree when migration is started, the target will try >>>> to finish fetching the device tree. By migrating ccs_list, the >>>> target can fetch the device tree properly. >>>> >>>> We tracked the size of ccs_list queue, and introduced a dynamic >>>> cache for ccs_list to get around having to create VMSD for the >>>> queue. There also existence tests in place for the newly added >>>> fields in the spapr state VMSD to make sure forward migration >>>> is not broken. >>>> >>>> Signed-off-by: Jianjun Duan >>> So there's problems here at a couple of levels. >>> >>> 1. Even more so that the indicators, this is transitional state, which >>> it would be really nice if we can avoid migrating. At the very least >>> the new state should go into a subsection with an is_needed function >>> which will skip it if we don't have a configure connector in progress. >>> That means we'll be able to backwards migrate as long as we're not in >>> the middle of a hotplug, which won't be possible with this version. >>> >>> Again, if there's some way we can defer or retry the operation instead >>> of migrating the interim state, that would be even better. >> I am not sure why transitional state should not be migrated. > > Because every extra piece of state to migrate is something which can > go wrong. Getting migration working properly is a difficult and > fragile process as it is - every extra bit of state we add makes it > harder. > Migrating this and pending_events does fix the racing. The alternative such as delay or throttling involves timing issues, which IMHO are even trickier to get it right. >> I think the >> fact that it changes is the reason why it should be migrated. As for >> backward migration, I thought about it, but decided to leave it later >> to beat the time. We can do it later, or do it this time and delayed the >> patches more. I agree that subsection seems to be the solution here. > > Leaving backwards migration until later - after you've already > introduced a new stream version - will make implementing it much, much > more difficult. > I will use subsection to fix the backward migration. >>> 2. Having to copy the list of elements out into an array just for >>> migration is pretty horrible. I'm almost include to suggest we should >>> add list migration into the savevm core rather than this. >> I thought about a general approach of adding something to savevm to >> handle the queue. But the queue used here uses macro and doesn't really >> support polymorphism. And we need to use QTAILQ_FOREACH(var, head, field) >> to access it. It is not easy as we may need to modify the macro definition >> to carry >> type name of the elements of the queue. >> >> Also I am following Alexey's code here ([PATCH qemu v15 07/17] spapr_iommu: >> Migrate full state). >> It was reviewed by you. > > Yeah. It's not incorrect, but it's ugly and every extra time I see > it, the impetus to find a better way increases. > I agree it is not elegant. I will look into it to see if I can create something similar to QTAILQ_FOREACH to get around the type issue. >>>> --- >>>> hw/ppc/spapr.c | 60 ++++++++++++++++++++++++++++++++++++++++++++- >>>> hw/ppc/spapr_rtas.c | 2 ++ >>>> include/hw/ppc/spapr.h | 11 +++++++++ >>>> include/migration/vmstate.h | 8 +++++- >>>> 4 files changed, 79 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>>> index af4745c..eab95f0 100644 >>>> --- a/hw/ppc/spapr.c >>>> +++ b/hw/ppc/spapr.c >>>> @@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp) >>>> } >>>> } >>>> +static void spapr_pre_save(void *opaque) >>>> +{ >>>> + sPAPRMachineState *spapr = (sPAPRMachineState *)opaque; >>>> + sPAPRConfigureConnectorState *ccs; >>>> + sPAPRConfigureConnectorStateCache *ccs_cache; >>>> + >>>> + /* Copy ccs_list to ccs_list_cache */ >>>> + spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache, >>>> + spapr->ccs_list_num); >>>> + ccs_cache = spapr->ccs_list_cache; >>>> + QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) { >>>> + ccs_cache->drc_index = ccs->drc_index; >>>> + ccs_cache->fdt_offset = ccs->fdt_offset; >>>> + ccs_cache->fdt_depth = ccs->fdt_depth; >>>> + ccs_cache++; >>>> + } >>>> +} >>>> + >>>> static int spapr_post_load(void *opaque, int version_id) >>>> { >>>> sPAPRMachineState *spapr = (sPAPRMachineState *)opaque; >>>> int err = 0; >>>> + sPAPRConfigureConnectorState *ccs; >>>> + sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache; >>>> + int index = 0; >>>> /* In earlier versions, there was no separate qdev for the PAPR >>>> * RTC, so the RTC offset was stored directly in sPAPREnvironment. >>>> @@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int version_id) >>>> err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset); >>>> } >>>> + if (version_id < 4) { >>>> + return err; >>>> + } >>>> + /* Copy ccs_list_cache to ccs_list */ >>>> + for (index = 0; index < spapr->ccs_list_num; index ++) { >>>> + ccs = g_new0(sPAPRConfigureConnectorState, 1); >>>> + ccs->drc_index = (ccs_cache + index)->drc_index; >>>> + ccs->fdt_offset = (ccs_cache + index)->fdt_offset; >>>> + ccs->fdt_depth = (ccs_cache + index)->fdt_depth; >>>> + QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next); >>>> + } >>>> + g_free(spapr->ccs_list_cache); >>>> + >>>> return err; >>>> } >>>> @@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int version_id) >>>> return version_id < 3; >>>> } >>>> +static bool version_ge_4(void *opaque, int version_id) >>>> +{ >>>> + return version_id >= 4; >>>> +} >>>> + >>>> +static const VMStateDescription vmstate_spapr_ccs_cache = { >>>> + .name = "spaprconfigureconnectorstate", >>>> + .version_id = 1, >>>> + .minimum_version_id = 1, >>>> + .fields = (VMStateField[]) { >>>> + VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache), >>>> + VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache), >>>> + VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache), >>>> + VMSTATE_END_OF_LIST() >>>> + }, >>>> +}; >>>> + >>>> static const VMStateDescription vmstate_spapr = { >>>> .name = "spapr", >>>> - .version_id = 3, >>>> + .version_id = 4, >>>> .minimum_version_id = 1, >>>> + .pre_save = spapr_pre_save, >>>> .post_load = spapr_post_load, >>>> .fields = (VMStateField[]) { >>>> /* used to be @next_irq */ >>>> @@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = { >>>> VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3), >>>> VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2), >>>> + /* RTAS state */ >>>> + VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4), >>> You don't generally need to write your own version test functions for specific-version. Instead you can just use VMSTATE_INT32_V. >> I agree. I realized that after the code was posted here. > > Ok. > >>>> + VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState, >>>> + version_ge_4, ccs_list_num, 1, >>>> + vmstate_spapr_ccs_cache, >>>> + sPAPRConfigureConnectorStateCache), >>>> VMSTATE_END_OF_LIST() >>>> }, >>>> }; >>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c >>>> index f073258..9cfd559 100644 >>>> --- a/hw/ppc/spapr_rtas.c >>>> +++ b/hw/ppc/spapr_rtas.c >>>> @@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr, >>>> { >>>> g_assert(!spapr_ccs_find(spapr, ccs->drc_index)); >>>> QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next); >>>> + spapr->ccs_list_num++; >>>> } >>>> static void spapr_ccs_remove(sPAPRMachineState *spapr, >>>> @@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr, >>>> { >>>> QTAILQ_REMOVE(&spapr->ccs_list, ccs, next); >>>> g_free(ccs); >>>> + spapr->ccs_list_num--; >>>> } >>>> void spapr_ccs_reset_hook(void *opaque) >>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >>>> index 815d5ee..c8be926 100644 >>>> --- a/include/hw/ppc/spapr.h >>>> +++ b/include/hw/ppc/spapr.h >>>> @@ -11,6 +11,8 @@ struct VIOsPAPRBus; >>>> struct sPAPRPHBState; >>>> struct sPAPRNVRAM; >>>> typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState; >>>> +typedef struct sPAPRConfigureConnectorStateCache >>>> + sPAPRConfigureConnectorStateCache; >>>> typedef struct sPAPREventLogEntry sPAPREventLogEntry; >>>> #define HPTE64_V_HPTE_DIRTY 0x0000000000000040ULL >>>> @@ -75,6 +77,9 @@ struct sPAPRMachineState { >>>> /* RTAS state */ >>>> QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list; >>>> + /* Temporary cache for migration purposes */ >>>> + int32_t ccs_list_num; >>>> + sPAPRConfigureConnectorStateCache *ccs_list_cache; >>>> /*< public >*/ >>>> char *kvm_type; >>>> @@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState { >>>> QTAILQ_ENTRY(sPAPRConfigureConnectorState) next; >>>> }; >>>> +struct sPAPRConfigureConnectorStateCache { >>>> + uint32_t drc_index; >>>> + int fdt_offset; >>>> + int fdt_depth; >>>> +}; >>>> + >>>> void spapr_ccs_reset_hook(void *opaque); >>>> #define TYPE_SPAPR_RTC "spapr-rtc" >>>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h >>>> index 1622638..7966979 100644 >>>> --- a/include/migration/vmstate.h >>>> +++ b/include/migration/vmstate.h >>>> @@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap; >>>> .offset = offsetof(_state, _field), \ >>>> } >>>> -#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\ >>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, _field_num, _version, _vmsd, _type) { \ >>>> .name = (stringify(_field)), \ >>>> .version_id = (_version), \ >>>> + .field_exists = (_test), \ >>>> .vmsd = &(_vmsd), \ >>>> .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \ >>>> .size = sizeof(_type), \ >>>> @@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap; >>>> VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version, \ >>>> _vmsd, _type) >>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, \ >>>> + _vmsd, _type) \ >>>> + VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num, \ >>>> + _version, _vmsd, _type) >>>> + >>>> #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, _size) \ >>>> VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, _info, \ >>>> _size) >> >