From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cRBNq-0000BU-2X for qemu-devel@nongnu.org; Wed, 11 Jan 2017 00:23:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cRBNm-0002UC-Sl for qemu-devel@nongnu.org; Wed, 11 Jan 2017 00:23:38 -0500 References: <1479555831-30960-1-git-send-email-zhang.zhanghailiang@huawei.com> <20161206134211.GD4990@noname.str.redhat.com> <20161206152415.GF2125@work-vm> <5848F139.1020402@huawei.com> <20161208200230.GH2054@work-vm> <585B40C5.9020603@huawei.com> From: Hailiang Zhang Message-ID: <5875C120.7040908@huawei.com> Date: Wed, 11 Jan 2017 13:22:40 +0800 MIME-Version: 1.0 In-Reply-To: <585B40C5.9020603@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Kevin Wolf , amit.shah@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, quintela@redhat.com ping .. ? Any comments ? Or should I send a for formal patch ? On 2016/12/22 10:56, Hailiang Zhang wrote: > On 2016/12/9 4:02, Dr. David Alan Gilbert wrote: >> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: >>> Hi, >>> >>> On 2016/12/6 23:24, Dr. David Alan Gilbert wrote: >>>> * Kevin Wolf (kwolf@redhat.com) wrote: >>>>> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: >>>>>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case >>>>>> which migration aborted QEMU because it didn't regain the control >>>>>> of images while some errors happened. >>>>>> >>>>>> Actually, we have another case in that error path to abort QEMU >>>>>> because of the same reason: >>>>>> migration_thread() >>>>>> migration_completion() >>>>>> bdrv_inactivate_all() ----------------> inactivate images >>>>>> qemu_savevm_state_complete_precopy() >>>>>> socket_writev_buffer() --------> error because destination fails >>>>>> qemu_fflush() -------------------> set error on migration stream >>>>>> qemu_mutex_unlock_iothread() ------> unlock >>>>>> qmp_migrate_cancel() ---------------------> user cancelled migration >>>>>> migrate_set_state() ------------------> set migrate CANCELLING >>>>> >>>>> Important to note here: qmp_migrate_cancel() is executed by a concurrent >>>>> thread, it doesn't depend on any code paths in migration_completion(). >>>>> >>>>>> migration_completion() -----------------> go on to fail_invalidate >>>>>> if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch >>>>>> migration_thread() -----------------------> break migration loop >>>>>> vm_start() -----------------------------> restart guest with inactive >>>>>> images >>>>>> We failed to regain the control of images because we only regain it >>>>>> while the migration state is "active", but here users cancelled the migration >>>>>> when they found some errors happened (for example, libvirtd daemon is shutdown >>>>>> in destination unexpectedly). >>>>>> >>>>>> Signed-off-by: zhanghailiang >>>>>> --- >>>>>> migration/migration.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>> index f498ab8..0c1ee6d 100644 >>>>>> --- a/migration/migration.c >>>>>> +++ b/migration/migration.c >>>>>> @@ -1752,7 +1752,8 @@ fail_invalidate: >>>>>> /* If not doing postcopy, vm_start() will be called: let's regain >>>>>> * control on images. >>>>>> */ >>>>>> - if (s->state == MIGRATION_STATUS_ACTIVE) { >>>>> >>>>> This if condition tries to check whether we ran the code path that >>>>> called bdrv_inactivate_all(), so that we only try to reactivate images >>>>> it if we really inactivated them first. >>>>> >>>>> The problem with it is that it ignores a possible concurrent >>>>> modification of s->state. >>>>> >>>>>> + if (s->state == MIGRATION_STATUS_ACTIVE || >>>>>> + s->state == MIGRATION_STATUS_CANCELLING) { >>>>> >>>>> This adds another state that we could end up with with a concurrent >>>>> modification, so that even in this case we undo the inactivation. >>>>> >>>>> However, it is no longer limited to the cases where we inactivated the >>>>> image. It also applies to other code paths (like the postcopy one) where >>>>> we didn't inactivate images. >>>>> >>>>> What saves the patch is that bdrv_invalidate_cache() is a no-op for >>>>> block devices that aren't inactivated, so calling it more often than >>>>> necessary is okay. >>>>> >>>>> But then, if we're going to rely on this, it would be much better to >>>>> just remove the if altogether. I can't say whether there are any other >>>>> possible values of s->state that we should consider, and by removing the >>>>> if we would be guaranteed to catch all of them. >>>>> >>>>> If we don't want to rely on it, just keep a local bool that remembers >>>>> whether we inactivated images and check that here. >>>>> >>>>>> Error *local_err = NULL; >>>>>> >>>>>> bdrv_invalidate_cache_all(&local_err); >>>>> >>>>> So in summary, this is a horrible patch because it checks the wrong >>>>> thing, and for I can't really say if it covers everything it needs to >>>>> cover, but arguably it happens to correctly fix the outcome of a >>>>> previously failing case. >>>>> >>>>> Normally I would reject such a patch and require a clean solution, but >>>>> then we're on the day of -rc3, so if you can't send v2 right away, we >>>>> might not have the time for it. >>>>> >>>>> Tough call... >>>> >>>> Hmm, this case is messy; I created this function having split it out >>>> of the main loop a couple of years back but it did get more messy >>>> with more s->state checks; as far as I can tell it's always >>>> done the transition to COMPLETED at the end well after the locked >>>> section, so there's always been that chance that cancellation sneaks >>>> in just before or just after the locked section. >>>> >>>> Some of the bad cases that can happen: >>>> a) A cancel sneaks in after the ACTIVE check but before or after >>>> the locked section; should we reactivate the disks? Well that >>>> depends on whether the destination actually got the full migration >>>> stream - we don't know! >>>> If the destination actually starts running we must not reactivate >>>> the disk on the source even if the CPU is stopped. >>>> >>> >>> Yes, we didn't have a mechanism to know exactly whether or not the VM in >>> destination is well received. >>> >>>> b) If the bdrv_inactive_all fails for one device, but the others >>>> are fine, we go down the fail: label and don't reactivate, so >>>> the source dies even though it might have been mostly OK. >>>> >>> >>>> We can move the _lock to before the check of s->state at the top, >>>> which would stop the cancel sneaking in early. >>>> In the case where postcopy was never enabled (!migrate_postcopy_ram()) >>>> we can move the COMPLETED transition into the lock as well; so I think >>>> then we kind of become safe. >>>> In the case where postcopy was enabled I think we can do the COMPLETED >>>> transition before waiting for the return path to close - I think but >>>> I need to think more about that. >>>> And there seem to be some dodgy cases where we call the invalidate >>>> there after a late postcopy failure; that's bad, we shouldn't reactivate >>>> the source disks after going into postcopy. >>>> >>>> So, in summary; this function is a mess - it needs a much bigger >>>> fix than this patch. >>>> >>> >>> So what's the conclusion ? >>> Will you send a patch to fix it ? Or let's fix it step by step ? >>> I think Kevin's suggestion which just remove the *if* check is OK. >> >> I don't see any of the simple solutions are easy; so I plan >> to look at it properly, but am not sure when; if you have time >> to do it then feel free. >> > > Hmm, we still have gaps between bdrv_inactivate_all() and migration thread > totally exit, which migration cancelling can slip in. > We do caught that case while we finished migration_completion() but > before the begin of cleanup work (It has global lock to be protected). > The related codes is: > > migration_completion(s, current_active_state, > &old_vm_running, &start_time); > break; > } > } > ------------------------------------------------------> gap begin > if (qemu_file_get_error(s->to_dst_file)) { > migrate_set_state(&s->state, current_active_state, > MIGRATION_STATUS_FAILED); > trace_migration_thread_file_err(); > break; > } > current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > if (current_time >= initial_time + BUFFER_DELAY) { > uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) - > initial_bytes; > uint64_t time_spent = current_time - initial_time; > double bandwidth = (double)transferred_bytes / time_spent; > max_size = bandwidth * s->parameters.downtime_limit; > > s->mbps = (((double) transferred_bytes * 8.0) / > ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; > > trace_migrate_transferred(transferred_bytes, time_spent, > bandwidth, max_size); > /* if we haven't sent anything, we don't want to recalculate > 10000 is a small enough number for our purposes */ > if (s->dirty_bytes_rate && transferred_bytes > 10000) { > s->expected_downtime = s->dirty_bytes_rate / bandwidth; > } > > qemu_file_reset_rate_limit(s->to_dst_file); > initial_time = current_time; > initial_bytes = qemu_ftell(s->to_dst_file); > } > if (qemu_file_rate_limit(s->to_dst_file)) { > /* usleep expects microseconds */ > g_usleep((initial_time + BUFFER_DELAY - current_time)*1000); > } > } > > trace_migration_thread_after_loop(); > /* If we enabled cpu throttling for auto-converge, turn it off. */ > cpu_throttle_stop(); > end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > --------------------------------------------------------------> gap end > qemu_mutex_lock_iothread(); > /* > > So maybe we should call bdrv_invalidate_cache_all() in qmp_migrate_cancel() > directly ? which i think will cover all the cases. (Including another potential > case, which QEMU has finished the migration process, but libvirtd decides to > cancel the migration before shutdown it.) > > The things here I'm not sure is, is it OK to call bdrv_invalidate_cache_all() > without bdrv_inactivate_all() has been called ? If not, we need to record > if we have inactive all blocks before call bdrv_invalidate_cache_all()。 > > Any ideas ? > > Hailiang > >> Dave >> >>> >>> Thanks, >>> Hailiang >>> >>>> Dave >>>> >>>>> Kevin >>>>> >>>> -- >>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>>> >>>> . >>>> >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >> . >>