From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40597)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1aGY6v-0005uL-AS
	for qemu-devel@nongnu.org; Tue, 05 Jan 2016 15:21:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1aGY6u-0007e1-AE
	for qemu-devel@nongnu.org; Tue, 05 Jan 2016 15:21:41 -0500
References: <1450802786-20893-1-git-send-email-kwolf@redhat.com>
	<1450802786-20893-6-git-send-email-kwolf@redhat.com>
	<5679B5DE.5070205@redhat.com>
From: John Snow <jsnow@redhat.com>
Message-ID: <568C25CB.7050808@redhat.com>
Date: Tue, 5 Jan 2016 15:21:31 -0500
MIME-Version: 1.0
In-Reply-To: <5679B5DE.5070205@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH 05/10] block: Inactivate BDS
 when migration completes
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, Kevin Wolf <kwolf@redhat.com>, qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, mreitz@redhat.com


On 12/22/2015 03:43 PM, Eric Blake wrote:
> On 12/22/2015 09:46 AM, Kevin Wolf wrote:
>> So far, live migration with shared storage meant that the image is in a
>> not-really-ready don't-touch-me state on the destination while the
>> source is still actively using it, but after completing the migration,
>> the image was fully opened on both sides. This is bad.
>>
>> This patch adds a block driver callback to inactivate images on the
>> source before completing the migration. Inactivation means that it goes
>> to a state as if it was just live migrated to the qemu instance on the
>> source (i.e. BDRV_O_INCOMING is set). You're then supposed to continue
>> either on the source or on the destination, which takes ownership of the
>> image.
>>
>> A typical migration looks like this now with respect to disk images:
>>
>> 1. Destination qemu is started, the image is opened with
>>    BDRV_O_INCOMING. The image is fully opened on the source.
>>
>> 2. Migration is about to complete. The source flushes the image and
>>    inactivates it. Now both sides have the image opened with
>>    BDRV_O_INCOMING and are expecting the other side to still modify it.
> 
> The name BDRV_O_INCOMING now doesn't quite match semantics on the
> source, but I don't have any better suggestions.  BDRV_O_LIMITED_USE?
> BDRV_O_HANDOFF?  At any rate, I fully agree with your logic of locking
> things down on the source to mark that the destination is about to take
> over write access to the file.
> 

INCOMING is handy as it keeps the code simple, even if it's weird to
read. Is it worth adding the extra ifs/case statements everywhere to add
in BDRV_O_HANDOFF? Maybe in the future someone will use BDRV_O_INCOMING
to mean something more specific (data is incoming, not just in the
process of being handed off) that could cause problems.

Maybe even just renaming BDRV_O_INCOMING right now to be BDRV_O_HANDOFF
would accomplish the semantics we want on both source and destination
without needing two flags.

Follow your dreams, Go with what you feel.

>>
>> 3. One side (the destination on success) continues and calls
>>    bdrv_invalidate_all() in order to take ownership of the image again.
>>    This removes BDRV_O_INCOMING on the resuming side; the flag remains
>>    set on the other side.
>>
>> This ensures that the same image isn't written to by both instances
>> (unless both are resumed, but then you get what you deserve). This is
>> important because .bdrv_close for non-BDRV_O_INCOMING images could write
>> to the image file, which is definitely forbidden while another host is
>> using the image.
> 
> And indeed, this is a prereq to your patch that modifies the file on
> close to clear the new 'open-for-writing' flag :)
> 
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block.c                   | 34 ++++++++++++++++++++++++++++++++++
>>  include/block/block.h     |  1 +
>>  include/block/block_int.h |  1 +
>>  migration/migration.c     |  7 +++++++
>>  qmp.c                     | 12 ++++++++++++
>>  5 files changed, 55 insertions(+)
>>
> 
>> @@ -1536,6 +1540,9 @@ static void migration_completion(MigrationState *s, int current_active_state,
>>          if (!ret) {
>>              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>              if (ret >= 0) {
>> +                ret = bdrv_inactivate_all();
>> +            }
>> +            if (ret >= 0) {
>>                  qemu_file_set_rate_limit(s->file, INT64_MAX);
> 
> Isn't the point of the rate limit change to allow any pending operations
> to flush without artificial slow limits?  Will inactivating the device
> be too slow if rate limits are still slow?
> 

This sets the rate limit for the migration pipe, doesn't it? My reading
was that this removes any artificial limits for the sake of post-copy,
but we shouldn't be flushing any writes to disk at this point, so I
think this order won't interfere with anything.

> But offhand, I don't have any strong proof that a different order is
> required, so yours makes sense to me.
> 
> You may want a second opinion, but I'm okay if you add:
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
Reviewed-by: John Snow <jsnow@redhat.com>