From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:51569)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1ROFVF-0006dO-Po
	for qemu-devel@nongnu.org; Wed, 09 Nov 2011 16:16:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1ROFVE-0000g1-3I
	for qemu-devel@nongnu.org; Wed, 09 Nov 2011 16:16:13 -0500
Received: from mail-iy0-f173.google.com ([209.85.210.173]:61003)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1ROFVD-0000fx-Qb
	for qemu-devel@nongnu.org; Wed, 09 Nov 2011 16:16:11 -0500
Received: by iakk32 with SMTP id k32so2453279iak.4
	for <qemu-devel@nongnu.org>; Wed, 09 Nov 2011 13:16:10 -0800 (PST)
Message-ID: <4EBAED97.2000100@codemonkey.ws>
Date: Wed, 09 Nov 2011 15:16:07 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <cover.1320865627.git.quintela@redhat.com>	<087d3dc42c667ea146edc73492b0f4afdd3a911d.1320865627.git.quintela@redhat.com>	<4EBADBC0.8000201@codemonkey.ws>
	<m3zkg56lzn.fsf@neno.neno>
In-Reply-To: <m3zkg56lzn.fsf@neno.neno>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 1/2] Reopen files after migration
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: quintela@redhat.com
Cc: qemu-devel@nongnu.org

On 11/09/2011 03:10 PM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> On 11/09/2011 01:16 PM, Juan Quintela wrote:
>>> We need to invalidate the Read Cache on the destination, otherwise we
>>> have corruption.  Easy way to reproduce it is:
>>>
>>> - create an qcow2 images
>>> - start qemu on destination of migration (qemu .... -incoming tcp:...)
>>> - start qemu on source of migration and do one install.
>>> - migrate at the end of install (when lot of disk IO has happened).
>>>
>>> Destination of migration has a local copy of the L1/L2 tables that existed
>>> at the beginning, before the install started.  We have disk corruption at
>>> this point.  The solution (for NFS) is to just re-open the file.  Operations
>>> have to happen in this order:
>>>
>>> - source of migration: flush()
>>> - destination: close(file);
>>> - destination: open(file)
>>>
>>> it is not necesary that source of migration close the file.
>>>
>>> Signed-off-by: Juan Quintela<quintela@redhat.com>
>>
>> Couple thoughts:
>>
>> 1) Pretty sure this would break -snapshot.  I do test migration with
>> -snapshot so please don't break it.
>
> Can you give me one example?  I don't know how to use -snapshot with migration.

This is totally unsafe but has always worked for me.  On the same box:

$ qemu -hda foo.img -snapshot

$ qemu -hda foo.img -snapshot -incoming tcp:localhost:1025

This is not the *only* way I test migration but it's very convenient for sniff 
testing.  The problem with your patch is that it assumes that once you've opened 
a file, the name still exists.  But that is not universally true.  It needs to 
degrade in a useful way.

I think just deferring open is probably the best strategy.

>
>> 2) I don't think this is going to work very well with encrypted drives.
>
> To be hones, no clue.

Deferring open addresses this is a nice way I think.

>> Perhaps we could do something like:
>>
>> http://mid.gmane.org/1284213896-12705-2-git-send-email-aliguori@us.ibm.com
>
> That is something like I wanted to know.
>
>> And do reopen as a default implementation.  That way we don't have to
>> do reopen for formats that don't need it (raw)
>
> Kevin told me that know that we allow online resize, we should also
> update that for raw, but I haven't tested to be sure one way or another.
>
>> or can flush caches without reopening the file (qed).
>
> qcow2 could be told to flush caches, it is that the code is not there.
> It shouldn't be _that_ difficult.  But I am not able to understand
> anymore block_open<->  block_file_open relationship.
>
>> It doesn't fix NFS close-to-open, but I think the right way to do that
>> is to defer the open, not to reopen.
>
> Fully agree here, that would be another way to fix it.  See that in my
> other answer I showed that Markus already have problems with ide + cmos,
> so I think that we should have:

I've posted patches that delay the geometry guess until the device model is 
initialized.  That avoids this particular problem.

Regards,

Anthony Liguori

>
> - initialization done before we open files/block/<whatever you call it>
> - open files/block/...
> - late initialization that uses that (almost nothing needs to be here
>    and should be easy to audit).
>
> About NFS, iSCSI, FC, my understanding is that if you use anything
> different than cache=none you are playing with fire, and will get burned
> sooner or later (it took quite a bit for Christoph to make me understand
> that, but now I fully agree with him).
>
> Later, Juan.