From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-devel@nongnu.org,
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [RFC][PATCH 0/3] Fix caching issues with live migration
Date: Sun, 12 Sep 2010 08:12:15 -0500 [thread overview]
Message-ID: <4C8CD1AF.3060904@codemonkey.ws> (raw)
In-Reply-To: <4C8CAF9C.8090903@redhat.com>
On 09/12/2010 05:46 AM, Avi Kivity wrote:
> On 09/11/2010 05:04 PM, Anthony Liguori wrote:
>> Today, live migration only works when using shared storage that is fully
>> cache coherent using raw images.
>>
>> The failure case with weak coherent (i.e. NFS) is subtle but
>> nontheless still
>> exists. NFS only guarantees close-to-open coherence and when
>> performing a live
>> migration, we do an open on the source and an open on the
>> destination. We
>> fsync() on the source before launching the destination but since we
>> have two
>> simultaneous opens, we're not guaranteed coherence.
>>
>> This is not necessarily a problem except that we are a bit gratituous
>> in reading
>> from the disk before launching a guest. This means that as things
>> stand today,
>> we're guaranteed to read the first 64k of the disk and as such, if a
>> client
>> writes to that region during live migration, corruption will result.
>>
>> The second failure condition has to do with image files (such as
>> qcow2). Today,
>> we aggressively cache metadata in all image formats and that cache is
>> definitely
>> not coherent even with fully coherent shared storage.
>>
>> In all image formats, we prefetch at least the L1 table in open()
>> which means
>> that if there is a write operation that causes a modification to an
>> L1 table,
>> corruption will ensue.
>>
>> This series attempts to address both of these issue. Technically, if
>> a NFS
>> client aggressively prefetches this solution is not enough but in
>> practice,
>> Linux doesn't do that.
>
> I think it is unlikely that it will, but I prefer to be on the right
> side of the standards.
I've been asking around about this and one thing that was suggested was
acquiring a file lock as NFS requires that a lock acquisition drops any
client cache for a file. I need to understand this a bit more so it's
step #2.
> Why not delay image open until after migration completes? I know
> your concern about the image not being there, but we can verify that
> with access(). If the image is deleted between access() and open()
> then the user has much bigger problems.
3/3 would still be needed because if we delay the open we obviously can
do a read until an open.
So it's only really a choice between invalidate_cache and delaying
open. It's a far less invasive change to just do invalidate_cache
though and it has some nice properties.
Regards,
Anthony Liguori
> Note that on NFS, removing (and I think chmoding) a file after it is
> opened will cause subsequent data access to fail, unlike posix.
>
prev parent reply other threads:[~2010-09-12 13:20 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-11 14:04 [Qemu-devel] [RFC][PATCH 0/3] Fix caching issues with live migration Anthony Liguori
2010-09-11 14:04 ` [Qemu-devel] [PATCH 1/3] block: allow migration to work with image files Anthony Liguori
2010-09-12 10:37 ` Avi Kivity
2010-09-12 13:06 ` Anthony Liguori
2010-09-12 13:28 ` Avi Kivity
2010-09-12 15:26 ` Anthony Liguori
2010-09-12 16:06 ` Avi Kivity
2010-09-12 17:10 ` Anthony Liguori
2010-09-12 17:51 ` Avi Kivity
2010-09-15 16:00 ` [Qemu-devel] " Juan Quintela
2010-09-15 15:57 ` Juan Quintela
2010-09-13 8:21 ` Kevin Wolf
2010-09-13 13:27 ` Anthony Liguori
2010-09-15 16:03 ` Juan Quintela
2010-09-16 7:54 ` Kevin Wolf
2010-09-15 15:53 ` Juan Quintela
2010-09-11 14:04 ` [Qemu-devel] [PATCH 2/3] block-nbd: fix use of protocols in backing files and nbd probing Anthony Liguori
2010-09-11 16:53 ` Stefan Hajnoczi
2010-09-11 17:27 ` Anthony Liguori
2010-09-11 17:45 ` Anthony Liguori
2010-09-15 16:06 ` [Qemu-devel] " Juan Quintela
2010-09-16 15:40 ` Anthony Liguori
2010-09-17 8:53 ` Kevin Wolf
2010-09-16 8:08 ` Kevin Wolf
2010-09-16 13:00 ` Anthony Liguori
2010-09-16 14:08 ` Kevin Wolf
2010-09-11 14:04 ` [Qemu-devel] [PATCH 3/3] disk: don't read from disk until the guest starts Anthony Liguori
2010-09-11 17:24 ` Stefan Hajnoczi
2010-09-11 17:34 ` Anthony Liguori
2010-09-12 10:42 ` Avi Kivity
2010-09-12 13:08 ` Anthony Liguori
2010-09-12 13:26 ` Avi Kivity
2010-09-12 15:29 ` Anthony Liguori
2010-09-12 16:04 ` Avi Kivity
2010-09-15 16:10 ` [Qemu-devel] " Juan Quintela
2010-09-13 8:32 ` Kevin Wolf
2010-09-13 13:29 ` Anthony Liguori
2010-09-13 13:39 ` Kevin Wolf
2010-09-13 13:42 ` Anthony Liguori
2010-09-13 14:13 ` Kevin Wolf
2010-09-13 14:34 ` Anthony Liguori
2010-09-14 9:47 ` Avi Kivity
2010-09-14 12:51 ` Anthony Liguori
2010-09-14 13:16 ` Avi Kivity
2010-09-13 19:29 ` Stefan Hajnoczi
2010-09-13 20:03 ` Kevin Wolf
2010-09-13 20:09 ` Anthony Liguori
2010-09-14 8:28 ` Kevin Wolf
2010-09-15 16:16 ` Juan Quintela
2010-09-12 10:46 ` [Qemu-devel] [RFC][PATCH 0/3] Fix caching issues with live migration Avi Kivity
2010-09-12 13:12 ` Anthony Liguori [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C8CD1AF.3060904@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).