qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Storage requirements for live migration
@ 2011-11-11  0:11 Anthony Liguori
  2011-11-11  6:27 ` Mark Wu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Anthony Liguori @ 2011-11-11  0:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Juan Quintela, Christoph Hellwig, Avi Kivity,
	Stefan Hajnoczi

I did a brain dump of my understanding of the various storage requirements for 
live migration.  I think it's accurate but I may have misunderstand some details 
so I would appreciate review.

I think given sections (1) and (2), the only viable thing is to require 
cache=none unless we get new interfaces to flush caches.

Section (3) talks about image formats.  As I mentioned elsewhere in the thread, 
I think the best we can do right now is have a block layer interface to quiesce 
the image format.  I think reopen may be a viable short term strategy for qcow2 
but I think for raw, we should just make the quiesce operation a nop.

http://wiki.qemu.org/Migration/Storage

Inlined below for ease of review.

Regards,

Anthony Liguori

Migration in QEMU is designed assuming cache coherent shared storage and raw 
format block devices.  There are some cases where less migration will also work 
with more weakly coherent shared storage.  This wiki page attempts to outline 
those scenarios.  It also attempts to iterate through the reasons why various 
image formats do not support migration even with shared storage.

== NFS ==

=== Background ===

NFS only offers close-to-open cache coherence.  This means that the only 
guarantee provided by the protocol is that if you close a file in a client A and 
then open the file in another client B, client B will see client A's changes.

The way migration works in QEMU, the source stops the guest after it sends all 
of the required data but does not immediately free any resources.  This makes 
migration more reliable since it avoids the Two Generals Problem allowing a 
reliable third node to make the final decision about whether migration was 
successful.

As soon as the destination receives all of the data, it immediately starts the 
guest.  This means that the reliable third node is not in the critical path of 
migration downtime but can still recover a failed migration.

Since the source never knows that the destination is okay, the only way to 
support NFS robustly would be to close all files on the source before sending 
the last chunk of migration data.  This would mean that if any failure occurred 
after this point, the VM would be lost.

=== In Practice ===

A Linux NFS server that exports with 'sync' offers a stronger coherency than NFS 
guarantees.  This is an implementation detail, not a guarantee as far as I know. 
  If the client sends a read request, then any data that has been acknowledged 
done with a stable write by any other client will be returned without the need 
to close and reopen the file.

A file opened with O_DIRECT with the Linux NFS client code wil always issue a 
protocol read operation given a userspace read() call.  This means that if you 
issue stable writes (fsync) on the source and then use O_DIRECT to read on the 
destination, you can safely access the same file without reopening.

=== Conclusion ===

Migration with QEMU is safe, in practice, when using Linux as an NFS server and 
client when both the source and destination are using cache=none for the disks 
and a raw file.

== iSCSI/Direct Attached Storage ==

iSCSI has a similar cache coherency guarantee to direct attached storage (via 
fibre channel).  Any read request will return data that has been acknowledged as 
written by another client.

Since QEMU issues read() requests in userspace, Linux normally uses the page 
cache.  The Linux page cache is not coherent across multiple nodes so the only 
way to safely access storage coherently is to bypass the Linux page cache via 
cache=none.

=== Conclusion ===

iSCSI, FC, or other forms of direct attached storage are only safe to use with 
live migration if you use cache=none and a raw image.

== Clustered File Systems ==

Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to use 
with live migration regardless of the caching option use as long as raw images 
are used.

== Image Formats ==

Image formats are not safe to use with live migration.  The reason is that QEMU 
caches data for image formats and does not have a mechanism to flush those 
caches.  The following attempts to describe the issues with the various formats

=== QCOW2 ===

QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, 
etc) and mutable header information (file size, snapshot entries, etc).

This data needs to be discarded before after migration starts.

=== QED ===

QED caches similar data to QCOW2.  In addition, the QED header has a dirty flag 
that must be handled specially in the case of live migration.

=== Raw Files ===

Technically, the file size of a raw file is mutable metadata that QEMU caches. 
This is only applicable when using online image resizing.  If you avoid online 
image resizing during live migration, raw files are completely safe provided the 
storage used meets the above requirements.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-11-11 23:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-11  0:11 [Qemu-devel] Storage requirements for live migration Anthony Liguori
2011-11-11  6:27 ` Mark Wu
2011-11-11  9:15   ` Kevin Wolf
2011-11-11  9:38 ` Kevin Wolf
2011-11-11  9:55   ` Daniel P. Berrange
2011-11-11 10:01     ` Kevin Wolf
2011-11-11 14:08     ` Anthony Liguori
2011-11-11 14:05   ` Anthony Liguori
2011-11-11 22:43 ` Ryan Harper
2011-11-11 23:23   ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).