From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=36778 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OuQhT-0002OA-9e for qemu-devel@nongnu.org; Sat, 11 Sep 2010 10:05:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OuQhR-0004xe-3o for qemu-devel@nongnu.org; Sat, 11 Sep 2010 10:05:03 -0400 Received: from e5.ny.us.ibm.com ([32.97.182.145]:49712) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OuQhR-0004xS-0C for qemu-devel@nongnu.org; Sat, 11 Sep 2010 10:05:01 -0400 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o8BDjPTk018167 for ; Sat, 11 Sep 2010 09:45:25 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o8BE4wuU333450 for ; Sat, 11 Sep 2010 10:04:58 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o8BE4wVM016975 for ; Sat, 11 Sep 2010 10:04:58 -0400 From: Anthony Liguori Date: Sat, 11 Sep 2010 09:04:53 -0500 Message-Id: <1284213896-12705-1-git-send-email-aliguori@us.ibm.com> Subject: [Qemu-devel] [RFC][PATCH 0/3] Fix caching issues with live migration List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Kevin Wolf , Stefan Hajnoczi , Juan Quintela Today, live migration only works when using shared storage that is fully cache coherent using raw images. The failure case with weak coherent (i.e. NFS) is subtle but nontheless still exists. NFS only guarantees close-to-open coherence and when performing a live migration, we do an open on the source and an open on the destination. We fsync() on the source before launching the destination but since we have two simultaneous opens, we're not guaranteed coherence. This is not necessarily a problem except that we are a bit gratituous in reading from the disk before launching a guest. This means that as things stand today, we're guaranteed to read the first 64k of the disk and as such, if a client writes to that region during live migration, corruption will result. The second failure condition has to do with image files (such as qcow2). Today, we aggressively cache metadata in all image formats and that cache is definitely not coherent even with fully coherent shared storage. In all image formats, we prefetch at least the L1 table in open() which means that if there is a write operation that causes a modification to an L1 table, corruption will ensue. This series attempts to address both of these issue. Technically, if a NFS client aggressively prefetches this solution is not enough but in practice, Linux doesn't do that. I need some help with the qcow2 metadata invalidation. We need to delay the loading of the l1 and the reference count table but we only do this synchronously today. I think we can just do this on demand but I'd still like a second opinion.