From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48247) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHVyn-0000Wn-Nw for qemu-devel@nongnu.org; Thu, 05 Sep 2013 05:36:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VHVyh-0005lk-K0 for qemu-devel@nongnu.org; Thu, 05 Sep 2013 05:35:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52675) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHVyh-0005lS-C6 for qemu-devel@nongnu.org; Thu, 05 Sep 2013 05:35:51 -0400 Date: Thu, 5 Sep 2013 11:35:43 +0200 From: Stefan Hajnoczi Message-ID: <20130905093543.GC12293@stefanha-thinkpad.redhat.com> References: <1378215952-7151-1-git-send-email-kwolf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1378215952-7151-1-git-send-email-kwolf@redhat.com> Subject: Re: [Qemu-devel] [RFC] qcow2 journalling draft List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: benoit.canet@irqsave.net, jcody@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com On Tue, Sep 03, 2013 at 03:45:52PM +0200, Kevin Wolf wrote: > This contains an extension of the qcow2 spec that introduces journalling > to the image format, plus some preliminary type definitions and > function prototypes in the qcow2 code. > > Journalling functionality is a crucial feature for the design of data > deduplication, and it will improve the core part of qcow2 by avoiding > cluster leaks on crashes as well as provide an easier way to get a > reliable implementation of performance features like Delayed COW. > > At this point of the RFC, it would be most important to review the > on-disk structure. Once we're confident that it can do everything we > want, we can start going into more detail on the qemu side of things. > > Signed-off-by: Kevin Wolf > --- > block/Makefile.objs | 2 +- > block/qcow2-journal.c | 55 ++++++++++++++ > block/qcow2.h | 78 +++++++++++++++++++ > docs/specs/qcow2.txt | 204 +++++++++++++++++++++++++++++++++++++++++++++++++- > 4 files changed, 337 insertions(+), 2 deletions(-) > create mode 100644 block/qcow2-journal.c Although we are still discussing details of the on-disk layout, the general design is clear enough to discuss how the journal will be used. Today qcow2 uses Qcow2Cache to do lazy, ordered metadata updates. The performance is pretty good with two exceptions that I can think of: 1. The delayed CoW problem that Kevin has been working on. Guests perform sequential writes that are smaller than a qcow2 cluster. The first write triggers a copy-on-write of the full cluster. Later writes then overwrite the copied data. It would be more efficient to anticipate sequential writes and hold off on CoW where possible. 2. Lazy metadata updates lead to bursty behavior and expensive flushes. We do not take advantage of disk bandwidth since metadata updates stay in the Qcow2Cache until the last possible second. When the guest issues a flush we must write out dirty Qcow2Cache entries and possibly fsync between them if dependencies have been set (e.g. refcount before L2). How will the journal change this situation? Writes that go through the journal are doubled - they must first be journalled, fsync, and then they can be applied to the actual image. How do we benefit by using the journal? Stefan