From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:47125) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmRKC-0000CE-NB for qemu-devel@nongnu.org; Thu, 28 Jul 2011 10:12:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QmRKB-0006kL-OF for qemu-devel@nongnu.org; Thu, 28 Jul 2011 10:12:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53450) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmRKB-0006kG-Fz for qemu-devel@nongnu.org; Thu, 28 Jul 2011 10:12:31 -0400 Message-ID: <4E316EFD.6080304@redhat.com> Date: Thu, 28 Jul 2011 16:15:25 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <1311861017-13425-1-git-send-email-freddy77@gmail.com> In-Reply-To: <1311861017-13425-1-git-send-email-freddy77@gmail.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] [RFC] qcow2: group refcount updates during cow List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Frediano Ziglio Cc: qemu-devel@nongnu.org Am 28.07.2011 15:50, schrieb Frediano Ziglio: > Well, I think this is the first real improve patch. > Is more a RFC than a patch. Yes, some lines are terrible! > It collapses refcount decrement during cow. > From a first check time executing 015 test passed from about 600 seconds > to 70. > This at least prove that refcount updates counts! > Some doubt: > 1- place the code in qcow2-refcount.c as it update only refcount and not > cluster? > 2- allow some sort of "begin transaction" / "commit" / "rollback" like > databases instead? > 3- allow changing tables from different coroutines? > > 1) If you have a sequence like (1, 2, 4) probably these clusters are all in > the same l2 table but with this code you get two write instead of one. > I'm thinking about a function in qcow2-refcount.c that accept an array of cluster > instead of a start + len. > > Signed-off-by: Frediano Ziglio I think what you're seeing is actually just one special case of a more general problem. The problem is that we're interpreting writethrough stricter than required. The semantics that we really need is that on completion of a request, all of its data and metadata must be flushed to disk. There is no requirement that we flush all intermediate states. My recent update to qcow2_update_snapshot_refcount() is just another case of the same problem. I think the solution should be similar to what I did there, i.e. switch the cache to writeback mode while we're operating on it and switch back when we're done. We should probably have functions that make both of this a one-liner (I think here we have some similarity to your begin/commit idea). With the right functions, this could become as easy as this (might need better function names, but you get the idea): diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 882f50a..45b67b1 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -612,6 +612,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) if (m->nb_clusters == 0) return 0; + qcow2_cache_disable_writethrough(bs); + old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t)); /* copy content of unmodified sectors */ @@ -683,6 +685,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) ret = 0; err: + qcow2_cache_restore_writethrough(bs); qemu_free(old_cluster); return ret; } Kevin