From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=35361 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pz93q-0000Kv-ON for qemu-devel@nongnu.org; Mon, 14 Mar 2011 10:47:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pz93o-0002RF-W5 for qemu-devel@nongnu.org; Mon, 14 Mar 2011 10:47:54 -0400 Received: from mail-yx0-f173.google.com ([209.85.213.173]:55140) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pz93o-0002Qx-Oq for qemu-devel@nongnu.org; Mon, 14 Mar 2011 10:47:52 -0400 Received: by yxk8 with SMTP id 8so2571543yxk.4 for ; Mon, 14 Mar 2011 07:47:49 -0700 (PDT) Message-ID: <4D7E2A91.3040300@codemonkey.ws> Date: Mon, 14 Mar 2011 09:47:45 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: Strategic decision: COW format References: <4D5BC467.4070804@redhat.com> <4D5E4271.80501@redhat.com> <20110220221357.GO4580@hall.aurel32.net> <4D62295E.1030504@redhat.com> <4D7D036B.4050706@codemonkey.ws> <4D7E167A.1020509@codemonkey.ws> <4D7E22FF.3090803@redhat.com> In-Reply-To: <4D7E22FF.3090803@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Chunqiang Tang , Stefan Hajnoczi , Markus Armbruster , Aurelien Jarno , qemu-devel@nongnu.org On 03/14/2011 09:15 AM, Kevin Wolf wrote: >> The file system can keep a lot of these things around pretty easily but >> with your proposal, it seems like there can only be one. If you support >> many of them, I think you'll degenerate to something as complex as a >> reference count table. > IIUC, he already uses a refcount table. Well, he needs a separate mechanism to make trim/discard work, but for the snapshot discussion, a reference count table is avoided. The bitmap only covers whether the guest has accessed a block or not. Then there is a separate table that maps guest offsets to offsets within the file. I haven't thought hard about it, but my guess is that there is an ordering constraint between these two pieces of metadata which is why the journal is necessary. I get worried about the complexity of a journal even more than a reference count table. > Actually, I think that a > refcount table is a requirement to provide the interesting properties > that internal snapshots have (see my other mail). Well the trick here AFAICT is that you're basically storing external snapshots internally. So it's sort of like a bunch of FVD formats embedded into a single image. > Refcount tables aren't a very complex thing either. In fact, it makes a > format much simpler to have one concept like refcount tables instead of > adding another different mechanism for each new feature that would be > natural with refcount tables. I think it's a reasonable design goal to minimize any metadata updates in the fast path. If we can write 1 piece of metadata verses writing 2, then it's worth exploring IMHO. > The only problem with them is that they are metadata that must be > updated. However, I think we have discussed enough how to avoid the > greatest part of that cost. Maybe I missed it, but in the WCE=0 mode, is it really possible to avoid the writes for the refcount table? >> On the other hand, I think it's reasonable to just avoid the CoW overlay >> entirely and say that moving to a previous snapshot destroys any of it's >> children. I think this ends up being a simplifying assumption that is >> worth investigating further. >> >> From the use-cases that I'm aware of (backup and RAS), I think these >> semantics are okay. > I don't think this semantics would be expected. Any anyway, would this > really allow simplification of the format? I don't know, I'm really just trying to separate out the implementation of the format to the use-cases we're trying to address. Even if we're talking about qcow3, then if we only really care about read-only snapshots, perhaps we can add a feature bit for this and take advantage of this to make the WCE=0 case much faster. But the fundamental question is, does this satisfy the use-cases we care about? Regards, Anthony Liguori > I'm afraid that you would go > for complicated solutions with odd semantics just because of an > arbitrary dislike of refcounts. > > Kevin >