From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=56739 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PsFak-0007fQ-2r for qemu-devel@nongnu.org; Wed, 23 Feb 2011 09:21:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PsFah-0002Kc-39 for qemu-devel@nongnu.org; Wed, 23 Feb 2011 09:21:20 -0500 Received: from mail-vx0-f173.google.com ([209.85.220.173]:64365) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PsFag-0002KL-WD for qemu-devel@nongnu.org; Wed, 23 Feb 2011 09:21:19 -0500 Received: by vxb41 with SMTP id 41so2745410vxb.4 for ; Wed, 23 Feb 2011 06:21:18 -0800 (PST) Message-ID: <4D6517E4.4050300@codemonkey.ws> Date: Wed, 23 Feb 2011 08:21:24 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: Strategic decision: COW format References: <4D5BC467.4070804@redhat.com> <4D5E4271.80501@redhat.com> <4D5E8031.5020402@codemonkey.ws> <4D637A20.9020307@redhat.com> <4D63DCD6.50801@codemonkey.ws> <4D63E13C.4090705@redhat.com> <4D63FE02.3040406@codemonkey.ws> <4D64CFB4.9030108@redhat.com> In-Reply-To: <4D64CFB4.9030108@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Chunqiang Tang , qemu-devel@nongnu.org, Markus Armbruster , Stefan Hajnoczi On 02/23/2011 03:13 AM, Kevin Wolf wrote: > Am 22.02.2011 19:18, schrieb Anthony Liguori: > >> On 02/22/2011 10:15 AM, Kevin Wolf wrote: >> >>> Am 22.02.2011 16:57, schrieb Anthony Liguori: >>> >>> >>>> On 02/22/2011 02:56 AM, Kevin Wolf wrote: >>>> >>>> >>>>> *sigh* >>>>> >>>>> It starts to get annoying, but if you really insist, I can repeat it >>>>> once more: These features that you don't need (this is the correct >>>>> description for what you call "misfeatures") _are_ implemented in a way >>>>> that they don't impact the "normal" case. >>>>> >>>>> >>>> Except that they require a refcount table that adds additional metadata >>>> that needs to be updated in the fast path. I consider that impacting >>>> the normal case. >>>> >>>> >>> Like it or not, this requirement exists anyway, without any of your >>> "misfeatures". >>> >>> You chose to use the dirty flag in QED in order to avoid having to flush >>> metadata too often, which is an approach that any other format, even one >>> using refcounts, can take as well. >>> >>> >> It's a minor detail, but flushing and the amount of metadata are >> separate points. >> > I agree that they are separate... > > >> The dirty flag prevents metadata from being flushed to disk very often >> but the use of a refcount table adds additional metadata. >> >> A refcount table is definitely not required even if you claim the >> requirement exists for other features. I assume you mean to implement >> trim/discard support but instead of a refcount table, a free list would >> work just as well and would leave the metadata update out of the fast >> path (allocating writes) and instead only be in the slow path >> (trim/discard). >> > ...but here you're arguing about writing metadata out in the fast path, > so you're actually not interested in the amount of metadata but in the > overhead of flushing it. Which is a problem that's solved. > I'm interested in both. An extra write is always going to be an extra write. The flush just makes it very painful. > A refcount table is essential for internal snapshots and compression, > it's useful for discard and for running on block devices, it's necessary > for avoiding the dirty flag and fsck on startup. > No, as designed today, qcow2 still needs a dirty flag to avoid leaking blocks. > These are five use cases that I can enumerate without thinking a lot > about it, there might be more. You propose using three different > mechanisms for allowing normal allocations (use the file size), block > devices (add a size field into the header) and discard (free list), and > the other three features, for which you can't think of a hack, you > declare "misfeatures". > No, I only label compression and internal snapshots as misfeatures. Encryption is a completely reasonable feature. So even with qcow3, what's the expectation of snapshots? Are we going to scale to images with over 1000 snapshots? I believe snapshot support in qcow2 is not a feature that has been designed with any serious thought. If we truly want to support internal snapshots, let's design it correctly. >> As a format feature, a refcount table really only makes sense if the >> refcount is required to be greater than a single bit. There are more >> optimal data structures that can be used if the refcount of a block is >> fixed to 1-bit (like a free list) which is what the fundamental design >> difference between qcow2 and qed is. >> > Okay, so even assuming that there's something like misfeatures that we > can kick out (with which I strongly disagree), what's the crucial > advantage of free lists that would make you switch the image format? > Performance. One thing we haven't tested with qcow2 is O_SYNC performance in the guest but my suspicion is that an O_SYNC workload is going to perform poorly even with cache=none. Starting with a simple format that we don't have to jump through tremendous hoops to get reasonable performance out of has a lot of virtues. Regards, Anthony Liguori