From: Anthony Liguori <anthony@codemonkey.ws>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Chunqiang Tang <ctang@us.ibm.com>,
qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Re: Strategic decision: COW format
Date: Wed, 23 Feb 2011 08:21:24 -0600 [thread overview]
Message-ID: <4D6517E4.4050300@codemonkey.ws> (raw)
In-Reply-To: <4D64CFB4.9030108@redhat.com>
On 02/23/2011 03:13 AM, Kevin Wolf wrote:
> Am 22.02.2011 19:18, schrieb Anthony Liguori:
>
>> On 02/22/2011 10:15 AM, Kevin Wolf wrote:
>>
>>> Am 22.02.2011 16:57, schrieb Anthony Liguori:
>>>
>>>
>>>> On 02/22/2011 02:56 AM, Kevin Wolf wrote:
>>>>
>>>>
>>>>> *sigh*
>>>>>
>>>>> It starts to get annoying, but if you really insist, I can repeat it
>>>>> once more: These features that you don't need (this is the correct
>>>>> description for what you call "misfeatures") _are_ implemented in a way
>>>>> that they don't impact the "normal" case.
>>>>>
>>>>>
>>>> Except that they require a refcount table that adds additional metadata
>>>> that needs to be updated in the fast path. I consider that impacting
>>>> the normal case.
>>>>
>>>>
>>> Like it or not, this requirement exists anyway, without any of your
>>> "misfeatures".
>>>
>>> You chose to use the dirty flag in QED in order to avoid having to flush
>>> metadata too often, which is an approach that any other format, even one
>>> using refcounts, can take as well.
>>>
>>>
>> It's a minor detail, but flushing and the amount of metadata are
>> separate points.
>>
> I agree that they are separate...
>
>
>> The dirty flag prevents metadata from being flushed to disk very often
>> but the use of a refcount table adds additional metadata.
>>
>> A refcount table is definitely not required even if you claim the
>> requirement exists for other features. I assume you mean to implement
>> trim/discard support but instead of a refcount table, a free list would
>> work just as well and would leave the metadata update out of the fast
>> path (allocating writes) and instead only be in the slow path
>> (trim/discard).
>>
> ...but here you're arguing about writing metadata out in the fast path,
> so you're actually not interested in the amount of metadata but in the
> overhead of flushing it. Which is a problem that's solved.
>
I'm interested in both. An extra write is always going to be an extra
write. The flush just makes it very painful.
> A refcount table is essential for internal snapshots and compression,
> it's useful for discard and for running on block devices, it's necessary
> for avoiding the dirty flag and fsck on startup.
>
No, as designed today, qcow2 still needs a dirty flag to avoid leaking
blocks.
> These are five use cases that I can enumerate without thinking a lot
> about it, there might be more. You propose using three different
> mechanisms for allowing normal allocations (use the file size), block
> devices (add a size field into the header) and discard (free list), and
> the other three features, for which you can't think of a hack, you
> declare "misfeatures".
>
No, I only label compression and internal snapshots as misfeatures.
Encryption is a completely reasonable feature.
So even with qcow3, what's the expectation of snapshots? Are we going
to scale to images with over 1000 snapshots? I believe snapshot support
in qcow2 is not a feature that has been designed with any serious
thought. If we truly want to support internal snapshots, let's design
it correctly.
>> As a format feature, a refcount table really only makes sense if the
>> refcount is required to be greater than a single bit. There are more
>> optimal data structures that can be used if the refcount of a block is
>> fixed to 1-bit (like a free list) which is what the fundamental design
>> difference between qcow2 and qed is.
>>
> Okay, so even assuming that there's something like misfeatures that we
> can kick out (with which I strongly disagree), what's the crucial
> advantage of free lists that would make you switch the image format?
>
Performance. One thing we haven't tested with qcow2 is O_SYNC
performance in the guest but my suspicion is that an O_SYNC workload is
going to perform poorly even with cache=none.
Starting with a simple format that we don't have to jump through
tremendous hoops to get reasonable performance out of has a lot of virtues.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2011-02-23 14:21 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <OF3C9DAE9F.EC6B5878-ON85257826.00715C10-85257826.007A14FB@LocalDomain>
2011-02-15 19:45 ` [Qemu-devel] Re: Comparing New Image Formats: FVD vs. QED Chunqiang Tang
2011-02-16 12:34 ` Kevin Wolf
2011-02-17 16:04 ` Chunqiang Tang
2011-02-18 9:12 ` Strategic decision: COW format (was: [Qemu-devel] Re: Comparing New Image Formats: FVD vs. QED) Markus Armbruster
2011-02-18 9:57 ` [Qemu-devel] Re: Strategic decision: COW format Kevin Wolf
2011-02-18 14:20 ` Anthony Liguori
2011-02-22 8:37 ` Markus Armbruster
2011-02-22 8:56 ` Kevin Wolf
2011-02-22 10:21 ` Markus Armbruster
2011-02-22 15:57 ` Anthony Liguori
2011-02-22 16:15 ` Kevin Wolf
2011-02-22 18:18 ` Anthony Liguori
2011-02-23 9:13 ` Kevin Wolf
2011-02-23 14:21 ` Anthony Liguori [this message]
2011-02-23 14:55 ` Kevin Wolf
2011-02-23 13:43 ` Avi Kivity
2011-02-23 14:23 ` Anthony Liguori
2011-02-23 14:38 ` Kevin Wolf
2011-02-23 15:29 ` Anthony Liguori
2011-02-23 15:36 ` Avi Kivity
2011-02-23 15:47 ` Anthony Liguori
2011-02-23 15:59 ` Avi Kivity
2011-02-23 15:54 ` Kevin Wolf
2011-02-23 15:23 ` Avi Kivity
2011-02-23 15:31 ` Anthony Liguori
2011-02-23 15:37 ` Avi Kivity
2011-02-23 15:50 ` Anthony Liguori
2011-02-23 16:03 ` Avi Kivity
2011-02-23 16:04 ` Anthony Liguori
2011-02-23 16:15 ` Kevin Wolf
2011-02-25 11:20 ` Pavel Dovgaluk
[not found] ` <-1737654525499315352@unknownmsgid>
2011-02-25 13:22 ` Stefan Hajnoczi
2011-02-23 15:52 ` Anthony Liguori
2011-02-23 15:59 ` Gleb Natapov
2011-02-23 16:00 ` Avi Kivity
2011-02-23 15:33 ` Daniel P. Berrange
2011-02-23 15:38 ` Avi Kivity
2011-02-18 17:43 ` Stefan Weil
2011-02-18 19:11 ` Kevin Wolf
2011-02-18 19:47 ` Anthony Liguori
2011-02-18 20:49 ` Kevin Wolf
2011-02-18 20:50 ` Anthony Liguori
2011-02-18 21:27 ` Kevin Wolf
2011-02-19 17:19 ` Stefan Hajnoczi
2011-02-18 20:31 ` Anthony Liguori
2011-02-19 12:27 ` [Qemu-devel] Bugs in the VDI Block Device Driver Chunqiang Tang
2011-02-19 16:21 ` Stefan Hajnoczi
2011-02-19 18:49 ` Stefan Weil
2011-02-20 22:13 ` [Qemu-devel] Re: Strategic decision: COW format Aurelien Jarno
2011-02-21 8:59 ` Kevin Wolf
2011-02-21 13:44 ` Stefan Hajnoczi
2011-02-21 14:10 ` Kevin Wolf
2011-02-21 15:16 ` Anthony Liguori
2011-02-21 15:26 ` Kevin Wolf
2011-02-23 3:32 ` Chunqiang Tang
2011-02-23 13:20 ` Markus Armbruster
[not found] ` <OFAEB4CD91.BE989F29-ON8525783F.007366B8-85257840.00130B47@LocalDomain>
2011-03-13 5:51 ` Chunqiang Tang
2011-03-13 17:48 ` Anthony Liguori
2011-03-14 2:28 ` Chunqiang Tang
2011-03-14 13:22 ` Anthony Liguori
2011-03-14 13:53 ` Chunqiang Tang
2011-03-14 14:02 ` Anthony Liguori
2011-03-14 14:21 ` Kevin Wolf
2011-03-14 14:35 ` Chunqiang Tang
2011-03-14 14:49 ` Anthony Liguori
2011-03-14 15:05 ` Stefan Hajnoczi
2011-03-14 15:08 ` Kevin Wolf
2011-03-14 14:26 ` Stefan Hajnoczi
2011-03-14 14:30 ` Chunqiang Tang
2011-03-14 14:15 ` Kevin Wolf
2011-03-14 14:25 ` Chunqiang Tang
2011-03-14 14:31 ` Stefan Hajnoczi
2011-03-14 16:32 ` Chunqiang Tang
2011-03-14 17:57 ` Kevin Wolf
2011-03-14 19:23 ` Chunqiang Tang
2011-03-14 20:16 ` Kevin Wolf
[not found] ` <OF7C2FDD40.E76A4E14-ON85257853.005ADD68-85257853.005AF16E@LocalDomain>
2011-03-14 21:32 ` Chunqiang Tang
2011-03-14 14:34 ` Kevin Wolf
2011-03-14 14:47 ` Anthony Liguori
2011-03-14 15:03 ` Kevin Wolf
2011-03-14 15:13 ` Anthony Liguori
2011-03-14 15:04 ` Chunqiang Tang
2011-03-14 15:07 ` Stefan Hajnoczi
2011-03-14 10:12 ` Kevin Wolf
2011-02-22 8:40 ` Markus Armbruster
2011-02-16 13:21 ` [Qemu-devel] Re: Comparing New Image Formats: FVD vs. QED Stefan Hajnoczi
2011-02-17 16:04 ` Chunqiang Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D6517E4.4050300@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=armbru@redhat.com \
--cc=ctang@us.ibm.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).