From: Kevin Wolf <kwolf@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Chunqiang Tang <ctang@us.ibm.com>,
qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Re: Strategic decision: COW format
Date: Wed, 23 Feb 2011 15:55:12 +0100 [thread overview]
Message-ID: <4D651FD0.20704@redhat.com> (raw)
In-Reply-To: <4D6517E4.4050300@codemonkey.ws>
Am 23.02.2011 15:21, schrieb Anthony Liguori:
> On 02/23/2011 03:13 AM, Kevin Wolf wrote:
>> Am 22.02.2011 19:18, schrieb Anthony Liguori:
>>
>>> On 02/22/2011 10:15 AM, Kevin Wolf wrote:
>>>
>>>> Am 22.02.2011 16:57, schrieb Anthony Liguori:
>>>>
>>>>
>>>>> On 02/22/2011 02:56 AM, Kevin Wolf wrote:
>>>>>
>>>>>
>>>>>> *sigh*
>>>>>>
>>>>>> It starts to get annoying, but if you really insist, I can repeat it
>>>>>> once more: These features that you don't need (this is the correct
>>>>>> description for what you call "misfeatures") _are_ implemented in a way
>>>>>> that they don't impact the "normal" case.
>>>>>>
>>>>>>
>>>>> Except that they require a refcount table that adds additional metadata
>>>>> that needs to be updated in the fast path. I consider that impacting
>>>>> the normal case.
>>>>>
>>>>>
>>>> Like it or not, this requirement exists anyway, without any of your
>>>> "misfeatures".
>>>>
>>>> You chose to use the dirty flag in QED in order to avoid having to flush
>>>> metadata too often, which is an approach that any other format, even one
>>>> using refcounts, can take as well.
>>>>
>>>>
>>> It's a minor detail, but flushing and the amount of metadata are
>>> separate points.
>>>
>> I agree that they are separate...
>>
>>
>>> The dirty flag prevents metadata from being flushed to disk very often
>>> but the use of a refcount table adds additional metadata.
>>>
>>> A refcount table is definitely not required even if you claim the
>>> requirement exists for other features. I assume you mean to implement
>>> trim/discard support but instead of a refcount table, a free list would
>>> work just as well and would leave the metadata update out of the fast
>>> path (allocating writes) and instead only be in the slow path
>>> (trim/discard).
>>>
>> ...but here you're arguing about writing metadata out in the fast path,
>> so you're actually not interested in the amount of metadata but in the
>> overhead of flushing it. Which is a problem that's solved.
>>
>
> I'm interested in both. An extra write is always going to be an extra
> write. The flush just makes it very painful.
One extra write of 64k every 2 GB. Hardly relevant.
>> A refcount table is essential for internal snapshots and compression,
>> it's useful for discard and for running on block devices, it's necessary
>> for avoiding the dirty flag and fsck on startup.
>>
>
> No, as designed today, qcow2 still needs a dirty flag to avoid leaking
> blocks.
I know that this is your opinion and I do respect that, this is one of
the reasons why there is the suggestion to add the dirty flag for you.
On the other hand, it would be about time for you to accept that there
are people who think differently about it and who don't want the same as
you. This is why using the dirty flag should be optional.
>> These are five use cases that I can enumerate without thinking a lot
>> about it, there might be more. You propose using three different
>> mechanisms for allowing normal allocations (use the file size), block
>> devices (add a size field into the header) and discard (free list), and
>> the other three features, for which you can't think of a hack, you
>> declare "misfeatures".
>>
>
> No, I only label compression and internal snapshots as misfeatures.
> Encryption is a completely reasonable feature.
I didn't even mention encryption. It's obvious that it's a "reasonable
feature" and not a "misfeature", because it fits relatively easily in
your QED design. :-)
The three features you don't like because they don't fit are
compression, internal snapshots and not having to fsck (thanks for
proving the latter above)
> So even with qcow3, what's the expectation of snapshots? Are we going
> to scale to images with over 1000 snapshots? I believe snapshot support
> in qcow2 is not a feature that has been designed with any serious
> thought. If we truly want to support internal snapshots, let's design
> it correctly.
So what would be the key differences between your design and qcow2's? We
can always check if there's room to improve.
>>> As a format feature, a refcount table really only makes sense if the
>>> refcount is required to be greater than a single bit. There are more
>>> optimal data structures that can be used if the refcount of a block is
>>> fixed to 1-bit (like a free list) which is what the fundamental design
>>> difference between qcow2 and qed is.
>>>
>> Okay, so even assuming that there's something like misfeatures that we
>> can kick out (with which I strongly disagree), what's the crucial
>> advantage of free lists that would make you switch the image format?
>
> Performance. One thing we haven't tested with qcow2 is O_SYNC
> performance in the guest but my suspicion is that an O_SYNC workload is
> going to perform poorly even with cache=none.
But wasn't it you who wants to use the dirty flag in any case? The
refcounts aren't even written then.
> Starting with a simple format that we don't have to jump through
> tremendous hoops to get reasonable performance out of has a lot of virtues.
I know that you don't mean it like I read this, but it's entirely true:
You're _starting_ with a simple format, but once you add features you're
going to get something much more complex than qcow2 because you just
don't have proper cluster allocation infrastructure and need to invent
new hacks every time.
Kevin
next prev parent reply other threads:[~2011-02-23 14:55 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <OF3C9DAE9F.EC6B5878-ON85257826.00715C10-85257826.007A14FB@LocalDomain>
2011-02-15 19:45 ` [Qemu-devel] Re: Comparing New Image Formats: FVD vs. QED Chunqiang Tang
2011-02-16 12:34 ` Kevin Wolf
2011-02-17 16:04 ` Chunqiang Tang
2011-02-18 9:12 ` Strategic decision: COW format (was: [Qemu-devel] Re: Comparing New Image Formats: FVD vs. QED) Markus Armbruster
2011-02-18 9:57 ` [Qemu-devel] Re: Strategic decision: COW format Kevin Wolf
2011-02-18 14:20 ` Anthony Liguori
2011-02-22 8:37 ` Markus Armbruster
2011-02-22 8:56 ` Kevin Wolf
2011-02-22 10:21 ` Markus Armbruster
2011-02-22 15:57 ` Anthony Liguori
2011-02-22 16:15 ` Kevin Wolf
2011-02-22 18:18 ` Anthony Liguori
2011-02-23 9:13 ` Kevin Wolf
2011-02-23 14:21 ` Anthony Liguori
2011-02-23 14:55 ` Kevin Wolf [this message]
2011-02-23 13:43 ` Avi Kivity
2011-02-23 14:23 ` Anthony Liguori
2011-02-23 14:38 ` Kevin Wolf
2011-02-23 15:29 ` Anthony Liguori
2011-02-23 15:36 ` Avi Kivity
2011-02-23 15:47 ` Anthony Liguori
2011-02-23 15:59 ` Avi Kivity
2011-02-23 15:54 ` Kevin Wolf
2011-02-23 15:23 ` Avi Kivity
2011-02-23 15:31 ` Anthony Liguori
2011-02-23 15:37 ` Avi Kivity
2011-02-23 15:50 ` Anthony Liguori
2011-02-23 16:03 ` Avi Kivity
2011-02-23 16:04 ` Anthony Liguori
2011-02-23 16:15 ` Kevin Wolf
2011-02-25 11:20 ` Pavel Dovgaluk
[not found] ` <-1737654525499315352@unknownmsgid>
2011-02-25 13:22 ` Stefan Hajnoczi
2011-02-23 15:52 ` Anthony Liguori
2011-02-23 15:59 ` Gleb Natapov
2011-02-23 16:00 ` Avi Kivity
2011-02-23 15:33 ` Daniel P. Berrange
2011-02-23 15:38 ` Avi Kivity
2011-02-18 17:43 ` Stefan Weil
2011-02-18 19:11 ` Kevin Wolf
2011-02-18 19:47 ` Anthony Liguori
2011-02-18 20:49 ` Kevin Wolf
2011-02-18 20:50 ` Anthony Liguori
2011-02-18 21:27 ` Kevin Wolf
2011-02-19 17:19 ` Stefan Hajnoczi
2011-02-18 20:31 ` Anthony Liguori
2011-02-19 12:27 ` [Qemu-devel] Bugs in the VDI Block Device Driver Chunqiang Tang
2011-02-19 16:21 ` Stefan Hajnoczi
2011-02-19 18:49 ` Stefan Weil
2011-02-20 22:13 ` [Qemu-devel] Re: Strategic decision: COW format Aurelien Jarno
2011-02-21 8:59 ` Kevin Wolf
2011-02-21 13:44 ` Stefan Hajnoczi
2011-02-21 14:10 ` Kevin Wolf
2011-02-21 15:16 ` Anthony Liguori
2011-02-21 15:26 ` Kevin Wolf
2011-02-23 3:32 ` Chunqiang Tang
2011-02-23 13:20 ` Markus Armbruster
[not found] ` <OFAEB4CD91.BE989F29-ON8525783F.007366B8-85257840.00130B47@LocalDomain>
2011-03-13 5:51 ` Chunqiang Tang
2011-03-13 17:48 ` Anthony Liguori
2011-03-14 2:28 ` Chunqiang Tang
2011-03-14 13:22 ` Anthony Liguori
2011-03-14 13:53 ` Chunqiang Tang
2011-03-14 14:02 ` Anthony Liguori
2011-03-14 14:21 ` Kevin Wolf
2011-03-14 14:35 ` Chunqiang Tang
2011-03-14 14:49 ` Anthony Liguori
2011-03-14 15:05 ` Stefan Hajnoczi
2011-03-14 15:08 ` Kevin Wolf
2011-03-14 14:26 ` Stefan Hajnoczi
2011-03-14 14:30 ` Chunqiang Tang
2011-03-14 14:15 ` Kevin Wolf
2011-03-14 14:25 ` Chunqiang Tang
2011-03-14 14:31 ` Stefan Hajnoczi
2011-03-14 16:32 ` Chunqiang Tang
2011-03-14 17:57 ` Kevin Wolf
2011-03-14 19:23 ` Chunqiang Tang
2011-03-14 20:16 ` Kevin Wolf
[not found] ` <OF7C2FDD40.E76A4E14-ON85257853.005ADD68-85257853.005AF16E@LocalDomain>
2011-03-14 21:32 ` Chunqiang Tang
2011-03-14 14:34 ` Kevin Wolf
2011-03-14 14:47 ` Anthony Liguori
2011-03-14 15:03 ` Kevin Wolf
2011-03-14 15:13 ` Anthony Liguori
2011-03-14 15:04 ` Chunqiang Tang
2011-03-14 15:07 ` Stefan Hajnoczi
2011-03-14 10:12 ` Kevin Wolf
2011-02-22 8:40 ` Markus Armbruster
2011-02-16 13:21 ` [Qemu-devel] Re: Comparing New Image Formats: FVD vs. QED Stefan Hajnoczi
2011-02-17 16:04 ` Chunqiang Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D651FD0.20704@redhat.com \
--to=kwolf@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=armbru@redhat.com \
--cc=ctang@us.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).