All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Fri, 10 Sep 2010 14:14:25 +0300	[thread overview]
Message-ID: <4C8A1311.8070903@redhat.com> (raw)
In-Reply-To: <4C88D7CC.5000806@codemonkey.ws>

  On 09/09/2010 03:49 PM, Anthony Liguori wrote:
> On 09/09/2010 01:45 AM, Avi Kivity wrote:
>> Loading very large L2 tables on demand will result in very long 
>> latencies.  Increasing cluster size will result in very long first 
>> write latencies.  Adding an extra level results in an extra random 
>> write every 4TB.
>
> It would be trivially easy to add another level of tables as a feature 
> bit so let's delay the decision.

It means that you'll need to upgrade qemu to read certain images, but okay.

>>>
>>> qed is very careful about ensuring that we don't need to do syncs 
>>> and we don't get corruption because of data loss.  I don't 
>>> necessarily buy your checksumming argument.
>>
>> The requirement for checksumming comes from a different place.  For 
>> decades we've enjoyed very low undetected bit error rates.  However 
>> the actual amount of data is increasing to the point that it makes an 
>> undetectable bit error likely, just by throwing a huge amount of bits 
>> at storage.  Write ordering doesn't address this issue.
>
> I don't think we should optimize an image format for cheap disks and 
> an old file system.
>
> We should optimize for the future.  That means a btrfs file system

I wouldn't use an image format at all with btrfs.

> and/or enterprise storage.

That doesn't eliminate undiscovered errors (they can still come from the 
transport).

>
> The point of an image format is not to recreate btrfs in software.  
> It's to provide a mechanism to allow users to move images around 
> reasonable but once an image is present on a reasonable filesystem, we 
> should more or less get the heck out of the way.

You can achieve exactly the same thing with qcow2.  Yes, it's more work, 
but it's also less disruptive to users.

>>
>>> By creating two code paths within qcow2. 
>>
>> You're creating two code paths for users.
>
> No, I'm creating a single path: QED.
>
> There are already two code paths: raw and qcow2.  qcow2 has had such a 
> bad history that for a lot of users, it's not even a choice.

qcow2 exists, people use it, and by the time qed is offered on distros 
(even more on enterprise distros), there will be a lot more qcow2 
images.  Not everyone runs qemu.git HEAD.

What will you tell those people?  Upgrade your image?  They may still 
want to share it with older installations.  What if they use features 
not present in qed?  Bad luck?

qcow2 is going to live forever no matter what we do.

>
> Today, users have to choose between performance and reliability or 
> features.  QED offers an opportunity to be able to tell users to just 
> always use QED as an image format and forget about 
> raw/qcow2/everything else.

raw will always be needed for direct volume access and shared storage.  
qcow2 will always be needed for old images.

>
> You can say, let's just make qcow2 better, but we've been trying that 
> for years and we have an existence proof that we can do it in a 
> straight forward fashion with QED.

When you don't use the extra qcow2 features, it has the same performance 
characteristics as qed.  You need to batch allocation and freeing, but 
that's fairly straightforward.

Yes, qcow2 has a long and tortured history and qed is perfect.  Starting 
from scratch is always easier and more fun.  Except for the users.

> A new format doesn't introduce much additional complexity.  We provide 
> image conversion tool and we can almost certainly provide an in-place 
> conversion tool that makes the process very fast.

It introduces a lot of complexity for the users who aren't qed experts.  
They need to make a decision.  What's the impact of the change?  Are the 
features that we lose important to us?  Do we know what they are?  Is 
there any risk?  Can we make the change online or do we have to schedule 
downtime?  Do all our hosts support qed?

Improving qcow2 will be very complicated for Kevin who already looks 
older beyond his years [1] but very simple for users.

>>
>> It requires users to make a decision.  By the time qed is ready for 
>> mass deployment, 1-2 years will have passed.  How many qcow2 images 
>> will be in the wild then?  How much scheduled downtime will be needed?
>
> Zero if we're smart.  You can do QED stream + live migration to do a 
> live conversion from raw to QED.
>

Not all installations use live migration (say, desktop users).

>>   How much user confusion will be caused?
>
> User confusion is reduced if we can make strong, clear statements: all 
> users should use QED even if they care about performance.  Today, 
> there's mass confusion because of the poor state of qcow2.

If we improve qcow2 and make the same strong, clear statement we'll have 
the same results.

>
>> Virtualization is about compatibility.  In-guest compatibility first, 
>> but keeping the external environment stable is also important.  We 
>> really need to exhaust the possibilities with qcow2 before giving up 
>> on it.
>
> IMHO, we're long past exhausting the possibilities with qcow2.  We 
> still haven't decided what we're going to do for 0.13.0. 

Sorry, I disagree 100%.  How can you say that, when no one has yet 
tried, for example, batching allocations and frees?  Or properly 
threaded it?

What we've done is make qcow2 safe and a more parallel than it was.  But 
"exhaust all possibilities"? not even close.


> Are we going to ship qcow2 with awful performance (a 15 minute 
> operation taking hours) or with compromised data integrity?

We're going to fix it.

>
> It's been this way for every release since qcow2 existed.  Let's not 
> let sunk cost cloud our judgement here.

Yes, new and shiny is always better.

>
> qcow2 is not a properly designed image format.  It was a weekend 
> hacking session from Fabrice that he dropped in the code base and 
> never really finished doing what he originally intended.  The 
> improvements that have been made to it are almost at the heroic level 
> but we're only hurting our users by not moving on to something better.
>

I don't like qcow2 either.  But from a performance perspective, it can 
be made equivalent to qed with some effort.  It is worthwhile to expend 
that effort rather than push the burden to users.

> Regards,
>
> Anthony Liguori
>
>

[1] okay, maybe not.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

  parent reply	other threads:[~2010-09-10 11:14 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-06 10:04 [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format Stefan Hajnoczi
2010-09-06 10:25 ` Alexander Graf
2010-09-06 10:31   ` Stefan Hajnoczi
2010-09-06 14:21   ` Luca Tettamanti
2010-09-06 14:24     ` Alexander Graf
2010-09-06 16:27       ` Anthony Liguori
2010-09-06 10:27 ` [Qemu-devel] " Kevin Wolf
2010-09-06 12:40   ` Stefan Hajnoczi
2010-09-06 12:57     ` Anthony Liguori
2010-09-06 13:02       ` Stefan Hajnoczi
2010-09-06 14:10       ` Kevin Wolf
2010-09-06 16:45         ` Anthony Liguori
2010-09-06 12:45   ` Anthony Liguori
2010-09-10 23:49     ` H. Peter Anvin
2010-09-06 11:18 ` [Qemu-devel] " Daniel P. Berrange
2010-09-06 12:52   ` Anthony Liguori
2010-09-06 13:35     ` Daniel P. Berrange
2010-09-06 16:38       ` Anthony Liguori
2010-09-06 13:06 ` Anthony Liguori
2010-09-07 14:51   ` Avi Kivity
2010-09-07 15:40     ` Anthony Liguori
2010-09-07 16:09       ` Avi Kivity
2010-09-07 16:25         ` Anthony Liguori
2010-09-07 22:27           ` Anthony Liguori
2010-09-08  8:23             ` Avi Kivity
2010-09-08  8:41               ` Alexander Graf
2010-09-08  8:53                 ` Avi Kivity
2010-09-08 11:15                   ` Stefan Hajnoczi
2010-09-08 15:38                     ` Christoph Hellwig
2010-09-08 16:30                       ` Anthony Liguori
2010-09-08 20:23                         ` Christoph Hellwig
2010-09-08 20:28                           ` Anthony Liguori
2010-09-09  2:35                             ` Christoph Hellwig
2010-09-09  6:24                               ` Avi Kivity
2010-09-09 21:01                                 ` Christoph Hellwig
2010-09-10 11:15                                   ` Avi Kivity
2010-09-09  6:53                     ` Avi Kivity
2010-09-10 21:22                     ` Jamie Lokier
2010-09-14 10:46                       ` Stefan Hajnoczi
2010-09-14 11:08                         ` Stefan Hajnoczi
2010-09-14 12:54                         ` Anthony Liguori
2010-09-08 12:55                   ` Anthony Liguori
2010-09-09  6:30                     ` Avi Kivity
2010-09-08 12:48               ` Anthony Liguori
2010-09-08 13:20                 ` Kevin Wolf
2010-09-08 13:26                   ` Anthony Liguori
2010-09-08 13:46                     ` Kevin Wolf
2010-09-09  6:45                 ` Avi Kivity
2010-09-09  6:48                   ` Avi Kivity
2010-09-09 12:49                   ` Anthony Liguori
2010-09-09 16:48                     ` [Qemu-devel] " Paolo Bonzini
2010-09-09 17:02                       ` Anthony Liguori
2010-09-09 20:56                         ` Christoph Hellwig
2010-09-10 10:53                         ` Avi Kivity
2010-09-10 11:14                     ` Avi Kivity [this message]
2010-09-10 11:25                       ` [Qemu-devel] " Avi Kivity
2010-09-10 11:33                         ` Stefan Hajnoczi
2010-09-10 11:43                           ` Avi Kivity
2010-09-10 13:22                             ` Anthony Liguori
2010-09-10 13:48                               ` Christoph Hellwig
2010-09-10 15:02                                 ` Anthony Liguori
2010-09-10 15:18                                   ` Kevin Wolf
2010-09-10 15:53                                     ` Anthony Liguori
2010-09-10 16:05                                       ` Kevin Wolf
2010-09-10 17:10                                         ` Anthony Liguori
2010-09-10 17:44                                           ` Kevin Wolf
2010-09-10 17:46                                           ` Miguel Di Ciurcio Filho
2010-09-10 14:02                               ` Avi Kivity
2010-09-10 13:47                           ` Christoph Hellwig
2010-09-10 14:05                             ` Avi Kivity
2010-09-10 14:12                               ` Christoph Hellwig
2010-09-10 14:24                                 ` Avi Kivity
2010-09-10 13:16                         ` Anthony Liguori
2010-09-10 14:06                           ` Avi Kivity
2010-09-10 11:43                       ` Stefan Hajnoczi
2010-09-10 12:06                         ` Avi Kivity
2010-09-10 13:28                           ` Anthony Liguori
2010-09-10 12:12                         ` Kevin Wolf
2010-09-10 12:35                           ` Stefan Hajnoczi
2010-09-10 12:47                             ` Avi Kivity
2010-09-10 13:10                               ` Stefan Hajnoczi
2010-09-10 13:19                                 ` Avi Kivity
2010-09-10 13:39                               ` Anthony Liguori
2010-09-10 13:52                                 ` Christoph Hellwig
2010-09-10 13:56                                 ` Avi Kivity
2010-09-10 13:48                             ` Kevin Wolf
2010-09-10 13:14                       ` Anthony Liguori
2010-09-10 13:47                         ` Avi Kivity
2010-09-10 14:56                           ` Anthony Liguori
2010-09-10 15:49                             ` Avi Kivity
2010-09-10 17:07                               ` Anthony Liguori
2010-09-10 17:42                                 ` Kevin Wolf
2010-09-10 19:33                                   ` Anthony Liguori
2010-09-13 10:41                                     ` Kevin Wolf
2010-09-12 13:24                                 ` Avi Kivity
2010-09-12 15:13                                   ` Anthony Liguori
2010-09-12 15:56                                     ` Avi Kivity
2010-09-12 17:09                                       ` Anthony Liguori
2010-09-12 17:51                                         ` Avi Kivity
2010-09-12 20:18                                           ` Anthony Liguori
2010-09-13  9:24                                             ` Avi Kivity
2010-09-13 11:28                                         ` Kevin Wolf
2010-09-13 11:34                                           ` Avi Kivity
2010-09-13 11:48                                             ` Kevin Wolf
2010-09-13 13:19                                               ` Anthony Liguori
2010-09-13 13:12                                           ` Anthony Liguori
2010-09-13 11:03                                       ` Kevin Wolf
2010-09-13 13:07                                         ` Anthony Liguori
2010-09-13 13:24                                           ` Kevin Wolf
2010-09-07 16:12     ` Anthony Liguori
2010-09-07 21:35       ` Christoph Hellwig
2010-09-07 22:29         ` Anthony Liguori
2010-09-07 22:40           ` Christoph Hellwig
2010-09-08 15:07     ` Stefan Hajnoczi
2010-09-09  6:59       ` Avi Kivity
2010-09-09 17:43         ` Anthony Liguori
2010-09-09 20:46           ` Christoph Hellwig
2010-09-10 11:22           ` Avi Kivity
2010-09-10 11:29             ` Stefan Hajnoczi
2010-09-10 11:37               ` Avi Kivity
2010-09-07 13:58 ` Avi Kivity
2010-09-07 19:25 ` Blue Swirl
2010-09-07 20:41   ` Anthony Liguori
2010-09-08  7:48     ` Kevin Wolf
2010-09-08 15:37   ` Stefan Hajnoczi
2010-09-08 18:24     ` Blue Swirl
2010-09-08 18:35       ` Anthony Liguori
2010-09-08 18:56         ` Blue Swirl
2010-09-08 19:19           ` Anthony Liguori
2010-09-15 21:01 ` [Qemu-devel] " Michael S. Tsirkin
2010-09-15 21:12   ` Anthony Liguori
  -- strict thread matches above, loose matches on Subject: below --
2010-09-17  3:51 [Qemu-devel] " Khoa Huynh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C8A1311.8070903@redhat.com \
    --to=avi@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.