All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Kevin Wolf <kwolf@redhat.com>
Cc: stefanha@gmail.com, avi@redhat.com, mjt@tls.msk.ru,
	qemu-devel@nongnu.org, hch@lst.de
Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes"
Date: Tue, 24 Aug 2010 08:29:10 -0500	[thread overview]
Message-ID: <4C73C926.3010901@codemonkey.ws> (raw)
In-Reply-To: <4C73C622.7080808@redhat.com>

On 08/24/2010 08:16 AM, Kevin Wolf wrote:
> Am 24.08.2010 15:01, schrieb Anthony Liguori:
>    
>> On 08/24/2010 05:40 AM, Kevin Wolf wrote:
>>      
>>> This reverts commit 8b3b720620a1137a1b794fc3ed64734236f94e06.
>>>
>>> This fix has caused severe slowdowns on recent kernels that actually do flush
>>> when they are told so. Reverting this patch hurts correctness and means that we
>>> could get corrupted images in case of a host crash. This means that qcow2 might
>>> not be an option for some people without this fix. On the other hand, I get
>>> reports that the slowdown is so massive that not reverting it would mean that
>>> people can't use it either because it just takes ages to complete stuff. It
>>> probably can be fixed, but not in time for 0.13.0.
>>>
>>> Usually, if there's a possible tradeoff between correctness and performance, I
>>> tend to choose correctness, but I'm not so sure in this case. I'm not sure with
>>> reverting either, which is why I post this as an RFC only.
>>>
>>> I hope to get some more comments on how to proceed here for 0.13.
>>>
>>>        
>> How fundamental of an issue is this?  Is this something we think we know
>> how to fix and we just don't think there's time to fix it for 0.13?
>>      
> I think we can improve things basically by trying to batch metadata
> writes and do them in parallel while already processing the next requests.
>
> I'm not sure what the numbers are going to look like with something like
> this in place, I need to try it. It's definitely not something that I
> want to go into 0.13 at this point.
>    

I'm not sure this patch is needed in the first place.

If you have a sequence of operations like:

0) receive guest write request Z
1) submit write A
2) write A completes
3) submit write B
4) write B completes
5) report guest write Z complete

You're adding a:

4.5) sync write B

Which is ultimately unnecessary if what you care about is avoiding 
reordering of step (2) and (4).  When a write() request completes, 
you're guaranteed that a subsequent read() request will return the 
written data.  That's always true.  If I could do a write(A) followed by 
a write(B) and then read()=A, no software would actually function correctly.

It's important to make sure that you don't get image corruption if (2) 
happens but not (4).  But I think that's okay in qcow2 today.

Regards,

Anthony Liguori

>> And with this fix in place, how much confidence do we have in qcow2 with
>> respect to data integrity on power loss?
>>      
> How do you measure confidence? :-)
>
> There are power failure tests and I don't have any bugs open to that
> respect. I'm not sure how intensively it's tested, though.
>
>    
>> We've shipped every version of QEMU since qcow2 was introduced with
>> known data corruptions.  It sucks big time.  I think it's either that
>> building an image format is a really hard problem akin to making a file
>> system and we shouldn't be in that business or that qcow2 is bad as an
>> image format which makes this all harder than it should be.
>>      
> I tend to say that it's just hard to get right. Most of the problems
> that were fixed in qcow2 over the last year are probably present in our
> VMDK implementation as well, just to pick one example.
>
> Kevin
>    

  reply	other threads:[~2010-08-24 13:29 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-24 10:40 [Qemu-devel] [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" Kevin Wolf
2010-08-24 11:02 ` [Qemu-devel] " Stefan Hajnoczi
2010-08-24 11:06   ` Michael Tokarev
2010-08-24 11:40   ` Kevin Wolf
2010-08-24 11:56     ` Alexander Graf
2010-08-24 12:10       ` Kevin Wolf
2010-08-24 12:12         ` Alexander Graf
2010-08-24 12:18           ` Avi Kivity
2010-08-24 12:21             ` Alexander Graf
2010-08-24 12:27               ` Avi Kivity
2010-08-24 12:35                 ` Kevin Wolf
2010-08-24 12:39                   ` Avi Kivity
2010-08-24 12:53                     ` Kevin Wolf
2010-08-24 12:21     ` Stefan Hajnoczi
2010-08-24 12:23       ` Michael Tokarev
2010-08-24 12:48 ` Juan Quintela
2010-08-24 13:01 ` Anthony Liguori
2010-08-24 13:16   ` Kevin Wolf
2010-08-24 13:29     ` Anthony Liguori [this message]
2010-08-24 13:31       ` Avi Kivity
2010-08-24 13:35         ` Anthony Liguori
2010-08-24 13:39           ` Avi Kivity
2010-08-24 13:40             ` Anthony Liguori
2010-08-24 13:44               ` Avi Kivity
2010-08-24 13:56                 ` Anthony Liguori
2010-08-25  7:14                   ` Avi Kivity
2010-08-25 12:46                     ` Anthony Liguori
2010-08-25 13:07                       ` Avi Kivity
2010-08-25 13:37                         ` Anthony Liguori
2010-08-25 13:23                       ` Avi Kivity
2010-08-25 13:42                         ` Anthony Liguori
2010-08-25 14:00                           ` Avi Kivity
2010-08-25 14:14                             ` Anthony Liguori
2010-08-25 14:36                               ` Avi Kivity
2010-08-25 15:06                                 ` Anthony Liguori
2010-08-25 15:15                                   ` Avi Kivity
2010-08-25 15:21                                     ` Anthony Liguori
2010-08-25 13:46                         ` Anthony Liguori
2010-08-25 14:03                           ` Avi Kivity
2010-08-25 14:19                           ` Christoph Hellwig
2010-08-25 14:37                             ` Avi Kivity
2010-08-25 14:18                         ` Christoph Hellwig
2010-08-25 14:26                           ` Anthony Liguori
2010-08-25 14:49                             ` Daniel P. Berrange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C73C926.3010901@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=hch@lst.de \
    --cc=kwolf@redhat.com \
    --cc=mjt@tls.msk.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.