public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Christoph Hellwig <hch@infradead.org>
Cc: Michael Tokarev <mjt@tls.msk.ru>, KVM list <kvm@vger.kernel.org>,
	Kevin Wolf <kwolf@redhat.com>
Subject: Re: JFYI: ext4 bug triggerable by kvm
Date: Tue, 17 Aug 2010 09:54:07 -0500	[thread overview]
Message-ID: <4C6AA28F.1000605@codemonkey.ws> (raw)
In-Reply-To: <20100817144507.GA10280@infradead.org>

On 08/17/2010 09:45 AM, Christoph Hellwig wrote:
> On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote:
>    
>> The type of cache we present to the guest only should relate to how
>> the hypervisor caches the storage.  It should be independent of how
>> data is cached by the disk.
>>      
> It is.
>
>    
>> There can be many levels of caching in a storage hierarchy and each
>> hierarchy cached independently of the next level.
>>
>> If the user has a disk with a writeback cache, if we expose a
>> writethrough cache to the guest, it's not our responsibility to make
>> sure that we break through the writeback cache on the disk.
>>      
> The users doesn't know or have to care about the caching.  The
> users uses O_SYNC/fsync to tell it wants data on disk, and it's the
> operating systems job to make that happen.   The situation with qemu
> is the same - if we tell the guest that we do not have a volatile write
> cache that needs explicit management the guest can rely on the fact
> that it does not have to do manual cache management.
>    

This is simply unrealistic.  O_SYNC might force data to be on a platter 
when using a directly attached disk but many NAS's actually do writeback 
caching and relying on having an UPS to preserve data integrity.  
There's really no way in the general case to ensure that data is 
actually on a platter once you've involved a complex storage setup or 
you assume FUA

Let me put it another way.  If an admin knows the disks on a machine 
have battery backed cache, he's likely to leave writeback caching enabled.

We are currently giving the admin two choices with QEMU, either ignore 
the fact that the disk is battery backed and do write through caching of 
the disk or do writeback caching in the host which expands the disk 
cache from something very small and non-volatile (the on-disk cache) to 
something very large and volatile (the page cache).  To make the page 
cache non-volatile, you would need to have an UPS for the hypervisor 
with enough power to flush the page cache.

So basically, we're not presenting a model that makes sensible use of 
reliable disks.

cache=none does the right thing here but doesn't benefit from the host's 
page cache for reads.  This is really the missing behavior.

Regards,

Anthony Liguori



  parent reply	other threads:[~2010-08-17 14:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-16 14:00 JFYI: ext4 bug triggerable by kvm Michael Tokarev
2010-08-16 14:43 ` Anthony Liguori
2010-08-16 18:42   ` Christoph Hellwig
2010-08-16 20:34     ` Anthony Liguori
2010-08-17  9:07       ` Christoph Hellwig
2010-08-17  9:23         ` Avi Kivity
2010-08-17 11:17           ` Christoph Hellwig
2010-08-17 12:56         ` Anthony Liguori
2010-08-17 13:07           ` Christoph Hellwig
2010-08-17 14:20             ` Anthony Liguori
2010-08-17 14:28               ` Christoph Hellwig
2010-08-17 14:39                 ` Anthony Liguori
2010-08-17 14:45                   ` Christoph Hellwig
2010-08-17 14:53                     ` Avi Kivity
2010-08-17 14:54                     ` Anthony Liguori [this message]
2010-08-17 15:01                       ` Avi Kivity
2010-08-17 15:02                       ` Christoph Hellwig
2010-08-17 14:40                 ` Michael Tokarev
2010-08-17 14:44                   ` Anthony Liguori
2010-08-17 14:46                     ` Christoph Hellwig
2010-08-17 14:57                       ` Anthony Liguori
2010-08-17 14:59                       ` Avi Kivity
2010-08-17 15:04                         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C6AA28F.1000605@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=hch@infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mjt@tls.msk.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox