From: Kevin Wolf <kwolf@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Christoph Hellwig <hch@lst.de>, qemu-devel <qemu-devel@nongnu.org>
Subject: [Qemu-devel] Re: Caching modes
Date: Tue, 21 Sep 2010 10:15:56 +0200 [thread overview]
Message-ID: <4C9869BC.500@redhat.com> (raw)
In-Reply-To: <4C97F9C6.60501@codemonkey.ws>
Am 21.09.2010 02:18, schrieb Anthony Liguori:
> On 09/20/2010 06:17 PM, Christoph Hellwig wrote:
>> On Mon, Sep 20, 2010 at 03:11:31PM -0500, Anthony Liguori wrote:
>>
>>>>> All read and write requests SHOULD avoid any type of caching in the
>>>>> host. Any write request MUST complete after the next level of storage
>>>>> reports that the write request has completed. A flush from the guest
>>>>> MUST complete after all pending I/O requests for the guest have been
>>>>> completed.
>>>>>
>>>>> As an implementation detail, with the raw format, these guarantees are
>>>>> only in place for preallocated images. Sparse images do not provide as
>>>>> strong of a guarantee.
>>>>>
>>>>>
>>>> That's not how cache=none ever worked nor works currently.
>>>>
>>>>
>>> How does it work today compared to what I wrote above?
>>>
>> For the guest point of view it works exactly as you describe
>> cache=writeback. There is no ordering or cache flushing guarantees. By
>> using O_DIRECT we do bypass the host file cache, but we don't even try
>> on the others (disk cache, commiting metadata transaction that are
>> required to actually see the commited data for sparse, preallocated or
>> growing images).
>>
>
> O_DIRECT alone to a pre-allocated file on a normal file system should
> result in the data being visible without any additional metadata
> transactions.
>
> The only time when that isn't true is when dealing with CoW or other
> special filesystem features.
I think preallocated files are the exception, usually people use sparse
files. And even with preallocation, the disk cache is still left.
>> What you describe above is the equivalent of O_DSYNC|O_DIRECT which
>> doesn't exist in current qemu, except that O_DSYNC|O_DIRECT also
>> guarantees the semantics for sparse images. Sparse images really aren't
>> special in any way - preallocaiton using posix_fallocate or COW
>> filesystems like btrfs,nilfs2 or zfs have exactly the same issues.
>>
>>
>>>> | WC enable | WC disable
>>>> -----------------------------------------------
>>>> direct | |
>>>> buffer | |
>>>> buffer + ignore flush | |
>>>>
>>>> currently we only have:
>>>>
>>>> cache=none direct + WC enable
>>>> cache=writeback buffer + WC enable
>>>> cache=writethrough buffer + WC disable
>>>> cache=unsafe buffer + ignore flush + WC enable
>>>>
>>>>
>>> Where does O_DSYNC fit into this chart?
>>>
>> O_DSYNC is used for all WC disable modes.
>>
>>
>>> Do all modern filesystems implement O_DSYNC without generating
>>> additional barriers per request?
>>>
>>> Having a barrier per-write request is ultimately not the right semantic
>>> for any of the modes. However, without the use of O_DSYNC (or
>>> sync_file_range(), which I know you dislike), I don't see how we can
>>> have reasonable semantics without always implementing write back caching
>>> in the host.
>>>
>> Barriers are a Linux-specific implementation details that is in the
>> process of going away, probably in Linux 2.6.37. But if you want
>> O_DSYNC semantics with a volatile disk write cache there is no way
>> around using a cache flush or the FUA bit on all I/O caused by it.
>
> If you have a volatile disk write cache, then we don't need O_DSYNC
> semantics.
What has semantics of a qemu option to do with the host disk write
cache? We always need to provide the same semantics. If anything, we can
take advantage of a host providing write-through/no caches so that we
don't have to issue the flushes ourselves.
>> We
>> currently use the cache flush, and although I plan to experiment a bit
>> more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
>> surprised if they actually are any faster.
>>
>
> The thing I struggle with understanding is that if the guest is sending
> us a write request, why are we sending the underlying disk a write +
> flush request? That doesn't seem logical at all to me.
>
> Even if we advertise WC disable, it should be up to the guest to decide
> when to issue flushes.
Why should a guest ever flush a cache when it's told that this cache
doesn't exist?
Kevin
next prev parent reply other threads:[~2010-09-21 8:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-20 16:53 [Qemu-devel] Caching modes Anthony Liguori
2010-09-20 18:37 ` Blue Swirl
2010-09-20 18:51 ` Anthony Liguori
2010-09-20 19:34 ` [Qemu-devel] " Christoph Hellwig
2010-09-20 20:11 ` Anthony Liguori
2010-09-20 23:17 ` Christoph Hellwig
2010-09-21 0:18 ` Anthony Liguori
2010-09-21 8:15 ` Kevin Wolf [this message]
2010-09-21 14:26 ` Christoph Hellwig
2010-09-21 15:13 ` Anthony Liguori
2010-09-21 20:57 ` Christoph Hellwig
2010-09-21 21:27 ` Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C9869BC.500@redhat.com \
--to=kwolf@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=hch@lst.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.