From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Blue Swirl <blauwirbel@gmail.com>,
Laurent Vivier <Laurent.Vivier@bull.net>,
qemu-devel@nongnu.org, Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 21:29:01 +0300 [thread overview]
Message-ID: <483469ED.1050408@qumranet.com> (raw)
In-Reply-To: <20080521174754.GG22488@duo.random>
Andrea Arcangeli wrote:
> On Wed, May 21, 2008 at 08:18:01PM +0300, Avi Kivity wrote:
>
>> Yes, that's the reason. Here zerocopy is not the motivation; instead, we
>> have host-cached pages that are used directly in the guest. So we get both
>> reduced memory footprint, and host caching. O_DIRECT reduces the memory
>> footprint but kills host caching.
>>
>
> Sure. So MAP_SHARED+remap_file_pages should work just fine to achieve
> zerocopy I/O.
>
>
No, when the guest writes to memory, it will affect the disk, which
doesn't happen with normal memory writes. MAP_PRIVATE is needed.
>> The scenario is desktop/laptop use. For server use O_DIRECT is clearly
>> preferred due to much reduced overhead.
>>
>
> Well in some ways there's more overhead with O_DIRECT because O_DIRECT
> has to call get_user_pages and walk pagetables in software before
> every I/O operation. MAP_SHARED walks them in hardware and it can take
> advantage of the CPU tlb too.
>
> The primary problem of MAP_SHARED isn't the overhead of the operation
> itself that infact will be lower with MAP_SHARED after the cache is
> allocated, but the write throttling and garbage collection of the host
> caches. If you've a 250G guest image, MAP_SHARED will allocate as much
> as 250G of cache and a cp /dev/zero /dev/hdb in the guest will mark
> 100% of guest RAM dirty. The mkclean methods and write throttling for
> MAP_SHARED introduced in reasonably recent kernels can avoid filling
> 100% of host ram with dirty pages, but it still requires write
> throttling and it'll pollute the host caches and it can result in
> large ram allocations in the host having to block before the ram is
> available, the same way as buffered writes in the host (current KVM
> default).
>
I'd do writes via the normal write path, not mmap().
> I think O_DIRECT is the best solution and MAP_SHARED could become a
> secondary option just for certain guest workloads with light I/O where
> fairness isn't even a variable worth considering.
>
> The cost of garbage collection of the mapped caches on the host isn't
> trivial, and I don't mean because the nonlinear rmap logic has to scan
> all pagetables, that's a minor cost compared to shrinking the host
> caches before try_to_unmap is ever invoked etc... Leaving the host
> caches purely for the host usage is surely more fair that won't ever
> lead to one guest doing heavy I/O thrashing the host caches and
> leading to all other guests and host tasks hanging. If they will hang
> for a few msec with O_DIRECT it'll be because they're waiting for I/O
> and the elevator put them on the queue to wait for the disk to return
> ready. But it won't be because of some write throttling during writes,
> or in alloc_pages shrink methods that are calling ->writepage on the
> dirty pages.
>
> The other significant advantage of O_DIRECT is that you won't have to
> call msync to provide journaling.
>
> I think O_DIRECT will work best for all usages, and it looks higher
> priority to me to have than MAP_SHARED. MAP_SHARED will surely result
> in better benchmarks for certain workloads though, imagine 'dd
> if=/dev/hda of=/dev/zero iflag=direct bs=1M count=100 run on the
> guest', it'll read from cache and it will do zero I/O starting from
> the second run with MAP_SHARED ;).
>
> If it was me I'd prefer O_DIRECT by default.
>
>
Certainly O_DIRECT is the normal path. We're considering mmap() as a
way to have both host caching and avoiding double-caching.
> For full disclosure you may also want to read this but I strongly
> disagree with those statements. http://kerneltrap.org/node/7563
>
I disagree with them strongly too. For general purpose applications you
want to avoid O_DIRECT, but special purpose applications that do their
own caching (databases, virtualization, streaming servers), O_DIRECT is
critical.
The kernel's cache management algorithms simply cannot compete with a
specially tuned application, not to mention the additional overhead that
comes from crossing a protection boundary.
[I've worked on a userspace filesystem that took every possible measure
to get the OS out of the way: user level threads, O_DIRECT, aio, large
pages]
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2008-05-21 18:29 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36 ` Jamie Lokier
2008-05-20 22:52 ` Paul Brook
2008-05-20 22:59 ` Laurent Vivier
2008-05-21 0:54 ` Paul Brook
2008-05-21 7:59 ` Laurent Vivier
2008-05-21 0:58 ` Anthony Liguori
2008-05-21 1:04 ` Jamie Lokier
2008-05-21 1:05 ` Anthony Liguori
2008-05-21 8:06 ` Kevin Wolf
2008-05-21 1:05 ` Paul Brook
2008-05-21 1:14 ` Anthony Liguori
2008-05-21 8:24 ` Kevin Wolf
2008-05-21 12:26 ` Jamie Lokier
2008-05-21 12:37 ` Avi Kivity
2008-05-21 13:41 ` Jamie Lokier
2008-05-21 13:55 ` Anthony Liguori
2008-05-21 14:17 ` Avi Kivity
2008-05-21 14:26 ` Anthony Liguori
2008-05-21 14:57 ` Avi Kivity
2008-05-21 15:34 ` Jamie Lokier
2008-05-21 16:02 ` Anthony Liguori
2008-05-21 16:24 ` Jamie Lokier
2008-05-21 16:48 ` Avi Kivity
2008-05-21 17:01 ` Andrea Arcangeli
2008-05-21 17:18 ` Avi Kivity
2008-05-21 17:47 ` Andrea Arcangeli
2008-05-21 17:53 ` Anthony Liguori
2008-05-21 18:08 ` Andrea Arcangeli
2008-05-21 18:25 ` Anthony Liguori
2008-05-21 20:13 ` Andrea Arcangeli
2008-05-21 20:35 ` Anthony Liguori
2008-05-21 20:42 ` Andrea Arcangeli
2008-05-21 18:29 ` Avi Kivity [this message]
2008-05-21 16:45 ` Avi Kivity
2008-05-21 16:44 ` Avi Kivity
2008-05-20 23:04 ` Laurent Vivier
2008-05-20 23:13 ` Jamie Lokier
2008-05-21 1:00 ` Anthony Liguori
2008-05-21 1:19 ` Jamie Lokier
2008-05-21 2:12 ` Anthony Liguori
2008-05-21 8:27 ` Andreas Färber
2008-05-21 14:06 ` Anthony Liguori
2008-05-21 15:31 ` Jamie Lokier
2008-05-21 11:43 ` Jamie Lokier
2008-05-23 9:12 ` Laurent Vivier
2008-05-28 7:01 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=483469ED.1050408@qumranet.com \
--to=avi@qumranet.com \
--cc=Laurent.Vivier@bull.net \
--cc=andrea@qumranet.com \
--cc=blauwirbel@gmail.com \
--cc=paul@codesourcery.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).