From: Andrea Arcangeli <andrea@qumranet.com>
To: Avi Kivity <avi@qumranet.com>
Cc: Blue Swirl <blauwirbel@gmail.com>,
Laurent Vivier <Laurent.Vivier@bull.net>,
qemu-devel@nongnu.org, Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 19:47:54 +0200 [thread overview]
Message-ID: <20080521174754.GG22488@duo.random> (raw)
In-Reply-To: <48345949.4050903@qumranet.com>
On Wed, May 21, 2008 at 08:18:01PM +0300, Avi Kivity wrote:
> Yes, that's the reason. Here zerocopy is not the motivation; instead, we
> have host-cached pages that are used directly in the guest. So we get both
> reduced memory footprint, and host caching. O_DIRECT reduces the memory
> footprint but kills host caching.
Sure. So MAP_SHARED+remap_file_pages should work just fine to achieve
zerocopy I/O.
> The scenario is desktop/laptop use. For server use O_DIRECT is clearly
> preferred due to much reduced overhead.
Well in some ways there's more overhead with O_DIRECT because O_DIRECT
has to call get_user_pages and walk pagetables in software before
every I/O operation. MAP_SHARED walks them in hardware and it can take
advantage of the CPU tlb too.
The primary problem of MAP_SHARED isn't the overhead of the operation
itself that infact will be lower with MAP_SHARED after the cache is
allocated, but the write throttling and garbage collection of the host
caches. If you've a 250G guest image, MAP_SHARED will allocate as much
as 250G of cache and a cp /dev/zero /dev/hdb in the guest will mark
100% of guest RAM dirty. The mkclean methods and write throttling for
MAP_SHARED introduced in reasonably recent kernels can avoid filling
100% of host ram with dirty pages, but it still requires write
throttling and it'll pollute the host caches and it can result in
large ram allocations in the host having to block before the ram is
available, the same way as buffered writes in the host (current KVM
default).
I think O_DIRECT is the best solution and MAP_SHARED could become a
secondary option just for certain guest workloads with light I/O where
fairness isn't even a variable worth considering.
The cost of garbage collection of the mapped caches on the host isn't
trivial, and I don't mean because the nonlinear rmap logic has to scan
all pagetables, that's a minor cost compared to shrinking the host
caches before try_to_unmap is ever invoked etc... Leaving the host
caches purely for the host usage is surely more fair that won't ever
lead to one guest doing heavy I/O thrashing the host caches and
leading to all other guests and host tasks hanging. If they will hang
for a few msec with O_DIRECT it'll be because they're waiting for I/O
and the elevator put them on the queue to wait for the disk to return
ready. But it won't be because of some write throttling during writes,
or in alloc_pages shrink methods that are calling ->writepage on the
dirty pages.
The other significant advantage of O_DIRECT is that you won't have to
call msync to provide journaling.
I think O_DIRECT will work best for all usages, and it looks higher
priority to me to have than MAP_SHARED. MAP_SHARED will surely result
in better benchmarks for certain workloads though, imagine 'dd
if=/dev/hda of=/dev/zero iflag=direct bs=1M count=100 run on the
guest', it'll read from cache and it will do zero I/O starting from
the second run with MAP_SHARED ;).
If it was me I'd prefer O_DIRECT by default.
For full disclosure you may also want to read this but I strongly
disagree with those statements. http://kerneltrap.org/node/7563
next prev parent reply other threads:[~2008-05-21 17:48 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36 ` Jamie Lokier
2008-05-20 22:52 ` Paul Brook
2008-05-20 22:59 ` Laurent Vivier
2008-05-21 0:54 ` Paul Brook
2008-05-21 7:59 ` Laurent Vivier
2008-05-21 0:58 ` Anthony Liguori
2008-05-21 1:04 ` Jamie Lokier
2008-05-21 1:05 ` Anthony Liguori
2008-05-21 8:06 ` Kevin Wolf
2008-05-21 1:05 ` Paul Brook
2008-05-21 1:14 ` Anthony Liguori
2008-05-21 8:24 ` Kevin Wolf
2008-05-21 12:26 ` Jamie Lokier
2008-05-21 12:37 ` Avi Kivity
2008-05-21 13:41 ` Jamie Lokier
2008-05-21 13:55 ` Anthony Liguori
2008-05-21 14:17 ` Avi Kivity
2008-05-21 14:26 ` Anthony Liguori
2008-05-21 14:57 ` Avi Kivity
2008-05-21 15:34 ` Jamie Lokier
2008-05-21 16:02 ` Anthony Liguori
2008-05-21 16:24 ` Jamie Lokier
2008-05-21 16:48 ` Avi Kivity
2008-05-21 17:01 ` Andrea Arcangeli
2008-05-21 17:18 ` Avi Kivity
2008-05-21 17:47 ` Andrea Arcangeli [this message]
2008-05-21 17:53 ` Anthony Liguori
2008-05-21 18:08 ` Andrea Arcangeli
2008-05-21 18:25 ` Anthony Liguori
2008-05-21 20:13 ` Andrea Arcangeli
2008-05-21 20:35 ` Anthony Liguori
2008-05-21 20:42 ` Andrea Arcangeli
2008-05-21 18:29 ` Avi Kivity
2008-05-21 16:45 ` Avi Kivity
2008-05-21 16:44 ` Avi Kivity
2008-05-20 23:04 ` Laurent Vivier
2008-05-20 23:13 ` Jamie Lokier
2008-05-21 1:00 ` Anthony Liguori
2008-05-21 1:19 ` Jamie Lokier
2008-05-21 2:12 ` Anthony Liguori
2008-05-21 8:27 ` Andreas Färber
2008-05-21 14:06 ` Anthony Liguori
2008-05-21 15:31 ` Jamie Lokier
2008-05-21 11:43 ` Jamie Lokier
2008-05-23 9:12 ` Laurent Vivier
2008-05-28 7:01 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080521174754.GG22488@duo.random \
--to=andrea@qumranet.com \
--cc=Laurent.Vivier@bull.net \
--cc=avi@qumranet.com \
--cc=blauwirbel@gmail.com \
--cc=paul@codesourcery.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).