Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Blue Swirl <blauwirbel@gmail.com>,
	Laurent Vivier <Laurent.Vivier@bull.net>,
	qemu-devel@nongnu.org, Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 21:29:01 +0300	[thread overview]
Message-ID: <483469ED.1050408@qumranet.com> (raw)
In-Reply-To: <20080521174754.GG22488@duo.random>

Andrea Arcangeli wrote:
> On Wed, May 21, 2008 at 08:18:01PM +0300, Avi Kivity wrote:
>   
>> Yes, that's the reason.  Here zerocopy is not the motivation; instead, we 
>> have host-cached pages that are used directly in the guest.  So we get both 
>> reduced memory footprint, and host caching.  O_DIRECT reduces the memory 
>> footprint but kills host caching.
>>     
>
> Sure. So MAP_SHARED+remap_file_pages should work just fine to achieve
> zerocopy I/O.
>
>   

No, when the guest writes to memory, it will affect the disk, which 
doesn't happen with normal memory writes.  MAP_PRIVATE is needed.

>> The scenario is desktop/laptop use.  For server use O_DIRECT is clearly 
>> preferred due to much reduced overhead.
>>     
>
> Well in some ways there's more overhead with O_DIRECT because O_DIRECT
> has to call get_user_pages and walk pagetables in software before
> every I/O operation. MAP_SHARED walks them in hardware and it can take
> advantage of the CPU tlb too.
>
> The primary problem of MAP_SHARED isn't the overhead of the operation
> itself that infact will be lower with MAP_SHARED after the cache is
> allocated, but the write throttling and garbage collection of the host
> caches. If you've a 250G guest image, MAP_SHARED will allocate as much
> as 250G of cache and a cp /dev/zero /dev/hdb in the guest will mark
> 100% of guest RAM dirty. The mkclean methods and write throttling for
> MAP_SHARED introduced in reasonably recent kernels can avoid filling
> 100% of host ram with dirty pages, but it still requires write
> throttling and it'll pollute the host caches and it can result in
> large ram allocations in the host having to block before the ram is
> available, the same way as buffered writes in the host (current KVM
> default).
>   

I'd do writes via the normal write path, not mmap().

> I think O_DIRECT is the best solution and MAP_SHARED could become a
> secondary option just for certain guest workloads with light I/O where
> fairness isn't even a variable worth considering.
>
> The cost of garbage collection of the mapped caches on the host isn't
> trivial, and I don't mean because the nonlinear rmap logic has to scan
> all pagetables, that's a minor cost compared to shrinking the host
> caches before try_to_unmap is ever invoked etc... Leaving the host
> caches purely for the host usage is surely more fair that won't ever
> lead to one guest doing heavy I/O thrashing the host caches and
> leading to all other guests and host tasks hanging. If they will hang
> for a few msec with O_DIRECT it'll be because they're waiting for I/O
> and the elevator put them on the queue to wait for the disk to return
> ready. But it won't be because of some write throttling during writes,
> or in alloc_pages shrink methods that are calling ->writepage on the
> dirty pages.
>
> The other significant advantage of O_DIRECT is that you won't have to
> call msync to provide journaling.
>
> I think O_DIRECT will work best for all usages, and it looks higher
> priority to me to have than MAP_SHARED. MAP_SHARED will surely result
> in better benchmarks for certain workloads though, imagine 'dd
> if=/dev/hda of=/dev/zero iflag=direct bs=1M count=100 run on the
> guest', it'll read from cache and it will do zero I/O starting from
> the second run with MAP_SHARED ;).
>
> If it was me I'd prefer O_DIRECT by default.
>
>   

Certainly O_DIRECT is the normal path.  We're considering mmap() as a 
way to have both host caching and avoiding double-caching.

> For full disclosure you may also want to read this but I strongly
> disagree with those statements. http://kerneltrap.org/node/7563
>   

I disagree with them strongly too.  For general purpose applications you 
want to avoid O_DIRECT, but special purpose applications that do their 
own caching (databases, virtualization, streaming servers), O_DIRECT is 
critical.

The kernel's cache management algorithms simply cannot compete with a 
specially tuned application, not to mention the additional overhead that 
comes from crossing a protection boundary.

[I've worked on a userspace filesystem that took every possible measure 
to get the OS out of the way: user level threads, O_DIRECT, aio, large 
pages]

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

next prev parent reply	other threads:[~2008-05-21 18:29 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36   ` Jamie Lokier
2008-05-20 22:52     ` Paul Brook
2008-05-20 22:59       ` Laurent Vivier
2008-05-21  0:54         ` Paul Brook
2008-05-21  7:59           ` Laurent Vivier
2008-05-21  0:58       ` Anthony Liguori
2008-05-21  1:04         ` Jamie Lokier
2008-05-21  1:05         ` Anthony Liguori
2008-05-21  8:06           ` Kevin Wolf
2008-05-21  1:05         ` Paul Brook
2008-05-21  1:14           ` Anthony Liguori
2008-05-21  8:24             ` Kevin Wolf
2008-05-21 12:26               ` Jamie Lokier
2008-05-21 12:37                 ` Avi Kivity
2008-05-21 13:41                   ` Jamie Lokier
2008-05-21 13:55                     ` Anthony Liguori
2008-05-21 14:17                       ` Avi Kivity
2008-05-21 14:26                         ` Anthony Liguori
2008-05-21 14:57                           ` Avi Kivity
2008-05-21 15:34                             ` Jamie Lokier
2008-05-21 16:02                               ` Anthony Liguori
2008-05-21 16:24                                 ` Jamie Lokier
2008-05-21 16:48                                   ` Avi Kivity
2008-05-21 17:01                                     ` Andrea Arcangeli
2008-05-21 17:18                                       ` Avi Kivity
2008-05-21 17:47                                         ` Andrea Arcangeli
2008-05-21 17:53                                           ` Anthony Liguori
2008-05-21 18:08                                             ` Andrea Arcangeli
2008-05-21 18:25                                               ` Anthony Liguori
2008-05-21 20:13                                                 ` Andrea Arcangeli
2008-05-21 20:35                                                   ` Anthony Liguori
2008-05-21 20:42                                                     ` Andrea Arcangeli
2008-05-21 18:29                                           ` Avi Kivity [this message]
2008-05-21 16:45                                 ` Avi Kivity
2008-05-21 16:44                               ` Avi Kivity
2008-05-20 23:04     ` Laurent Vivier
2008-05-20 23:13       ` Jamie Lokier
2008-05-21  1:00     ` Anthony Liguori
2008-05-21  1:19       ` Jamie Lokier
2008-05-21  2:12         ` Anthony Liguori
2008-05-21  8:27           ` Andreas Färber
2008-05-21 14:06             ` Anthony Liguori
2008-05-21 15:31               ` Jamie Lokier
2008-05-21 11:43           ` Jamie Lokier
2008-05-23  9:12   ` Laurent Vivier
2008-05-28  7:01     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483469ED.1050408@qumranet.com \
    --to=avi@qumranet.com \
    --cc=Laurent.Vivier@bull.net \
    --cc=andrea@qumranet.com \
    --cc=blauwirbel@gmail.com \
    --cc=paul@codesourcery.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).