All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Blue Swirl <blauwirbel@gmail.com>,
	Laurent Vivier <Laurent.Vivier@bull.net>,
	qemu-devel@nongnu.org, Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 21:29:01 +0300	[thread overview]
Message-ID: <483469ED.1050408@qumranet.com> (raw)
In-Reply-To: <20080521174754.GG22488@duo.random>

Andrea Arcangeli wrote:
> On Wed, May 21, 2008 at 08:18:01PM +0300, Avi Kivity wrote:
>   
>> Yes, that's the reason.  Here zerocopy is not the motivation; instead, we 
>> have host-cached pages that are used directly in the guest.  So we get both 
>> reduced memory footprint, and host caching.  O_DIRECT reduces the memory 
>> footprint but kills host caching.
>>     
>
> Sure. So MAP_SHARED+remap_file_pages should work just fine to achieve
> zerocopy I/O.
>
>   

No, when the guest writes to memory, it will affect the disk, which 
doesn't happen with normal memory writes.  MAP_PRIVATE is needed.

>> The scenario is desktop/laptop use.  For server use O_DIRECT is clearly 
>> preferred due to much reduced overhead.
>>     
>
> Well in some ways there's more overhead with O_DIRECT because O_DIRECT
> has to call get_user_pages and walk pagetables in software before
> every I/O operation. MAP_SHARED walks them in hardware and it can take
> advantage of the CPU tlb too.
>
> The primary problem of MAP_SHARED isn't the overhead of the operation
> itself that infact will be lower with MAP_SHARED after the cache is
> allocated, but the write throttling and garbage collection of the host
> caches. If you've a 250G guest image, MAP_SHARED will allocate as much
> as 250G of cache and a cp /dev/zero /dev/hdb in the guest will mark
> 100% of guest RAM dirty. The mkclean methods and write throttling for
> MAP_SHARED introduced in reasonably recent kernels can avoid filling
> 100% of host ram with dirty pages, but it still requires write
> throttling and it'll pollute the host caches and it can result in
> large ram allocations in the host having to block before the ram is
> available, the same way as buffered writes in the host (current KVM
> default).
>   

I'd do writes via the normal write path, not mmap().

> I think O_DIRECT is the best solution and MAP_SHARED could become a
> secondary option just for certain guest workloads with light I/O where
> fairness isn't even a variable worth considering.
>
> The cost of garbage collection of the mapped caches on the host isn't
> trivial, and I don't mean because the nonlinear rmap logic has to scan
> all pagetables, that's a minor cost compared to shrinking the host
> caches before try_to_unmap is ever invoked etc... Leaving the host
> caches purely for the host usage is surely more fair that won't ever
> lead to one guest doing heavy I/O thrashing the host caches and
> leading to all other guests and host tasks hanging. If they will hang
> for a few msec with O_DIRECT it'll be because they're waiting for I/O
> and the elevator put them on the queue to wait for the disk to return
> ready. But it won't be because of some write throttling during writes,
> or in alloc_pages shrink methods that are calling ->writepage on the
> dirty pages.
>
> The other significant advantage of O_DIRECT is that you won't have to
> call msync to provide journaling.
>
> I think O_DIRECT will work best for all usages, and it looks higher
> priority to me to have than MAP_SHARED. MAP_SHARED will surely result
> in better benchmarks for certain workloads though, imagine 'dd
> if=/dev/hda of=/dev/zero iflag=direct bs=1M count=100 run on the
> guest', it'll read from cache and it will do zero I/O starting from
> the second run with MAP_SHARED ;).
>
> If it was me I'd prefer O_DIRECT by default.
>
>   

Certainly O_DIRECT is the normal path.  We're considering mmap() as a 
way to have both host caching and avoiding double-caching.

> For full disclosure you may also want to read this but I strongly
> disagree with those statements. http://kerneltrap.org/node/7563
>   

I disagree with them strongly too.  For general purpose applications you 
want to avoid O_DIRECT, but special purpose applications that do their 
own caching (databases, virtualization, streaming servers), O_DIRECT is 
critical.

The kernel's cache management algorithms simply cannot compete with a 
specially tuned application, not to mention the additional overhead that 
comes from crossing a protection boundary.

[I've worked on a userspace filesystem that took every possible measure 
to get the OS out of the way: user level threads, O_DIRECT, aio, large 
pages]

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

  parent reply	other threads:[~2008-05-21 18:29 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36   ` Jamie Lokier
2008-05-20 22:52     ` Paul Brook
2008-05-20 22:59       ` Laurent Vivier
2008-05-21  0:54         ` Paul Brook
2008-05-21  7:59           ` Laurent Vivier
2008-05-21  0:58       ` Anthony Liguori
2008-05-21  1:04         ` Jamie Lokier
2008-05-21  1:05         ` Anthony Liguori
2008-05-21  8:06           ` Kevin Wolf
2008-05-21  1:05         ` Paul Brook
2008-05-21  1:14           ` Anthony Liguori
2008-05-21  8:24             ` Kevin Wolf
2008-05-21 12:26               ` Jamie Lokier
2008-05-21 12:37                 ` Avi Kivity
2008-05-21 13:41                   ` Jamie Lokier
2008-05-21 13:55                     ` Anthony Liguori
2008-05-21 14:17                       ` Avi Kivity
2008-05-21 14:26                         ` Anthony Liguori
2008-05-21 14:57                           ` Avi Kivity
2008-05-21 15:34                             ` Jamie Lokier
2008-05-21 16:02                               ` Anthony Liguori
2008-05-21 16:24                                 ` Jamie Lokier
2008-05-21 16:48                                   ` Avi Kivity
2008-05-21 17:01                                     ` Andrea Arcangeli
2008-05-21 17:18                                       ` Avi Kivity
2008-05-21 17:47                                         ` Andrea Arcangeli
2008-05-21 17:53                                           ` Anthony Liguori
2008-05-21 18:08                                             ` Andrea Arcangeli
2008-05-21 18:25                                               ` Anthony Liguori
2008-05-21 20:13                                                 ` Andrea Arcangeli
2008-05-21 20:35                                                   ` Anthony Liguori
2008-05-21 20:42                                                     ` Andrea Arcangeli
2008-05-21 18:29                                           ` Avi Kivity [this message]
2008-05-21 16:45                                 ` Avi Kivity
2008-05-21 16:44                               ` Avi Kivity
2008-05-20 23:04     ` Laurent Vivier
2008-05-20 23:13       ` Jamie Lokier
2008-05-21  1:00     ` Anthony Liguori
2008-05-21  1:19       ` Jamie Lokier
2008-05-21  2:12         ` Anthony Liguori
2008-05-21  8:27           ` Andreas Färber
2008-05-21 14:06             ` Anthony Liguori
2008-05-21 15:31               ` Jamie Lokier
2008-05-21 11:43           ` Jamie Lokier
2008-05-23  9:12   ` Laurent Vivier
2008-05-28  7:01     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483469ED.1050408@qumranet.com \
    --to=avi@qumranet.com \
    --cc=Laurent.Vivier@bull.net \
    --cc=andrea@qumranet.com \
    --cc=blauwirbel@gmail.com \
    --cc=paul@codesourcery.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.