From: Anthony Liguori <anthony@codemonkey.ws>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>,
Dave Hansen <haveblue@us.ibm.com>,
qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 13:25:59 -0500 [thread overview]
Message-ID: <48346937.80408@codemonkey.ws> (raw)
In-Reply-To: <20080521180852.GI22488@duo.random>
Andrea Arcangeli wrote:
> On Wed, May 21, 2008 at 12:53:52PM -0500, Anthony Liguori wrote:
>
>> MAP_SHARED cannot be done transparently to the guest, that's the motivating
>> reason behind MAP_PRIVATE.
>>
>
> Could you elaborate on what means 'done transparently'? The only
> difference is for writes. When guest writes MAP_PRIVATE will
> copy-on-write. How can it be good if guest generates many
> copy-on-writes and eliminates the cache from the mapping and replaces
> it with anonymous memory?
>
I think we're talking about different things. What I'm talking about is
the following:
Guest issues DMA read from disk at offset N of size M to physical
address X. Today, we essentially read from the backing disk image from
offset N into a temporary buffer of size M, and then memcpy() to
physical address X.
What I would like to do, if N and M are multiples of PAGE_SIZE, is
replace the memory at guest physical address X, with the host's page
cache for N, M. The guest is unaware of this though and it may decide
to reclaim that memory for something else. When this happens, we need
to unmap guest physical address X and replace it with normal memory
(essentially, CoW'ing).
The effect of this would be that if multiple guests are using the same
disk image, they would end up sharing memory transparently.
With MMU notifiers, this is possible by just using mmap(MAP_PRIVATE |
MAP_FIXED) assuming we fix gfn_to_pfn() to take a 'write' parameter,
right now we always write fault CoW mappings because we unconditionally
call get_user_pages with write=1.
As has been pointed out, this is probably not ideal since it would cause
heavy vma fragmentation. We may be able to simulate this using the
slots API although slots are quite similar to vma's in that we optimize
for a small number of them.
I'm not really sure what's the best approach.
Regards,
Anthony Liguori
> I can't see how MAP_PRIVATE could replace O_DIRECT, there's no way to
> write anything to disk with MAP_PRIVATE, msync on a MAP_PRIVATE is a
> pure overhead noop for example, only MAP_SHARED has a chance to modify
> any bit present on disk and it'll require msync at least every time
> the host OS waits for I/O completion and assumes the journal
> metadata/data is written on disk.
>
> The real good thing I see of MAP_PRIVATE/MAP_SHARED vs O_DIRECT, is
> that the guest would boot the second time without triggering reads
> from disks. But after guest is booted, the runtime of the guest is
> likely going to be better with O_DIRECT, the guest has its own
> filesystem caches in the guest memory, replicating them shouldn't pay
> off significantly for the guest runtime even on a laptop, and it
> provides disavantages in the host by polluting host caches already
> existing in the guest, and it'll decrease fairness of the system,
> without mentioning the need of msync for journaling. So besides the
> initial boot time I don't see many advantages for
> MAP_PRIVATE/MAP_SHARED at least unless you're running msdos ;).
>
next prev parent reply other threads:[~2008-05-21 18:26 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36 ` Jamie Lokier
2008-05-20 22:52 ` Paul Brook
2008-05-20 22:59 ` Laurent Vivier
2008-05-21 0:54 ` Paul Brook
2008-05-21 7:59 ` Laurent Vivier
2008-05-21 0:58 ` Anthony Liguori
2008-05-21 1:04 ` Jamie Lokier
2008-05-21 1:05 ` Anthony Liguori
2008-05-21 8:06 ` Kevin Wolf
2008-05-21 1:05 ` Paul Brook
2008-05-21 1:14 ` Anthony Liguori
2008-05-21 8:24 ` Kevin Wolf
2008-05-21 12:26 ` Jamie Lokier
2008-05-21 12:37 ` Avi Kivity
2008-05-21 13:41 ` Jamie Lokier
2008-05-21 13:55 ` Anthony Liguori
2008-05-21 14:17 ` Avi Kivity
2008-05-21 14:26 ` Anthony Liguori
2008-05-21 14:57 ` Avi Kivity
2008-05-21 15:34 ` Jamie Lokier
2008-05-21 16:02 ` Anthony Liguori
2008-05-21 16:24 ` Jamie Lokier
2008-05-21 16:48 ` Avi Kivity
2008-05-21 17:01 ` Andrea Arcangeli
2008-05-21 17:18 ` Avi Kivity
2008-05-21 17:47 ` Andrea Arcangeli
2008-05-21 17:53 ` Anthony Liguori
2008-05-21 18:08 ` Andrea Arcangeli
2008-05-21 18:25 ` Anthony Liguori [this message]
2008-05-21 20:13 ` Andrea Arcangeli
2008-05-21 20:35 ` Anthony Liguori
2008-05-21 20:42 ` Andrea Arcangeli
2008-05-21 18:29 ` Avi Kivity
2008-05-21 16:45 ` Avi Kivity
2008-05-21 16:44 ` Avi Kivity
2008-05-20 23:04 ` Laurent Vivier
2008-05-20 23:13 ` Jamie Lokier
2008-05-21 1:00 ` Anthony Liguori
2008-05-21 1:19 ` Jamie Lokier
2008-05-21 2:12 ` Anthony Liguori
2008-05-21 8:27 ` Andreas Färber
2008-05-21 14:06 ` Anthony Liguori
2008-05-21 15:31 ` Jamie Lokier
2008-05-21 11:43 ` Jamie Lokier
2008-05-23 9:12 ` Laurent Vivier
2008-05-28 7:01 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48346937.80408@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=Laurent.Vivier@bull.net \
--cc=andrea@qumranet.com \
--cc=blauwirbel@gmail.com \
--cc=haveblue@us.ibm.com \
--cc=paul@codesourcery.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).