qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>,
	Dave Hansen <haveblue@us.ibm.com>,
	qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
	Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 22:13:35 +0200	[thread overview]
Message-ID: <20080521201335.GK22488@duo.random> (raw)
In-Reply-To: <48346937.80408@codemonkey.ws>

On Wed, May 21, 2008 at 01:25:59PM -0500, Anthony Liguori wrote:
> I think we're talking about different things.  What I'm talking about is 
> the following:
>
> Guest issues DMA read from disk at offset N of size M to physical address 
> X.   Today, we essentially read from the backing disk image from offset N 
> into a temporary buffer of size M, and then memcpy() to physical address X.
>
> What I would like to do, if N and M are multiples of PAGE_SIZE, is replace 
> the memory at guest physical address X, with the host's page cache for N, 
> M.  The guest is unaware of this though and it may decide to reclaim that 
> memory for something else.  When this happens, we need to unmap guest 
> physical address X and replace it with normal memory (essentially, 
> CoW'ing).
>
> The effect of this would be that if multiple guests are using the same disk 
> image, they would end up sharing memory transparently.
>
> With MMU notifiers, this is possible by just using mmap(MAP_PRIVATE | 
> MAP_FIXED) assuming we fix gfn_to_pfn() to take a 'write' parameter, right 
> now we always write fault CoW mappings because we unconditionally call 
> get_user_pages with write=1.

Ok, now I exactly see what you're going after. So it'd save memory
yes, but only with -snapshot... And it'd be zerocopy yes, but it'd
need to flush the tlb of all cpus (both regular pte and spte too) with
ipis for every pte overwritten as the old pte could be cached in the
tlb even if this won't require further writes to the cache. ipis are
likely more costly than a local memcpy of 4k region. One thing is
calling get_user_pages in O_DIRECT to only know which is the physical
page the DMA should be directed to (in our case the anonymous page
pointed by the gpa), one thing is to mangle ptes and having to update
the tlbs for each emulated dma operation etc...

> As has been pointed out, this is probably not ideal since it would cause 
> heavy vma fragmentation.  We may be able to simulate this using the slots 
> API although slots are quite similar to vma's in that we optimize for a 
> small number of them.

I'm quite sure remap_file_pages can be extended to work on
MAP_PRIVATE. But I don't see the big benefit in sharing the ram
between host and guest, when having it in the guest is enough and this
only works for read anyway and it can only share ram among different
guests with -snapshot.

So while it sounds a clever trick, I doubt it's a worthwhile
optimization, it has downsides, and the worst is that I don't see how
we could extend this logic to work for writes because the pagecache of
the guest can't be written on disk before the dma is explicitly
started on the guest.

  reply	other threads:[~2008-05-21 20:13 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36   ` Jamie Lokier
2008-05-20 22:52     ` Paul Brook
2008-05-20 22:59       ` Laurent Vivier
2008-05-21  0:54         ` Paul Brook
2008-05-21  7:59           ` Laurent Vivier
2008-05-21  0:58       ` Anthony Liguori
2008-05-21  1:04         ` Jamie Lokier
2008-05-21  1:05         ` Anthony Liguori
2008-05-21  8:06           ` Kevin Wolf
2008-05-21  1:05         ` Paul Brook
2008-05-21  1:14           ` Anthony Liguori
2008-05-21  8:24             ` Kevin Wolf
2008-05-21 12:26               ` Jamie Lokier
2008-05-21 12:37                 ` Avi Kivity
2008-05-21 13:41                   ` Jamie Lokier
2008-05-21 13:55                     ` Anthony Liguori
2008-05-21 14:17                       ` Avi Kivity
2008-05-21 14:26                         ` Anthony Liguori
2008-05-21 14:57                           ` Avi Kivity
2008-05-21 15:34                             ` Jamie Lokier
2008-05-21 16:02                               ` Anthony Liguori
2008-05-21 16:24                                 ` Jamie Lokier
2008-05-21 16:48                                   ` Avi Kivity
2008-05-21 17:01                                     ` Andrea Arcangeli
2008-05-21 17:18                                       ` Avi Kivity
2008-05-21 17:47                                         ` Andrea Arcangeli
2008-05-21 17:53                                           ` Anthony Liguori
2008-05-21 18:08                                             ` Andrea Arcangeli
2008-05-21 18:25                                               ` Anthony Liguori
2008-05-21 20:13                                                 ` Andrea Arcangeli [this message]
2008-05-21 20:35                                                   ` Anthony Liguori
2008-05-21 20:42                                                     ` Andrea Arcangeli
2008-05-21 18:29                                           ` Avi Kivity
2008-05-21 16:45                                 ` Avi Kivity
2008-05-21 16:44                               ` Avi Kivity
2008-05-20 23:04     ` Laurent Vivier
2008-05-20 23:13       ` Jamie Lokier
2008-05-21  1:00     ` Anthony Liguori
2008-05-21  1:19       ` Jamie Lokier
2008-05-21  2:12         ` Anthony Liguori
2008-05-21  8:27           ` Andreas Färber
2008-05-21 14:06             ` Anthony Liguori
2008-05-21 15:31               ` Jamie Lokier
2008-05-21 11:43           ` Jamie Lokier
2008-05-23  9:12   ` Laurent Vivier
2008-05-28  7:01     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080521201335.GK22488@duo.random \
    --to=andrea@qumranet.com \
    --cc=Laurent.Vivier@bull.net \
    --cc=anthony@codemonkey.ws \
    --cc=blauwirbel@gmail.com \
    --cc=haveblue@us.ibm.com \
    --cc=paul@codesourcery.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).