From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Chris Wright <chrisw@redhat.com>,
Mark McLoughlin <markmc@redhat.com>,
kvm-devel <kvm-devel@lists.sourceforge.net>,
Laurent Vivier <Laurent.Vivier@bull.net>,
qemu-devel@nongnu.org, Ryan Harper <ryanh@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Date: Sun, 12 Oct 2008 20:34:08 +0200 [thread overview]
Message-ID: <48F24320.9010201@redhat.com> (raw)
In-Reply-To: <48F23AF1.2000104@codemonkey.ws>
Anthony Liguori wrote:
>>
>> Getting good performance because we have a huge amount of free memory
>> in the host is not a good benchmark. Under most circumstances, the
>> free memory will be used either for more guests, or will be given to
>> the existing guests, which can utilize it more efficiently than the
>> host.
>
> There's two arguments for O_DIRECT. The first is that you can avoid
> bringing in data into CPU cache. This requires zero-copy in QEMU but
> ignoring that, the use of the page cache doesn't necessarily prevent
> us from achieving this.
>
> In the future, most systems will have a DMA offload engine. This is a
> pretty obvious thing to attempt to accelerate with such an engine
> which would prevent cache pollution.
But would increase latency, memory bus utilization, and cpu overhead.
In the cases where the page cache buys us something (host page cache
significantly larger than guest size), that's understandable. But for
the other cases, why bother? Especially when many systems don't have
this today.
Let me phrase this another way: is there an argument against O_DIRECT?
In a significant fraction of deployments it will be both simpler and faster.
> Another possibility is to directly map the host's page cache into the
> guest's memory space.
>
Doesn't work with large pages.
> The later is a bit tricky but is so much more interesting especially
> if you have a strong storage backend that is capable of
> deduplification (you get memory compaction for free).
>
It's not free at all. Replacing a guest memory page involves IPIs and
TLB flushes. It only works on small pages, and if the host page cache
and guest page cache are aligned with each other. And with current
Linux memory management, I don't see a way to do it that doesn't involve
creating a vma for every page, which is prohibitively expensive.
> I also have my doubts that the amount of memory saved by using
> O_DIRECT will have a noticable impact on performance considering that
> guest memory and page cache memory are entirely reclaimable.
O_DIRECT is not about saving memory, it is about saving cpu utilization,
cache utilization, and memory bandwidth.
> An LRU should make the best decisions about whether memory is more
> valuable for the guests or for the host page cache.
>
LRU typically makes fairly bad decisions since it throws most of the
information it has away. I recommend looking up LRU-K and similar
algorithms, just to get a feel for this; it is basically the simplest
possible algorithm short of random selection.
Note that Linux doesn't even have an LRU; it has to approximate since it
can't sample all of the pages all of the time. With a hypervisor that
uses Intel's EPT, it's even worse since we don't have an accessed bit.
On silly benchmarks that just exercise the disk and touch no memory, and
if you tune the host very aggresively, LRU will win on long running
guests since it will eventually page out all unused guest memory (with
Linux guests, it will never even page guest memory in). On real life
applications I don't think there is much chance.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2008-10-12 18:35 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-09 17:00 [Qemu-devel] [RFC] Disk integrity in QEMU Anthony Liguori
2008-10-10 7:54 ` Gerd Hoffmann
2008-10-10 8:12 ` Mark McLoughlin
2008-10-12 23:10 ` Jamie Lokier
2008-10-14 17:15 ` Avi Kivity
2008-10-10 9:32 ` Avi Kivity
2008-10-12 23:00 ` Jamie Lokier
2008-10-10 8:11 ` Aurelien Jarno
2008-10-10 12:26 ` Anthony Liguori
2008-10-10 12:53 ` Paul Brook
2008-10-10 13:55 ` Anthony Liguori
2008-10-10 14:05 ` Paul Brook
2008-10-10 14:19 ` Avi Kivity
2008-10-17 13:14 ` Jens Axboe
2008-10-19 9:13 ` Avi Kivity
2008-10-10 15:48 ` Aurelien Jarno
2008-10-10 9:16 ` Avi Kivity
2008-10-10 9:58 ` Daniel P. Berrange
2008-10-10 10:26 ` Avi Kivity
2008-10-10 12:59 ` Paul Brook
2008-10-10 13:20 ` Avi Kivity
2008-10-10 12:34 ` Anthony Liguori
2008-10-10 12:56 ` Avi Kivity
2008-10-11 9:07 ` andrzej zaborowski
2008-10-11 17:54 ` Mark Wagner
2008-10-11 20:35 ` Anthony Liguori
2008-10-12 0:43 ` Mark Wagner
2008-10-12 1:50 ` Chris Wright
2008-10-12 16:22 ` Jamie Lokier
2008-10-12 17:54 ` Anthony Liguori
2008-10-12 18:14 ` nuitari-qemu
2008-10-13 0:27 ` Mark Wagner
2008-10-13 1:21 ` Anthony Liguori
2008-10-13 2:09 ` Mark Wagner
2008-10-13 3:16 ` Anthony Liguori
2008-10-13 6:42 ` Aurelien Jarno
2008-10-13 14:38 ` Steve Ofsthun
2008-10-12 0:44 ` Chris Wright
2008-10-12 10:21 ` Avi Kivity
2008-10-12 14:37 ` Dor Laor
2008-10-12 15:35 ` Jamie Lokier
2008-10-12 18:00 ` Anthony Liguori
2008-10-12 18:02 ` Anthony Liguori
2008-10-15 10:17 ` Andrea Arcangeli
2008-10-12 17:59 ` Anthony Liguori
2008-10-12 18:34 ` Avi Kivity [this message]
2008-10-12 19:33 ` Izik Eidus
2008-10-14 17:08 ` Avi Kivity
2008-10-12 19:59 ` Anthony Liguori
2008-10-12 20:43 ` Avi Kivity
2008-10-12 21:11 ` Anthony Liguori
2008-10-14 15:21 ` Avi Kivity
2008-10-14 15:32 ` Anthony Liguori
2008-10-14 15:43 ` Avi Kivity
2008-10-14 19:25 ` Laurent Vivier
2008-10-16 9:47 ` Avi Kivity
2008-10-12 10:12 ` Avi Kivity
2008-10-17 13:20 ` Jens Axboe
2008-10-19 9:01 ` Avi Kivity
2008-10-19 18:10 ` Jens Axboe
2008-10-19 18:23 ` Avi Kivity
2008-10-19 19:17 ` M. Warner Losh
2008-10-19 19:31 ` Avi Kivity
2008-10-19 18:24 ` Avi Kivity
2008-10-19 18:36 ` Jens Axboe
2008-10-19 19:11 ` Avi Kivity
2008-10-19 19:30 ` Jens Axboe
2008-10-19 20:16 ` Avi Kivity
2008-10-20 14:14 ` Avi Kivity
2008-10-10 10:03 ` Fabrice Bellard
2008-10-13 16:11 ` Laurent Vivier
2008-10-13 16:58 ` Anthony Liguori
2008-10-13 17:36 ` Jamie Lokier
2008-10-13 17:06 ` [Qemu-devel] " Ryan Harper
2008-10-13 18:43 ` Anthony Liguori
2008-10-14 16:42 ` Avi Kivity
2008-10-13 18:51 ` Laurent Vivier
2008-10-13 19:43 ` Ryan Harper
2008-10-13 20:21 ` Laurent Vivier
2008-10-13 21:05 ` Ryan Harper
2008-10-15 13:10 ` Laurent Vivier
2008-10-16 10:24 ` Laurent Vivier
2008-10-16 13:43 ` Anthony Liguori
2008-10-16 16:08 ` Laurent Vivier
2008-10-17 12:48 ` Avi Kivity
2008-10-17 13:17 ` Laurent Vivier
2008-10-14 10:05 ` Kevin Wolf
2008-10-14 14:32 ` Ryan Harper
2008-10-14 16:37 ` Avi Kivity
2008-10-13 19:00 ` Mark Wagner
2008-10-13 19:15 ` Ryan Harper
2008-10-14 16:49 ` Avi Kivity
2008-10-13 17:58 ` [Qemu-devel] " Rik van Riel
2008-10-13 18:22 ` Jamie Lokier
2008-10-13 18:34 ` Rik van Riel
2008-10-14 1:56 ` Jamie Lokier
2008-10-14 2:28 ` nuitari-qemu
2008-10-28 17:34 ` Ian Jackson
2008-10-28 17:45 ` Anthony Liguori
2008-10-28 17:50 ` Ian Jackson
2008-10-28 18:19 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48F24320.9010201@redhat.com \
--to=avi@redhat.com \
--cc=Laurent.Vivier@bull.net \
--cc=anthony@codemonkey.ws \
--cc=chrisw@redhat.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=markmc@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ryanh@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).