From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: "David S. Ahern" <daahern@cisco.com>, kvm@vger.kernel.org
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)
Date: Thu, 29 May 2008 13:01:06 +0300 [thread overview]
Message-ID: <483E7EE2.8010508@qumranet.com> (raw)
In-Reply-To: <20080528170410.GC8086@duo.random>
Andrea Arcangeli wrote:
>
>>> - set up kmap to point at pte
>>> - test_and_clear_bit(pte)
>>> - kunmap
>>>
>>> From kvm's point of view this looks like
>>>
>>> - several accesses to set up the kmap
>>>
>
> Hmm, the kmap establishment takes a single guest operation in the
> fixmap area. That's a single write to the pte, to write a pte_t 8/4
> byte large region (PAE/non-PAE). The same pte_t is then cleared and
> flushed out of the tlb with a cpu-local invlpg during kunmap_atomic.
>
> I count 1 write here so far.
>
>
No, two:
static inline void set_pte(pte_t *ptep, pte_t pte)
{
ptep->pte_high = pte.pte_high;
smp_wmb();
ptep->pte_low = pte.pte_low;
}
>>> - if these accesses trigger flooding, we will have to tear down the
>>> shadow for this page, only to set it up again soon
>>>
>
> So the shadow mapping the fixmap area would be tear down by the
> flooding.
>
Before we started patching this, yes.
> Or is the shadow corresponding to the real user pte pointed by the
> fixmap, that is unshadowed by the flooding, or both/all?
>
>
After we started patching this, no, but with per-page-pte-history, yes
(correctly).
>>> - an access to the pte (emulted)
>>>
>
> Here I count the second write and this isn't done on the fixmap area
> like the first write above, but this is a write to the real user pte,
> pointed by the fixmap. So if this is emulated it means the shadow of
> the user pte pointing to the real data page is still active.
>
Right. But if we are scanning a page table linearly, it should be
unshadowed.
>
>>> - if this access _doesn't_ trigger flooding, we will have 512 unneeded
>>> emulations. The pte is worthless anyway since the accessed bit is clear
>>> (so we can't set up a shadow pte for it)
>>> - this bug was fixed
>>>
>
> You mean the accessed bit on fixmap pte used by kmap? Or the user pte
> pointed by the fixmap pte?
>
The user pte. After guest code runs test_and_clear_bit(accessed_bit,
ptep), we can't shadow that pte (all shadowed ptes must have the
accessed bit set in the corresponding guest pte, similar to how a tlb
entry can only exist if the accessed bit is set).
>
>>> - an access to tear down the kmap
>>>
>
> Yep, pte_clear on the fixmap pte_t followed by an invlpg (if that
> matters).
>
Looking at the code, that only happens if CONFIG_HIGHMEM_DEBUG is set.
> I think what we should aim for is to quickly reach this condition:
>
> 1) always keep the fixmap/kmap pte_t shadowed and emulate the
> kmap/kunmap access so the test_and_clear_young done on the user pte
> doesn't require to re-establish the spte representing the fixmap
> virtual address. If we don't emulate fixmap we'll have to
> re-establish the spte during the write to the user pte, and
> tear it down again during kunmap_atomic. So there's not much doubt
> fixmap access emulation is worth it.
>
That is what is done by current HEAD.
418c6952ba9fd379059ed325ea5a3efe904fb7fd is responsible.
Note that there is an alternative: allow the kmap pte to be unshadowed,
and instead emulate the access through that pte (i.e. emulate the btc
instruction). I don't think it's worth it though because it hurts other
users of the fixmap page.
> 2) get rid of the user pte shadow mapping pointing to the user data so
> the test_and_clear of the young bitflag on the user pte will not be
> emulated and it'll run at full CPU speed through the shadow pte
> mapping corresponding to the fixmap virtual address
>
That's what per-page-pte-history is supposed to do. The first few
accesses are emulated, the next will be native.
It's still not full speed as the kmap setup has to be emulated (twice).
One possible optimization is that if we see the first part of the kmap
instantiation, we emulate a few more instructions before returning to
the guest. Xen does this IIRC.
> kscand pattern is the same as running mprotect on a 32bit 2.6
> kernel so it sounds worth optimizing for it, even if kscand may be
> unfixable without killall -STOP kscand or VM fixes to guest.
>
>
I'm no longer sure the access pattern is sequential, since I see
kmap_atomic() will not recreate the pte if its value has not changed
(unless HIGHMEM_DEBUG).
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2008-05-29 10:01 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-16 0:15 performance with guests running 2.4 kernels (specifically RHEL3) David S. Ahern
2008-04-16 8:46 ` Avi Kivity
2008-04-17 21:12 ` David S. Ahern
2008-04-18 7:57 ` Avi Kivity
2008-04-21 4:31 ` David S. Ahern
2008-04-21 9:19 ` Avi Kivity
2008-04-21 17:07 ` David S. Ahern
2008-04-22 20:23 ` David S. Ahern
2008-04-23 8:04 ` Avi Kivity
2008-04-23 15:23 ` David S. Ahern
2008-04-23 15:53 ` Avi Kivity
2008-04-23 16:39 ` David S. Ahern
2008-04-24 17:25 ` David S. Ahern
2008-04-26 6:43 ` Avi Kivity
2008-04-26 6:20 ` Avi Kivity
2008-04-25 17:33 ` David S. Ahern
2008-04-26 6:45 ` Avi Kivity
2008-04-28 18:15 ` Marcelo Tosatti
2008-04-28 23:45 ` David S. Ahern
2008-04-30 4:18 ` David S. Ahern
2008-04-30 9:55 ` Avi Kivity
2008-04-30 13:39 ` David S. Ahern
2008-04-30 13:49 ` Avi Kivity
2008-05-11 12:32 ` Avi Kivity
2008-05-11 13:36 ` Avi Kivity
2008-05-13 3:49 ` David S. Ahern
2008-05-13 7:25 ` Avi Kivity
2008-05-14 20:35 ` David S. Ahern
2008-05-15 10:53 ` Avi Kivity
2008-05-17 4:31 ` David S. Ahern
[not found] ` <482FCEE1.5040306@qumranet.com>
[not found] ` <4830F90A.1020809@cisco.com>
2008-05-19 4:14 ` [kvm-devel] " David S. Ahern
2008-05-19 14:27 ` Avi Kivity
2008-05-19 16:25 ` David S. Ahern
2008-05-19 17:04 ` Avi Kivity
2008-05-20 14:19 ` Avi Kivity
2008-05-20 14:34 ` Avi Kivity
2008-05-22 22:08 ` David S. Ahern
2008-05-28 10:51 ` Avi Kivity
2008-05-28 14:13 ` David S. Ahern
2008-05-28 14:35 ` Avi Kivity
2008-05-28 19:49 ` David S. Ahern
2008-05-29 6:37 ` Avi Kivity
2008-05-28 14:48 ` Andrea Arcangeli
2008-05-28 14:57 ` Avi Kivity
2008-05-28 15:39 ` David S. Ahern
2008-05-29 11:49 ` Avi Kivity
2008-05-29 12:10 ` Avi Kivity
2008-05-29 13:49 ` David S. Ahern
2008-05-29 14:08 ` Avi Kivity
2008-05-28 15:58 ` Andrea Arcangeli
2008-05-28 15:37 ` Avi Kivity
2008-05-28 15:43 ` David S. Ahern
2008-05-28 17:04 ` Andrea Arcangeli
2008-05-28 17:24 ` David S. Ahern
2008-05-29 10:01 ` Avi Kivity [this message]
2008-05-29 14:27 ` Andrea Arcangeli
2008-05-29 15:11 ` David S. Ahern
2008-05-29 15:16 ` Avi Kivity
2008-05-30 13:12 ` Andrea Arcangeli
2008-05-31 7:39 ` Avi Kivity
2008-05-29 16:42 ` David S. Ahern
2008-05-31 8:16 ` Avi Kivity
2008-06-02 16:42 ` David S. Ahern
2008-06-05 8:37 ` Avi Kivity
2008-06-05 16:20 ` David S. Ahern
2008-06-06 16:40 ` Avi Kivity
2008-06-19 4:20 ` David S. Ahern
2008-06-22 6:34 ` Avi Kivity
2008-06-23 14:09 ` David S. Ahern
2008-06-25 9:51 ` Avi Kivity
2008-04-30 13:56 ` Daniel P. Berrange
2008-04-30 14:23 ` David S. Ahern
2008-04-23 8:03 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=483E7EE2.8010508@qumranet.com \
--to=avi@qumranet.com \
--cc=andrea@qumranet.com \
--cc=daahern@cisco.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox