From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Avi Kivity <avi@qumranet.com>, kvm-devel <kvm@vger.kernel.org>
Subject: Re: KVM: MMU: rmap_write_protect() hugepage iteration bug
Date: Sun, 8 Jun 2008 01:48:53 -0300 [thread overview]
Message-ID: <20080608044853.GA1408@dmt.cnet> (raw)
In-Reply-To: <20080608015452.GC8321@duo.random>
Hi Andrea,
On Sun, Jun 08, 2008 at 03:54:52AM +0200, Andrea Arcangeli wrote:
> On Sat, Jun 07, 2008 at 09:27:36PM -0300, Marcelo Tosatti wrote:
> >
> > rmap_next() expects the "spte" argument to be NULL whenever there's only
> > one remaining entry in the descriptor. That is, it was not designed to
> > handle changes in the chain while iterating.
> >
> > This bug cripples rmap_write_protect() so that it won't nuke all
> > writable large mappings to a particular gfn.
>
> Better than before but the patched version will still not nuke all
> writeable sptes (I did exactly the same mistake once).
Why not posting it at the time?
> I found out the trouble the hard way with a trace of ksm kprobe
> wp_notifier faults, and that's this reordering here:
>
> desc->shadow_ptes[i] = desc->shadow_ptes[j];
> desc->shadow_ptes[j] = NULL;
You're right.
> In short there's no way to mix rmap_next and rmap_remove in the same
> loop as rmap_remove will reorder stuff in the array, so using the
> next_spte as pointer will lead to missing the sptes after the
> next_spte that have been reordered in place of the spte just before
> the next_spte. This was quite subtle issue as it's not immediately
> evident the first time you read rmap_remove internals.
>
> rmap_remove invocation requires to restart from scratch (or
> alternatively to avoid restarting from scratch, we'd need to extend
> the rmap methods to provide that functionality, something like
> rmap_next_safe that would be robust against rmap_remove if used in the
> same way you did now with rmap_next).
>
> For now this will fix it for real.
This makes no difference since rmap_next() still can't handle the case
where rmap_remove() left a single entry in the array and "spte" has been
passed as non-NULL:
while (desc) {
for (i = 0; i < RMAP_EXT && desc->shadow_ptes[i]; ++i) {
if (prev_spte == spte)
return desc->shadow_ptes[i];
prev_spte = desc->shadow_ptes[i];
}
desc = desc->more;
}
Other than the BUG_ON()'s at the beginning of the while() loop.
How bad you think restarting is? Unless there are workloads with very
large hugepage chains most of the data should be cached in L1. Also the
fact that a 4K page has just been shadowed inside a large region makes
it unlikely for the same chain to be examined again soon.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index aaccc40..d01d098 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -640,6 +640,7 @@ static void rmap_write_protect(struct kvm *kvm, u64
gfn)
rmap_remove(kvm, spte);
--kvm->stat.lpages;
set_shadow_pte(spte,shadow_trap_nonpresent_pte);
+ spte = NULL;
write_protected = 1;
}
spte = rmap_next(kvm, rmapp, spte);
next prev parent reply other threads:[~2008-06-08 4:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-08 0:27 KVM: MMU: rmap_write_protect() hugepage iteration bug Marcelo Tosatti
2008-06-08 1:54 ` Andrea Arcangeli
2008-06-08 4:48 ` Marcelo Tosatti [this message]
2008-06-08 7:30 ` Avi Kivity
2008-06-08 8:03 ` Avi Kivity
2008-06-08 8:04 ` Avi Kivity
2008-06-08 18:31 ` Andrea Arcangeli
2008-06-08 19:52 ` Marcelo Tosatti
2008-06-08 20:30 ` Andrea Arcangeli
2008-06-09 0:20 ` Marcelo Tosatti
2008-06-09 0:55 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080608044853.GA1408@dmt.cnet \
--to=mtosatti@redhat.com \
--cc=andrea@qumranet.com \
--cc=avi@qumranet.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox