kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, Glauber Costa <glommer@redhat.com>
Subject: Re: [PATCH 2/2] KVM: Prevent internal slots from being COWed
Date: Tue, 6 Jul 2010 16:45:19 +0200	[thread overview]
Message-ID: <20100706144519.GC16195@random.random> (raw)
In-Reply-To: <4C209BD8.8030104@redhat.com>

On Tue, Jun 22, 2010 at 02:17:44PM +0300, Avi Kivity wrote:
> On 06/21/2010 11:23 PM, Marcelo Tosatti wrote:
> > On Mon, Jun 21, 2010 at 11:18:13AM +0300, Avi Kivity wrote:
> >    
> >> If a process with a memory slot is COWed, the page will change its address
> >> (despite having an elevated reference count).  This breaks internal memory
> >> slots which have their physical addresses loaded into vmcs registers (see
> >> the APIC access memory slot).
> >>
> >> Signed-off-by: Avi Kivity<avi@redhat.com>
> >> ---
> >>   arch/x86/kvm/x86.c |    5 +++++
> >>   1 files changed, 5 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 33156a3..d9a33e6 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -5633,6 +5633,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >>   				int user_alloc)
> >>   {
> >>   	int npages = memslot->npages;
> >> +	int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
> >> +
> >> +	/* Prevent internal slot pages from being moved by fork()/COW. */
> >> +	if (memslot->id>= KVM_MEMORY_SLOTS)
> >> +		map_flags = MAP_SHARED | MAP_ANONYMOUS;
> >>
> >>   	/*To keep backward compatibility with older userspace,
> >>   	 *x86 needs to hanlde !user_alloc case.
> >>      
> > Forgot to use map_flags below.
> >
> >    
> 
> Ouch, corrected and applied.

I think I tracked down the corruption during swapping with THP enabled
to this bug. The real bug is that the mmu notifier fires (it's not
like fork isn't covered by the mmu notifier) but KVM ignores it and
keeps writing to the old location. Shared pages can also be swapped
out and if the dirty bit on the spte isn't set faster than the time it
takes to write the page, the page can be relocated. Basically if
do_swap_page decides to make a copy of the page (like in ksm-swapin
case, erratically triggered now even for non-ksm pages in current
upstream by a bug in the new anon-vma code which I fixed already in
aa.git)  and the dirty bit on the spte is ignored because of lumpy
reclaim (which also I removed now and that makes the bug stop
triggering too), eventually what happens is that the page is unmapped
and during swapin it is relocated to a different page.

The bug really is in KVM that ignores the mmu_notifier_invalidate_page
and keeps using the old page.

It should have rang a bell that fork was breaking anything... fork
must not break anything since KVM is mmu notifier
capable. MADV_DONTFORK must only be a performance optimization
now. And the above change should be unnecessary (and I doubt the above
really fixes the swapping case as tmpfs can also be swapped out, at
least unless the page is pinned).

The way I'd like to fix it is to allocate those magic pages by hand
and not add them to lru and have page->mapping null. Then they will
remain pinned in the pte, and all problems will go away.

The other way would be to have a lookup hashtable that when mmu
notifier invalidate fires, we lookup the hash and we call a method to
have kvm stop using the page. And then something is needed during the
page fault, if the gfn in the hash is paged-in another method is
called to set the magic host user address to point the new pfn.

I think pinning the pages and allocating them by hand is simpler,
hopefully we can do it in a way that munmap will collect them
automatically like now.

  reply	other threads:[~2010-07-06 14:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-21  8:18 [PATCH 0/2] Fix failures caused by fork() interaction with internal slots Avi Kivity
2010-06-21  8:18 ` [PATCH 1/2] KVM: Keep slot ID in memory slot structure Avi Kivity
2010-06-21  8:18 ` [PATCH 2/2] KVM: Prevent internal slots from being COWed Avi Kivity
2010-06-21 20:23   ` Marcelo Tosatti
2010-06-22 11:17     ` Avi Kivity
2010-07-06 14:45       ` Andrea Arcangeli [this message]
2010-07-06 14:53         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100706144519.GC16195@random.random \
    --to=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).