Re: KVM: MMU: add KVM_ZAP_GFN ioctl

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <marcelo@kvack.org>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Avi Kivity <avi@qumranet.com>
Subject: Re: KVM: MMU: add KVM_ZAP_GFN ioctl
Date: Fri, 21 Mar 2008 18:23:41 -0300	[thread overview]
Message-ID: <20080321212341.GA3897@dmt> (raw)
In-Reply-To: <20080321155650.GG7822@v2.random>

On Fri, Mar 21, 2008 at 04:56:50PM +0100, Andrea Arcangeli wrote:
> On Fri, Mar 21, 2008 at 10:37:00AM -0300, Marcelo Tosatti wrote:
> > This is not the final put_page().
> > 
> > Remote TLB's are flushed here, after rmap_remove:
> > 
> > +       if (nuked)
> > +               kvm_flush_remote_tlbs(kvm);
> > 
> > This ioctl is called before zap_page_range() is executed through
> > sys_madvise(MADV_DONTNEED) to remove the page in question.
> > 
> > We know that the guest will not attempt to fault in the gfn because
> > the virtio balloon driver is synchronous (it will only attempt to
> > release that page back to the guest OS once rmap_nuke+zap_page_range has
> > finished).
> > 
> > Can you be more verbose?
> 
> Sure.
> 
> 1) even if you run madvise(MADV_DONTNEED) after KVM_ZAP_GFN, the anon
>    page can be released by the VM at any time without any kvm-aware
>    lock (there will be a swap reference to it, no any more page_count
>    references leading to memory corruption in the host in presence of
>    memory pressure). This is purely theoretical of course, not sure if
>    timings or probabilities allows for reproducing this in real life.

If there are any active shadow mappings to a page there is a guarantee
that there is a valid linux pte mapping pointing at it. So page_count ==
1 + nr_sptes.

So the theoretical race you're talking about is:


    CPU0							CPU1

    spte = rmap_next(kvm, rmapp, NULL);
    while (spte) {
            BUG_ON(!spte);
            BUG_ON(!(*spte & PT_PRESENT_MASK));
            rmap_printk("rmap_nuke: spte %p %llx\n", spte, *spte);
            rmap_remove(kvm, spte);
            set_shadow_pte(spte, shadow_trap_nonpresent_pte);
            nuked = 1;
            spte = rmap_next(kvm, rmapp, spte);
    }
    							<---------- try_to_unmap_one()
								    page is now free
								    page allocated for other
								    purposes
   if (nuked)
       kvm_flush_remote_tlbs(kvm);

And some other VCPU with the TLB cached writes to the now freed (and
possibly allocated to another purpose) page.

This case is safe because the path that frees a pte and subsequently
a page will take care of flushing the TLB of any remote CPU's that
possibly have it cached (before freeing the page, of course).
ptep_clear_flush->flush_tlb_page.

Am I missing something?

> 2) not sure what you mean with synchronous, do you mean single
>    threaded? I can't see how it can be single threaded (does
>    ballooning stops all other vcpus?). 

No, I mean synchronous as in that no other vcpu will attempt to fault
that _particular gfn_ in between KVM_ZAP_GFN and madvise.

>  Why are you taking the mmu_lock around rmap_nuke if no other vcpu
>  can take any page fault and call into get_user_pages in between
>  KVM_ZAP_GFN and madvise?

Other vcpu's can take page faults and call into get_user_pages, but not
for the gfn KVM_ZAP_GFN is operating on, because it has been allocated
by the balloon driver.

So we need mmu_lock to protect against concurrent shadow page and rmap
operations.

> As far as I
>    can tell the only possible safe ordering is madvise; KVM_ZAP_GFN,
>    which is emulating the mmu notifier behavior incidentally.
> 
> Note that the rmap_remove smp race (also note here smp race means
> smp-host race, it will trigger even if guest is UP) might be a generic
> issue with the rmap_remove logic. I didn't analyze all the possible
> rmap_remove callers yet (this was in my todo list), I just made sure
> that my code would be smp safe.

As detailed above, we have a guarantee that there is a live linux pte
by the time rmap_remove() nukes a shadow pte.

> > By the way, I don't see invalidate_begin/invalidate_end hooks in the KVM 
> > part of MMU notifiers V9 patch? (meaning that zap_page_range will not zap
> > the spte's for the pages in question).
> 
> range_begin isn't needed. range_begin is needed only by secondary mmu
> drivers that aren't reference counting the pages. The _end callback is
> below. It could be improved to skip the whole range in a single browse
> of the memslots instead of browsing it for each page in the range. The
> mmu notifiers aren't merged and this code may still require changes in
> terms of API if EMM is merged instead of #v9 (hope not), so I tried to
> keep it simple.

Oh, I missed that. Nice.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

next prev parent reply	other threads:[~2008-03-21 21:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 22:29 KVM: MMU: add KVM_ZAP_GFN ioctl Marcelo Tosatti
2008-03-20 12:09 ` Avi Kivity
2008-03-21 11:42   ` Andrea Arcangeli
2008-03-21 13:37     ` Marcelo Tosatti
2008-03-21 15:56       ` Andrea Arcangeli
2008-03-21 21:23         ` Marcelo Tosatti [this message]
2008-03-24  6:54           ` Andrea Arcangeli
2008-03-26 12:31             ` Andrea Arcangeli
2008-03-23  8:50       ` Avi Kivity
2008-03-23 20:23         ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080321212341.GA3897@dmt \
    --to=marcelo@kvack.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox