Re: KVM: MMU: add KVM_ZAP_GFN ioctl

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <marcelo@kvack.org>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Avi Kivity <avi@qumranet.com>
Subject: Re: KVM: MMU: add KVM_ZAP_GFN ioctl
Date: Fri, 21 Mar 2008 18:23:41 -0300	[thread overview]
Message-ID: <20080321212341.GA3897@dmt> (raw)
In-Reply-To: <20080321155650.GG7822@v2.random>

On Fri, Mar 21, 2008 at 04:56:50PM +0100, Andrea Arcangeli wrote:
> On Fri, Mar 21, 2008 at 10:37:00AM -0300, Marcelo Tosatti wrote:
> > This is not the final put_page().
> > 
> > Remote TLB's are flushed here, after rmap_remove:
> > 
> > +       if (nuked)
> > +               kvm_flush_remote_tlbs(kvm);
> > 
> > This ioctl is called before zap_page_range() is executed through
> > sys_madvise(MADV_DONTNEED) to remove the page in question.
> > 
> > We know that the guest will not attempt to fault in the gfn because
> > the virtio balloon driver is synchronous (it will only attempt to
> > release that page back to the guest OS once rmap_nuke+zap_page_range has
> > finished).
> > 
> > Can you be more verbose?
> 
> Sure.
> 
> 1) even if you run madvise(MADV_DONTNEED) after KVM_ZAP_GFN, the anon
>    page can be released by the VM at any time without any kvm-aware
>    lock (there will be a swap reference to it, no any more page_count
>    references leading to memory corruption in the host in presence of
>    memory pressure). This is purely theoretical of course, not sure if
>    timings or probabilities allows for reproducing this in real life.

If there are any active shadow mappings to a page there is a guarantee
that there is a valid linux pte mapping pointing at it. So page_count ==
1 + nr_sptes.

So the theoretical race you're talking about is:


    CPU0							CPU1

    spte = rmap_next(kvm, rmapp, NULL);
    while (spte) {
            BUG_ON(!spte);
            BUG_ON(!(*spte & PT_PRESENT_MASK));
            rmap_printk("rmap_nuke: spte %p %llx\n", spte, *spte);
            rmap_remove(kvm, spte);
            set_shadow_pte(spte, shadow_trap_nonpresent_pte);
            nuked = 1;
            spte = rmap_next(kvm, rmapp, spte);
    }
    							<---------- try_to_unmap_one()
								    page is now free
								    page allocated for other
								    purposes
   if (nuked)
       kvm_flush_remote_tlbs(kvm);

And some other VCPU with the TLB cached writes to the now freed (and
possibly allocated to another purpose) page.

This case is safe because the path that frees a pte and subsequently
a page will take care of flushing the TLB of any remote CPU's that
possibly have it cached (before freeing the page, of course).
ptep_clear_flush->flush_tlb_page.

Am I missing something?

> 2) not sure what you mean with synchronous, do you mean single
>    threaded? I can't see how it can be single threaded (does
>    ballooning stops all other vcpus?). 

No, I mean synchronous as in that no other vcpu will attempt to fault
that _particular gfn_ in between KVM_ZAP_GFN and madvise.

>  Why are you taking the mmu_lock around rmap_nuke if no other vcpu
>  can take any page fault and call into get_user_pages in between
>  KVM_ZAP_GFN and madvise?

Other vcpu's can take page faults and call into get_user_pages, but not
for the gfn KVM_ZAP_GFN is operating on, because it has been allocated
by the balloon driver.

So we need mmu_lock to protect against concurrent shadow page and rmap
operations.

> As far as I
>    can tell the only possible safe ordering is madvise; KVM_ZAP_GFN,
>    which is emulating the mmu notifier behavior incidentally.
> 
> Note that the rmap_remove smp race (also note here smp race means
> smp-host race, it will trigger even if guest is UP) might be a generic
> issue with the rmap_remove logic. I didn't analyze all the possible
> rmap_remove callers yet (this was in my todo list), I just made sure
> that my code would be smp safe.

As detailed above, we have a guarantee that there is a live linux pte
by the time rmap_remove() nukes a shadow pte.

> > By the way, I don't see invalidate_begin/invalidate_end hooks in the KVM 
> > part of MMU notifiers V9 patch? (meaning that zap_page_range will not zap
> > the spte's for the pages in question).
> 
> range_begin isn't needed. range_begin is needed only by secondary mmu
> drivers that aren't reference counting the pages. The _end callback is
> below. It could be improved to skip the whole range in a single browse
> of the memslots instead of browsing it for each page in the range. The
> mmu notifiers aren't merged and this code may still require changes in
> terms of API if EMM is merged instead of #v9 (hope not), so I tried to
> keep it simple.

Oh, I missed that. Nice.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

next prev parent reply	other threads:[~2008-03-21 21:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 22:29 KVM: MMU: add KVM_ZAP_GFN ioctl Marcelo Tosatti
2008-03-20 12:09 ` Avi Kivity
2008-03-21 11:42   ` Andrea Arcangeli
2008-03-21 13:37     ` Marcelo Tosatti
2008-03-21 15:56       ` Andrea Arcangeli
2008-03-21 21:23         ` Marcelo Tosatti [this message]
2008-03-24  6:54           ` Andrea Arcangeli
2008-03-26 12:31             ` Andrea Arcangeli
2008-03-23  8:50       ` Avi Kivity
2008-03-23 20:23         ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080321212341.GA3897@dmt \
    --to=marcelo@kvack.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.