All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	avi@redhat.com, kvm@vger.kernel.org,
	xiaoguangrong@linux.vnet.ibm.com
Subject: Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
Date: Tue, 14 Feb 2012 17:43:42 -0200	[thread overview]
Message-ID: <20120214194342.GA24117@amt.cnet> (raw)
In-Reply-To: <20120214185356.GQ9440@redhat.com>

On Tue, Feb 14, 2012 at 07:53:56PM +0100, Andrea Arcangeli wrote:
> On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote:
> > The problem the patch is fixing is not related to page freeing, but
> > rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d
> 
> Can't find the commit on kvm.git.

Sorry, we got kvm.git out of sync. But you can see an equivalent below.

> 
> > (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"):
> > 
> > 
> > During protecting pages for dirty logging, other threads may also try
> > to protect a page in mmu_sync_children() or kvm_mmu_get_page().
> > 
> > In such a case, because get_dirty_log releases mmu_lock before flushing
> > TLB's, the following race condition can happen:
> > 
> >   A (get_dirty_log)     B (another thread)
> > 
> >   lock(mmu_lock)
> >   clear pte.w
> >   unlock(mmu_lock)
> >                         lock(mmu_lock)
> >                         pte.w is already cleared
> >                         unlock(mmu_lock)
> >                         skip TLB flush
> 
> Not sure which tree it is, but in kvm and upstream I see an
> unconditional tlb flush here, not skip (both
> kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I
> assume this has been updated in your tree to eb conditional.

        if (!direct) {
                if (rmap_write_protect(vcpu->kvm, gfn))
                        kvm_flush_remote_tlbs(vcpu->kvm);


> Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock
> in the kvm_mmu_rmap_write_protect case (like in quoted description),
> so two write_protect_slot in parallel against each other may not be
> ok, but that may be enforced by design if qemu won't ever call that
> ioctl from two different userland threads (it doesn't sounds security
> related so it should be ok to enforce its safety by userland design).

Yes, here is the fix:

http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=02b48d00d7f1853bdf8a06da19ca5413ebe334c6

This is an equivalent of 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d.

> 
> >                         return
> >   ...
> >   TLB flush
> > 
> > Though thread B assumes the page has already been protected when it
> > returns, the remaining TLB entry will break that assumption.
> 
> Now I get the question of why not running the TLB flush inside the
> mmu_lock only if the spte was writable :).
> 
> kvm_mmu_get_page as long as it only runs in the context of a kvm page
> fault is ok, because the page fault would be inhibited by the mmu
> notifier invalidates, so maybe it's safe.

Ah, perhaps, but this was not taken into account before. Can you confirm
this is the case so we can revert the invalidate_page patch?

> mmu_sync_children seems to have a problem instead, in your tree
> get_dirty_log also has an issue if it has been updated to skip the
> flush on readonly sptes, like I guess.
> 
> Interesting how the spte is already non present, the page is just
> being freed shortly later, but yet we still need to trigger write
> faults synchronously and prevent other CPUs in guest mode to further
> modify the page to avoid losing dirty bits updates or updates on
> pagetables that maps pagetables in the not NPT/EPT case. If the page
> was really only going to be freed it would be ok if the other cpus
> would still write to it for a little longer until the page was freed.
> 
> Like I wrote in previous email, I was thinking if we'd change the mmu
> notifier methods to do an unconditional flush, then every other flush
> could also run outside of the mmu_lock. But then I didn't think enough
> about this to be sure. My guess is we could move all flushes outside
> the mmu_lock if we stop flushling the tlb conditonally to the current
> spte values. It'd clearly be slower for an UP guest though :). Large
> SMP guests might benefit, if that is feasible at all... It depends how
> problematic the mmu_lock is on the large SMP guests and how much we're
> saving by doing conditional TLB flushes.

Also it should not be necessary for these flushes to be inside mmu_lock
on EPT/NPT case (since there is no write protection there). But it would
be awkward to differentiate the unlock position based on EPT/NPT.


  reply	other threads:[~2012-02-14 19:46 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10  6:28 [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock Takuya Yoshikawa
2012-02-10  6:29 ` [PATCH 2/2] KVM: MMU: Flush TLBs only once in invlpg() " Takuya Yoshikawa
2012-02-10  6:55   ` Xiao Guangrong
2012-02-10  7:21     ` Takuya Yoshikawa
2012-02-10  7:42       ` Xiao Guangrong
2012-02-14  4:36         ` Takuya Yoshikawa
2012-02-14  4:56           ` Takuya Yoshikawa
2012-02-14 17:21             ` Andrea Arcangeli
2012-02-10  7:52 ` [PATCH 1/2] KVM: mmu_notifier: Flush TLBs " Xiao Guangrong
2012-02-13  6:00   ` Takuya Yoshikawa
2012-02-14 17:27   ` Andrea Arcangeli
2012-02-10 17:26 ` Marcelo Tosatti
2012-02-14 17:10 ` Andrea Arcangeli
2012-02-14 17:29   ` Marcelo Tosatti
2012-02-14 18:53     ` Andrea Arcangeli
2012-02-14 19:43       ` Marcelo Tosatti [this message]
2012-02-15  9:18         ` Avi Kivity
2012-02-15  9:47           ` Avi Kivity
2012-02-15 11:37             ` Xiao Guangrong
2012-02-15 14:07               ` Avi Kivity
2012-02-15 19:16                 ` Andrea Arcangeli
2012-02-16  4:50                 ` Xiao Guangrong
2012-02-16 11:57                   ` Avi Kivity
2012-02-17  2:36                     ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120214194342.GA24117@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.