From: Marcelo Tosatti <mtosatti@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
kvm@vger.kernel.org, takuya.yoshikawa@gmail.com
Subject: Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
Date: Mon, 7 May 2012 23:25:13 -0300 [thread overview]
Message-ID: <20120508022513.GA26114@amt.cnet> (raw)
In-Reply-To: <4FA291F5.4090408@redhat.com>
On Thu, May 03, 2012 at 05:11:01PM +0300, Avi Kivity wrote:
> On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
> > On 05/03/2012 07:22 PM, Avi Kivity wrote:
> >
> > > Currently we flush the TLB while holding mmu_lock. This
> > > increases the lock hold time by the IPI round-trip time, increasing
> > > contention, and makes dropping the lock (for latency reasons) harder.
> > >
> > > This patch changes TLB management to be usable locklessly, introducing
> > > the following APIs:
> > >
> > > kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
> > > kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
> > > dirty
> > >
> > > These APIs can be used without holding mmu_lock (though if the TLB
> > > became stale due to shadow page table modifications, typically it
> > > will need to be called with the lock held to prevent other threads
> > > from seeing the modified page tables with the TLB unmarked and unflushed)/
> > >
> > > Signed-off-by: Avi Kivity <avi@redhat.com>
> > > ---
> > > Documentation/virtual/kvm/locking.txt | 14 ++++++++++++++
> > > arch/x86/kvm/paging_tmpl.h | 4 ++--
> > > include/linux/kvm_host.h | 22 +++++++++++++++++++++-
> > > virt/kvm/kvm_main.c | 29 ++++++++++++++++++++---------
> > > 4 files changed, 57 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> > > index 3b4cd3b..f6c90479 100644
> > > --- a/Documentation/virtual/kvm/locking.txt
> > > +++ b/Documentation/virtual/kvm/locking.txt
> > > @@ -23,3 +23,17 @@ Arch: x86
> > > Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
> > > - tsc offset in vmcb
> > > Comment: 'raw' because updating the tsc offsets must not be preempted.
> > > +
> > > +3. TLB control
> > > +--------------
> > > +
> > > +The following APIs should be used for TLB control:
> > > +
> > > + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> > > + either guest or host page tables.
> > > + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> > > + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> > > +
> > > +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> > > +used while holding mmu_lock if it is called due to host page table changes
> > > +(contrast to guest page table changes).
> >
> >
> > In these patches, it seems that kvm_mark_tlb_dirty is always used
> > under the protection of mmu-lock, yes?
>
> Correct. It's possible we'll find a use outside mmu_lock, but this
> isn't likely.
>
> > If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
> > out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
> >
> > If it is so, dirtied_count/flushed_count need not be atomic.
>
> But we always mark with mmu_lock held.
>
> >
> > And, it seems there is a bug:
> >
> > VCPU 0 VCPU 1
> >
> > hold mmu-lock
> > zap spte which points to 'gfn'
> > mark_tlb_dirty
> > release mmu-lock
> > hold mmu-lock
> > rmap_write-protect:
> > see no spte pointing to gfn
> > tlb did not be flushed
> > release mmu-lock
> >
> > write gfn by guest
> > OOPS!!!
> >
> > kvm_cond_flush_remote_tlbs
> >
> > We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
> > unconditionally?
>
> Yes, that's the point. We change
>
> spin_lock(mmu_lock)
> conditionally flush the tlb
> spin_unlock(mmu_lock)
>
> to
>
> spin_lock(mmu_lock)
> conditionally mark the tlb as dirty
> spin_unlock(mmu_lock)
> kvm_cond_flush_remote_tlbs()
>
> but that means the entire codebase has to be converted.
Is there any other site which expects sptes and TLBs to be in sync,
other than rmap_write_protect?
Please convert the flush_remote_tlbs at the end of set_spte (RW->RO) to
mark_dirty.
Looks good in general (patchset is incomplete though). One thing that
is annoying is that there is no guarantee of progress for flushed_count
increment (it can, in theory, always race with a mark_dirty). But since
that would be only a performance, and not correctness, aspect, it is
fine.
It has the advantage that it requires code to explicitly document where
the TLB must be flushed and the sites which expect sptes to be in sync
with TLBs.
next prev parent reply other threads:[~2012-05-08 2:29 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
2012-05-03 11:22 ` [PATCH 1/4] KVM: Add APIs for unlocked " Avi Kivity
2012-05-03 13:23 ` Xiao Guangrong
2012-05-03 14:11 ` Avi Kivity
2012-05-07 7:06 ` Xiao Guangrong
2012-05-07 7:59 ` Avi Kivity
2012-05-08 1:55 ` Marcelo Tosatti
2012-05-08 9:09 ` Avi Kivity
2012-05-08 13:50 ` Marcelo Tosatti
2012-05-08 9:09 ` Avi Kivity
2012-05-08 2:25 ` Marcelo Tosatti [this message]
2012-05-08 12:39 ` Avi Kivity
2012-05-09 21:03 ` Marcelo Tosatti
2012-05-21 14:45 ` Avi Kivity
2012-05-03 11:23 ` [PATCH 2/4] KVM: Flush TLB in mmu notifier without holding mmu_lock Avi Kivity
2012-05-03 11:23 ` [PATCH 3/4] KVM: Flush TLB in FNAME(invlpg) " Avi Kivity
2012-05-03 11:23 ` [PATCH 4/4] KVM: Flush TLB in change_pte mmu notifier " Avi Kivity
2012-05-08 1:25 ` [PATCH 0/4] Unlocked TLB flush Marcelo Tosatti
2012-05-08 1:27 ` Marcelo Tosatti
2012-05-08 10:51 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120508022513.GA26114@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=takuya.yoshikawa@gmail.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).