Re: [RFC] KVM: x86: conditionally acquire/release slots_lock on entry/exit

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, Gleb Natapov <gleb@redhat.com>
Subject: Re: [RFC] KVM: x86: conditionally acquire/release slots_lock on entry/exit
Date: Thu, 10 Sep 2009 19:30:03 -0300	[thread overview]
Message-ID: <20090910223003.GA658@amt.cnet> (raw)
In-Reply-To: <4A977E3C.7050604@redhat.com>

On Fri, Aug 28, 2009 at 09:50:36AM +0300, Avi Kivity wrote:
> On 08/28/2009 01:59 AM, Marcelo Tosatti wrote:
>> On Thu, Aug 27, 2009 at 07:27:48PM +0300, Avi Kivity wrote:
>>    
>>> On 08/27/2009 06:54 PM, Marcelo Tosatti wrote:
>>>      
>>>> perf report shows heavy overhead from down/up of slots_lock.
>>>>
>>>> Attempted to remove slots_lock by having vcpus stop on a synchronization
>>>> point, but this introduced further complexity (a vcpu can be scheduled
>>>> out before reaching the synchronization point, and can sched back in at
>>>> points which are slots_lock protected, etc).
>>>>
>>>> This patch changes vcpu_enter_guest to conditionally release/acquire
>>>> slots_lock in case a vcpu state bit is set.
>>>>
>>>> vmexit performance improves by 5-10% on UP guest.
>>>>
>>>>        
>>> Sorry, it looks pretty complex.
>>>      
>> Why?
>>    
>
> There's a new locking protocol in there.  I prefer sticking with the  
> existing kernel plumbing, or it gets more and more difficult knowing who  
> protects what and in what order you can do things.
>
>>> Have you considered using srcu? It seems to me down/up_read() can
>>> be replaced by srcu_read_lock/unlock(), and after proper conversion
>>> of memslots and io_bus to rcu_assign_pointer(), we can just add
>>> synchronize_srcu() immediately after changing stuff (of course
>>> mmu_lock still needs to be held when updating slots).
>>>      
>> I don't see RCU as being suitable because in certain operations you
>> want to stop writers (on behalf of vcpus), do something, and let them
>> continue afterwards. The dirty log, for example. Or any operation that
>> wants to modify lockless vcpu specific data.
>>    
>
> kvm_flush_remote_tlbs() (which you'd call after mmu operations), will  
> get cpus out of guest mode, and synchronize_srcu() will wait for them to  
> drop the srcu "read lock".  So it really happens naturally: do an RCU  
> update, send some request to all vcpus, synchronize_srcu(), done.
>
>> Also, synchronize_srcu() is limited to preemptible sections.
>>
>> io_bus could use RCU, but I think being able to stop vcpus is also a
>> different requirement. Do you have any suggestion on how to do it in a
>> better way?
>>    
>
> We don't need to stop vcpus, just kick them out of guest mode to let  
> them notice the new data.  SRCU does that well.

Two problems:

1. The removal of memslots/aliases and zapping of mmu (to remove any
shadow pages with stale sp->gfn) must be atomic from the POV of the
pagefault path. Otherwise something like this can happen:

fault path					set_memory_region

walk_addr fetches and validates
table_gfns
						delete memslot
						synchronize_srcu

fetch creates shadow
page with nonexistant sp->gfn

OR

mmu_alloc_roots path				set_memory_region
						
						delete memslot
root_gfn = vcpu->arch.cr3 << PAGE_SHIFT
mmu_check_root(root_gfn)			synchronize_rcu
kvm_mmu_get_page()
						kvm_mmu_zap_all

Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see
shadow pages with stale gfn.

But, if you still think its worthwhile to use RCU, at least handling
gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry
invalidation to set_memory_region path will be necessary (so that
gfn_to_pfn validation, in the fault path, is discarded in case
of memslot/alias update).

2. Another complication is that memslot creation and kvm_iommu_map_pages
are not atomic.

create memslot
synchronize_srcu
				<----- vcpu grabs gfn reference without
				       iommu mapping.
kvm_iommu_map_pages

Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn
helper) to use base_gfn,npages,hva information from somewhere else other
than visible kvm->memslots (so that when the slot becomes visible its
already iommu mapped).

So it appears to me using RCU introduces more complications / subtle
details than mutual exclusion here. The new request bit which the
original patch introduces is limited to enabling/disabling conditional
acquision of slots_lock (calling it a "new locking protocol" is unfair)
to improve write acquision latency.

Better ideas/directions welcome.

next prev parent reply	other threads:[~2009-09-10 22:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-27  1:20 [patch 0/5] unify remote request and kvm_vcpu_kick IPI mechanism Marcelo Tosatti
2009-08-27  1:20 ` [patch 1/5] KVM: move kvm_vcpu_kick to virt/kvm/kvm_main.c Marcelo Tosatti
2009-08-27  1:20 ` [patch 2/5] KVM: reintroduce guest mode bit and unify remote request code Marcelo Tosatti
2009-08-27  8:15   ` Avi Kivity
2009-08-27 12:45     ` Marcelo Tosatti
2009-08-27 13:24       ` Avi Kivity
2009-08-27 14:07         ` Marcelo Tosatti
2009-08-28  7:06           ` Avi Kivity
2009-08-28  7:22             ` Avi Kivity
2009-08-27  8:25   ` Avi Kivity
2009-08-27 12:58     ` Marcelo Tosatti
2009-08-27  1:20 ` [patch 3/5] KVM: switch REQ_TLB_FLUSH/REQ_MMU_RELOAD to kvm_vcpus_request Marcelo Tosatti
2009-08-27  1:20 ` [patch 4/5] KVM: remove make_all_cpus_request Marcelo Tosatti
2009-08-27  1:20 ` [patch 5/5] KVM: x86: drop duplicat kvm_flush_remote_tlbs Marcelo Tosatti
2009-08-27 15:54 ` [RFC] KVM: x86: conditionally acquire/release slots_lock on entry/exit Marcelo Tosatti
2009-08-27 16:27   ` Avi Kivity
2009-08-27 22:59     ` Marcelo Tosatti
2009-08-28  6:50       ` Avi Kivity
2009-09-10 22:30         ` Marcelo Tosatti [this message]
2009-09-13 15:42           ` Avi Kivity
2009-09-13 16:26             ` Paul E. McKenney
2009-09-13 22:49               ` Marcelo Tosatti
2009-09-14  5:03                 ` Avi Kivity
2009-09-14  7:17                   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090910223003.GA658@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.