Re: [PATCH RFC] KVM: MMU: Don't use RCU for lockless shadow walking

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH RFC] KVM: MMU: Don't use RCU for lockless shadow walking
Date: Tue, 24 Apr 2012 14:37:32 +0800	[thread overview]
Message-ID: <4F964A2C.7050106@linux.vnet.ibm.com> (raw)
In-Reply-To: <1335197812-32064-1-git-send-email-avi@redhat.com>

On 04/24/2012 12:16 AM, Avi Kivity wrote:

> Using RCU for lockless shadow walking can increase the amount of memory
> in use by the system, since RCU grace periods are unpredictable.  We also
> have an unconditional write to a shared variable (reader_counter), which
> isn't good for scaling.
> 
> Replace that with a scheme similar to x86's get_user_pages_fast(): disable
> interrupts during lockless shadow walk to force the freer
> (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
> processor with interrupts enabled.
> 
> We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
> kvm_flush_remote_tlbs() from avoiding the IPI.
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
> 
> Turned out to be simpler than expected.  However, I think there's a problem
> with make_all_cpus_request() possible reading an incorrect vcpu->cpu.


It seems possible.

Can we fix it by reading vcpu->cpu when the vcpu is in GUEST_MODE or
EXITING_GUEST_MODE (IIRC, in these modes, interrupt is disabled)?

Like:

if (kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
                      cpumask_set_cpu(vcpu->cpu, cpus);

> 
>  arch/x86/include/asm/kvm_host.h |    4 ---
>  arch/x86/kvm/mmu.c              |   61 +++++++++++----------------------------
>  include/linux/kvm_host.h        |    3 +-
>  3 files changed, 19 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..67e66e6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -237,8 +237,6 @@ struct kvm_mmu_page {
>  #endif
> 
>  	int write_flooding_count;
> -
> -	struct rcu_head rcu;
>  };
> 
>  struct kvm_pio_request {
> @@ -536,8 +534,6 @@ struct kvm_arch {
>  	u64 hv_guest_os_id;
>  	u64 hv_hypercall;
> 
> -	atomic_t reader_counter;
> -
>  	#ifdef CONFIG_KVM_MMU_AUDIT
>  	int audit_point;
>  	#endif
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 07424cf..903af5e 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -551,19 +551,23 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
> 
>  static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
>  {
> -	rcu_read_lock();
> -	atomic_inc(&vcpu->kvm->arch.reader_counter);
> -
> -	/* Increase the counter before walking shadow page table */
> -	smp_mb__after_atomic_inc();
> +	/*
> +	 * Prevent page table teardown by making any free-er wait during
> +	 * kvm_flush_remote_tlbs() IPI to all active vcpus.
> +	 */
> +	local_irq_disable();
> +	vcpu->mode = READING_SHADOW_PAGE_TABLES;
> +	/*
> +	 * wmb: advertise vcpu->mode change
> +	 * rmb: make sure we see updated sptes
> +	 */
> +	smp_mb();
>  }
> 
>  static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
>  {
> -	/* Decrease the counter after walking shadow page table finished */
> -	smp_mb__before_atomic_dec();
> -	atomic_dec(&vcpu->kvm->arch.reader_counter);
> -	rcu_read_unlock();


We need a mb here to avoid that setting vcpu->mode is reordered to the head
of reading/writing spte? (it is safe on x86, but we need a comment at least?)

Otherwise it looks good to me, i will measure it later.

next prev parent reply	other threads:[~2012-04-24  6:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 16:16 [PATCH RFC] KVM: MMU: Don't use RCU for lockless shadow walking Avi Kivity
2012-04-24  1:17 ` Marcelo Tosatti
2012-04-24  9:24   ` Avi Kivity
2012-05-14 12:41     ` Avi Kivity
2012-04-24  6:37 ` Xiao Guangrong [this message]
2012-04-24  9:19   ` Avi Kivity
2012-04-24  9:23     ` Avi Kivity
2012-04-24  9:54     ` Xiao Guangrong
2012-04-24 10:02       ` Avi Kivity
2012-04-24 10:05         ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F964A2C.7050106@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.