All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v2 09/15] KVM: MMU: introduce pte-list lockless walker
Date: Mon, 16 Sep 2013 15:42:13 +0300	[thread overview]
Message-ID: <20130916124213.GR17294@redhat.com> (raw)
In-Reply-To: <1378376958-27252-10-git-send-email-xiaoguangrong@linux.vnet.ibm.com>

On Thu, Sep 05, 2013 at 06:29:12PM +0800, Xiao Guangrong wrote:
> The basic idea is from nulls list which uses a nulls to indicate
> whether the desc is moved to different pte-list
> 
> Note, we should do bottom-up walk in the desc since we always move
> the bottom entry to the deleted position. A desc only has 3 entries
> in the current code so it is not a problem now, but the issue will
> be triggered if we expend the size of desc in the further development
> 
> Thanks to SLAB_DESTROY_BY_RCU, the desc can be quickly reused
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c5f1b27..3e1432f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -975,6 +975,10 @@ static int count_spte_number(struct pte_list_desc *desc)
>  	return first_free + desc_num * PTE_LIST_EXT;
>  }
>  
> +#define rcu_assign_pte_list(pte_list_p, value)				\
> +	rcu_assign_pointer(*(unsigned long __rcu **)(pte_list_p),	\
> +			(unsigned long *)(value))
> +
>  /*
>   * Pte mapping structures:
>   *
> @@ -994,7 +998,7 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
>  
>  	if (!*pte_list) {
>  		rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
> -		*pte_list = (unsigned long)spte;
> +		rcu_assign_pte_list(pte_list, spte);
>  		return 0;
>  	}
>  
> @@ -1004,7 +1008,7 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
>  		desc->sptes[0] = (u64 *)*pte_list;
>  		desc->sptes[1] = spte;
>  		desc_mark_nulls(pte_list, desc);
> -		*pte_list = (unsigned long)desc | 1;
> +		rcu_assign_pte_list(pte_list, (unsigned long)desc | 1);
>  		return 1;
>  	}
>  
> @@ -1017,7 +1021,7 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
>  		new_desc = mmu_alloc_pte_list_desc(vcpu);
>  		new_desc->more = desc;
>  		desc = new_desc;
> -		*pte_list = (unsigned long)desc | 1;
> +		rcu_assign_pte_list(pte_list, (unsigned long)desc | 1);
>  	}
>  
>  	free_pos = find_first_free(desc);
> @@ -1125,6 +1129,51 @@ static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
>  	WARN_ON(desc_get_nulls_value(desc) != pte_list);
>  }
>  
> +/* The caller should hold rcu lock. */
> +static void pte_list_walk_lockless(unsigned long *pte_list,
> +				   pte_list_walk_fn fn)
> +{
> +	struct pte_list_desc *desc;
> +	unsigned long pte_list_value;
> +	int i;
> +
> +restart:
> +	/*
> +	 * Force the pte_list to be reloaded.
> +	 *
> +	 * See the comments in hlist_nulls_for_each_entry_rcu().
> +	 */
> +	barrier();
> +	pte_list_value = *rcu_dereference(pte_list);
> +	if (!pte_list_value)
> +		return;
> +
> +	if (!(pte_list_value & 1))
> +		return fn((u64 *)pte_list_value);
> +
> +	desc = (struct pte_list_desc *)(pte_list_value & ~1ul);
> +	while (!desc_is_a_nulls(desc)) {
> +		/*
> +		 * We should do top-down walk since we always use the higher
> +		 * indices to replace the deleted entry if only one desc is
> +		 * used in the rmap when a spte is removed. Otherwise the
> +		 * moved entry will be missed.
> +		 */
> +		for (i = PTE_LIST_EXT - 1; i >= 0; i--)
> +			if (desc->sptes[i])
> +				fn(desc->sptes[i]);
> +
> +		desc = rcu_dereference(desc->more);
> +
> +		/* It is being initialized. */
> +		if (unlikely(!desc))
> +			goto restart;
> +	}
> +
> +	if (unlikely(desc_get_nulls_value(desc) != pte_list))
> +		goto restart;
> +}
> +
>  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>  				    struct kvm_memory_slot *slot)
>  {
> @@ -4651,7 +4700,7 @@ int kvm_mmu_module_init(void)
>  {
>  	pte_list_desc_cache = kmem_cache_create("pte_list_desc",
>  					    sizeof(struct pte_list_desc),
> -					    0, 0, NULL);
> +					    0, SLAB_DESTROY_BY_RCU, NULL);
Haven't we agreed that constructor is needed for the cache?

>  	if (!pte_list_desc_cache)
>  		goto nomem;
>  
> -- 
> 1.8.1.4

--
			Gleb.

  parent reply	other threads:[~2013-09-16 12:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05 10:29 [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 01/15] KVM: MMU: fix the count of spte number Xiao Guangrong
2013-09-08 12:19   ` Gleb Natapov
2013-09-08 13:55     ` Xiao Guangrong
2013-09-08 14:01       ` Gleb Natapov
2013-09-08 14:24         ` Xiao Guangrong
2013-09-08 14:26           ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 02/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-09-30 21:23   ` Marcelo Tosatti
2013-10-03  6:16     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 03/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-09-30 22:39   ` Marcelo Tosatti
2013-10-03  6:29     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 04/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 05/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-09-30 23:05   ` Marcelo Tosatti
2013-10-03  6:46     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 06/15] KVM: MMU: update spte and add it into rmap before dirty log Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 07/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 08/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 09/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-09-08 12:03   ` Xiao Guangrong
2013-09-16 12:42   ` Gleb Natapov [this message]
2013-09-16 13:52     ` Xiao Guangrong
2013-09-16 15:04       ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 10/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 11/15] KVM: MMU: reintroduce kvm_mmu_isolate_page() Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-10-08  1:23   ` Marcelo Tosatti
2013-10-08  4:02     ` Xiao Guangrong
2013-10-09  1:56       ` Marcelo Tosatti
2013-10-09 10:45         ` Xiao Guangrong
2013-10-10  1:47           ` Marcelo Tosatti
2013-10-10 12:08             ` Gleb Natapov
2013-10-10 16:42               ` Marcelo Tosatti
2013-10-10 19:16                 ` Gleb Natapov
2013-10-10 21:03                   ` Marcelo Tosatti
2013-10-11  5:38                     ` Gleb Natapov
2013-10-11 20:30                       ` Marcelo Tosatti
2013-10-12  5:53                         ` Gleb Natapov
2013-10-14 19:29                           ` Marcelo Tosatti
2013-10-15  3:57                             ` Gleb Natapov
2013-10-15 22:21                               ` Marcelo Tosatti
2013-10-16  0:41                                 ` Xiao Guangrong
2013-10-16  9:12                                 ` Gleb Natapov
2013-10-16 20:43                                   ` Marcelo Tosatti
2013-09-05 10:29 ` [PATCH v2 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-09-15 10:26 ` [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130916124213.GR17294@redhat.com \
    --to=gleb@redhat.com \
    --cc=avi.kivity@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.