From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread
Date: Mon, 7 Oct 2013 22:23:55 -0300 [thread overview]
Message-ID: <20131008012355.GA3588@amt.cnet> (raw)
In-Reply-To: <1378376958-27252-13-git-send-email-xiaoguangrong@linux.vnet.ibm.com>
On Thu, Sep 05, 2013 at 06:29:15PM +0800, Xiao Guangrong wrote:
> It is easy if the handler is in the vcpu context, in that case we can use
> walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end() that
> disable interrupt to stop shadow page being freed. But we are on the ioctl
> context and the paths we are optimizing for have heavy workload, disabling
> interrupt is not good for the system performance
>
> We add a indicator into kvm struct (kvm->arch.rcu_free_shadow_page), then
> use call_rcu() to free the shadow page if that indicator is set. Set/Clear the
> indicator are protected by slot-lock, so it need not be atomic and does not
> hurt the performance and the scalability
>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
> arch/x86/include/asm/kvm_host.h | 6 +++++-
> arch/x86/kvm/mmu.c | 32 ++++++++++++++++++++++++++++++++
> arch/x86/kvm/mmu.h | 22 ++++++++++++++++++++++
> 3 files changed, 59 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c76ff74..8e4ca0d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -226,7 +226,10 @@ struct kvm_mmu_page {
> /* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */
> unsigned long mmu_valid_gen;
>
> - DECLARE_BITMAP(unsync_child_bitmap, 512);
> + union {
> + DECLARE_BITMAP(unsync_child_bitmap, 512);
> + struct rcu_head rcu;
> + };
>
> #ifdef CONFIG_X86_32
> /*
> @@ -554,6 +557,7 @@ struct kvm_arch {
> */
> struct list_head active_mmu_pages;
> struct list_head zapped_obsolete_pages;
> + bool rcu_free_shadow_page;
>
> struct list_head assigned_dev_head;
> struct iommu_domain *iommu_domain;
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2bf450a..f551fc7 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2355,6 +2355,30 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
> return ret;
> }
>
> +static void kvm_mmu_isolate_pages(struct list_head *invalid_list)
> +{
> + struct kvm_mmu_page *sp;
> +
> + list_for_each_entry(sp, invalid_list, link)
> + kvm_mmu_isolate_page(sp);
> +}
> +
> +static void free_pages_rcu(struct rcu_head *head)
> +{
> + struct kvm_mmu_page *next, *sp;
> +
> + sp = container_of(head, struct kvm_mmu_page, rcu);
> + while (sp) {
> + if (!list_empty(&sp->link))
> + next = list_first_entry(&sp->link,
> + struct kvm_mmu_page, link);
> + else
> + next = NULL;
> + kvm_mmu_free_page(sp);
> + sp = next;
> + }
> +}
> +
> static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> struct list_head *invalid_list)
> {
> @@ -2375,6 +2399,14 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> */
> kvm_flush_remote_tlbs(kvm);
>
> + if (kvm->arch.rcu_free_shadow_page) {
> + kvm_mmu_isolate_pages(invalid_list);
> + sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
> + list_del_init(invalid_list);
> + call_rcu(&sp->rcu, free_pages_rcu);
> + return;
> + }
This is unbounded (there was a similar problem with early fast page fault
implementations):
>From RCU/checklist.txt:
" An especially important property of the synchronize_rcu()
primitive is that it automatically self-limits: if grace periods
are delayed for whatever reason, then the synchronize_rcu()
primitive will correspondingly delay updates. In contrast,
code using call_rcu() should explicitly limit update rate in
cases where grace periods are delayed, as failing to do so can
result in excessive realtime latencies or even OOM conditions.
"
Moreover, freeing pages differently depending on some state should
be avoided.
Alternatives:
- Disable interrupts at write protect sites.
- Rate limit the number of pages freed via call_rcu
per grace period.
- Some better alternative.
next prev parent reply other threads:[~2013-10-08 1:24 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-05 10:29 [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 01/15] KVM: MMU: fix the count of spte number Xiao Guangrong
2013-09-08 12:19 ` Gleb Natapov
2013-09-08 13:55 ` Xiao Guangrong
2013-09-08 14:01 ` Gleb Natapov
2013-09-08 14:24 ` Xiao Guangrong
2013-09-08 14:26 ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 02/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-09-30 21:23 ` Marcelo Tosatti
2013-10-03 6:16 ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 03/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-09-30 22:39 ` Marcelo Tosatti
2013-10-03 6:29 ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 04/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 05/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-09-30 23:05 ` Marcelo Tosatti
2013-10-03 6:46 ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 06/15] KVM: MMU: update spte and add it into rmap before dirty log Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 07/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 08/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 09/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-09-08 12:03 ` Xiao Guangrong
2013-09-16 12:42 ` Gleb Natapov
2013-09-16 13:52 ` Xiao Guangrong
2013-09-16 15:04 ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 10/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 11/15] KVM: MMU: reintroduce kvm_mmu_isolate_page() Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-10-08 1:23 ` Marcelo Tosatti [this message]
2013-10-08 4:02 ` Xiao Guangrong
2013-10-09 1:56 ` Marcelo Tosatti
2013-10-09 10:45 ` Xiao Guangrong
2013-10-10 1:47 ` Marcelo Tosatti
2013-10-10 12:08 ` Gleb Natapov
2013-10-10 16:42 ` Marcelo Tosatti
2013-10-10 19:16 ` Gleb Natapov
2013-10-10 21:03 ` Marcelo Tosatti
2013-10-11 5:38 ` Gleb Natapov
2013-10-11 20:30 ` Marcelo Tosatti
2013-10-12 5:53 ` Gleb Natapov
2013-10-14 19:29 ` Marcelo Tosatti
2013-10-15 3:57 ` Gleb Natapov
2013-10-15 22:21 ` Marcelo Tosatti
2013-10-16 0:41 ` Xiao Guangrong
2013-10-16 9:12 ` Gleb Natapov
2013-10-16 20:43 ` Marcelo Tosatti
2013-09-05 10:29 ` [PATCH v2 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-09-15 10:26 ` [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131008012355.GA3588@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.