From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: liu.jinsong@huawei.com, kvm@vger.kernel.org,
wangxinxin.wang@huawei.com, xiaoguangrong@tencent.com,
qemu-devel@nongnu.org, arei.gonglei@huawei.com,
pbonzini@redhat.com
Subject: Re: [PATCH 4/4] KVM: MMU: fast cleanup D bit based on fast write protect
Date: Thu, 17 Jan 2019 08:32:08 -0800 [thread overview]
Message-ID: <20190117163208.GD22169@linux.intel.com> (raw)
In-Reply-To: <1547733331-16140-5-git-send-email-ann.zhuangyanying@huawei.com>
On Thu, Jan 17, 2019 at 01:55:31PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
>
> When live-migration with large-memory guests, vcpu may hang for a long
> time while starting migration, such as 9s for 2T
> (linux-5.0.0-rc2+qemu-3.1.0).
> The reason is memory_global_dirty_log_start() taking too long, and the
> vcpu is waiting for BQL. The page-by-page D bit clearup is the main time
> consumption. I think that the idea of "KVM: MMU: fast write protect" by
> xiaoguangrong, especially the function kvm_mmu_write_protect_all_pages(),
> is very helpful. After a little modifcation, on his patch, can solve
> this problem, 9s to 0.5s.
>
> At the beginning of live migration, write protection is only applied to the
> top-level SPTE. Then the write from vm trigger the EPT violation, with
> for_each_shadow_entry write protection is performed at dirct_map.
> Finally the Dirty bit of the target page(at level 1 page table) is
> cleared, and the dirty page tracking is started. Of coure, the page where
> GPA is located is marked dirty when mmu_set_spte.
> A similar implementation on xen, just emt instead of write protection.
>
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
> arch/x86/kvm/mmu.c | 8 +++++---
> arch/x86/kvm/vmx/vmx.c | 3 +--
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 047b897..a18bcc0 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3257,7 +3257,10 @@ static bool mmu_load_shadow_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> break;
>
> if (is_last_spte(spte, sp->role.level)) {
> - flush |= spte_write_protect(sptep, false);
> + if (sp->role.level == PT_PAGE_TABLE_LEVEL)
> + flush |= spte_clear_dirty(sptep);
> + else
> + flush |= spte_write_protect(sptep, false);
> continue;
> }
>
> @@ -6114,7 +6117,6 @@ void kvm_mmu_write_protect_all_pages(struct kvm *kvm, bool write_protect)
> {
> u64 wp_all_indicator, kvm_wp_all_gen;
>
> - mutex_lock(&kvm->slots_lock);
> wp_all_indicator = get_write_protect_all_indicator(kvm);
> kvm_wp_all_gen = get_write_protect_all_gen(wp_all_indicator);
>
> @@ -6134,8 +6136,8 @@ void kvm_mmu_write_protect_all_pages(struct kvm *kvm, bool write_protect)
> */
> if (write_protect)
> kvm_reload_remote_mmus(kvm);
> - mutex_unlock(&kvm->slots_lock);
Why is the lock removed? And why was it added in the first place?
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_write_protect_all_pages);
>
> static unsigned long
> mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f6915f1..5236a07 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7180,8 +7180,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
> static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> struct kvm_memory_slot *slot)
> {
> - kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> - kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
> + kvm_mmu_write_protect_all_pages(kvm, true);
What's the purpose of having @write_protect if
kvm_mmu_write_protect_all_pages() is only ever called to enable
protection? If there's no known scenario where write protection is
explicitly disabled then there's no need for WP_ALL_ENABLE_MASK, i.e. a
non-zero generation would indicate write protection is enabled. That'd
simplify the code and clean up the atomic usage.
> }
>
> static void vmx_slot_disable_log_dirty(struct kvm *kvm,
> --
> 1.8.3.1
>
>
WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: xiaoguangrong@tencent.com, pbonzini@redhat.com,
arei.gonglei@huawei.com, qemu-devel@nongnu.org,
kvm@vger.kernel.org, wangxinxin.wang@huawei.com,
liu.jinsong@huawei.com
Subject: Re: [Qemu-devel] [PATCH 4/4] KVM: MMU: fast cleanup D bit based on fast write protect
Date: Thu, 17 Jan 2019 08:32:08 -0800 [thread overview]
Message-ID: <20190117163208.GD22169@linux.intel.com> (raw)
In-Reply-To: <1547733331-16140-5-git-send-email-ann.zhuangyanying@huawei.com>
On Thu, Jan 17, 2019 at 01:55:31PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
>
> When live-migration with large-memory guests, vcpu may hang for a long
> time while starting migration, such as 9s for 2T
> (linux-5.0.0-rc2+qemu-3.1.0).
> The reason is memory_global_dirty_log_start() taking too long, and the
> vcpu is waiting for BQL. The page-by-page D bit clearup is the main time
> consumption. I think that the idea of "KVM: MMU: fast write protect" by
> xiaoguangrong, especially the function kvm_mmu_write_protect_all_pages(),
> is very helpful. After a little modifcation, on his patch, can solve
> this problem, 9s to 0.5s.
>
> At the beginning of live migration, write protection is only applied to the
> top-level SPTE. Then the write from vm trigger the EPT violation, with
> for_each_shadow_entry write protection is performed at dirct_map.
> Finally the Dirty bit of the target page(at level 1 page table) is
> cleared, and the dirty page tracking is started. Of coure, the page where
> GPA is located is marked dirty when mmu_set_spte.
> A similar implementation on xen, just emt instead of write protection.
>
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
> arch/x86/kvm/mmu.c | 8 +++++---
> arch/x86/kvm/vmx/vmx.c | 3 +--
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 047b897..a18bcc0 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3257,7 +3257,10 @@ static bool mmu_load_shadow_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> break;
>
> if (is_last_spte(spte, sp->role.level)) {
> - flush |= spte_write_protect(sptep, false);
> + if (sp->role.level == PT_PAGE_TABLE_LEVEL)
> + flush |= spte_clear_dirty(sptep);
> + else
> + flush |= spte_write_protect(sptep, false);
> continue;
> }
>
> @@ -6114,7 +6117,6 @@ void kvm_mmu_write_protect_all_pages(struct kvm *kvm, bool write_protect)
> {
> u64 wp_all_indicator, kvm_wp_all_gen;
>
> - mutex_lock(&kvm->slots_lock);
> wp_all_indicator = get_write_protect_all_indicator(kvm);
> kvm_wp_all_gen = get_write_protect_all_gen(wp_all_indicator);
>
> @@ -6134,8 +6136,8 @@ void kvm_mmu_write_protect_all_pages(struct kvm *kvm, bool write_protect)
> */
> if (write_protect)
> kvm_reload_remote_mmus(kvm);
> - mutex_unlock(&kvm->slots_lock);
Why is the lock removed? And why was it added in the first place?
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_write_protect_all_pages);
>
> static unsigned long
> mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f6915f1..5236a07 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7180,8 +7180,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
> static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> struct kvm_memory_slot *slot)
> {
> - kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> - kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
> + kvm_mmu_write_protect_all_pages(kvm, true);
What's the purpose of having @write_protect if
kvm_mmu_write_protect_all_pages() is only ever called to enable
protection? If there's no known scenario where write protection is
explicitly disabled then there's no need for WP_ALL_ENABLE_MASK, i.e. a
non-zero generation would indicate write protection is enabled. That'd
simplify the code and clean up the atomic usage.
> }
>
> static void vmx_slot_disable_log_dirty(struct kvm *kvm,
> --
> 1.8.3.1
>
>
next prev parent reply other threads:[~2019-01-17 16:32 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-17 13:55 [PATCH 0/4] KVM: MMU: fast cleanup D bit based on fast write protect Zhuangyanying
2019-01-17 13:55 ` [Qemu-devel] " Zhuangyanying
2019-01-17 13:55 ` [PATCH 1/4] KVM: MMU: correct the behavior of mmu_spte_update_no_track Zhuangyanying
2019-01-17 13:55 ` [Qemu-devel] " Zhuangyanying
2019-01-17 15:44 ` Sean Christopherson
2019-01-17 15:44 ` [Qemu-devel] " Sean Christopherson
2019-01-17 13:55 ` [PATCH 2/4] KVM: MMU: introduce possible_writable_spte_bitmap Zhuangyanying
2019-01-17 13:55 ` [Qemu-devel] " Zhuangyanying
2019-01-17 15:55 ` Sean Christopherson
2019-01-17 15:55 ` [Qemu-devel] " Sean Christopherson
2019-01-17 13:55 ` [PATCH 3/4] KVM: MMU: introduce kvm_mmu_write_protect_all_pages Zhuangyanying
2019-01-17 13:55 ` [Qemu-devel] " Zhuangyanying
2019-01-17 16:09 ` Sean Christopherson
2019-01-17 16:09 ` [Qemu-devel] " Sean Christopherson
2019-01-17 13:55 ` [PATCH 4/4] KVM: MMU: fast cleanup D bit based on fast write protect Zhuangyanying
2019-01-17 13:55 ` [Qemu-devel] " Zhuangyanying
2019-01-17 16:32 ` Sean Christopherson [this message]
2019-01-17 16:32 ` Sean Christopherson
2019-01-21 6:37 ` Zhuangyanying
2019-01-21 6:37 ` [Qemu-devel] " Zhuangyanying
2019-01-22 15:17 ` Sean Christopherson
2019-01-22 15:17 ` [Qemu-devel] " Sean Christopherson
2019-01-23 18:38 ` Zhuangyanying
2019-01-23 18:38 ` [Qemu-devel] " Zhuangyanying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190117163208.GD22169@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=ann.zhuangyanying@huawei.com \
--cc=arei.gonglei@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=liu.jinsong@huawei.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=wangxinxin.wang@huawei.com \
--cc=xiaoguangrong@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.