From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Kautuk Consul <kconsul@linux.vnet.ibm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kautuk Consul <kconsul@linux.vnet.ibm.com>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v2] KVM: ppc64: Enable ring-based dirty memory tracking on ppc64: enable config options and implement relevant functions
Date: Mon, 17 Jul 2023 13:36:58 +0530 [thread overview]
Message-ID: <87pm4rarml.fsf@linux.ibm.com> (raw)
In-Reply-To: <20230717071208.1134783-1-kconsul@linux.vnet.ibm.com>
Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:
> - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as ppc64 is weakly
> ordered.
> - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP because the
> kvmppc_xive_native_set_attr is called in the context of an ioctl
> syscall and will call kvmppc_xive_native_eq_sync for setting the
> KVM_DEV_XIVE_EQ_SYNC attribute which will call mark_dirty_page()
> when there isn't a running vcpu. Implemented the
> kvm_arch_allow_write_without_running_vcpu to always return true
> to allow mark_page_dirty_in_slot to mark the page dirty in the
> memslot->dirty_bitmap in this case.
> - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
> offset.
> - Implement the kvm_arch_mmu_enable_log_dirty_pt_masked function required
> for the generic KVM code to call.
> - Add a check to kvmppc_vcpu_run_hv for checking whether the dirty
> ring is soft full.
> - Implement the kvm_arch_flush_remote_tlbs_memslot function to support
> the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT config option.
>
> Test Results
> ============
> On testing with live migration it was found that there is around
> 150-180 ms improvment in overall migration time with this patch.
>
> Bare Metal P9 testing with patch:
> --------------------------------
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 20694 ms
> downtime: 73 ms
> setup: 23 ms
> transferred ram: 2604370 kbytes
> throughput: 1033.55 mbps
> remaining ram: 0 kbytes
> total ram: 16777216 kbytes
> duplicate: 3555398 pages
> skipped: 0 pages
> normal: 642026 pages
> normal bytes: 2568104 kbytes
> dirty sync count: 3
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 32455
> precopy ram: 2581549 kbytes
> downtime ram: 22820 kbytes
>
> Bare Metal P9 testing without patch:
> -----------------------------------
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 20873 ms
> downtime: 62 ms
> setup: 19 ms
> transferred ram: 2612900 kbytes
> throughput: 1027.83 mbps
> remaining ram: 0 kbytes
> total ram: 16777216 kbytes
> duplicate: 3553329 pages
> skipped: 0 pages
> normal: 644159 pages
> normal bytes: 2576636 kbytes
> dirty sync count: 4
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 88297
> precopy ram: 2603645 kbytes
> downtime ram: 9254 kbytes
>
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> ---
> Documentation/virt/kvm/api.rst | 2 +-
> arch/powerpc/include/uapi/asm/kvm.h | 2 ++
> arch/powerpc/kvm/Kconfig | 2 ++
> arch/powerpc/kvm/book3s.c | 46 +++++++++++++++++++++++++++++
> arch/powerpc/kvm/book3s_hv.c | 3 ++
> include/linux/kvm_dirty_ring.h | 5 ++++
> virt/kvm/dirty_ring.c | 1 +
> 7 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c0ddd3035462..84c180ccd178 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8114,7 +8114,7 @@ regardless of what has actually been exposed through the CPUID leaf.
> 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> ----------------------------------------------------------
>
> -:Architectures: x86, arm64
> +:Architectures: x86, arm64, ppc64
> :Parameters: args[0] - size of the dirty log ring
>
> KVM is capable of tracking dirty memory using ring buffers that are
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index 9f18fa090f1f..f722309ed7fb 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -33,6 +33,8 @@
> /* Not always available, but if it is, this is the correct offset. */
> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>
> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
> +
> struct kvm_regs {
> __u64 pc;
> __u64 cr;
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 902611954200..c93354ec3bd5 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -26,6 +26,8 @@ config KVM
> select IRQ_BYPASS_MANAGER
> select HAVE_KVM_IRQ_BYPASS
> select INTERVAL_TREE
> + select HAVE_KVM_DIRTY_RING_ACQ_REL
> + select NEED_KVM_DIRTY_RING_WITH_BITMAP
>
> config KVM_BOOK3S_HANDLER
> bool
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 686d8d9eda3e..01aa4fe2c424 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -32,6 +32,7 @@
> #include <asm/mmu_context.h>
> #include <asm/page.h>
> #include <asm/xive.h>
> +#include <asm/book3s/64/radix.h>
>
> #include "book3s.h"
> #include "trace.h"
> @@ -1070,6 +1071,51 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
>
> #endif /* CONFIG_KVM_XICS */
>
> +/*
> + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
> + * dirty pages.
> + *
> + * It write protects selected pages to enable dirty logging for them.
> + */
> +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> + gfn_t gfn_offset,
> + unsigned long mask)
> +{
> + phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT;
> + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> +
> + while (start < end) {
> + pte_t *ptep;
> + unsigned int shift;
> +
> + ptep = find_kvm_secondary_pte(kvm, start, &shift);
> +
> + if (radix_enabled())
> + __radix_pte_update(ptep, _PAGE_WRITE, 0);
> + else
> + *ptep = __pte(pte_val(*ptep) & ~(_PAGE_WRITE));
> +
> + start += PAGE_SIZE;
> + }
>
I am not sure about that. You are walking partition scoped table here
and you are checking for hypervisor translation mode and doing pte
updates. That doesn't look correct.
-aneesh
WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Kautuk Consul <kconsul@linux.vnet.ibm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Kautuk Consul <kconsul@linux.vnet.ibm.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] KVM: ppc64: Enable ring-based dirty memory tracking on ppc64: enable config options and implement relevant functions
Date: Mon, 17 Jul 2023 13:36:58 +0530 [thread overview]
Message-ID: <87pm4rarml.fsf@linux.ibm.com> (raw)
In-Reply-To: <20230717071208.1134783-1-kconsul@linux.vnet.ibm.com>
Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:
> - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as ppc64 is weakly
> ordered.
> - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP because the
> kvmppc_xive_native_set_attr is called in the context of an ioctl
> syscall and will call kvmppc_xive_native_eq_sync for setting the
> KVM_DEV_XIVE_EQ_SYNC attribute which will call mark_dirty_page()
> when there isn't a running vcpu. Implemented the
> kvm_arch_allow_write_without_running_vcpu to always return true
> to allow mark_page_dirty_in_slot to mark the page dirty in the
> memslot->dirty_bitmap in this case.
> - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
> offset.
> - Implement the kvm_arch_mmu_enable_log_dirty_pt_masked function required
> for the generic KVM code to call.
> - Add a check to kvmppc_vcpu_run_hv for checking whether the dirty
> ring is soft full.
> - Implement the kvm_arch_flush_remote_tlbs_memslot function to support
> the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT config option.
>
> Test Results
> ============
> On testing with live migration it was found that there is around
> 150-180 ms improvment in overall migration time with this patch.
>
> Bare Metal P9 testing with patch:
> --------------------------------
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 20694 ms
> downtime: 73 ms
> setup: 23 ms
> transferred ram: 2604370 kbytes
> throughput: 1033.55 mbps
> remaining ram: 0 kbytes
> total ram: 16777216 kbytes
> duplicate: 3555398 pages
> skipped: 0 pages
> normal: 642026 pages
> normal bytes: 2568104 kbytes
> dirty sync count: 3
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 32455
> precopy ram: 2581549 kbytes
> downtime ram: 22820 kbytes
>
> Bare Metal P9 testing without patch:
> -----------------------------------
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 20873 ms
> downtime: 62 ms
> setup: 19 ms
> transferred ram: 2612900 kbytes
> throughput: 1027.83 mbps
> remaining ram: 0 kbytes
> total ram: 16777216 kbytes
> duplicate: 3553329 pages
> skipped: 0 pages
> normal: 644159 pages
> normal bytes: 2576636 kbytes
> dirty sync count: 4
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 88297
> precopy ram: 2603645 kbytes
> downtime ram: 9254 kbytes
>
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> ---
> Documentation/virt/kvm/api.rst | 2 +-
> arch/powerpc/include/uapi/asm/kvm.h | 2 ++
> arch/powerpc/kvm/Kconfig | 2 ++
> arch/powerpc/kvm/book3s.c | 46 +++++++++++++++++++++++++++++
> arch/powerpc/kvm/book3s_hv.c | 3 ++
> include/linux/kvm_dirty_ring.h | 5 ++++
> virt/kvm/dirty_ring.c | 1 +
> 7 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c0ddd3035462..84c180ccd178 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8114,7 +8114,7 @@ regardless of what has actually been exposed through the CPUID leaf.
> 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> ----------------------------------------------------------
>
> -:Architectures: x86, arm64
> +:Architectures: x86, arm64, ppc64
> :Parameters: args[0] - size of the dirty log ring
>
> KVM is capable of tracking dirty memory using ring buffers that are
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index 9f18fa090f1f..f722309ed7fb 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -33,6 +33,8 @@
> /* Not always available, but if it is, this is the correct offset. */
> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>
> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
> +
> struct kvm_regs {
> __u64 pc;
> __u64 cr;
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 902611954200..c93354ec3bd5 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -26,6 +26,8 @@ config KVM
> select IRQ_BYPASS_MANAGER
> select HAVE_KVM_IRQ_BYPASS
> select INTERVAL_TREE
> + select HAVE_KVM_DIRTY_RING_ACQ_REL
> + select NEED_KVM_DIRTY_RING_WITH_BITMAP
>
> config KVM_BOOK3S_HANDLER
> bool
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 686d8d9eda3e..01aa4fe2c424 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -32,6 +32,7 @@
> #include <asm/mmu_context.h>
> #include <asm/page.h>
> #include <asm/xive.h>
> +#include <asm/book3s/64/radix.h>
>
> #include "book3s.h"
> #include "trace.h"
> @@ -1070,6 +1071,51 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
>
> #endif /* CONFIG_KVM_XICS */
>
> +/*
> + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
> + * dirty pages.
> + *
> + * It write protects selected pages to enable dirty logging for them.
> + */
> +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> + gfn_t gfn_offset,
> + unsigned long mask)
> +{
> + phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT;
> + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> +
> + while (start < end) {
> + pte_t *ptep;
> + unsigned int shift;
> +
> + ptep = find_kvm_secondary_pte(kvm, start, &shift);
> +
> + if (radix_enabled())
> + __radix_pte_update(ptep, _PAGE_WRITE, 0);
> + else
> + *ptep = __pte(pte_val(*ptep) & ~(_PAGE_WRITE));
> +
> + start += PAGE_SIZE;
> + }
>
I am not sure about that. You are walking partition scoped table here
and you are checking for hypervisor translation mode and doing pte
updates. That doesn't look correct.
-aneesh
next prev parent reply other threads:[~2023-07-17 8:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-17 7:12 [PATCH v2] KVM: ppc64: Enable ring-based dirty memory tracking on ppc64: enable config options and implement relevant functions Kautuk Consul
2023-07-17 7:12 ` Kautuk Consul
2023-07-17 8:06 ` Aneesh Kumar K.V [this message]
2023-07-17 8:06 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pm4rarml.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=corbet@lwn.net \
--cc=kconsul@linux.vnet.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.