* [PATCH 1/8] arm64: add an interface for stage-2 page tracking
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 2/8] KVM: arm64: add page tracking device as a capability Lilit Janpoladyan
` (7 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
Add an interface for tracking stage-2 page accesses. The interface
can be implemented by a driver for a device that has the capabilities
e.g. AWS Graviton Page Tracking Agent accelerator. When a device
implementing page_tracking_device interface is available, KVM will
use it to accelerate dirty logging. The initial version of the
interface supports dirty logging only, but the interface can be
extended to other use cases, such as a WSS calculation.
page_tracking_device supports tracking stage-2 translations by VMID
and by CPU ID. While VMID filter is required, CPU ID is optional.
CPU ID == -1 denotes any CPU. Similarly, page_tracking_device allows
getting pages logged for either a particular CPU or for all. KVM
can use CPU ID of -1 to populate dirty bitmaps and a specific
CPU ID for per vCPU dirty rings.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/arm64/include/asm/page_tracking.h | 79 +++++++++++++
arch/arm64/kvm/Kconfig | 12 ++
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/page_tracking.c | 158 +++++++++++++++++++++++++
4 files changed, 250 insertions(+)
create mode 100644 arch/arm64/include/asm/page_tracking.h
create mode 100644 arch/arm64/kvm/page_tracking.c
diff --git a/arch/arm64/include/asm/page_tracking.h b/arch/arm64/include/asm/page_tracking.h
new file mode 100644
index 000000000000..5162fb5b648e
--- /dev/null
+++ b/arch/arm64/include/asm/page_tracking.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ARM64_PAGE_TRACKING_DEVICE_H
+#define _ARM64_PAGE_TRACKING_DEVICE_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+
+/* Page tracking mode */
+enum pt_mode {
+ dirty_pages,
+};
+
+/* Configuration of a per-VM page tracker */
+struct pt_config {
+ enum pt_mode mode; /* Tracking mode */
+ u32 vmid; /* VMID to track */
+};
+
+/* Interface provided by the page tracking device */
+struct page_tracking_device {
+
+ /* Allocates a per-VM tracker, returns tracking context */
+ void* (*allocate_tracker)(struct pt_config config);
+
+ /* Releases a per-VM tracker */
+ int (*release_tracker)(void *ctx);
+
+ /*
+ * Enables tracking for the specified @ctx and the specified @cpu,
+ * @cpu = -1 enables tracking for all cpus
+ *
+ * The function may be called for the same @ctx and @cpu multiple
+ * times and the implementation has to do reference counting to
+ * correctly disable the tracking.
+ * @returns 0 on success, negative errno in case of a failure
+ */
+ int (*enable_tracking)(void *ctx, int cpu);
+
+ /*
+ * Disables tracking for the @ctx
+ *
+ * Does actually disable the tracking of the @ctx and the @cpu only
+ * when the number of disable and enable calls matches, i.e. when the
+ * reference counter is at 0. @returns 0 in this case, -EBUSY while
+ * reference counter > 0 and negative errno in case of a failure
+ */
+ int (*disable_tracking)(void *ctx, int cpu);
+
+ /*
+ * Flushes any tracking data available for the @ctx,
+ * @returns 0 on success, negative errno in case of a failure
+ */
+ int (*flush)(void *ctx);
+
+ /*
+ * Reads up to @max dirty pages available for the @ctx
+ * In case @cpu id is not -1, reads only pages dirtied by the specified cpu
+ * @returns number of read pages and -errno in case of a failure
+ */
+ int (*read_dirty_pages)(void *ctx,
+ int cpu,
+ gpa_t *pages,
+ u32 max);
+};
+
+/* Page tracking device tear-down, bring-up and existence checks */
+void page_tracking_device_unregister(struct page_tracking_device *pt_dev);
+int page_tracking_device_register(struct page_tracking_device *pt_dev);
+int page_tracking_device_registered(void);
+
+/* Page tracking device wrappers */
+void *page_tracking_allocate(struct pt_config config);
+int page_tracking_release(void *ctx);
+int page_tracking_enable(void *ctx, int cpu);
+int page_tracking_disable(void *ctx, int cpu);
+int page_tracking_flush(void *ctx);
+int page_tracking_read_dirty_pages(void *ctx, int cpu, gpa_t *pages, u32 max);
+
+#endif /*_ARM64_PAGE_TRACKNG_DEVICE_H */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8304eb342be9..33844658279b 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -66,4 +66,16 @@ config PROTECTED_NVHE_STACKTRACE
If unsure, or not using protected nVHE (pKVM), say N.
+config HAVE_KVM_PAGE_TRACKING_DEVICE
+ bool "Support for hardware accelerated dirty tracking"
+ default n
+ help
+ Say Y to enable hardware accelerated dirty tracking
+
+ Adds support for hardware accelerated dirty tracking during live
+ migration of a virtual machine. Requires a hardware accelerator.
+
+ If there is no required hardware, say N.
+
+
endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 86a629aaf0a1..4e4f5c63baf2 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -18,6 +18,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
guest.o debug.o reset.o sys_regs.o stacktrace.o \
vgic-sys-reg-v3.o fpsimd.o pkvm.o \
arch_timer.o trng.o vmid.o emulate-nested.o nested.o \
+ page_tracking.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/page_tracking.c b/arch/arm64/kvm/page_tracking.c
new file mode 100644
index 000000000000..a81c917d4faa
--- /dev/null
+++ b/arch/arm64/kvm/page_tracking.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <asm/page_tracking.h>
+#include <linux/mutex.h>
+#include <linux/rcupdate.h>
+
+#ifndef CONFIG_HAVE_KVM_PAGE_TRACKING_DEVICE
+
+int page_tracking_device_register(struct page_tracking_device *dev) { return 0; }
+void page_tracking_device_unregister(struct page_tracking_device *dev) {}
+int page_tracking_device_registered(void) { return 0; }
+void *page_tracking_allocate(struct pt_config config) { return NULL; }
+int page_tracking_release(void *ctx) { return 0; }
+int page_tracking_enable(void *ctx, int cpu) { return 0; }
+int page_tracking_disable(void *ctx, int cpu) { return 0; }
+int page_tracking_flush(void *ctx) { return 0; }
+int page_tracking_read_dirty_pages(void *ctx, int cpu, gpa_t *pages, u32 max) { return 0; }
+
+#else
+
+static DEFINE_MUTEX(page_tracking_device_mutex);
+static struct page_tracking_device __rcu *pt_dev __read_mostly;
+
+int page_tracking_device_register(struct page_tracking_device *dev)
+{
+ int rc = 0;
+
+ mutex_lock(&page_tracking_device_mutex);
+
+ if (rcu_dereference_protected(pt_dev, lockdep_is_held(&page_tracking_device_mutex))) {
+ rc = -EBUSY;
+ goto out;
+ }
+ rcu_assign_pointer(pt_dev, dev);
+out:
+ mutex_unlock(&page_tracking_device_mutex);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(page_tracking_device_register);
+
+void page_tracking_device_unregister(struct page_tracking_device *dev)
+{
+ mutex_lock(&page_tracking_device_mutex);
+
+ if (dev == rcu_dereference_protected(pt_dev,
+ lockdep_is_held(&page_tracking_device_mutex))) {
+ /* Disable page tracking device */
+ RCU_INIT_POINTER(pt_dev, NULL);
+ synchronize_rcu();
+ }
+ mutex_unlock(&page_tracking_device_mutex);
+}
+EXPORT_SYMBOL_GPL(page_tracking_device_unregister);
+
+int page_tracking_device_registered(void)
+{
+ bool registered;
+
+ rcu_read_lock();
+ registered = (rcu_dereference(pt_dev) != NULL);
+ rcu_read_unlock();
+ return registered;
+}
+EXPORT_SYMBOL_GPL(page_tracking_device_registered);
+
+/* Allocates a per-VM tracker, returns tracking context */
+void *page_tracking_allocate(struct pt_config config)
+{
+ struct page_tracking_device *dev;
+ void *ctx = NULL;
+
+ rcu_read_lock();
+ dev = rcu_dereference(pt_dev);
+ if (likely(dev))
+ ctx = dev->allocate_tracker(config);
+ rcu_read_unlock();
+ return ctx;
+}
+EXPORT_SYMBOL_GPL(page_tracking_allocate);
+
+/* Releases a per-VM tracker */
+int page_tracking_release(void *ctx)
+{
+ int r;
+ struct page_tracking_device *dev;
+
+ rcu_read_lock();
+ dev = rcu_dereference(pt_dev);
+ if (likely(dev))
+ r = dev->release_tracker(ctx);
+ rcu_read_unlock();
+ return r;
+}
+EXPORT_SYMBOL_GPL(page_tracking_release);
+
+/* Enables tracking for the specified @ctx and @cpu (-1 for all cpus) */
+int page_tracking_enable(void *ctx, int cpu)
+{
+ int r;
+ struct page_tracking_device *dev;
+
+ rcu_read_lock();
+ dev = rcu_dereference(pt_dev);
+ if (likely(dev))
+ r = dev->enable_tracking(ctx, cpu);
+ rcu_read_unlock();
+ return r;
+}
+EXPORT_SYMBOL_GPL(page_tracking_enable);
+
+/* Disables tracking for the @ctx and @cpu */
+int page_tracking_disable(void *ctx, int cpu)
+{
+ int r;
+ struct page_tracking_device *dev;
+
+ rcu_read_lock();
+ dev = rcu_dereference(pt_dev);
+ if (likely(dev))
+ r = dev->disable_tracking(ctx, cpu);
+ rcu_read_unlock();
+ return r;
+}
+EXPORT_SYMBOL_GPL(page_tracking_disable);
+
+/* Flushes any available data */
+int page_tracking_flush(void *ctx)
+{
+ int r;
+ struct page_tracking_device *dev;
+
+ rcu_read_lock();
+ dev = rcu_dereference(pt_dev);
+ if (likely(dev))
+ r = dev->flush(ctx);
+ rcu_read_unlock();
+ return r;
+}
+EXPORT_SYMBOL_GPL(page_tracking_flush);
+
+/*
+ * Reads up to @max dirty pages available for the @ctx and @cpu (-1 for all cpus)
+ * @returns number of read pages and -errno in case of error
+ */
+int page_tracking_read_dirty_pages(void *ctx, int cpu, gpa_t *pages, u32 max)
+{
+ int r;
+ struct page_tracking_device *dev;
+
+ rcu_read_lock();
+ dev = rcu_dereference(pt_dev);
+ if (likely(dev))
+ r = dev->read_dirty_pages(ctx, cpu, pages, max);
+ rcu_read_unlock();
+ return r;
+}
+EXPORT_SYMBOL_GPL(page_tracking_read_dirty_pages);
+
+#endif
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 2/8] KVM: arm64: add page tracking device as a capability
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 1/8] arm64: add an interface for stage-2 page tracking Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 3/8] KVM: arm64: use page tracking interface to enable dirty logging Lilit Janpoladyan
` (6 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
Add new capability KVM_CAP_ARM_PAGE_TRACKING_DEVICE to use page tracking
device for dirty logging. The capability can be used only if platform
supports such a device i.e. when KVM_CAP_ARM_PAGE_TRACKING_DEVICE
extension is supported. Until there is dirty ring support, make new
capability incompatible with the use of dirty ring.
When page tracking device is in use, instead of logging dirty pages on
faults KVM will collect a list of dirty pages from the device when
userspace reads dirty bitmap.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/arm.c | 17 +++++++++++++++++
include/uapi/linux/kvm.h | 1 +
4 files changed, 37 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b3be87489108..989d5dd886fb 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8950,6 +8950,23 @@ Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
production. The behavior and effective ABI for software-protected VMs is
unstable.
+8.42 KVM_CAP_ARM_PAGE_TRACKING_DEVICE
+_____________________________________
+
+:Capability: KVM_CAP_ARM_PAGE_TRACKING_DEVICE
+:Architecture: arm64
+:Type: vm
+:Parameters: arg[0] whether feature should be enabled or not
+:Returns 0 on success, -errno on failure
+
+This capability enables or disables hardware assistance for dirty page logging.
+
+In case page tracking device is available (i.e. if the host supports the
+KVM_CAP_ARM_PAGE_TRACKING_DEVICE extension), the device can be used to accelerate
+dirty logging. This capability turns the acceleration on and off.
+
+Not compatible with KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
+
9. Known KVM API problems
=========================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a33f5996ca9f..5b5e3647fbda 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -326,6 +326,8 @@ struct kvm_arch {
#define KVM_ARCH_FLAG_ID_REGS_INITIALIZED 7
/* Fine-Grained UNDEF initialised */
#define KVM_ARCH_FLAG_FGU_INITIALIZED 8
+ /* Page tracking device enabled */
+#define KVM_ARCH_FLAG_PAGE_TRACKING_DEVICE_ENABLED 9
unsigned long flags;
/* VM-wide vCPU feature set */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9bef7638342e..aea56df8ac04 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -40,6 +40,7 @@
#include <asm/kvm_nested.h>
#include <asm/kvm_pkvm.h>
#include <asm/kvm_ptrauth.h>
+#include <asm/page_tracking.h>
#include <asm/sections.h>
#include <kvm/arm_hypercalls.h>
@@ -149,6 +150,19 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->slots_lock);
break;
+ case KVM_CAP_ARM_PAGE_TRACKING_DEVICE:
+ if (page_tracking_device_registered() &&
+ !kvm->dirty_ring_size /* Does not support dirty ring yet */) {
+
+ r = 0;
+ if (cap->args[0])
+ set_bit(KVM_ARCH_FLAG_PAGE_TRACKING_DEVICE_ENABLED,
+ &kvm->arch.flags);
+ else
+ clear_bit(KVM_ARCH_FLAG_PAGE_TRACKING_DEVICE_ENABLED,
+ &kvm->arch.flags);
+ }
+ break;
default:
break;
}
@@ -416,6 +430,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES:
r = BIT(0);
break;
+ case KVM_CAP_ARM_PAGE_TRACKING_DEVICE:
+ r = page_tracking_device_registered();
+ break;
default:
r = 0;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 637efc055145..552ebede3f9d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -933,6 +933,7 @@ struct kvm_enable_cap {
#define KVM_CAP_PRE_FAULT_MEMORY 236
#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
#define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_ARM_PAGE_TRACKING_DEVICE 239
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 3/8] KVM: arm64: use page tracking interface to enable dirty logging
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 1/8] arm64: add an interface for stage-2 page tracking Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 2/8] KVM: arm64: add page tracking device as a capability Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-22 7:31 ` Sean Christopherson
2024-09-18 15:28 ` [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log Lilit Janpoladyan
` (5 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
If page tracking device is available, use it to enable and disable
hardware dirty tracking. Allocate a tracking context on the first dirty
logging enablement (for the first memslot) and deallocate the context
when dirty logging is off for the VMID.
Allocation and use of the context is not synchronized as they are done
from the VM ioctls.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/arm64/include/asm/kvm_host.h | 5 +++
arch/arm64/kvm/arm.c | 54 +++++++++++++++++++++++++++++++
arch/mips/kvm/mips.c | 10 ++++++
arch/powerpc/kvm/book3s.c | 10 ++++++
arch/powerpc/kvm/booke.c | 10 ++++++
arch/s390/kvm/kvm-s390.c | 10 ++++++
arch/x86/kvm/x86.c | 10 ++++++
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 19 +++++++----
9 files changed, 123 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5b5e3647fbda..db9bf42123e1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -377,6 +377,11 @@ struct kvm_arch {
* the associated pKVM instance in the hypervisor.
*/
struct kvm_protected_vm pkvm;
+
+ /*
+ * Stores page tracking context if page tracking device is in use
+ */
+ void *page_tracking_ctx;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index aea56df8ac04..c8dcf719ee99 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -267,6 +267,9 @@ static void kvm_destroy_mpidr_data(struct kvm *kvm)
*/
void kvm_arch_destroy_vm(struct kvm *kvm)
{
+ if (kvm->arch.page_tracking_ctx)
+ page_tracking_release(kvm->arch.page_tracking_ctx);
+
bitmap_free(kvm->arch.pmu_filter);
free_cpumask_var(kvm->arch.supported_cpus);
@@ -1816,6 +1819,57 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return r;
}
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ void *ctx = NULL;
+ struct pt_config config;
+ int r;
+
+ if (!page_tracking_device_registered())
+ return 0;
+
+ if (!kvm->arch.page_tracking_ctx) {
+ config.vmid = (u32)kvm->arch.mmu.vmid.id.counter;
+ config.mode = dirty_pages;
+ ctx = page_tracking_allocate(config);
+ if (!ctx)
+ return -ENOENT;
+
+ kvm->arch.page_tracking_ctx = ctx;
+ }
+
+ r = page_tracking_enable(kvm->arch.page_tracking_ctx, -1);
+
+ if (r) {
+ if (ctx) {
+ page_tracking_release(ctx);
+ kvm->arch.page_tracking_ctx = NULL;
+ }
+ }
+ return r;
+}
+
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ int r = 0;
+
+ if (!page_tracking_device_registered())
+ return 0;
+
+ if (!kvm->arch.page_tracking_ctx)
+ return -ENOENT;
+
+ r = page_tracking_disable(kvm->arch.page_tracking_ctx, -1);
+
+ if (r == -EBUSY) {
+ r = 0;
+ } else {
+ page_tracking_release(kvm->arch.page_tracking_ctx);
+ kvm->arch.page_tracking_ctx = NULL;
+ }
+ return r;
+}
+
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index b5de770b092e..edc6f473af4e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -974,6 +974,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl,
return r;
}
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index ff6c38373957..4c4a3ecc301c 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -844,6 +844,16 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
}
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 6a5be025a8af..f263ebc8fa49 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1814,6 +1814,16 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return r;
}
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return -EOPNOTSUPP;
+}
+
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return -EOPNOTSUPP;
+}
+
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0fd96860fc45..d6a8f7dbc644 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -667,6 +667,16 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
return r;
}
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
int i;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c983c8e434b8..1be8bacfe2bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6488,6 +6488,16 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
return 0;
}
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+ return 0;
+}
+
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0d5125a3e31a..ae905f54ec47 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1475,6 +1475,8 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset,
unsigned long mask);
+int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot);
+int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot);
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cb2b78e92910..1fd5e234c188 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1689,11 +1689,12 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
return r;
}
-static void kvm_commit_memory_region(struct kvm *kvm,
- struct kvm_memory_slot *old,
- const struct kvm_memory_slot *new,
- enum kvm_mr_change change)
+static int kvm_commit_memory_region(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ const struct kvm_memory_slot *new,
+ enum kvm_mr_change change)
{
+ int r;
int old_flags = old ? old->flags : 0;
int new_flags = new ? new->flags : 0;
/*
@@ -1709,6 +1710,10 @@ static void kvm_commit_memory_region(struct kvm *kvm,
int change = (new_flags & KVM_MEM_LOG_DIRTY_PAGES) ? 1 : -1;
atomic_set(&kvm->nr_memslots_dirty_logging,
atomic_read(&kvm->nr_memslots_dirty_logging) + change);
+ if (change > 0)
+ r = kvm_arch_enable_dirty_logging(kvm, new);
+ else
+ r = kvm_arch_disable_dirty_logging(kvm, new);
}
kvm_arch_commit_memory_region(kvm, old, new, change);
@@ -1740,6 +1745,8 @@ static void kvm_commit_memory_region(struct kvm *kvm,
default:
BUG();
}
+
+ return r;
}
/*
@@ -1954,9 +1961,7 @@ static int kvm_set_memslot(struct kvm *kvm,
* will directly hit the final, active memslot. Architectures are
* responsible for knowing that new->arch may be stale.
*/
- kvm_commit_memory_region(kvm, old, new, change);
-
- return 0;
+ return kvm_commit_memory_region(kvm, old, new, change);
}
static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 3/8] KVM: arm64: use page tracking interface to enable dirty logging
2024-09-18 15:28 ` [PATCH 3/8] KVM: arm64: use page tracking interface to enable dirty logging Lilit Janpoladyan
@ 2024-09-22 7:31 ` Sean Christopherson
0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2024-09-22 7:31 UTC (permalink / raw)
To: Lilit Janpoladyan
Cc: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source
On Wed, Sep 18, 2024, Lilit Janpoladyan wrote:
> +static int kvm_commit_memory_region(struct kvm *kvm,
> + struct kvm_memory_slot *old,
> + const struct kvm_memory_slot *new,
> + enum kvm_mr_change change)
> {
> + int r;
> int old_flags = old ? old->flags : 0;
> int new_flags = new ? new->flags : 0;
> /*
> @@ -1709,6 +1710,10 @@ static void kvm_commit_memory_region(struct kvm *kvm,
> int change = (new_flags & KVM_MEM_LOG_DIRTY_PAGES) ? 1 : -1;
> atomic_set(&kvm->nr_memslots_dirty_logging,
> atomic_read(&kvm->nr_memslots_dirty_logging) + change);
> + if (change > 0)
> + r = kvm_arch_enable_dirty_logging(kvm, new);
> + else
> + r = kvm_arch_disable_dirty_logging(kvm, new);
There's zero reason to add new arch callbacks, the entire reason
kvm_arch_commit_memory_region() exists is to let arch code react to memslot
changes. As evidenced by the fact that multiple architectures already handle
dirty logging changes in their commit hooks, it's trivial to detect changes, i.e.
not worth moving to generic code.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
` (2 preceding siblings ...)
2024-09-18 15:28 ` [PATCH 3/8] KVM: arm64: use page tracking interface to enable dirty logging Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-19 1:50 ` kernel test robot
2024-09-19 2:32 ` kernel test robot
2024-09-18 15:28 ` [PATCH 5/8] KVM: arm64: get dirty pages from the page tracking device Lilit Janpoladyan
` (4 subsequent siblings)
8 siblings, 2 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
Make kvm_arch_sync_dirty_log return a value, which is needed to
propagate errors that could happen when getting dirty pages via
a page tracking device.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/loongarch/kvm/mmu.c | 3 ++-
arch/mips/kvm/mips.c | 4 ++--
arch/powerpc/kvm/book3s.c | 4 ++--
arch/powerpc/kvm/booke.c | 4 ++--
arch/riscv/kvm/mmu.c | 3 ++-
arch/s390/kvm/kvm-s390.c | 3 ++-
arch/x86/kvm/x86.c | 11 ++++++-----
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 15 ++++++++++++---
9 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 28681dfb4b85..825c60d35529 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -943,8 +943,9 @@ int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
return 0;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
+ return 0;
}
void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index edc6f473af4e..4326b8c721e9 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -984,9 +984,9 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
return 0;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
-
+ return 0;
}
int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 4c4a3ecc301c..aab6f5c62aee 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -854,9 +854,9 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
return 0;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
-
+ return 0;
}
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index f263ebc8fa49..60629a320222 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1824,9 +1824,9 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
return -EOPNOTSUPP;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
-
+ return -EOPNOTSUPP;
}
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index b63650f9b966..53ad23432b31 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -402,8 +402,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
gstage_wp_range(kvm, start, end);
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
+ return 0;
}
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d6a8f7dbc644..5f1bb4bd4121 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -677,7 +677,7 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
return 0;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
int i;
gfn_t cur_gfn, last_gfn;
@@ -705,6 +705,7 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
return;
cond_resched();
}
+ return 0;
}
/* Section: vm related */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1be8bacfe2bd..e95e070c9bf3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6498,7 +6498,7 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
return 0;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
/*
@@ -6510,11 +6510,12 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
struct kvm_vcpu *vcpu;
unsigned long i;
- if (!kvm_x86_ops.cpu_dirty_log_size)
- return;
+ if (kvm_x86_ops.cpu_dirty_log_size) {
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ kvm_vcpu_kick(vcpu);
+ }
- kvm_for_each_vcpu(i, vcpu, kvm)
- kvm_vcpu_kick(vcpu);
+ return 0;
}
int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae905f54ec47..245b4172a7fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1477,7 +1477,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
unsigned long mask);
int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot);
int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot);
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1fd5e234c188..d55d92f599b0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2145,6 +2145,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
int i, as_id, id;
unsigned long n;
unsigned long any = 0;
+ int r;
/* Dirty ring tracking may be exclusive to dirty log tracking */
if (!kvm_use_dirty_bitmap(kvm))
@@ -2163,7 +2164,9 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
if (!(*memslot) || !(*memslot)->dirty_bitmap)
return -ENOENT;
- kvm_arch_sync_dirty_log(kvm, *memslot);
+ r = kvm_arch_sync_dirty_log(kvm, *memslot);
+ if (r)
+ return r;
n = kvm_dirty_bitmap_bytes(*memslot);
@@ -2210,6 +2213,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_buffer;
bool flush;
+ int r;
/* Dirty ring tracking may be exclusive to dirty log tracking */
if (!kvm_use_dirty_bitmap(kvm))
@@ -2227,7 +2231,9 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
dirty_bitmap = memslot->dirty_bitmap;
- kvm_arch_sync_dirty_log(kvm, memslot);
+ r = kvm_arch_sync_dirty_log(kvm, memslot);
+ if (r)
+ return r;
n = kvm_dirty_bitmap_bytes(memslot);
flush = false;
@@ -2322,6 +2328,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_buffer;
bool flush;
+ int r;
/* Dirty ring tracking may be exclusive to dirty log tracking */
if (!kvm_use_dirty_bitmap(kvm))
@@ -2349,7 +2356,9 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
(log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
return -EINVAL;
- kvm_arch_sync_dirty_log(kvm, memslot);
+ r = kvm_arch_sync_dirty_log(kvm, memslot);
+ if (r)
+ return r;
flush = false;
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log
2024-09-18 15:28 ` [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log Lilit Janpoladyan
@ 2024-09-19 1:50 ` kernel test robot
2024-09-19 2:32 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-09-19 1:50 UTC (permalink / raw)
To: Lilit Janpoladyan, kvm, maz, oliver.upton, james.morse,
suzuki.poulose, yuzenghui, nh-open-source
Cc: oe-kbuild-all
Hi Lilit,
kernel test robot noticed the following build errors:
[auto build test ERROR on powerpc/topic/ppc-kvm]
[also build test ERROR on v6.11]
[cannot apply to kvmarm/next kvm/queue linus/master kvm/linux-next next-20240918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Lilit-Janpoladyan/arm64-add-an-interface-for-stage-2-page-tracking/20240918-233004
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm
patch link: https://lore.kernel.org/r/20240918152807.25135-5-lilitj%40amazon.com
patch subject: [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20240919/202409190941.BQaCqQSk-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190941.BQaCqQSk-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409190941.BQaCqQSk-lkp@intel.com/
All errors (new ones prefixed by >>):
arch/s390/kvm/kvm-s390.c: In function 'kvm_arch_sync_dirty_log':
>> arch/s390/kvm/kvm-s390.c:705:25: error: 'return' with no value, in function returning non-void [-Wreturn-mismatch]
705 | return;
| ^~~~~~
arch/s390/kvm/kvm-s390.c:680:5: note: declared here
680 | int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
| ^~~~~~~~~~~~~~~~~~~~~~~
vim +/return +705 arch/s390/kvm/kvm-s390.c
5b5865e81387b9 Lilit Janpoladyan 2024-09-18 679
522a3b6f0285f5 Lilit Janpoladyan 2024-09-18 680 int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
15f36ebd34b5b2 Jason J. Herne 2012-08-02 681 {
0959e168678d2d Janosch Frank 2018-07-17 682 int i;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 683 gfn_t cur_gfn, last_gfn;
0959e168678d2d Janosch Frank 2018-07-17 684 unsigned long gaddr, vmaddr;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 685 struct gmap *gmap = kvm->arch.gmap;
0959e168678d2d Janosch Frank 2018-07-17 686 DECLARE_BITMAP(bitmap, _PAGE_ENTRIES);
15f36ebd34b5b2 Jason J. Herne 2012-08-02 687
0959e168678d2d Janosch Frank 2018-07-17 688 /* Loop over all guest segments */
0959e168678d2d Janosch Frank 2018-07-17 689 cur_gfn = memslot->base_gfn;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 690 last_gfn = memslot->base_gfn + memslot->npages;
0959e168678d2d Janosch Frank 2018-07-17 691 for (; cur_gfn <= last_gfn; cur_gfn += _PAGE_ENTRIES) {
0959e168678d2d Janosch Frank 2018-07-17 692 gaddr = gfn_to_gpa(cur_gfn);
0959e168678d2d Janosch Frank 2018-07-17 693 vmaddr = gfn_to_hva_memslot(memslot, cur_gfn);
0959e168678d2d Janosch Frank 2018-07-17 694 if (kvm_is_error_hva(vmaddr))
0959e168678d2d Janosch Frank 2018-07-17 695 continue;
0959e168678d2d Janosch Frank 2018-07-17 696
0959e168678d2d Janosch Frank 2018-07-17 697 bitmap_zero(bitmap, _PAGE_ENTRIES);
0959e168678d2d Janosch Frank 2018-07-17 698 gmap_sync_dirty_log_pmd(gmap, bitmap, gaddr, vmaddr);
0959e168678d2d Janosch Frank 2018-07-17 699 for (i = 0; i < _PAGE_ENTRIES; i++) {
0959e168678d2d Janosch Frank 2018-07-17 700 if (test_bit(i, bitmap))
0959e168678d2d Janosch Frank 2018-07-17 701 mark_page_dirty(kvm, cur_gfn + i);
0959e168678d2d Janosch Frank 2018-07-17 702 }
15f36ebd34b5b2 Jason J. Herne 2012-08-02 703
1763f8d09d522b Christian Borntraeger 2016-02-03 704 if (fatal_signal_pending(current))
1763f8d09d522b Christian Borntraeger 2016-02-03 @705 return;
70c88a00fbf659 Christian Borntraeger 2016-02-02 706 cond_resched();
15f36ebd34b5b2 Jason J. Herne 2012-08-02 707 }
522a3b6f0285f5 Lilit Janpoladyan 2024-09-18 708 return 0;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 709 }
15f36ebd34b5b2 Jason J. Herne 2012-08-02 710
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log
2024-09-18 15:28 ` [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log Lilit Janpoladyan
2024-09-19 1:50 ` kernel test robot
@ 2024-09-19 2:32 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-09-19 2:32 UTC (permalink / raw)
To: Lilit Janpoladyan, kvm, maz, oliver.upton, james.morse,
suzuki.poulose, yuzenghui, nh-open-source
Cc: llvm, oe-kbuild-all
Hi Lilit,
kernel test robot noticed the following build warnings:
[auto build test WARNING on powerpc/topic/ppc-kvm]
[also build test WARNING on v6.11]
[cannot apply to kvmarm/next kvm/queue linus/master kvm/linux-next next-20240918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Lilit-Janpoladyan/arm64-add-an-interface-for-stage-2-page-tracking/20240918-233004
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm
patch link: https://lore.kernel.org/r/20240918152807.25135-5-lilitj%40amazon.com
patch subject: [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20240919/202409191039.OFrXIvns-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 8663a75fa2f31299ab8d1d90288d9df92aadee88)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409191039.OFrXIvns-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409191039.OFrXIvns-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from arch/s390/kvm/kvm-s390.c:22:
In file included from include/linux/kvm_host.h:16:
In file included from include/linux/mm.h:2228:
include/linux/vmstat.h:500:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
500 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
501 | item];
| ~~~~
include/linux/vmstat.h:507:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
507 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
508 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
514 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
include/linux/vmstat.h:519:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
519 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
520 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:528:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
528 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
529 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
In file included from arch/s390/kvm/kvm-s390.c:22:
In file included from include/linux/kvm_host.h:19:
In file included from include/linux/msi.h:24:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:14:
In file included from arch/s390/include/asm/io.h:93:
include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
548 | val = __raw_readb(PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
561 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
| ^
include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
| ^
In file included from arch/s390/kvm/kvm-s390.c:22:
In file included from include/linux/kvm_host.h:19:
In file included from include/linux/msi.h:24:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:14:
In file included from arch/s390/include/asm/io.h:93:
include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
574 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
| ^
include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
In file included from arch/s390/kvm/kvm-s390.c:22:
In file included from include/linux/kvm_host.h:19:
In file included from include/linux/msi.h:24:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:14:
In file included from arch/s390/include/asm/io.h:93:
include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
585 | __raw_writeb(value, PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
595 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
605 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
693 | readsb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
701 | readsw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
709 | readsl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
718 | writesb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
727 | writesw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
736 | writesl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
>> arch/s390/kvm/kvm-s390.c:705:4: warning: non-void function 'kvm_arch_sync_dirty_log' should return a value [-Wreturn-mismatch]
705 | return;
| ^
18 warnings generated.
vim +/kvm_arch_sync_dirty_log +705 arch/s390/kvm/kvm-s390.c
5b5865e81387b9 Lilit Janpoladyan 2024-09-18 679
522a3b6f0285f5 Lilit Janpoladyan 2024-09-18 680 int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
15f36ebd34b5b2 Jason J. Herne 2012-08-02 681 {
0959e168678d2d Janosch Frank 2018-07-17 682 int i;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 683 gfn_t cur_gfn, last_gfn;
0959e168678d2d Janosch Frank 2018-07-17 684 unsigned long gaddr, vmaddr;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 685 struct gmap *gmap = kvm->arch.gmap;
0959e168678d2d Janosch Frank 2018-07-17 686 DECLARE_BITMAP(bitmap, _PAGE_ENTRIES);
15f36ebd34b5b2 Jason J. Herne 2012-08-02 687
0959e168678d2d Janosch Frank 2018-07-17 688 /* Loop over all guest segments */
0959e168678d2d Janosch Frank 2018-07-17 689 cur_gfn = memslot->base_gfn;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 690 last_gfn = memslot->base_gfn + memslot->npages;
0959e168678d2d Janosch Frank 2018-07-17 691 for (; cur_gfn <= last_gfn; cur_gfn += _PAGE_ENTRIES) {
0959e168678d2d Janosch Frank 2018-07-17 692 gaddr = gfn_to_gpa(cur_gfn);
0959e168678d2d Janosch Frank 2018-07-17 693 vmaddr = gfn_to_hva_memslot(memslot, cur_gfn);
0959e168678d2d Janosch Frank 2018-07-17 694 if (kvm_is_error_hva(vmaddr))
0959e168678d2d Janosch Frank 2018-07-17 695 continue;
0959e168678d2d Janosch Frank 2018-07-17 696
0959e168678d2d Janosch Frank 2018-07-17 697 bitmap_zero(bitmap, _PAGE_ENTRIES);
0959e168678d2d Janosch Frank 2018-07-17 698 gmap_sync_dirty_log_pmd(gmap, bitmap, gaddr, vmaddr);
0959e168678d2d Janosch Frank 2018-07-17 699 for (i = 0; i < _PAGE_ENTRIES; i++) {
0959e168678d2d Janosch Frank 2018-07-17 700 if (test_bit(i, bitmap))
0959e168678d2d Janosch Frank 2018-07-17 701 mark_page_dirty(kvm, cur_gfn + i);
0959e168678d2d Janosch Frank 2018-07-17 702 }
15f36ebd34b5b2 Jason J. Herne 2012-08-02 703
1763f8d09d522b Christian Borntraeger 2016-02-03 704 if (fatal_signal_pending(current))
1763f8d09d522b Christian Borntraeger 2016-02-03 @705 return;
70c88a00fbf659 Christian Borntraeger 2016-02-02 706 cond_resched();
15f36ebd34b5b2 Jason J. Herne 2012-08-02 707 }
522a3b6f0285f5 Lilit Janpoladyan 2024-09-18 708 return 0;
15f36ebd34b5b2 Jason J. Herne 2012-08-02 709 }
15f36ebd34b5b2 Jason J. Herne 2012-08-02 710
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/8] KVM: arm64: get dirty pages from the page tracking device
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
` (3 preceding siblings ...)
2024-09-18 15:28 ` [PATCH 4/8] KVM: return value from kvm_arch_sync_dirty_log Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 6/8] KVM: arm64: flush dirty logging data Lilit Janpoladyan
` (3 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
If a page tracking device is available, use it in
kvm_arch_sync_dirty_log to read device dirty log and mark logged
pages as dirty. Allocate a page to use as a buffer for reading
device dirty log; the allocation and access to the page are not
synchronized - assumes that userspace won't try to enable/disable
dirty logging and read dirty log at the same time.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/arm64/include/asm/kvm_host.h | 3 +-
arch/arm64/kvm/arm.c | 46 ++++++++++++++++++++++++++++++-
2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index db9bf42123e1..a76f25d4d2bc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -379,9 +379,10 @@ struct kvm_arch {
struct kvm_protected_vm pkvm;
/*
- * Stores page tracking context if page tracking device is in use
+ * Stores page tracking context and buffer if page tracking device is in use
*/
void *page_tracking_ctx;
+ gpa_t *page_tracking_pg;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c8dcf719ee99..139d7e929266 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1822,6 +1822,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot *memslot)
{
void *ctx = NULL;
+ unsigned long read_buffer;
struct pt_config config;
int r;
@@ -1836,16 +1837,31 @@ int kvm_arch_enable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
return -ENOENT;
kvm->arch.page_tracking_ctx = ctx;
+
+ read_buffer = __get_free_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!read_buffer) {
+ r = -ENOMEM;
+ goto out_free;
+ }
+
+ kvm->arch.page_tracking_pg = (gpa_t *)read_buffer;
}
r = page_tracking_enable(kvm->arch.page_tracking_ctx, -1);
+out_free:
if (r) {
if (ctx) {
page_tracking_release(ctx);
kvm->arch.page_tracking_ctx = NULL;
}
+
+ if (read_buffer) {
+ free_page(read_buffer);
+ kvm->arch.page_tracking_pg = NULL;
+ }
}
+
return r;
}
@@ -1866,13 +1882,41 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
} else {
page_tracking_release(kvm->arch.page_tracking_ctx);
kvm->arch.page_tracking_ctx = NULL;
+
+ if (kvm->arch.page_tracking_pg) {
+ free_page((unsigned long)kvm->arch.page_tracking_pg);
+ kvm->arch.page_tracking_pg = NULL;
+ }
}
return r;
}
-void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+int kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
+ int r;
+ u32 i;
+ u32 max_pages = PAGE_SIZE/sizeof(gpa_t);
+ if (!kvm->arch.page_tracking_ctx)
+ return 0;
+
+ if (!kvm->arch.page_tracking_pg)
+ return -ENOENT;
+
+ r = page_tracking_read_dirty_pages(kvm->arch.page_tracking_ctx, -1,
+ kvm->arch.page_tracking_pg, max_pages);
+
+ while (r > 0) {
+ for (i = 0; i < r; ++i) {
+ u32 gfn = kvm->arch.page_tracking_pg[i] >> PAGE_SHIFT;
+
+ memslot = gfn_to_memslot(kvm, gfn);
+ mark_page_dirty_in_slot(kvm, memslot, gfn);
+ }
+ r = page_tracking_read_dirty_pages(kvm->arch.page_tracking_ctx, -1,
+ kvm->arch.page_tracking_pg, max_pages);
+ }
+ return r < 0 ? r : 0;
}
static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 6/8] KVM: arm64: flush dirty logging data
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
` (4 preceding siblings ...)
2024-09-18 15:28 ` [PATCH 5/8] KVM: arm64: get dirty pages from the page tracking device Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 7/8] KVM: arm64: enable hardware dirty state management for stage-2 Lilit Janpoladyan
` (2 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
Make sure we do not miss last dirty pages and flush the data after
disabling dirty logging. Flush only when dirty logging is actually
disabled i.e. when page_tracking_disable returns 0.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/arm64/kvm/arm.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 139d7e929266..5ed049accb3e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1877,17 +1877,21 @@ int kvm_arch_disable_dirty_logging(struct kvm *kvm, const struct kvm_memory_slot
r = page_tracking_disable(kvm->arch.page_tracking_ctx, -1);
- if (r == -EBUSY) {
- r = 0;
- } else {
- page_tracking_release(kvm->arch.page_tracking_ctx);
- kvm->arch.page_tracking_ctx = NULL;
+ if (r == -EBUSY)
+ return 0;
- if (kvm->arch.page_tracking_pg) {
- free_page((unsigned long)kvm->arch.page_tracking_pg);
- kvm->arch.page_tracking_pg = NULL;
- }
+ /* Flush only when dirty tracking is disabled */
+ if (!r)
+ r = page_tracking_flush(kvm->arch.page_tracking_ctx);
+
+ /* But release resources anyway */
+ page_tracking_release(kvm->arch.page_tracking_ctx);
+ kvm->arch.page_tracking_ctx = NULL;
+ if (kvm->arch.page_tracking_pg) {
+ free_page((unsigned long)kvm->arch.page_tracking_pg);
+ kvm->arch.page_tracking_pg = NULL;
}
+
return r;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 7/8] KVM: arm64: enable hardware dirty state management for stage-2
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
` (5 preceding siblings ...)
2024-09-18 15:28 ` [PATCH 6/8] KVM: arm64: flush dirty logging data Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-18 15:28 ` [PATCH 8/8] KVM: arm64: make hardware manage dirty state after write faults Lilit Janpoladyan
2024-09-19 9:11 ` [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Oliver Upton
8 siblings, 0 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
Enable hardware management of dirty state for stage-2 (VTCR_EL2.HD)
translations. Set VTCR_EL2.HD unconditionally. This won't allow hardware
dirty state management yet as page descriptors are considered as
candidates for hardware dirty state updates when DBM (51) bit is set
and by default page descriptors are created with DBM = 0.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/arm64/kvm/hyp/pgtable.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 9e2bbee77491..d507931ab10c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -658,7 +658,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
#ifdef CONFIG_ARM64_HW_AFDBM
/*
- * Enable the Hardware Access Flag management, unconditionally
+ * Enable the Hardware Access Flag and Dirty State management, unconditionally
* on all CPUs. In systems that have asymmetric support for the feature
* this allows KVM to leverage hardware support on the subset of cores
* that implement the feature.
@@ -669,8 +669,10 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
* happen to be running on a design that has unadvertised support for
* HAFDBS. Here be dragons.
*/
- if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38))
+ if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38)) {
vtcr |= VTCR_EL2_HA;
+ vtcr |= VTCR_EL2_HD;
+ }
#endif /* CONFIG_ARM64_HW_AFDBM */
if (kvm_lpa2_is_enabled())
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 8/8] KVM: arm64: make hardware manage dirty state after write faults
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
` (6 preceding siblings ...)
2024-09-18 15:28 ` [PATCH 7/8] KVM: arm64: enable hardware dirty state management for stage-2 Lilit Janpoladyan
@ 2024-09-18 15:28 ` Lilit Janpoladyan
2024-09-19 9:11 ` [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Oliver Upton
8 siblings, 0 replies; 16+ messages in thread
From: Lilit Janpoladyan @ 2024-09-18 15:28 UTC (permalink / raw)
To: kvm, maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
nh-open-source, lilitj
In case of hardware dirty logging, fault in pages with their dirty
state managed by hardware. This will allow further writes to the
faulted in pages to be logged by the page tracking device. The first
write will still be logged on write fault. To avoid faults on first
writes we need to set DBM bit when eagerly splitting huge pages (to be
added).
Add KVM_PTE_LEAF_ATTR_HI_S2_DBM for the hardware DBM flag and
KVM_PGTABLE_PROT_HWDBM as a software page protection flag.
Hardware dirty state management changes the way
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W is interpreted. Pages whose dirty state
is managed by the hardware are always writable and
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W bit denotes their dirty state.
Signed-off-by: Lilit Janpoladyan <lilitj@amazon.com>
---
arch/arm64/include/asm/kvm_pgtable.h | 1 +
arch/arm64/kvm/hyp/pgtable.c | 23 ++++++++++++++++++++---
arch/arm64/kvm/mmu.c | 8 ++++++++
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 19278dfe7978..d3b81d7e923b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -210,6 +210,7 @@ enum kvm_pgtable_prot {
KVM_PGTABLE_PROT_DEVICE = BIT(3),
KVM_PGTABLE_PROT_NORMAL_NC = BIT(4),
+ KVM_PGTABLE_PROT_HWDBM = BIT(5),
KVM_PGTABLE_PROT_SW0 = BIT(55),
KVM_PGTABLE_PROT_SW1 = BIT(56),
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d507931ab10c..c4d654e7189c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -46,6 +46,8 @@
#define KVM_PTE_LEAF_ATTR_HI_S1_GP BIT(50)
+#define KVM_PTE_LEAF_ATTR_HI_S2_DBM BIT(51)
+
#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)
@@ -746,7 +748,13 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
if (prot & KVM_PGTABLE_PROT_R)
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
- if (prot & KVM_PGTABLE_PROT_W)
+ /*
+ * If hardware dirty state management is enabled then S2AP_W is interpreted
+ * as dirty state, don't set S2AP_W in this case
+ */
+ if (prot & KVM_PGTABLE_PROT_HWDBM)
+ attr |= KVM_PTE_LEAF_ATTR_HI_S2_DBM;
+ else if (prot & KVM_PGTABLE_PROT_W)
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
if (!kvm_lpa2_is_enabled())
@@ -768,7 +776,10 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte)
if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R)
prot |= KVM_PGTABLE_PROT_R;
- if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W)
+
+ if (pte & KVM_PTE_LEAF_ATTR_HI_S2_DBM)
+ prot |= KVM_PGTABLE_PROT_HWDBM | KVM_PGTABLE_PROT_W;
+ else if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W)
prot |= KVM_PGTABLE_PROT_W;
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN))
prot |= KVM_PGTABLE_PROT_X;
@@ -1367,7 +1378,13 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
if (prot & KVM_PGTABLE_PROT_R)
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
- if (prot & KVM_PGTABLE_PROT_W)
+ /*
+ * If hardware dirty state management is enabled then S2AP_W is interpreted
+ * as dirty state, don't set S2AP_W in this case
+ */
+ if (prot & KVM_PGTABLE_PROT_HWDBM)
+ set |= KVM_PTE_LEAF_ATTR_HI_S2_DBM;
+ else if (prot & KVM_PGTABLE_PROT_W)
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
if (prot & KVM_PGTABLE_PROT_X)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a509b63bd4dd..a5bcc7f11083 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1418,6 +1418,11 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
return vma->vm_flags & VM_MTE_ALLOWED;
}
+static bool is_hw_logging_enabled(struct kvm *kvm)
+{
+ return kvm->arch.page_tracking_ctx != NULL;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1658,6 +1663,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (writable)
prot |= KVM_PGTABLE_PROT_W;
+ if (is_hw_logging_enabled(kvm))
+ prot |= KVM_PGTABLE_PROT_HWDBM;
+
if (exec_fault)
prot |= KVM_PGTABLE_PROT_X;
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 0/8] *** RFC: ARM KVM dirty tracking device ***
2024-09-18 15:27 [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Lilit Janpoladyan
` (7 preceding siblings ...)
2024-09-18 15:28 ` [PATCH 8/8] KVM: arm64: make hardware manage dirty state after write faults Lilit Janpoladyan
@ 2024-09-19 9:11 ` Oliver Upton
2024-09-20 10:12 ` Janpoladyan, Lilit
2024-09-26 10:00 ` David Woodhouse
8 siblings, 2 replies; 16+ messages in thread
From: Oliver Upton @ 2024-09-19 9:11 UTC (permalink / raw)
To: Lilit Janpoladyan
Cc: kvm, maz, james.morse, suzuki.poulose, yuzenghui, nh-open-source,
kvmarm
Hi Lilit,
+cc kvmarm mailing list, get_maintainer is your friend :)
On Wed, Sep 18, 2024 at 03:27:59PM +0000, Lilit Janpoladyan wrote:
> An example of a device that tracks accesses to stage-2 translations and will
> implement page_tracking_device interface is AWS Graviton Page Tracking Agent
> (PTA). We'll be posting code for the Graviton PTA device driver in a separate
> series of patches.
In order to actually review these patches, we need to see an
implementation of such a page tracking device. Otherwise it's hard to
tell that the interface accomplishes the right abstractions.
Beyond that, I have some reservations about maintaining support for
features that cannot actually be tested outside of your own environment.
> When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> use it via the same interface most likely with adaptations.
Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
in hardware?
I think the best way forward here is to implement the architecture, and
hopefully after that your legacy driver can be made to fit the
interface. The FVP implements FEAT_HDBSS, so there's some (slow)
reference hardware to test against.
This is a very interesting feature, so hopefully we can move towards
something workable.
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 0/8] *** RFC: ARM KVM dirty tracking device ***
2024-09-19 9:11 ` [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Oliver Upton
@ 2024-09-20 10:12 ` Janpoladyan, Lilit
2024-09-26 10:00 ` David Woodhouse
1 sibling, 0 replies; 16+ messages in thread
From: Janpoladyan, Lilit @ 2024-09-20 10:12 UTC (permalink / raw)
To: Oliver Upton
Cc: kvm@vger.kernel.org, maz@kernel.org, james.morse@arm.com,
suzuki.poulose@arm.com, yuzenghui@huawei.com,
nh-open-source@amazon.com, kvmarm@lists.linux.dev, Zamler, Shiran
Hi Oliver,
On 19.09.24, 11:12, "Oliver Upton" <oliver.upton@linux.dev <mailto:oliver.upton@linux.dev>> wrote:
> Hi Lilit,
> +cc kvmarm mailing list, get_maintainer is your friend :)
> On Wed, Sep 18, 2024 at 03:27:59PM +0000, Lilit Janpoladyan wrote:
> > An example of a device that tracks accesses to stage-2 translations and will
> > implement page_tracking_device interface is AWS Graviton Page Tracking Agent
> > (PTA). We'll be posting code for the Graviton PTA device driver in a separate
> > series of patches.
> In order to actually review these patches, we need to see an
> implementation of such a page tracking device. Otherwise it's hard to
> tell that the interface accomplishes the right abstractions.
We'll be posting driver patches in the coming weeks, they should explain device
functionality.
> Beyond that, I have some reservations about maintaining support for
> features that cannot actually be tested outside of your own environment.
I understand, we'll see how we can emulate the functionality and make interface
testable.
> > When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> > use it via the same interface most likely with adaptations.
> Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
> in hardware?
We'd need to keep the interface for as long as hardware without FEAT_HDBSS
but with PTA is in use, hence the attempt of generalisation.
> I think the best way forward here is to implement the architecture, and
> hopefully after that your legacy driver can be made to fit the
> interface. The FVP implements FEAT_HDBSS, so there's some (slow)
> reference hardware to test against.
Thanks for the idea, we'll test with FVP, but we'd need FEAT_HDBSS
documentation for that. I don't think it's available yet, is it?
> This is a very interesting feature, so hopefully we can move towards
> something workable.
> --
> Thanks,
> Oliver
Thanks for the feedback, we'll be working on the discussed points,
Lilit
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/8] *** RFC: ARM KVM dirty tracking device ***
2024-09-19 9:11 ` [PATCH 0/8] *** RFC: ARM KVM dirty tracking device *** Oliver Upton
2024-09-20 10:12 ` Janpoladyan, Lilit
@ 2024-09-26 10:00 ` David Woodhouse
2024-09-30 17:33 ` Oliver Upton
1 sibling, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2024-09-26 10:00 UTC (permalink / raw)
To: Oliver Upton, Lilit Janpoladyan
Cc: kvm, maz, james.morse, suzuki.poulose, yuzenghui, nh-open-source,
kvmarm
[-- Attachment #1: Type: text/plain, Size: 3158 bytes --]
On Thu, 2024-09-19 at 02:11 -0700, Oliver Upton wrote:
> Hi Lilit,
>
> +cc kvmarm mailing list, get_maintainer is your friend :)
>
> On Wed, Sep 18, 2024 at 03:27:59PM +0000, Lilit Janpoladyan wrote:
> > An example of a device that tracks accesses to stage-2 translations and will
> > implement page_tracking_device interface is AWS Graviton Page Tracking Agent
> > (PTA). We'll be posting code for the Graviton PTA device driver in a separate
> > series of patches.
>
> In order to actually review these patches, we need to see an
> implementation of such a page tracking device. Otherwise it's hard to
> tell that the interface accomplishes the right abstractions.
Absolutely. That one is coming soon, but I was chasing the team to post
the API and KVM glue parts as early as possible to kick-start the
discussion, especially about the upcoming architectural solution.
> Beyond that, I have some reservations about maintaining support for
> features that cannot actually be tested outside of your own environment.
That's more about the hardware driver itself which will follow, than
the core API posted here.
I understand the reservation, but I think it's fine. In general, Linux
does support esoteric hardware that not everyone can test every
time. We do sweeping changes across all Ethernet drivers, for example,
and some of those barely even exist any more.
This particular device should be available on bare metal EC2 instances,
of course, but perhaps we should also implement it in QEMU. That would
actually be beneficial for our internal testing anyway, as it would
allow us to catch regressions much earlier in our own development
process.
> > When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> > use it via the same interface most likely with adaptations.
>
> Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
> in hardware?
I don't think there is a definitive answer to that which is ready to
tape out, but it certainly seems possible that future generations will
eventually move to FEAT_HDBSS, maybe even reaching production by the
end of the decade, at the earliest? And then a decade or two later, the
existing hardware generations might even get retired, yes¹.
¹ #include <forward-looking statement.disclaimer>
> I think the best way forward here is to implement the architecture, and
> hopefully after that your legacy driver can be made to fit the
> interface. The FVP implements FEAT_HDBSS, so there's some (slow)
> reference hardware to test against.
Is there actually any documentation available about FEAT_HDBSS? We've
been asking, but haven't received it. I can find one or two mentions
e.g. https://arm.jonpalmisc.com/2023_09_sysreg/AArch64-hdbssbr_el2 but
nothing particularly useful.
The main reason for posting this series early is to make sure we do all
we can to accommodate FEAT_HDBSS. It's not the *end* of the world if
the kernel-internal API has to be tweaked slightly when FEAT_HDBSS
actually becomes reality in future, but obviously we'd prefer to
support it right from the start.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/8] *** RFC: ARM KVM dirty tracking device ***
2024-09-26 10:00 ` David Woodhouse
@ 2024-09-30 17:33 ` Oliver Upton
0 siblings, 0 replies; 16+ messages in thread
From: Oliver Upton @ 2024-09-30 17:33 UTC (permalink / raw)
To: David Woodhouse
Cc: Lilit Janpoladyan, kvm, maz, james.morse, suzuki.poulose,
yuzenghui, nh-open-source, kvmarm
On Thu, Sep 26, 2024 at 11:00:39AM +0100, David Woodhouse wrote:
> > Beyond that, I have some reservations about maintaining support for
> > features that cannot actually be tested outside of your own environment.
>
> That's more about the hardware driver itself which will follow, than
> the core API posted here.
>
> I understand the reservation, but I think it's fine. In general, Linux
> does support esoteric hardware that not everyone can test every
> time. We do sweeping changes across all Ethernet drivers, for example,
> and some of those barely even exist any more.
Of course, but I think it is also reasonable to say that ethernet
support in the kernel is rather mature with a good variety of hardware.
By comparison, what we have here is a brand new driver interface with
architecture / KVM code, which is pretty rare, with a single
implementation.
I'm perfectly happy to tinker on page tracking interface(s) in the
future w/o testing everything, but I must insist that we have *some*
way of testing the initial infrastructure before even considering taking
it.
> This particular device should be available on bare metal EC2 instances,
> of course, but perhaps we should also implement it in QEMU. That would
> actually be beneficial for our internal testing anyway, as it would
> allow us to catch regressions much earlier in our own development
> process.
QEMU would be interesting, but hardware is always welcome too ;-)
> > > When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> > > use it via the same interface most likely with adaptations.
> >
> > Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
> > in hardware?
>
> I don't think there is a definitive answer to that which is ready to
> tape out, but it certainly seems possible that future generations will
> eventually move to FEAT_HDBSS, maybe even reaching production by the
> end of the decade, at the earliest? And then a decade or two later, the
> existing hardware generations might even get retired, yes¹.
>
> ¹ #include <forward-looking statement.disclaimer>
Well, hopefully that means you folks will look after it then :)
> > I think the best way forward here is to implement the architecture, and
> > hopefully after that your legacy driver can be made to fit the
> > interface. The FVP implements FEAT_HDBSS, so there's some (slow)
> > reference hardware to test against.
>
> Is there actually any documentation available about FEAT_HDBSS? We've
> been asking, but haven't received it. I can find one or two mentions
> e.g. https://arm.jonpalmisc.com/2023_09_sysreg/AArch64-hdbssbr_el2 but
> nothing particularly useful.
Annoyingly no, the Arm ARM tends to lag the architecture by quite a bit.
The sysreg XML (from which I think this website is derived) gets updated
much more frequently.
> The main reason for posting this series early is to make sure we do all
> we can to accommodate FEAT_HDBSS. It's not the *end* of the world if
> the kernel-internal API has to be tweaked slightly when FEAT_HDBSS
> actually becomes reality in future, but obviously we'd prefer to
> support it right from the start.
Jury is still out on how FEAT_HDBSS is gonna fit with this PTA stuff.
I'm guessing your hardware has some way of disambiguating dirtied
addresses by VMID.
The architected solution, OTOH, is tied to a particular stage-2 MMU
configuration. KVM proper might need to manage the dirty tracking
hardware in that case as it'll need to be context switched on the
vcpu_load() / vcpu_put() boundary.
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 16+ messages in thread