* [PATCH v14 1/7] KVM: Add architecture-defined TLB flush support
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-22 17:08 ` Christoffer Dall
2014-11-14 1:57 ` [PATCH v14 2/7] KVM: Add generic support for dirty page logging Mario Smarduch
` (7 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
Allow architectures to override the generic kvm_flush_remote_tlbs()
function via HAVE_KVM_ARCH_TLB_FLUSH_ALL. ARMv7 will need this to
provide its own TLB flush interface.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
virt/kvm/Kconfig | 3 +++
virt/kvm/kvm_main.c | 2 ++
2 files changed, 5 insertions(+)
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index fc0c5e6..3796a21 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -37,3 +37,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
config KVM_VFIO
bool
+
+config HAVE_KVM_ARCH_TLB_FLUSH_ALL
+ bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..887df87 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -184,6 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
return called;
}
+#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
void kvm_flush_remote_tlbs(struct kvm *kvm)
{
long dirty_count = kvm->tlbs_dirty;
@@ -194,6 +195,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
}
EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
+#endif
void kvm_reload_remote_mmus(struct kvm *kvm)
{
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 2/7] KVM: Add generic support for dirty page logging
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
2014-11-14 1:57 ` [PATCH v14 1/7] KVM: Add architecture-defined TLB flush support Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-22 19:13 ` Christoffer Dall
2014-11-14 1:57 ` [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect Mario Smarduch
` (6 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused
by several architectures. Building on that we intrdoduce
kvm_get_dirty_log_protect() adding write protection to mark these pages dirty
for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user
space.
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
include/linux/kvm_host.h | 9 ++++++
virt/kvm/Kconfig | 6 ++++
virt/kvm/kvm_main.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..c55dd75 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -590,6 +590,15 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext);
int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty);
+
+int kvm_get_dirty_log_protect(struct kvm *kvm,
+ struct kvm_dirty_log *log, bool *is_dirty);
+
+void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset,
+ unsigned long mask);
+
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 3796a21..314950c 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -40,3 +40,9 @@ config KVM_VFIO
config HAVE_KVM_ARCH_TLB_FLUSH_ALL
bool
+
+config HAVE_KVM_ARCH_DIRTY_LOG_PROTECT
+ bool
+
+config KVM_GENERIC_DIRTYLOG_READ_PROTECT
+ bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 887df87..f50909c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -981,6 +981,86 @@ out:
}
EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+/**
+ * kvm_get_dirty_log_protect - get a snapshot of dirty pages, and if any pages
+ * are dirty write protect them for next write.
+ * @kvm: pointer to kvm instance
+ * @log: slot id and address to which we copy the log
+ * @is_dirty: flag set if any page is dirty
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently. So, to avoid losing track of dirty pages we keep the
+ * following order:
+ *
+ * 1. Take a snapshot of the bit and clear it if needed.
+ * 2. Write protect the corresponding page.
+ * 3. Copy the snapshot to the userspace.
+ * 4. Upon return caller flushes TLB's if needed.
+ *
+ * Between 2 and 4, the guest may write to the page using the remaining TLB
+ * entry. This is not a problem because the page is reported dirty using
+ * the snapshot taken before and step 4 ensures that writes done after
+ * exiting to userspace will be logged for the next call.
+ *
+ */
+int kvm_get_dirty_log_protect(struct kvm *kvm,
+ struct kvm_dirty_log *log, bool *is_dirty)
+{
+ struct kvm_memory_slot *memslot;
+ int r, i;
+ unsigned long n;
+ unsigned long *dirty_bitmap;
+ unsigned long *dirty_bitmap_buffer;
+
+ r = -EINVAL;
+ if (log->slot >= KVM_USER_MEM_SLOTS)
+ goto out;
+
+ memslot = id_to_memslot(kvm->memslots, log->slot);
+
+ dirty_bitmap = memslot->dirty_bitmap;
+ r = -ENOENT;
+ if (!dirty_bitmap)
+ goto out;
+
+ n = kvm_dirty_bitmap_bytes(memslot);
+
+ dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+ memset(dirty_bitmap_buffer, 0, n);
+
+ spin_lock(&kvm->mmu_lock);
+ *is_dirty = false;
+ for (i = 0; i < n / sizeof(long); i++) {
+ unsigned long mask;
+ gfn_t offset;
+
+ if (!dirty_bitmap[i])
+ continue;
+
+ *is_dirty = true;
+
+ mask = xchg(&dirty_bitmap[i], 0);
+ dirty_bitmap_buffer[i] = mask;
+
+ offset = i * BITS_PER_LONG;
+ kvm_arch_mmu_write_protect_pt_masked(kvm, memslot, offset,
+ mask);
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+
+ r = -EFAULT;
+ if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+ goto out;
+
+ r = 0;
+out:
+ return r;
+}
+EXPORT_SYMBOL_GPL(kvm_get_dirty_log_protect);
+#endif
+
bool kvm_largepages_enabled(void)
{
return largepages_enabled;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
2014-11-14 1:57 ` [PATCH v14 1/7] KVM: Add architecture-defined TLB flush support Mario Smarduch
2014-11-14 1:57 ` [PATCH v14 2/7] KVM: Add generic support for dirty page logging Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-14 2:06 ` Mario Smarduch
2014-11-22 19:19 ` Christoffer Dall
2014-11-14 1:57 ` [PATCH v14 4/7] KVM: arm: Add ARMv7 API to flush TLBs Mario Smarduch
` (5 subsequent siblings)
8 siblings, 2 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
From: Paolo Bonzini <pbonzini@redhat.com>
We now have a generic function that does most of the work of
kvm_vm_ioctl_get_dirty_log, now use it.
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
arch/x86/include/asm/kvm_host.h | 3 --
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu.c | 4 +--
arch/x86/kvm/x86.c | 64 ++++++---------------------------------
4 files changed, 12 insertions(+), 60 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..934dc24 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
- struct kvm_memory_slot *slot,
- gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f9d16ff..d073594 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -39,6 +39,7 @@ config KVM
select PERF_EVENTS
select HAVE_KVM_MSI
select HAVE_KVM_CPU_RELAX_INTERCEPT
+ select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select KVM_VFIO
---help---
Support hosting fully virtualized guest machines using hardware
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..bf6b82c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
}
/**
- * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
+ * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
* @kvm: kvm instance
* @slot: slot to protect
* @gfn_offset: start of the BITS_PER_LONG pages we care about
@@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
* Used when we do not need to care about huge page mappings: e.g. during dirty
* logging we do not have any such mappings.
*/
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
{
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..9f8ae9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
*
* 1. Take a snapshot of the bit and clear it if needed.
* 2. Write protect the corresponding page.
- * 3. Flush TLB's if needed.
- * 4. Copy the snapshot to the userspace.
+ * 3. Copy the snapshot to the userspace.
+ * 4. Flush TLB's if needed.
*
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry. This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
+ * Between 2 and 4, the guest may write to the page using the remaining TLB
+ * entry. This is not a problem because the page is reported dirty using
+ * the snapshot taken before and step 4 ensures that writes done after
+ * exiting to userspace will be logged for the next call.
*/
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
- int r;
- struct kvm_memory_slot *memslot;
- unsigned long n, i;
- unsigned long *dirty_bitmap;
- unsigned long *dirty_bitmap_buffer;
bool is_dirty = false;
+ int r;
mutex_lock(&kvm->slots_lock);
- r = -EINVAL;
- if (log->slot >= KVM_USER_MEM_SLOTS)
- goto out;
-
- memslot = id_to_memslot(kvm->memslots, log->slot);
-
- dirty_bitmap = memslot->dirty_bitmap;
- r = -ENOENT;
- if (!dirty_bitmap)
- goto out;
-
- n = kvm_dirty_bitmap_bytes(memslot);
-
- dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
- memset(dirty_bitmap_buffer, 0, n);
-
- spin_lock(&kvm->mmu_lock);
-
- for (i = 0; i < n / sizeof(long); i++) {
- unsigned long mask;
- gfn_t offset;
-
- if (!dirty_bitmap[i])
- continue;
-
- is_dirty = true;
-
- mask = xchg(&dirty_bitmap[i], 0);
- dirty_bitmap_buffer[i] = mask;
-
- offset = i * BITS_PER_LONG;
- kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
- }
-
- spin_unlock(&kvm->mmu_lock);
-
- /* See the comments in kvm_mmu_slot_remove_write_access(). */
- lockdep_assert_held(&kvm->slots_lock);
+ r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
/*
* All the TLBs can be flushed out of mmu lock, see the comments in
* kvm_mmu_slot_remove_write_access().
*/
+ lockdep_assert_held(&kvm->slots_lock);
if (is_dirty)
kvm_flush_remote_tlbs(kvm);
- r = -EFAULT;
- if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
- goto out;
-
- r = 0;
-out:
mutex_unlock(&kvm->slots_lock);
return r;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-11-14 1:57 ` [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect Mario Smarduch
@ 2014-11-14 2:06 ` Mario Smarduch
2014-11-14 10:03 ` Paolo Bonzini
2014-11-22 19:19 ` Christoffer Dall
1 sibling, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 2:06 UTC (permalink / raw)
To: linux-arm-kernel
Hi Paolo,
I changed your patch a little to use a Kconfig symbol,
hope that's fine with you.
- Mario
On 11/13/2014 05:57 PM, Mario Smarduch wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> We now have a generic function that does most of the work of
> kvm_vm_ioctl_get_dirty_log, now use it.
>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
> arch/x86/include/asm/kvm_host.h | 3 --
> arch/x86/kvm/Kconfig | 1 +
> arch/x86/kvm/mmu.c | 4 +--
> arch/x86/kvm/x86.c | 64 ++++++---------------------------------
> 4 files changed, 12 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7c492ed..934dc24 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>
> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> - struct kvm_memory_slot *slot,
> - gfn_t gfn_offset, unsigned long mask);
> void kvm_mmu_zap_all(struct kvm *kvm);
> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index f9d16ff..d073594 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
> select PERF_EVENTS
> select HAVE_KVM_MSI
> select HAVE_KVM_CPU_RELAX_INTERCEPT
> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> select KVM_VFIO
> ---help---
> Support hosting fully virtualized guest machines using hardware
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9314678..bf6b82c 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> }
>
> /**
> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
> * @kvm: kvm instance
> * @slot: slot to protect
> * @gfn_offset: start of the BITS_PER_LONG pages we care about
> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> * Used when we do not need to care about huge page mappings: e.g. during dirty
> * logging we do not have any such mappings.
> */
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> gfn_t gfn_offset, unsigned long mask)
> {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8f1e22d..9f8ae9a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
> *
> * 1. Take a snapshot of the bit and clear it if needed.
> * 2. Write protect the corresponding page.
> - * 3. Flush TLB's if needed.
> - * 4. Copy the snapshot to the userspace.
> + * 3. Copy the snapshot to the userspace.
> + * 4. Flush TLB's if needed.
> *
> - * Between 2 and 3, the guest may write to the page using the remaining TLB
> - * entry. This is not a problem because the page will be reported dirty at
> - * step 4 using the snapshot taken before and step 3 ensures that successive
> - * writes will be logged for the next call.
> + * Between 2 and 4, the guest may write to the page using the remaining TLB
> + * entry. This is not a problem because the page is reported dirty using
> + * the snapshot taken before and step 4 ensures that writes done after
> + * exiting to userspace will be logged for the next call.
> */
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> {
> - int r;
> - struct kvm_memory_slot *memslot;
> - unsigned long n, i;
> - unsigned long *dirty_bitmap;
> - unsigned long *dirty_bitmap_buffer;
> bool is_dirty = false;
> + int r;
>
> mutex_lock(&kvm->slots_lock);
>
> - r = -EINVAL;
> - if (log->slot >= KVM_USER_MEM_SLOTS)
> - goto out;
> -
> - memslot = id_to_memslot(kvm->memslots, log->slot);
> -
> - dirty_bitmap = memslot->dirty_bitmap;
> - r = -ENOENT;
> - if (!dirty_bitmap)
> - goto out;
> -
> - n = kvm_dirty_bitmap_bytes(memslot);
> -
> - dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> - memset(dirty_bitmap_buffer, 0, n);
> -
> - spin_lock(&kvm->mmu_lock);
> -
> - for (i = 0; i < n / sizeof(long); i++) {
> - unsigned long mask;
> - gfn_t offset;
> -
> - if (!dirty_bitmap[i])
> - continue;
> -
> - is_dirty = true;
> -
> - mask = xchg(&dirty_bitmap[i], 0);
> - dirty_bitmap_buffer[i] = mask;
> -
> - offset = i * BITS_PER_LONG;
> - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> - }
> -
> - spin_unlock(&kvm->mmu_lock);
> -
> - /* See the comments in kvm_mmu_slot_remove_write_access(). */
> - lockdep_assert_held(&kvm->slots_lock);
> + r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
>
> /*
> * All the TLBs can be flushed out of mmu lock, see the comments in
> * kvm_mmu_slot_remove_write_access().
> */
> + lockdep_assert_held(&kvm->slots_lock);
> if (is_dirty)
> kvm_flush_remote_tlbs(kvm);
>
> - r = -EFAULT;
> - if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> - goto out;
> -
> - r = 0;
> -out:
> mutex_unlock(&kvm->slots_lock);
> return r;
> }
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-11-14 2:06 ` Mario Smarduch
@ 2014-11-14 10:03 ` Paolo Bonzini
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2014-11-14 10:03 UTC (permalink / raw)
To: linux-arm-kernel
On 14/11/2014 03:06, Mario Smarduch wrote:
> Hi Paolo,
>
> I changed your patch a little to use a Kconfig symbol,
> hope that's fine with you.
Of course, thanks.
Paolo
> - Mario
>
> On 11/13/2014 05:57 PM, Mario Smarduch wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> We now have a generic function that does most of the work of
>> kvm_vm_ioctl_get_dirty_log, now use it.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 3 --
>> arch/x86/kvm/Kconfig | 1 +
>> arch/x86/kvm/mmu.c | 4 +--
>> arch/x86/kvm/x86.c | 64 ++++++---------------------------------
>> 4 files changed, 12 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 7c492ed..934dc24 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>>
>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> - struct kvm_memory_slot *slot,
>> - gfn_t gfn_offset, unsigned long mask);
>> void kvm_mmu_zap_all(struct kvm *kvm);
>> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index f9d16ff..d073594 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -39,6 +39,7 @@ config KVM
>> select PERF_EVENTS
>> select HAVE_KVM_MSI
>> select HAVE_KVM_CPU_RELAX_INTERCEPT
>> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>> select KVM_VFIO
>> ---help---
>> Support hosting fully virtualized guest machines using hardware
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9314678..bf6b82c 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>> }
>>
>> /**
>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
>> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
>> * @kvm: kvm instance
>> * @slot: slot to protect
>> * @gfn_offset: start of the BITS_PER_LONG pages we care about
>> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>> * Used when we do not need to care about huge page mappings: e.g. during dirty
>> * logging we do not have any such mappings.
>> */
>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
>> struct kvm_memory_slot *slot,
>> gfn_t gfn_offset, unsigned long mask)
>> {
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8f1e22d..9f8ae9a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>> *
>> * 1. Take a snapshot of the bit and clear it if needed.
>> * 2. Write protect the corresponding page.
>> - * 3. Flush TLB's if needed.
>> - * 4. Copy the snapshot to the userspace.
>> + * 3. Copy the snapshot to the userspace.
>> + * 4. Flush TLB's if needed.
>> *
>> - * Between 2 and 3, the guest may write to the page using the remaining TLB
>> - * entry. This is not a problem because the page will be reported dirty at
>> - * step 4 using the snapshot taken before and step 3 ensures that successive
>> - * writes will be logged for the next call.
>> + * Between 2 and 4, the guest may write to the page using the remaining TLB
>> + * entry. This is not a problem because the page is reported dirty using
>> + * the snapshot taken before and step 4 ensures that writes done after
>> + * exiting to userspace will be logged for the next call.
>> */
>> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> {
>> - int r;
>> - struct kvm_memory_slot *memslot;
>> - unsigned long n, i;
>> - unsigned long *dirty_bitmap;
>> - unsigned long *dirty_bitmap_buffer;
>> bool is_dirty = false;
>> + int r;
>>
>> mutex_lock(&kvm->slots_lock);
>>
>> - r = -EINVAL;
>> - if (log->slot >= KVM_USER_MEM_SLOTS)
>> - goto out;
>> -
>> - memslot = id_to_memslot(kvm->memslots, log->slot);
>> -
>> - dirty_bitmap = memslot->dirty_bitmap;
>> - r = -ENOENT;
>> - if (!dirty_bitmap)
>> - goto out;
>> -
>> - n = kvm_dirty_bitmap_bytes(memslot);
>> -
>> - dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
>> - memset(dirty_bitmap_buffer, 0, n);
>> -
>> - spin_lock(&kvm->mmu_lock);
>> -
>> - for (i = 0; i < n / sizeof(long); i++) {
>> - unsigned long mask;
>> - gfn_t offset;
>> -
>> - if (!dirty_bitmap[i])
>> - continue;
>> -
>> - is_dirty = true;
>> -
>> - mask = xchg(&dirty_bitmap[i], 0);
>> - dirty_bitmap_buffer[i] = mask;
>> -
>> - offset = i * BITS_PER_LONG;
>> - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>> - }
>> -
>> - spin_unlock(&kvm->mmu_lock);
>> -
>> - /* See the comments in kvm_mmu_slot_remove_write_access(). */
>> - lockdep_assert_held(&kvm->slots_lock);
>> + r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
>>
>> /*
>> * All the TLBs can be flushed out of mmu lock, see the comments in
>> * kvm_mmu_slot_remove_write_access().
>> */
>> + lockdep_assert_held(&kvm->slots_lock);
>> if (is_dirty)
>> kvm_flush_remote_tlbs(kvm);
>>
>> - r = -EFAULT;
>> - if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>> - goto out;
>> -
>> - r = 0;
>> -out:
>> mutex_unlock(&kvm->slots_lock);
>> return r;
>> }
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-11-14 1:57 ` [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect Mario Smarduch
2014-11-14 2:06 ` Mario Smarduch
@ 2014-11-22 19:19 ` Christoffer Dall
2014-11-24 18:35 ` Mario Smarduch
2014-12-08 23:12 ` Mario Smarduch
1 sibling, 2 replies; 26+ messages in thread
From: Christoffer Dall @ 2014-11-22 19:19 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Nov 13, 2014 at 05:57:44PM -0800, Mario Smarduch wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> We now have a generic function that does most of the work of
> kvm_vm_ioctl_get_dirty_log, now use it.
>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
> arch/x86/include/asm/kvm_host.h | 3 --
> arch/x86/kvm/Kconfig | 1 +
> arch/x86/kvm/mmu.c | 4 +--
> arch/x86/kvm/x86.c | 64 ++++++---------------------------------
> 4 files changed, 12 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7c492ed..934dc24 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>
> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> - struct kvm_memory_slot *slot,
> - gfn_t gfn_offset, unsigned long mask);
> void kvm_mmu_zap_all(struct kvm *kvm);
> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index f9d16ff..d073594 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
> select PERF_EVENTS
> select HAVE_KVM_MSI
> select HAVE_KVM_CPU_RELAX_INTERCEPT
> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> select KVM_VFIO
> ---help---
> Support hosting fully virtualized guest machines using hardware
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9314678..bf6b82c 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> }
>
> /**
> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
> * @kvm: kvm instance
> * @slot: slot to protect
> * @gfn_offset: start of the BITS_PER_LONG pages we care about
> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> * Used when we do not need to care about huge page mappings: e.g. during dirty
> * logging we do not have any such mappings.
> */
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> gfn_t gfn_offset, unsigned long mask)
> {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8f1e22d..9f8ae9a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
> *
> * 1. Take a snapshot of the bit and clear it if needed.
> * 2. Write protect the corresponding page.
> - * 3. Flush TLB's if needed.
> - * 4. Copy the snapshot to the userspace.
> + * 3. Copy the snapshot to the userspace.
> + * 4. Flush TLB's if needed.
> *
> - * Between 2 and 3, the guest may write to the page using the remaining TLB
> - * entry. This is not a problem because the page will be reported dirty at
> - * step 4 using the snapshot taken before and step 3 ensures that successive
> - * writes will be logged for the next call.
> + * Between 2 and 4, the guest may write to the page using the remaining TLB
> + * entry. This is not a problem because the page is reported dirty using
> + * the snapshot taken before and step 4 ensures that writes done after
> + * exiting to userspace will be logged for the next call.
> */
this seems to duplicate the comment in virt/kvm/kvm_main.c, but
whatever.
FWIW:
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-11-22 19:19 ` Christoffer Dall
@ 2014-11-24 18:35 ` Mario Smarduch
2014-12-08 23:12 ` Mario Smarduch
1 sibling, 0 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-24 18:35 UTC (permalink / raw)
To: linux-arm-kernel
On 11/22/2014 11:19 AM, Christoffer Dall wrote:
> On Thu, Nov 13, 2014 at 05:57:44PM -0800, Mario Smarduch wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> We now have a generic function that does most of the work of
>> kvm_vm_ioctl_get_dirty_log, now use it.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 3 --
>> arch/x86/kvm/Kconfig | 1 +
>> arch/x86/kvm/mmu.c | 4 +--
>> arch/x86/kvm/x86.c | 64 ++++++---------------------------------
>> 4 files changed, 12 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 7c492ed..934dc24 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>>
>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> - struct kvm_memory_slot *slot,
>> - gfn_t gfn_offset, unsigned long mask);
>> void kvm_mmu_zap_all(struct kvm *kvm);
>> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index f9d16ff..d073594 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -39,6 +39,7 @@ config KVM
>> select PERF_EVENTS
>> select HAVE_KVM_MSI
>> select HAVE_KVM_CPU_RELAX_INTERCEPT
>> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>> select KVM_VFIO
>> ---help---
>> Support hosting fully virtualized guest machines using hardware
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9314678..bf6b82c 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>> }
>>
>> /**
>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
>> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
>> * @kvm: kvm instance
>> * @slot: slot to protect
>> * @gfn_offset: start of the BITS_PER_LONG pages we care about
>> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>> * Used when we do not need to care about huge page mappings: e.g. during dirty
>> * logging we do not have any such mappings.
>> */
>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
>> struct kvm_memory_slot *slot,
>> gfn_t gfn_offset, unsigned long mask)
>> {
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8f1e22d..9f8ae9a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>> *
>> * 1. Take a snapshot of the bit and clear it if needed.
>> * 2. Write protect the corresponding page.
>> - * 3. Flush TLB's if needed.
>> - * 4. Copy the snapshot to the userspace.
>> + * 3. Copy the snapshot to the userspace.
>> + * 4. Flush TLB's if needed.
>> *
>> - * Between 2 and 3, the guest may write to the page using the remaining TLB
>> - * entry. This is not a problem because the page will be reported dirty at
>> - * step 4 using the snapshot taken before and step 3 ensures that successive
>> - * writes will be logged for the next call.
>> + * Between 2 and 4, the guest may write to the page using the remaining TLB
>> + * entry. This is not a problem because the page is reported dirty using
>> + * the snapshot taken before and step 4 ensures that writes done after
>> + * exiting to userspace will be logged for the next call.
>> */
>
> this seems to duplicate the comment in virt/kvm/kvm_main.c, but
> whatever.
Reuses most of that text but differs slightly, the _protect version
is a subset of this one.
>
> FWIW:
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-11-22 19:19 ` Christoffer Dall
2014-11-24 18:35 ` Mario Smarduch
@ 2014-12-08 23:12 ` Mario Smarduch
2014-12-09 19:42 ` Paolo Bonzini
1 sibling, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-12-08 23:12 UTC (permalink / raw)
To: linux-arm-kernel
Hi Paolo,
I took a closer look at Christoffers comment,
the _log description in x86.c is a repeat of the
_protect description in kvm_main.c.
I'm wondering if description below would be acceptable, or
perhaps you had a reason leaving it as is.
For the ARM variant I would word same. Please advise.
Thanks.
"
/**
* kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in
a slot
* @kvm: kvm instance
* @log: slot id and address to which we copy the log
*
* Steps 1-4 below provide general overview of dirty page logging. See
* kvm_get_dirty_log_protect() function description for additional details.
*
* We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
* always flush the TLB (step 4) even if 'protect' failed and dirty bitmap
* may be corrupt. Regardless of previous outcome KVM logging API does not
* preclude user space subsequent dirty log read. Flushing TLB insures
writes
* will be marked dirty for next log read.
*
* 1. Take a snapshot of the bit and clear it if needed.
* 2. Write protect the corresponding page.
* 3. Copy the snapshot to the userspace.
* 4. Flush TLB's if needed.
*/
"
On 11/22/2014 11:19 AM, Christoffer Dall wrote:
> On Thu, Nov 13, 2014 at 05:57:44PM -0800, Mario Smarduch wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> We now have a generic function that does most of the work of
>> kvm_vm_ioctl_get_dirty_log, now use it.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 3 --
>> arch/x86/kvm/Kconfig | 1 +
>> arch/x86/kvm/mmu.c | 4 +--
>> arch/x86/kvm/x86.c | 64 ++++++---------------------------------
>> 4 files changed, 12 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 7c492ed..934dc24 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>>
>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> - struct kvm_memory_slot *slot,
>> - gfn_t gfn_offset, unsigned long mask);
>> void kvm_mmu_zap_all(struct kvm *kvm);
>> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index f9d16ff..d073594 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -39,6 +39,7 @@ config KVM
>> select PERF_EVENTS
>> select HAVE_KVM_MSI
>> select HAVE_KVM_CPU_RELAX_INTERCEPT
>> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>> select KVM_VFIO
>> ---help---
>> Support hosting fully virtualized guest machines using hardware
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9314678..bf6b82c 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>> }
>>
>> /**
>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
>> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
>> * @kvm: kvm instance
>> * @slot: slot to protect
>> * @gfn_offset: start of the BITS_PER_LONG pages we care about
>> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>> * Used when we do not need to care about huge page mappings: e.g. during dirty
>> * logging we do not have any such mappings.
>> */
>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
>> struct kvm_memory_slot *slot,
>> gfn_t gfn_offset, unsigned long mask)
>> {
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8f1e22d..9f8ae9a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>> *
>> * 1. Take a snapshot of the bit and clear it if needed.
>> * 2. Write protect the corresponding page.
>> - * 3. Flush TLB's if needed.
>> - * 4. Copy the snapshot to the userspace.
>> + * 3. Copy the snapshot to the userspace.
>> + * 4. Flush TLB's if needed.
>> *
>> - * Between 2 and 3, the guest may write to the page using the remaining TLB
>> - * entry. This is not a problem because the page will be reported dirty at
>> - * step 4 using the snapshot taken before and step 3 ensures that successive
>> - * writes will be logged for the next call.
>> + * Between 2 and 4, the guest may write to the page using the remaining TLB
>> + * entry. This is not a problem because the page is reported dirty using
>> + * the snapshot taken before and step 4 ensures that writes done after
>> + * exiting to userspace will be logged for the next call.
>> */
>
> this seems to duplicate the comment in virt/kvm/kvm_main.c, but
> whatever.
>
> FWIW:
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
2014-12-08 23:12 ` Mario Smarduch
@ 2014-12-09 19:42 ` Paolo Bonzini
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2014-12-09 19:42 UTC (permalink / raw)
To: linux-arm-kernel
On 09/12/2014 00:12, Mario Smarduch wrote:
> Hi Paolo,
> I took a closer look at Christoffers comment,
> the _log description in x86.c is a repeat of the
> _protect description in kvm_main.c.
>
> I'm wondering if description below would be acceptable, or
> perhaps you had a reason leaving it as is.
Yes, it's okay.
> For the ARM variant I would word same. Please advise.
>
> Thanks.
>
> "
> /**
> * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in
> a slot
> * @kvm: kvm instance
> * @log: slot id and address to which we copy the log
> *
> * Steps 1-4 below provide general overview of dirty page logging. See
> * kvm_get_dirty_log_protect() function description for additional details.
> *
> * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
> * always flush the TLB (step 4) even if 'protect' failed and dirty bitmap
even if a previous step failed and the dirty bitmap may be corrupt.
> * may be corrupt. Regardless of previous outcome KVM logging API does not
... the KVM logging API ...
> * preclude user space subsequent dirty log read. Flushing TLB insures
s/insures/ensures/
Paolo
> writes
> * will be marked dirty for next log read.
> *
> * 1. Take a snapshot of the bit and clear it if needed.
> * 2. Write protect the corresponding page.
> * 3. Copy the snapshot to the userspace.
> * 4. Flush TLB's if needed.
> */
> "
>
> On 11/22/2014 11:19 AM, Christoffer Dall wrote:
>> On Thu, Nov 13, 2014 at 05:57:44PM -0800, Mario Smarduch wrote:
>>> From: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>> We now have a generic function that does most of the work of
>>> kvm_vm_ioctl_get_dirty_log, now use it.
>>>
>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 3 --
>>> arch/x86/kvm/Kconfig | 1 +
>>> arch/x86/kvm/mmu.c | 4 +--
>>> arch/x86/kvm/x86.c | 64 ++++++---------------------------------
>>> 4 files changed, 12 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 7c492ed..934dc24 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>>>
>>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>> - struct kvm_memory_slot *slot,
>>> - gfn_t gfn_offset, unsigned long mask);
>>> void kvm_mmu_zap_all(struct kvm *kvm);
>>> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
>>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>>> index f9d16ff..d073594 100644
>>> --- a/arch/x86/kvm/Kconfig
>>> +++ b/arch/x86/kvm/Kconfig
>>> @@ -39,6 +39,7 @@ config KVM
>>> select PERF_EVENTS
>>> select HAVE_KVM_MSI
>>> select HAVE_KVM_CPU_RELAX_INTERCEPT
>>> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>>> select KVM_VFIO
>>> ---help---
>>> Support hosting fully virtualized guest machines using hardware
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 9314678..bf6b82c 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>>> }
>>>
>>> /**
>>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
>>> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
>>> * @kvm: kvm instance
>>> * @slot: slot to protect
>>> * @gfn_offset: start of the BITS_PER_LONG pages we care about
>>> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>>> * Used when we do not need to care about huge page mappings: e.g. during dirty
>>> * logging we do not have any such mappings.
>>> */
>>> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
>>> struct kvm_memory_slot *slot,
>>> gfn_t gfn_offset, unsigned long mask)
>>> {
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 8f1e22d..9f8ae9a 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>>> *
>>> * 1. Take a snapshot of the bit and clear it if needed.
>>> * 2. Write protect the corresponding page.
>>> - * 3. Flush TLB's if needed.
>>> - * 4. Copy the snapshot to the userspace.
>>> + * 3. Copy the snapshot to the userspace.
>>> + * 4. Flush TLB's if needed.
>>> *
>>> - * Between 2 and 3, the guest may write to the page using the remaining TLB
>>> - * entry. This is not a problem because the page will be reported dirty at
>>> - * step 4 using the snapshot taken before and step 3 ensures that successive
>>> - * writes will be logged for the next call.
>>> + * Between 2 and 4, the guest may write to the page using the remaining TLB
>>> + * entry. This is not a problem because the page is reported dirty using
>>> + * the snapshot taken before and step 4 ensures that writes done after
>>> + * exiting to userspace will be logged for the next call.
>>> */
>>
>> this seems to duplicate the comment in virt/kvm/kvm_main.c, but
>> whatever.
>>
>> FWIW:
>> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 4/7] KVM: arm: Add ARMv7 API to flush TLBs
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
` (2 preceding siblings ...)
2014-11-14 1:57 ` [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-14 1:57 ` [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support Mario Smarduch
` (4 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
This patch adds ARMv7 architecture TLB Flush function.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
arch/arm/include/asm/kvm_asm.h | 1 +
arch/arm/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/interrupts.S | 11 +++++++++++
4 files changed, 25 insertions(+)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 3a67bec..25410b2 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -96,6 +96,7 @@ extern char __kvm_hyp_code_end[];
extern void __kvm_flush_vm_context(void);
extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
#endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..3da6ea7 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -220,6 +220,18 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
}
+/**
+ * kvm_flush_remote_tlbs() - flush all VM TLB entries
+ * @kvm: pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter.
+ */
+static inline void kvm_flush_remote_tlbs(struct kvm *kvm)
+{
+ kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
static inline int kvm_arch_dev_ioctl_check_extension(long ext)
{
return 0;
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 466bd29..f27f336 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
+ select HAVE_KVM_ARCH_TLB_FLUSH_ALL
select KVM_MMIO
select KVM_ARM_HOST
depends on ARM_VIRT_EXT && ARM_LPAE
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 01dcb0e..79caf79 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
bx lr
ENDPROC(__kvm_tlb_flush_vmid_ipa)
+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+
+ENTRY(__kvm_tlb_flush_vmid)
+ b __kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
/********************************************************************
* Flush TLBs and instruction caches of all CPUs inside the inner-shareable
* domain, for all VMIDs
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
` (3 preceding siblings ...)
2014-11-14 1:57 ` [PATCH v14 4/7] KVM: arm: Add ARMv7 API to flush TLBs Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-22 19:33 ` Christoffer Dall
2014-11-14 1:57 ` [PATCH v14 6/7] KVM: arm: dirty logging write protect support Mario Smarduch
` (3 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
Add support for initial write protection of VM memslots. This patch
series assumes that huge PUDs will not be used in 2nd stage tables, which is
always valid on ARMv7.
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
arch/arm/include/asm/kvm_host.h | 2 +
arch/arm/include/asm/kvm_mmu.h | 20 +++++
arch/arm/include/asm/pgtable-3level.h | 1 +
arch/arm/kvm/mmu.c | 138 +++++++++++++++++++++++++++++++++
4 files changed, 161 insertions(+)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3da6ea7..8fa6238 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -245,4 +245,6 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
int kvm_perf_init(void);
int kvm_perf_teardown(void);
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+
#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
pmd_val(*pmd) |= L_PMD_S2_RDWR;
}
+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+ pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+ return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+ pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+ return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
/* Open coded p*d_addr_end that can deal with 64bit addresses */
#define kvm_pgd_addr_end(addr, end) \
({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..d29c880 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -130,6 +130,7 @@
#define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */
#define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */
+#define L_PMD_S2_RDONLY (_AT(pmdval_t, 1) << 6) /* HAP[1] */
#define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
/*
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 16e7994..1e8b6a9 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -45,6 +45,7 @@ static phys_addr_t hyp_idmap_vector;
#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
#define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
+#define kvm_pud_huge(_x) pud_huge(_x)
static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
{
@@ -746,6 +747,131 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
return false;
}
+#ifdef CONFIG_ARM
+/**
+ * stage2_wp_ptes - write protect PMD range
+ * @pmd: pointer to pmd entry
+ * @addr: range start address
+ * @end: range end address
+ */
+static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+ pte_t *pte;
+
+ pte = pte_offset_kernel(pmd, addr);
+ do {
+ if (!pte_none(*pte)) {
+ if (!kvm_s2pte_readonly(pte))
+ kvm_set_s2pte_readonly(pte);
+ }
+ } while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmds - write protect PUD range
+ * @pud: pointer to pud entry
+ * @addr: range start address
+ * @end: range end address
+ */
+static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+ pmd_t *pmd;
+ phys_addr_t next;
+
+ pmd = pmd_offset(pud, addr);
+
+ do {
+ next = kvm_pmd_addr_end(addr, end);
+ if (!pmd_none(*pmd)) {
+ if (kvm_pmd_huge(*pmd)) {
+ if (!kvm_s2pmd_readonly(pmd))
+ kvm_set_s2pmd_readonly(pmd);
+ } else {
+ stage2_wp_ptes(pmd, addr, next);
+ }
+ }
+ } while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_puds - write protect PGD range
+ * @pgd: pointer to pgd entry
+ * @addr: range start address
+ * @end: range end address
+ *
+ * Process PUD entries, for a huge PUD we cause a panic.
+ */
+static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
+{
+ pud_t *pud;
+ phys_addr_t next;
+
+ pud = pud_offset(pgd, addr);
+ do {
+ next = kvm_pud_addr_end(addr, end);
+ if (!pud_none(*pud)) {
+ /* TODO:PUD not supported, revisit later if supported */
+ BUG_ON(kvm_pud_huge(*pud));
+ stage2_wp_pmds(pud, addr, next);
+ }
+ } while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_range() - write protect stage2 memory region range
+ * @kvm: The KVM pointer
+ * @addr: Start address of range
+ * @end: End address of range
+ */
+static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+{
+ pgd_t *pgd;
+ phys_addr_t next;
+
+ pgd = kvm->arch.pgd + pgd_index(addr);
+ do {
+ /*
+ * Release kvm_mmu_lock periodically if the memory region is
+ * large. Otherwise, we may see kernel panics with
+ * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR,
+ * CONFIG_LOCK_DEP. Additionally, holding the lock too long
+ * will also starve other vCPUs.
+ */
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+ cond_resched_lock(&kvm->mmu_lock);
+
+ next = kvm_pgd_addr_end(addr, end);
+ if (pgd_present(*pgd))
+ stage2_wp_puds(pgd, addr, next);
+ } while (pgd++, addr = next, addr != end);
+}
+
+/**
+ * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
+ * @kvm: The KVM pointer
+ * @slot: The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+ struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+ phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+ phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+ spin_lock(&kvm->mmu_lock);
+ stage2_wp_range(kvm, start, end);
+ spin_unlock(&kvm->mmu_lock);
+ kvm_flush_remote_tlbs(kvm);
+}
+#endif
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_memory_slot *memslot,
unsigned long fault_status)
@@ -1129,6 +1255,18 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
unmap_stage2_range(kvm, gpa, size);
spin_unlock(&kvm->mmu_lock);
}
+
+#ifdef CONFIG_ARM
+ /*
+ * At this point memslot has been committed and there is an
+ * allocated dirty_bitmap[], dirty pages will be be tracked while the
+ * memory slot is write protected.
+ */
+ if (change != KVM_MR_DELETE && change != KVM_MR_MOVE &&
+ mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
+ kvm_mmu_wp_memory_region(kvm, mem->slot);
+#endif
+
}
int kvm_arch_prepare_memory_region(struct kvm *kvm,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support
2014-11-14 1:57 ` [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support Mario Smarduch
@ 2014-11-22 19:33 ` Christoffer Dall
2014-11-24 18:44 ` Mario Smarduch
0 siblings, 1 reply; 26+ messages in thread
From: Christoffer Dall @ 2014-11-22 19:33 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Nov 13, 2014 at 05:57:46PM -0800, Mario Smarduch wrote:
> Add support for initial write protection of VM memslots. This patch
> series assumes that huge PUDs will not be used in 2nd stage tables, which is
> always valid on ARMv7.
>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
> arch/arm/include/asm/kvm_host.h | 2 +
> arch/arm/include/asm/kvm_mmu.h | 20 +++++
> arch/arm/include/asm/pgtable-3level.h | 1 +
> arch/arm/kvm/mmu.c | 138 +++++++++++++++++++++++++++++++++
> 4 files changed, 161 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 3da6ea7..8fa6238 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -245,4 +245,6 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
> int kvm_perf_init(void);
> int kvm_perf_teardown(void);
>
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> +
> #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
> pmd_val(*pmd) |= L_PMD_S2_RDWR;
> }
>
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> + pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> + return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> + pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> + return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +
> /* Open coded p*d_addr_end that can deal with 64bit addresses */
> #define kvm_pgd_addr_end(addr, end) \
> ({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 06e0bc0..d29c880 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -130,6 +130,7 @@
> #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */
> #define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */
>
> +#define L_PMD_S2_RDONLY (_AT(pmdval_t, 1) << 6) /* HAP[1] */
> #define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
>
> /*
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 16e7994..1e8b6a9 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -45,6 +45,7 @@ static phys_addr_t hyp_idmap_vector;
> #define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>
> #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
> +#define kvm_pud_huge(_x) pud_huge(_x)
>
> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> {
> @@ -746,6 +747,131 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
> return false;
> }
>
> +#ifdef CONFIG_ARM
> +/**
> + * stage2_wp_ptes - write protect PMD range
> + * @pmd: pointer to pmd entry
> + * @addr: range start address
> + * @end: range end address
> + */
> +static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> +{
> + pte_t *pte;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + do {
> + if (!pte_none(*pte)) {
> + if (!kvm_s2pte_readonly(pte))
> + kvm_set_s2pte_readonly(pte);
> + }
incorrect indentation of the closing brace
> + } while (pte++, addr += PAGE_SIZE, addr != end);
> +}
> +
> +/**
> + * stage2_wp_pmds - write protect PUD range
> + * @pud: pointer to pud entry
> + * @addr: range start address
> + * @end: range end address
> + */
> +static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
> +{
> + pmd_t *pmd;
> + phys_addr_t next;
> +
> + pmd = pmd_offset(pud, addr);
> +
> + do {
> + next = kvm_pmd_addr_end(addr, end);
> + if (!pmd_none(*pmd)) {
> + if (kvm_pmd_huge(*pmd)) {
> + if (!kvm_s2pmd_readonly(pmd))
> + kvm_set_s2pmd_readonly(pmd);
> + } else {
> + stage2_wp_ptes(pmd, addr, next);
> + }
> + }
> + } while (pmd++, addr = next, addr != end);
> +}
> +
> +/**
> + * stage2_wp_puds - write protect PGD range
> + * @pgd: pointer to pgd entry
> + * @addr: range start address
> + * @end: range end address
> + *
> + * Process PUD entries, for a huge PUD we cause a panic.
> + */
> +static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
> +{
> + pud_t *pud;
> + phys_addr_t next;
> +
> + pud = pud_offset(pgd, addr);
> + do {
> + next = kvm_pud_addr_end(addr, end);
> + if (!pud_none(*pud)) {
> + /* TODO:PUD not supported, revisit later if supported */
> + BUG_ON(kvm_pud_huge(*pud));
> + stage2_wp_pmds(pud, addr, next);
> + }
> + } while (pud++, addr = next, addr != end);
> +}
> +
> +/**
> + * stage2_wp_range() - write protect stage2 memory region range
> + * @kvm: The KVM pointer
> + * @addr: Start address of range
> + * @end: End address of range
> + */
> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> +{
> + pgd_t *pgd;
> + phys_addr_t next;
> +
> + pgd = kvm->arch.pgd + pgd_index(addr);
> + do {
> + /*
> + * Release kvm_mmu_lock periodically if the memory region is
> + * large. Otherwise, we may see kernel panics with
> + * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR,
> + * CONFIG_LOCK_DEP. Additionally, holding the lock too long
> + * will also starve other vCPUs.
> + */
> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> + cond_resched_lock(&kvm->mmu_lock);
> +
> + next = kvm_pgd_addr_end(addr, end);
> + if (pgd_present(*pgd))
> + stage2_wp_puds(pgd, addr, next);
> + } while (pgd++, addr = next, addr != end);
> +}
> +
> +/**
> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
> + * @kvm: The KVM pointer
> + * @slot: The memory slot to write protect
> + *
> + * Called to start logging dirty pages after memory region
> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
> + * all present PMD and PTEs are write protected in the memory region.
> + * Afterwards read of dirty page log can be called.
> + *
> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
> + * serializing operations for VM memory regions.
> + */
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> +{
> + struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
> + phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
> + phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> +
> + spin_lock(&kvm->mmu_lock);
> + stage2_wp_range(kvm, start, end);
> + spin_unlock(&kvm->mmu_lock);
> + kvm_flush_remote_tlbs(kvm);
> +}
> +#endif
> +
> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> struct kvm_memory_slot *memslot,
> unsigned long fault_status)
> @@ -1129,6 +1255,18 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> unmap_stage2_range(kvm, gpa, size);
> spin_unlock(&kvm->mmu_lock);
> }
> +
> +#ifdef CONFIG_ARM
> + /*
> + * At this point memslot has been committed and there is an
> + * allocated dirty_bitmap[], dirty pages will be be tracked while the
> + * memory slot is write protected.
> + */
> + if (change != KVM_MR_DELETE && change != KVM_MR_MOVE &&
> + mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
> + kvm_mmu_wp_memory_region(kvm, mem->slot);
What happens if the user moves a memregion and sets the LOG_DIRTY_PAGES
flag?
> +#endif
> +
> }
>
> int kvm_arch_prepare_memory_region(struct kvm *kvm,
> --
> 1.7.9.5
>
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support
2014-11-22 19:33 ` Christoffer Dall
@ 2014-11-24 18:44 ` Mario Smarduch
0 siblings, 0 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-24 18:44 UTC (permalink / raw)
To: linux-arm-kernel
On 11/22/2014 11:33 AM, Christoffer Dall wrote:
> On Thu, Nov 13, 2014 at 05:57:46PM -0800, Mario Smarduch wrote:
>> Add support for initial write protection of VM memslots. This patch
>> series assumes that huge PUDs will not be used in 2nd stage tables, which is
>> always valid on ARMv7.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>> arch/arm/include/asm/kvm_host.h | 2 +
>> arch/arm/include/asm/kvm_mmu.h | 20 +++++
>> arch/arm/include/asm/pgtable-3level.h | 1 +
>> arch/arm/kvm/mmu.c | 138 +++++++++++++++++++++++++++++++++
>> 4 files changed, 161 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 3da6ea7..8fa6238 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -245,4 +245,6 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
>> int kvm_perf_init(void);
>> int kvm_perf_teardown(void);
>>
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>> +
>> #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>> pmd_val(*pmd) |= L_PMD_S2_RDWR;
>> }
>>
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> + pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> + return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> + pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> + return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>> /* Open coded p*d_addr_end that can deal with 64bit addresses */
>> #define kvm_pgd_addr_end(addr, end) \
>> ({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 06e0bc0..d29c880 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -130,6 +130,7 @@
>> #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */
>> #define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */
>>
>> +#define L_PMD_S2_RDONLY (_AT(pmdval_t, 1) << 6) /* HAP[1] */
>> #define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
>>
>> /*
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 16e7994..1e8b6a9 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -45,6 +45,7 @@ static phys_addr_t hyp_idmap_vector;
>> #define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>>
>> #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
>> +#define kvm_pud_huge(_x) pud_huge(_x)
>>
>> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>> {
>> @@ -746,6 +747,131 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>> return false;
>> }
>>
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_ptes - write protect PMD range
>> + * @pmd: pointer to pmd entry
>> + * @addr: range start address
>> + * @end: range end address
>> + */
>> +static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>> +{
>> + pte_t *pte;
>> +
>> + pte = pte_offset_kernel(pmd, addr);
>> + do {
>> + if (!pte_none(*pte)) {
>> + if (!kvm_s2pte_readonly(pte))
>> + kvm_set_s2pte_readonly(pte);
>> + }
>
> incorrect indentation of the closing brace
got it.
>
>> + } while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmds - write protect PUD range
>> + * @pud: pointer to pud entry
>> + * @addr: range start address
>> + * @end: range end address
>> + */
>> +static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>> +{
>> + pmd_t *pmd;
>> + phys_addr_t next;
>> +
>> + pmd = pmd_offset(pud, addr);
>> +
>> + do {
>> + next = kvm_pmd_addr_end(addr, end);
>> + if (!pmd_none(*pmd)) {
>> + if (kvm_pmd_huge(*pmd)) {
>> + if (!kvm_s2pmd_readonly(pmd))
>> + kvm_set_s2pmd_readonly(pmd);
>> + } else {
>> + stage2_wp_ptes(pmd, addr, next);
>> + }
>> + }
>> + } while (pmd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_puds - write protect PGD range
>> + * @pgd: pointer to pgd entry
>> + * @addr: range start address
>> + * @end: range end address
>> + *
>> + * Process PUD entries, for a huge PUD we cause a panic.
>> + */
>> +static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
>> +{
>> + pud_t *pud;
>> + phys_addr_t next;
>> +
>> + pud = pud_offset(pgd, addr);
>> + do {
>> + next = kvm_pud_addr_end(addr, end);
>> + if (!pud_none(*pud)) {
>> + /* TODO:PUD not supported, revisit later if supported */
>> + BUG_ON(kvm_pud_huge(*pud));
>> + stage2_wp_pmds(pud, addr, next);
>> + }
>> + } while (pud++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_range() - write protect stage2 memory region range
>> + * @kvm: The KVM pointer
>> + * @addr: Start address of range
>> + * @end: End address of range
>> + */
>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>> +{
>> + pgd_t *pgd;
>> + phys_addr_t next;
>> +
>> + pgd = kvm->arch.pgd + pgd_index(addr);
>> + do {
>> + /*
>> + * Release kvm_mmu_lock periodically if the memory region is
>> + * large. Otherwise, we may see kernel panics with
>> + * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR,
>> + * CONFIG_LOCK_DEP. Additionally, holding the lock too long
>> + * will also starve other vCPUs.
>> + */
>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> + cond_resched_lock(&kvm->mmu_lock);
>> +
>> + next = kvm_pgd_addr_end(addr, end);
>> + if (pgd_present(*pgd))
>> + stage2_wp_puds(pgd, addr, next);
>> + } while (pgd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>> + * @kvm: The KVM pointer
>> + * @slot: The memory slot to write protect
>> + *
>> + * Called to start logging dirty pages after memory region
>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>> + * all present PMD and PTEs are write protected in the memory region.
>> + * Afterwards read of dirty page log can be called.
>> + *
>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>> + * serializing operations for VM memory regions.
>> + */
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> +{
>> + struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>> + phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>> + phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>> +
>> + spin_lock(&kvm->mmu_lock);
>> + stage2_wp_range(kvm, start, end);
>> + spin_unlock(&kvm->mmu_lock);
>> + kvm_flush_remote_tlbs(kvm);
>> +}
>> +#endif
>> +
>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> struct kvm_memory_slot *memslot,
>> unsigned long fault_status)
>> @@ -1129,6 +1255,18 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> unmap_stage2_range(kvm, gpa, size);
>> spin_unlock(&kvm->mmu_lock);
>> }
>> +
>> +#ifdef CONFIG_ARM
>> + /*
>> + * At this point memslot has been committed and there is an
>> + * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> + * memory slot is write protected.
>> + */
>> + if (change != KVM_MR_DELETE && change != KVM_MR_MOVE &&
>> + mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
>> + kvm_mmu_wp_memory_region(kvm, mem->slot);
>
> What happens if the user moves a memregion and sets the LOG_DIRTY_PAGES
> flag?
Yes I misinterpreted KVM_MR_MOVE, we need to handle initial write protect as
well if logging is enabled since GPA range moved. Reading qemu code led me
to believe scenario is not possible, but yes that's user space policy of no
concern here. Will correct.
Thanks.
>
>> +#endif
>> +
>> }
>>
>> int kvm_arch_prepare_memory_region(struct kvm *kvm,
>> --
>> 1.7.9.5
>>
>
> Thanks,
> -Christoffer
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 6/7] KVM: arm: dirty logging write protect support
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
` (4 preceding siblings ...)
2014-11-14 1:57 ` [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-22 19:40 ` Christoffer Dall
2014-11-14 1:57 ` [PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling Mario Smarduch
` (2 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
Add support to track dirty pages between user space KVM_GET_DIRTY_LOG ioctl
calls. We call kvm_get_dirty_log_protect() function to do most of the work.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/arm.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
arch/arm/kvm/mmu.c | 22 ++++++++++++++++++++++
3 files changed, 69 insertions(+)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index f27f336..a8d1ace 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
select HAVE_KVM_ARCH_TLB_FLUSH_ALL
select KVM_MMIO
select KVM_ARM_HOST
+ select KVM_GENERIC_DIRTYLOG_READ_PROTECT
depends on ARM_VIRT_EXT && ARM_LPAE
---help---
Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a99e0cd..040c0f3 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -737,9 +737,55 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
}
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently. So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ * 1. Take a snapshot of the bit and clear it if needed.
+ * 2. Write protect the corresponding page.
+ * 3. Copy the snapshot to the userspace.
+ * 4. Flush TLB's if needed.
+ *
+ * Steps 1,2,3 are handled by kvm_get_dirty_log_protect().
+ * Between 2 and 4, the guest may write to the page using the remaining TLB
+ * entry. This is not a problem because the page is reported dirty using
+ * the snapshot taken before and step 4 ensures that writes done after
+ * exiting to userspace will be logged for the next call.
+ */
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
+#ifdef CONFIG_ARM
+ int r;
+ bool is_dirty = false;
+
+ mutex_lock(&kvm->slots_lock);
+
+ r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
+ if (r)
+ goto out;
+
+ /*
+ * kvm_get_dirty_log_protect() may fail and we may skip TLB flush
+ * leaving few stale spte TLB entries which is harmless, given we're
+ * just write protecting spte's, so few stale TLB's will be left in
+ * original R/W state. And since the bitmap is corrupt userspace will
+ * error out anyway (i.e. during migration or dirty page loging for
+ * other reasons) terminating dirty page logging.
+ */
+ if (is_dirty)
+ kvm_flush_remote_tlbs(kvm);
+out:
+ mutex_unlock(&kvm->slots_lock);
+
+ return r;
+#else /* ARM64 */
return -EINVAL;
+#endif
}
static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1e8b6a9..8137455 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -870,6 +870,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
spin_unlock(&kvm->mmu_lock);
kvm_flush_remote_tlbs(kvm);
}
+
+/**
+ * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages
+ * @kvm: The KVM pointer
+ * @slot: The memory slot associated with mask
+ * @gfn_offset: The gfn offset in memory slot
+ * @mask: The mask of dirty pages at offset 'gfn_offset' in this memory
+ * slot to be write protected
+ *
+ * Walks bits set in mask write protects the associated pte's. Caller must
+ * acquire kvm_mmu_lock.
+ */
+void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset, unsigned long mask)
+{
+ phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
+ phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT;
+ phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+
+ stage2_wp_range(kvm, start, end);
+}
#endif
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 6/7] KVM: arm: dirty logging write protect support
2014-11-14 1:57 ` [PATCH v14 6/7] KVM: arm: dirty logging write protect support Mario Smarduch
@ 2014-11-22 19:40 ` Christoffer Dall
2014-11-24 18:47 ` Mario Smarduch
0 siblings, 1 reply; 26+ messages in thread
From: Christoffer Dall @ 2014-11-22 19:40 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Nov 13, 2014 at 05:57:47PM -0800, Mario Smarduch wrote:
> Add support to track dirty pages between user space KVM_GET_DIRTY_LOG ioctl
> calls. We call kvm_get_dirty_log_protect() function to do most of the work.
>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
> arch/arm/kvm/Kconfig | 1 +
> arch/arm/kvm/arm.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> arch/arm/kvm/mmu.c | 22 ++++++++++++++++++++++
> 3 files changed, 69 insertions(+)
>
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index f27f336..a8d1ace 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -24,6 +24,7 @@ config KVM
> select HAVE_KVM_ARCH_TLB_FLUSH_ALL
> select KVM_MMIO
> select KVM_ARM_HOST
> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> depends on ARM_VIRT_EXT && ARM_LPAE
> ---help---
> Support hosting virtualized guest machines. You will also
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index a99e0cd..040c0f3 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -737,9 +737,55 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> }
> }
>
> +/**
> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> + * @kvm: kvm instance
> + * @log: slot id and address to which we copy the log
> + *
> + * We need to keep it in mind that VCPU threads can write to the bitmap
> + * concurrently. So, to avoid losing data, we keep the following order for
> + * each bit:
> + *
> + * 1. Take a snapshot of the bit and clear it if needed.
> + * 2. Write protect the corresponding page.
> + * 3. Copy the snapshot to the userspace.
> + * 4. Flush TLB's if needed.
> + *
> + * Steps 1,2,3 are handled by kvm_get_dirty_log_protect().
> + * Between 2 and 4, the guest may write to the page using the remaining TLB
> + * entry. This is not a problem because the page is reported dirty using
> + * the snapshot taken before and step 4 ensures that writes done after
> + * exiting to userspace will be logged for the next call.
> + */
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> {
> +#ifdef CONFIG_ARM
> + int r;
> + bool is_dirty = false;
> +
> + mutex_lock(&kvm->slots_lock);
> +
> + r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
> + if (r)
> + goto out;
> +
> + /*
> + * kvm_get_dirty_log_protect() may fail and we may skip TLB flush
> + * leaving few stale spte TLB entries which is harmless, given we're
> + * just write protecting spte's, so few stale TLB's will be left in
> + * original R/W state. And since the bitmap is corrupt userspace will
> + * error out anyway (i.e. during migration or dirty page loging for
s/loging/logging/
Hmmm, where is this behavior specified in the ABI? If you call
KVM_GET_DIRTY_LOG subsequently, you will now potentially have unreported
dirty pages, which can be completely avoided by removing the
if-statement and the goto above. Why not simply do that and get rid of
this comment?
> + * other reasons) terminating dirty page logging.
> + */
> + if (is_dirty)
> + kvm_flush_remote_tlbs(kvm);
> +out:
> + mutex_unlock(&kvm->slots_lock);
> +
> + return r;
> +#else /* ARM64 */
> return -EINVAL;
> +#endif
> }
>
> static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 1e8b6a9..8137455 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -870,6 +870,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> spin_unlock(&kvm->mmu_lock);
> kvm_flush_remote_tlbs(kvm);
> }
> +
> +/**
> + * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages
> + * @kvm: The KVM pointer
> + * @slot: The memory slot associated with mask
> + * @gfn_offset: The gfn offset in memory slot
> + * @mask: The mask of dirty pages at offset 'gfn_offset' in this memory
> + * slot to be write protected
> + *
> + * Walks bits set in mask write protects the associated pte's. Caller must
> + * acquire kvm_mmu_lock.
> + */
> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> + gfn_t gfn_offset, unsigned long mask)
> +{
> + phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT;
> + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> +
> + stage2_wp_range(kvm, start, end);
> +}
> #endif
>
> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 6/7] KVM: arm: dirty logging write protect support
2014-11-22 19:40 ` Christoffer Dall
@ 2014-11-24 18:47 ` Mario Smarduch
0 siblings, 0 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-24 18:47 UTC (permalink / raw)
To: linux-arm-kernel
On 11/22/2014 11:40 AM, Christoffer Dall wrote:
> On Thu, Nov 13, 2014 at 05:57:47PM -0800, Mario Smarduch wrote:
>> Add support to track dirty pages between user space KVM_GET_DIRTY_LOG ioctl
>> calls. We call kvm_get_dirty_log_protect() function to do most of the work.
>>
>> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>> arch/arm/kvm/Kconfig | 1 +
>> arch/arm/kvm/arm.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>> arch/arm/kvm/mmu.c | 22 ++++++++++++++++++++++
>> 3 files changed, 69 insertions(+)
>>
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index f27f336..a8d1ace 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -24,6 +24,7 @@ config KVM
>> select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> select KVM_MMIO
>> select KVM_ARM_HOST
>> + select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>> depends on ARM_VIRT_EXT && ARM_LPAE
>> ---help---
>> Support hosting virtualized guest machines. You will also
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index a99e0cd..040c0f3 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -737,9 +737,55 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>> }
>> }
>>
>> +/**
>> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
>> + * @kvm: kvm instance
>> + * @log: slot id and address to which we copy the log
>> + *
>> + * We need to keep it in mind that VCPU threads can write to the bitmap
>> + * concurrently. So, to avoid losing data, we keep the following order for
>> + * each bit:
>> + *
>> + * 1. Take a snapshot of the bit and clear it if needed.
>> + * 2. Write protect the corresponding page.
>> + * 3. Copy the snapshot to the userspace.
>> + * 4. Flush TLB's if needed.
>> + *
>> + * Steps 1,2,3 are handled by kvm_get_dirty_log_protect().
>> + * Between 2 and 4, the guest may write to the page using the remaining TLB
>> + * entry. This is not a problem because the page is reported dirty using
>> + * the snapshot taken before and step 4 ensures that writes done after
>> + * exiting to userspace will be logged for the next call.
>> + */
>> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> {
>> +#ifdef CONFIG_ARM
>> + int r;
>> + bool is_dirty = false;
>> +
>> + mutex_lock(&kvm->slots_lock);
>> +
>> + r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
>> + if (r)
>> + goto out;
>> +
>> + /*
>> + * kvm_get_dirty_log_protect() may fail and we may skip TLB flush
>> + * leaving few stale spte TLB entries which is harmless, given we're
>> + * just write protecting spte's, so few stale TLB's will be left in
>> + * original R/W state. And since the bitmap is corrupt userspace will
>> + * error out anyway (i.e. during migration or dirty page loging for
>
> s/loging/logging/
>
> Hmmm, where is this behavior specified in the ABI? If you call
> KVM_GET_DIRTY_LOG subsequently, you will now potentially have unreported
> dirty pages, which can be completely avoided by removing the
> if-statement and the goto above. Why not simply do that and get rid of
> this comment?
Yeah that makes sense, the comment is an overkill for these few lines.
>
>> + * other reasons) terminating dirty page logging.
>> + */
>> + if (is_dirty)
>> + kvm_flush_remote_tlbs(kvm);
>> +out:
>> + mutex_unlock(&kvm->slots_lock);
>> +
>> + return r;
>> +#else /* ARM64 */
>> return -EINVAL;
>> +#endif
>> }
>>
>> static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 1e8b6a9..8137455 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -870,6 +870,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> spin_unlock(&kvm->mmu_lock);
>> kvm_flush_remote_tlbs(kvm);
>> }
>> +
>> +/**
>> + * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages
>> + * @kvm: The KVM pointer
>> + * @slot: The memory slot associated with mask
>> + * @gfn_offset: The gfn offset in memory slot
>> + * @mask: The mask of dirty pages at offset 'gfn_offset' in this memory
>> + * slot to be write protected
>> + *
>> + * Walks bits set in mask write protects the associated pte's. Caller must
>> + * acquire kvm_mmu_lock.
>> + */
>> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
>> + struct kvm_memory_slot *slot,
>> + gfn_t gfn_offset, unsigned long mask)
>> +{
>> + phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
>> + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT;
>> + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>> +
>> + stage2_wp_range(kvm, start, end);
>> +}
>> #endif
>>
>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> --
>> 1.7.9.5
>>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
` (5 preceding siblings ...)
2014-11-14 1:57 ` [PATCH v14 6/7] KVM: arm: dirty logging write protect support Mario Smarduch
@ 2014-11-14 1:57 ` Mario Smarduch
2014-11-14 16:45 ` Marc Zyngier
2014-11-14 8:06 ` [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Cornelia Huck
2014-11-25 10:22 ` Christoffer Dall
8 siblings, 1 reply; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 1:57 UTC (permalink / raw)
To: linux-arm-kernel
This patch adds support for handling 2nd stage page faults during migration,
it disables faulting in huge pages, and dissolves huge pages to page tables.
In case migration is canceled huge pages are used again.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
arch/arm/kvm/mmu.c | 56 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 48 insertions(+), 8 deletions(-)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 8137455..ff88e5b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -47,6 +47,20 @@ static phys_addr_t hyp_idmap_vector;
#define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
#define kvm_pud_huge(_x) pud_huge(_x)
+#define IOMAP_ATTR 0x1
+#define LOGGING_ACTIVE 0x2
+#define SET_SPTE_FLAGS(l, i) ((l) << (LOGGING_ACTIVE - 1) | \
+ (i) << (IOMAP_ATTR - 1))
+
+static bool kvm_get_logging_state(struct kvm_memory_slot *memslot)
+{
+#ifdef CONFIG_ARM
+ return !!memslot->dirty_bitmap;
+#else
+ return false;
+#endif
+}
+
static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
{
/*
@@ -626,10 +640,13 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
}
static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
- phys_addr_t addr, const pte_t *new_pte, bool iomap)
+ phys_addr_t addr, const pte_t *new_pte,
+ unsigned long flags)
{
pmd_t *pmd;
pte_t *pte, old_pte;
+ bool iomap = flags & IOMAP_ATTR;
+ bool logging_active = flags & LOGGING_ACTIVE;
/* Create stage-2 page table mapping - Level 1 */
pmd = stage2_get_pmd(kvm, cache, addr);
@@ -641,6 +658,18 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
return 0;
}
+ /*
+ * While dirty memory logging, clear PMD entry for huge page and split
+ * into smaller pages, to track dirty memory at page granularity.
+ */
+ if (logging_active && kvm_pmd_huge(*pmd)) {
+ phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
+
+ pmd_clear(pmd);
+ kvm_tlb_flush_vmid_ipa(kvm, ipa);
+ put_page(virt_to_page(pmd));
+ }
+
/* Create stage-2 page mappings - Level 2 */
if (pmd_none(*pmd)) {
if (!cache)
@@ -693,7 +722,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
if (ret)
goto out;
spin_lock(&kvm->mmu_lock);
- ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+ ret = stage2_set_pte(kvm, &cache, addr, &pte, IOMAP_ATTR);
spin_unlock(&kvm->mmu_lock);
if (ret)
goto out;
@@ -908,6 +937,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct vm_area_struct *vma;
pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
+ bool logging_active = kvm_get_logging_state(memslot);
write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
if (fault_status == FSC_PERM && !write_fault) {
@@ -918,7 +948,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/* Let's check if we will get back a huge page backed by hugetlbfs */
down_read(¤t->mm->mmap_sem);
vma = find_vma_intersection(current->mm, hva, hva + 1);
- if (is_vm_hugetlb_page(vma)) {
+ if (is_vm_hugetlb_page(vma) && !logging_active) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {
@@ -964,7 +994,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
spin_lock(&kvm->mmu_lock);
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
- if (!hugetlb && !force_pte)
+ if (!hugetlb && !force_pte && !logging_active)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
if (hugetlb) {
@@ -978,16 +1008,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
pte_t new_pte = pfn_pte(pfn, mem_type);
+ unsigned long flags = SET_SPTE_FLAGS(logging_active,
+ mem_type == PAGE_S2_DEVICE);
if (writable) {
kvm_set_s2pte_writable(&new_pte);
kvm_set_pfn_dirty(pfn);
}
coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
- ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
- mem_type == PAGE_S2_DEVICE);
+ ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
}
-
+ if (write_fault)
+ mark_page_dirty(kvm, gfn);
out_unlock:
spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
@@ -1137,7 +1169,15 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
{
pte_t *pte = (pte_t *)data;
- stage2_set_pte(kvm, NULL, gpa, pte, false);
+ /*
+ * We can always call stage2_set_pte with logging_active == false,
+ * because MMU notifiers will have unmapped a huge PMD before calling
+ * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
+ * stage2_set_pte() never needs to clear out a huge PMD through this
+ * calling path.
+ */
+
+ stage2_set_pte(kvm, NULL, gpa, pte, 0);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling
2014-11-14 1:57 ` [PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling Mario Smarduch
@ 2014-11-14 16:45 ` Marc Zyngier
2014-11-14 18:53 ` Mario Smarduch
0 siblings, 1 reply; 26+ messages in thread
From: Marc Zyngier @ 2014-11-14 16:45 UTC (permalink / raw)
To: linux-arm-kernel
Hi Mario,
On 14/11/14 01:57, Mario Smarduch wrote:
> This patch adds support for handling 2nd stage page faults during migration,
> it disables faulting in huge pages, and dissolves huge pages to page tables.
> In case migration is canceled huge pages are used again.
>
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
> arch/arm/kvm/mmu.c | 56 ++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 48 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 8137455..ff88e5b 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -47,6 +47,20 @@ static phys_addr_t hyp_idmap_vector;
> #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
> #define kvm_pud_huge(_x) pud_huge(_x)
>
> +#define IOMAP_ATTR 0x1
> +#define LOGGING_ACTIVE 0x2
> +#define SET_SPTE_FLAGS(l, i) ((l) << (LOGGING_ACTIVE - 1) | \
> + (i) << (IOMAP_ATTR - 1))
I suffered a minor brain haemorrhage on this one. How about something
that is more in line with what's done almost everywhere else:
#define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0)
#define KVM_S2PTE_FLAG_LOGGING_ACTIVE (1UL << 1)
> +static bool kvm_get_logging_state(struct kvm_memory_slot *memslot)
> +{
> +#ifdef CONFIG_ARM
> + return !!memslot->dirty_bitmap;
> +#else
> + return false;
> +#endif
> +}
> +
> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> {
> /*
> @@ -626,10 +640,13 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> }
>
> static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> - phys_addr_t addr, const pte_t *new_pte, bool iomap)
> + phys_addr_t addr, const pte_t *new_pte,
> + unsigned long flags)
> {
> pmd_t *pmd;
> pte_t *pte, old_pte;
> + bool iomap = flags & IOMAP_ATTR;
> + bool logging_active = flags & LOGGING_ACTIVE;
If you convert these two variables to be 'unsigned long'...
>
> /* Create stage-2 page table mapping - Level 1 */
> pmd = stage2_get_pmd(kvm, cache, addr);
> @@ -641,6 +658,18 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> return 0;
> }
>
> + /*
> + * While dirty memory logging, clear PMD entry for huge page and split
> + * into smaller pages, to track dirty memory at page granularity.
> + */
> + if (logging_active && kvm_pmd_huge(*pmd)) {
> + phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
> +
> + pmd_clear(pmd);
> + kvm_tlb_flush_vmid_ipa(kvm, ipa);
> + put_page(virt_to_page(pmd));
> + }
> +
> /* Create stage-2 page mappings - Level 2 */
> if (pmd_none(*pmd)) {
> if (!cache)
> @@ -693,7 +722,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> if (ret)
> goto out;
> spin_lock(&kvm->mmu_lock);
> - ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
> + ret = stage2_set_pte(kvm, &cache, addr, &pte, IOMAP_ATTR);
> spin_unlock(&kvm->mmu_lock);
> if (ret)
> goto out;
> @@ -908,6 +937,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> struct vm_area_struct *vma;
> pfn_t pfn;
> pgprot_t mem_type = PAGE_S2;
> + bool logging_active = kvm_get_logging_state(memslot);
and this to be:
unsigned long logging_active = 0;
if (kvm_get_logging_state(memslot))
logging_active = KVM_S2PTE_FLAG_LOGGING_ACTIVE;
>
> write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> if (fault_status == FSC_PERM && !write_fault) {
> @@ -918,7 +948,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> /* Let's check if we will get back a huge page backed by hugetlbfs */
> down_read(¤t->mm->mmap_sem);
> vma = find_vma_intersection(current->mm, hva, hva + 1);
> - if (is_vm_hugetlb_page(vma)) {
> + if (is_vm_hugetlb_page(vma) && !logging_active) {
> hugetlb = true;
> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> } else {
> @@ -964,7 +994,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> spin_lock(&kvm->mmu_lock);
> if (mmu_notifier_retry(kvm, mmu_seq))
> goto out_unlock;
> - if (!hugetlb && !force_pte)
> + if (!hugetlb && !force_pte && !logging_active)
> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>
> if (hugetlb) {
> @@ -978,16 +1008,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
> } else {
> pte_t new_pte = pfn_pte(pfn, mem_type);
> + unsigned long flags = SET_SPTE_FLAGS(logging_active,
> + mem_type == PAGE_S2_DEVICE);
... you can convert this to:
unsigned long flags = logging_active;
if (mem_type == PAGE_S2_DEVICE)
flags |= KVM_S2PTE_FLAG_IS_IOMAP;
Yes, this is more verbose, but I can decipher it without any effort.
Call me lazy! ;-)
> if (writable) {
> kvm_set_s2pte_writable(&new_pte);
> kvm_set_pfn_dirty(pfn);
> }
> coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
> - ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
> - mem_type == PAGE_S2_DEVICE);
> + ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
> }
>
> -
> + if (write_fault)
> + mark_page_dirty(kvm, gfn);
> out_unlock:
> spin_unlock(&kvm->mmu_lock);
> kvm_release_pfn_clean(pfn);
> @@ -1137,7 +1169,15 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> {
> pte_t *pte = (pte_t *)data;
>
> - stage2_set_pte(kvm, NULL, gpa, pte, false);
> + /*
> + * We can always call stage2_set_pte with logging_active == false,
> + * because MMU notifiers will have unmapped a huge PMD before calling
> + * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> + * stage2_set_pte() never needs to clear out a huge PMD through this
> + * calling path.
> + */
> +
> + stage2_set_pte(kvm, NULL, gpa, pte, 0);
> }
Other than this:
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling
2014-11-14 16:45 ` Marc Zyngier
@ 2014-11-14 18:53 ` Mario Smarduch
0 siblings, 0 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-14 18:53 UTC (permalink / raw)
To: linux-arm-kernel
On 11/14/2014 08:45 AM, Marc Zyngier wrote:
> Hi Mario,
>
> On 14/11/14 01:57, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and dissolves huge pages to page tables.
>> In case migration is canceled huge pages are used again.
>>
>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>> arch/arm/kvm/mmu.c | 56 ++++++++++++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 8137455..ff88e5b 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -47,6 +47,20 @@ static phys_addr_t hyp_idmap_vector;
>> #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
>> #define kvm_pud_huge(_x) pud_huge(_x)
>>
>> +#define IOMAP_ATTR 0x1
>> +#define LOGGING_ACTIVE 0x2
>> +#define SET_SPTE_FLAGS(l, i) ((l) << (LOGGING_ACTIVE - 1) | \
>> + (i) << (IOMAP_ATTR - 1))
>
> I suffered a minor brain haemorrhage on this one. How about something
> that is more in line with what's done almost everywhere else:
Hope it wasn't that serious, a huge blow to the project and jazz :)
I tried to optimize a little, but I see your point I'll resend
this one thanks for the tag.
(Btw email server issues hopefully this is the only response)
- Mario
>
> #define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0)
> #define KVM_S2PTE_FLAG_LOGGING_ACTIVE (1UL << 1)
>
>> +static bool kvm_get_logging_state(struct kvm_memory_slot *memslot)
>> +{
>> +#ifdef CONFIG_ARM
>> + return !!memslot->dirty_bitmap;
>> +#else
>> + return false;
>> +#endif
>> +}
>> +
>> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>> {
>> /*
>> @@ -626,10 +640,13 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>> }
>>
>> static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>> - phys_addr_t addr, const pte_t *new_pte, bool iomap)
>> + phys_addr_t addr, const pte_t *new_pte,
>> + unsigned long flags)
>> {
>> pmd_t *pmd;
>> pte_t *pte, old_pte;
>> + bool iomap = flags & IOMAP_ATTR;
>> + bool logging_active = flags & LOGGING_ACTIVE;
>
> If you convert these two variables to be 'unsigned long'...
>
>>
>> /* Create stage-2 page table mapping - Level 1 */
>> pmd = stage2_get_pmd(kvm, cache, addr);
>> @@ -641,6 +658,18 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>> return 0;
>> }
>>
>> + /*
>> + * While dirty memory logging, clear PMD entry for huge page and split
>> + * into smaller pages, to track dirty memory at page granularity.
>> + */
>> + if (logging_active && kvm_pmd_huge(*pmd)) {
>> + phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
>> +
>> + pmd_clear(pmd);
>> + kvm_tlb_flush_vmid_ipa(kvm, ipa);
>> + put_page(virt_to_page(pmd));
>> + }
>> +
>> /* Create stage-2 page mappings - Level 2 */
>> if (pmd_none(*pmd)) {
>> if (!cache)
>> @@ -693,7 +722,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>> if (ret)
>> goto out;
>> spin_lock(&kvm->mmu_lock);
>> - ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
>> + ret = stage2_set_pte(kvm, &cache, addr, &pte, IOMAP_ATTR);
>> spin_unlock(&kvm->mmu_lock);
>> if (ret)
>> goto out;
>> @@ -908,6 +937,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> struct vm_area_struct *vma;
>> pfn_t pfn;
>> pgprot_t mem_type = PAGE_S2;
>> + bool logging_active = kvm_get_logging_state(memslot);
>
> and this to be:
> unsigned long logging_active = 0;
> if (kvm_get_logging_state(memslot))
> logging_active = KVM_S2PTE_FLAG_LOGGING_ACTIVE;
>
>>
>> write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>> if (fault_status == FSC_PERM && !write_fault) {
>> @@ -918,7 +948,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> /* Let's check if we will get back a huge page backed by hugetlbfs */
>> down_read(¤t->mm->mmap_sem);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> - if (is_vm_hugetlb_page(vma)) {
>> + if (is_vm_hugetlb_page(vma) && !logging_active) {
>> hugetlb = true;
>> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>> } else {
>> @@ -964,7 +994,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> spin_lock(&kvm->mmu_lock);
>> if (mmu_notifier_retry(kvm, mmu_seq))
>> goto out_unlock;
>> - if (!hugetlb && !force_pte)
>> + if (!hugetlb && !force_pte && !logging_active)
>> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>
>> if (hugetlb) {
>> @@ -978,16 +1008,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>> } else {
>> pte_t new_pte = pfn_pte(pfn, mem_type);
>> + unsigned long flags = SET_SPTE_FLAGS(logging_active,
>> + mem_type == PAGE_S2_DEVICE);
>
> ... you can convert this to:
>
> unsigned long flags = logging_active;
> if (mem_type == PAGE_S2_DEVICE)
> flags |= KVM_S2PTE_FLAG_IS_IOMAP;
>
> Yes, this is more verbose, but I can decipher it without any effort.
> Call me lazy! ;-)
>
>> if (writable) {
>> kvm_set_s2pte_writable(&new_pte);
>> kvm_set_pfn_dirty(pfn);
>> }
>> coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
>> - ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
>> - mem_type == PAGE_S2_DEVICE);
>> + ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>> }
>>
>> -
>> + if (write_fault)
>> + mark_page_dirty(kvm, gfn);
>> out_unlock:
>> spin_unlock(&kvm->mmu_lock);
>> kvm_release_pfn_clean(pfn);
>> @@ -1137,7 +1169,15 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>> {
>> pte_t *pte = (pte_t *)data;
>>
>> - stage2_set_pte(kvm, NULL, gpa, pte, false);
>> + /*
>> + * We can always call stage2_set_pte with logging_active == false,
>> + * because MMU notifiers will have unmapped a huge PMD before calling
>> + * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
>> + * stage2_set_pte() never needs to clear out a huge PMD through this
>> + * calling path.
>> + */
>> +
>> + stage2_set_pte(kvm, NULL, gpa, pte, 0);
>> }
>
>
> Other than this:
>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1)
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
` (6 preceding siblings ...)
2014-11-14 1:57 ` [PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling Mario Smarduch
@ 2014-11-14 8:06 ` Cornelia Huck
2014-11-14 18:57 ` Mario Smarduch
2014-11-25 10:22 ` Christoffer Dall
8 siblings, 1 reply; 26+ messages in thread
From: Cornelia Huck @ 2014-11-14 8:06 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 13 Nov 2014 17:57:41 -0800
Mario Smarduch <m.smarduch@samsung.com> wrote:
> Patch series adds support for ARMv7 and generic dirty page logging support.
> As we try to move towards generic dirty page logging additional logic is moved
> to generic code. Initially x86, armv7 KVM_GET_DIRTY_LOG reuses generic
> code, shortly followed by armv8 patches.
FWIW: s390 (with qemu/master) seems to work as before.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1)
2014-11-14 1:57 [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Mario Smarduch
` (7 preceding siblings ...)
2014-11-14 8:06 ` [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1) Cornelia Huck
@ 2014-11-25 10:22 ` Christoffer Dall
2014-11-25 21:57 ` Mario Smarduch
8 siblings, 1 reply; 26+ messages in thread
From: Christoffer Dall @ 2014-11-25 10:22 UTC (permalink / raw)
To: linux-arm-kernel
Hi Mario,
On Thu, Nov 13, 2014 at 05:57:41PM -0800, Mario Smarduch wrote:
> Patch series adds support for ARMv7 and generic dirty page logging support.
> As we try to move towards generic dirty page logging additional logic is moved
> to generic code. Initially x86, armv7 KVM_GET_DIRTY_LOG reuses generic
> code, shortly followed by armv8 patches.
>
So given the timing of this and the last-minute fixes to this series, it
is too late to merge for 3.19.
I would like if we could get one complete patch series with both v7 and
v8 support in there ready for right after the merge window closes, so
that Marc and I can queue this early for -next and then merge it for
3.20.
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v14 0/7] KVM/arm/x86: dirty page logging for ARMv7 (3.17.0-rc1)
2014-11-25 10:22 ` Christoffer Dall
@ 2014-11-25 21:57 ` Mario Smarduch
0 siblings, 0 replies; 26+ messages in thread
From: Mario Smarduch @ 2014-11-25 21:57 UTC (permalink / raw)
To: linux-arm-kernel
On 11/25/2014 02:22 AM, Christoffer Dall wrote:
> Hi Mario,
>
> On Thu, Nov 13, 2014 at 05:57:41PM -0800, Mario Smarduch wrote:
>> Patch series adds support for ARMv7 and generic dirty page logging support.
>> As we try to move towards generic dirty page logging additional logic is moved
>> to generic code. Initially x86, armv7 KVM_GET_DIRTY_LOG reuses generic
>> code, shortly followed by armv8 patches.
>>
> So given the timing of this and the last-minute fixes to this series, it
> is too late to merge for 3.19.
>
> I would like if we could get one complete patch series with both v7 and
> v8 support in there ready for right after the merge window closes, so
> that Marc and I can queue this early for -next and then merge it for
> 3.20.
>
> Thanks,
> -Christoffer
>
Hi Christoffer,
got it.
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread