* Re: [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value
[not found] ` <1421865733-20034-2-git-send-email-marc.zyngier@arm.com>
@ 2015-03-12 11:40 ` Christoffer Dall
0 siblings, 0 replies; 6+ messages in thread
From: Christoffer Dall @ 2015-03-12 11:40 UTC (permalink / raw)
To: Marc Zyngier; +Cc: kvm, kvmarm, Steve Capper
On Wed, Jan 21, 2015 at 06:42:11PM +0000, Marc Zyngier wrote:
> So far, handle_hva_to_gpa was never required to return a value.
> As we prepare to age pages at Stage-2, we need to be able to
> return a value from the iterator (kvm_test_age_hva).
>
> Adapt the code to handle this situation. No semantic change.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging
[not found] ` <1421865733-20034-3-git-send-email-marc.zyngier@arm.com>
@ 2015-03-12 11:40 ` Christoffer Dall
2015-03-12 14:50 ` Marc Zyngier
0 siblings, 1 reply; 6+ messages in thread
From: Christoffer Dall @ 2015-03-12 11:40 UTC (permalink / raw)
To: Marc Zyngier; +Cc: kvm, kvmarm, Steve Capper
On Wed, Jan 21, 2015 at 06:42:12PM +0000, Marc Zyngier wrote:
> Until now, KVM/arm didn't care much for page aging (who was swapping
> anyway?), and simply provided empty hooks to the core KVM code. With
> server-type systems now being available, things are quite different.
>
> This patch implements very simple support for page aging, by clearing
> the Access flag in the Stage-2 page tables. On access fault, the current
> fault handling will write the PTE or PMD again, putting the Access flag
> back on.
>
> It should be possible to implement a much faster handling for Access
> faults, but that's left for a later patch.
>
> With this in place, performance in VMs is degraded much more gracefully.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
> arch/arm/include/asm/kvm_host.h | 13 ++-------
> arch/arm/kvm/mmu.c | 59 ++++++++++++++++++++++++++++++++++++++-
> arch/arm/kvm/trace.h | 33 ++++++++++++++++++++++
> arch/arm64/include/asm/kvm_arm.h | 1 +
> arch/arm64/include/asm/kvm_host.h | 13 ++-------
> 5 files changed, 96 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 04b4ea0..d6b5b85 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -163,19 +163,10 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>
> unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
> int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
> +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
> +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>
> /* We do not have shadow page tables, hence the empty hooks */
> -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> - unsigned long end)
> -{
> - return 0;
> -}
> -
> -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> -{
> - return 0;
> -}
> -
> static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
> unsigned long address)
> {
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index e163a45..ffe89a0 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1068,6 +1068,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> out_unlock:
> spin_unlock(&kvm->mmu_lock);
> + kvm_set_pfn_accessed(pfn);
> kvm_release_pfn_clean(pfn);
> return ret;
> }
> @@ -1102,7 +1103,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>
> /* Check the stage-2 fault is trans. fault or write fault */
> fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> - if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
> + if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
> + fault_status != FSC_ACCESS) {
> kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
> kvm_vcpu_trap_get_class(vcpu),
> (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> @@ -1237,6 +1239,61 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
> handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
> }
>
> +static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
> +{
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + pmd = stage2_get_pmd(kvm, NULL, gpa);
> + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + return 0;
> +
> + if (kvm_pmd_huge(*pmd)) { /* THP, HugeTLB */
> + *pmd = pmd_mkold(*pmd);
> + goto tlbi;
so in this case we'll loop over a huge pmd on a page-by-page basis,
invalidating the tlb each time, right?
Would it be worth checking of the access flag is already clear
(!pmd_young()) and in that case exit without doing tlb invalidation?
In fact, shouldn't we only return 1 if the pmd is indeed young and then
the tlb invalidation will be done once by kvm_flush_remote_tlbs() in
kvm_mmu_notifier_clear_flush_young() ?
I got a little lost looking at how the core mm code uses the return
value, but if I read the x86 and powerpc code correctly, they use it the
way I suggest. Did I get this all wrong?
> + }
> +
> + pte = pte_offset_kernel(pmd, gpa);
> + if (pte_none(*pte))
> + return 0;
> +
> + *pte = pte_mkold(*pte); /* Just a page... */
same with checking if it's young or not... ?
> +tlbi:
> + kvm_tlb_flush_vmid_ipa(kvm, gpa);
> + return 1;
> +}
> +
> +static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
> +{
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + pmd = stage2_get_pmd(kvm, NULL, gpa);
> + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + return 0;
> +
> + if (kvm_pmd_huge(*pmd)) /* THP, HugeTLB */
> + return pmd_young(*pmd);
> +
> + pte = pte_offset_kernel(pmd, gpa);
> + if (!pte_none(*pte)) /* Just a page... */
> + return pte_young(*pte);
> +
> + return 0;
> +}
> +
> +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
> +{
> + trace_kvm_age_hva(start, end);
> + return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
> +}
> +
> +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> +{
> + trace_kvm_test_age_hva(hva);
> + return handle_hva_to_gpa(kvm, hva, hva, kvm_test_age_hva_handler, NULL);
> +}
> +
> void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> {
> mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index b6a6e71..364b5382 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -203,6 +203,39 @@ TRACE_EVENT(kvm_set_spte_hva,
> TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
> );
>
> +TRACE_EVENT(kvm_age_hva,
> + TP_PROTO(unsigned long start, unsigned long end),
> + TP_ARGS(start, end),
> +
> + TP_STRUCT__entry(
> + __field( unsigned long, start )
> + __field( unsigned long, end )
> + ),
> +
> + TP_fast_assign(
> + __entry->start = start;
> + __entry->end = end;
> + ),
> +
> + TP_printk("mmu notifier age hva: %#08lx -- %#08lx",
> + __entry->start, __entry->end)
> +);
> +
> +TRACE_EVENT(kvm_test_age_hva,
> + TP_PROTO(unsigned long hva),
> + TP_ARGS(hva),
> +
> + TP_STRUCT__entry(
> + __field( unsigned long, hva )
> + ),
> +
> + TP_fast_assign(
> + __entry->hva = hva;
> + ),
> +
> + TP_printk("mmu notifier test age hva: %#08lx", __entry->hva)
> +);
> +
> TRACE_EVENT(kvm_hvc,
> TP_PROTO(unsigned long vcpu_pc, unsigned long r0, unsigned long imm),
> TP_ARGS(vcpu_pc, r0, imm),
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 8afb863..0d738f2 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -212,6 +212,7 @@
>
>
> #define FSC_FAULT (0x04)
> +#define FSC_ACCESS (0x08)
> #define FSC_PERM (0x0c)
>
> /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index acd101a..b831710 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -173,19 +173,10 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
> int kvm_unmap_hva_range(struct kvm *kvm,
> unsigned long start, unsigned long end);
> void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
> +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>
> /* We do not have shadow page tables, hence the empty hooks */
> -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> - unsigned long end)
> -{
> - return 0;
> -}
> -
> -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> -{
> - return 0;
> -}
> -
> static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
> unsigned long address)
> {
> --
> 2.1.4
>
Otherwise, this looks good.
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
[not found] ` <1421865733-20034-4-git-send-email-marc.zyngier@arm.com>
@ 2015-03-12 11:40 ` Christoffer Dall
2015-03-12 15:04 ` Marc Zyngier
2015-03-12 15:07 ` Marc Zyngier
0 siblings, 2 replies; 6+ messages in thread
From: Christoffer Dall @ 2015-03-12 11:40 UTC (permalink / raw)
To: Marc Zyngier; +Cc: kvm, kvmarm, Steve Capper
On Wed, Jan 21, 2015 at 06:42:13PM +0000, Marc Zyngier wrote:
> Now that we have page aging in Stage-2, it becomes obvious that
> we're doing way too much work handling the fault.
>
> The page is not going anywhere (it is still mapped), the page
> tables are already allocated, and all we want is to flip a bit
> in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
> since a page with the AF bit off is not allowed to be cached.
>
> An obvious solution is to have a separate handler for FSC_ACCESS,
> where we pride ourselves to only do the very minimum amount of
> work.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
> arch/arm/kvm/mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> arch/arm/kvm/trace.h | 15 +++++++++++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index ffe89a0..112bae1 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1073,6 +1073,46 @@ out_unlock:
> return ret;
> }
>
> +/*
> + * Resolve the access fault by making the page young again.
> + * Note that because the faulting entry is guaranteed not to be
> + * cached in the TLB, we don't need to invalidate anything.
> + */
> +static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> +{
> + pmd_t *pmd;
> + pte_t *pte;
> + pfn_t pfn;
> + bool pfn_valid = false;
I think you can just initialize pfn to KVM_PFN_ERR_FAULT and use
is_error_pfn() if you like, not sure if it's cleaner.
> +
> + trace_kvm_access_fault(fault_ipa);
> +
> + spin_lock(&vcpu->kvm->mmu_lock);
> +
> + pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + goto out;
> +
> + if (kvm_pmd_huge(*pmd)) { /* THP, HugeTLB */
> + *pmd = pmd_mkyoung(*pmd);
> + pfn = pmd_pfn(*pmd);
> + pfn_valid = true;
> + goto out;
> + }
> +
> + pte = pte_offset_kernel(pmd, fault_ipa);
> + if (pte_none(*pte)) /* Nothing there either */
> + goto out;
> +
> + *pte = pte_mkyoung(*pte); /* Just a page... */
> + pfn = pte_pfn(*pte);
> + pfn_valid = true;
> +out:
> + spin_unlock(&vcpu->kvm->mmu_lock);
do you have a race here if the page is swapped out before you go and set
pfn accessed? Does that cause any harm?
> + if (pfn_valid)
> + kvm_set_pfn_accessed(pfn);
> +}
> +
> /**
> * kvm_handle_guest_abort - handles all 2nd stage aborts
> * @vcpu: the VCPU pointer
> @@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
> /* Userspace should not be able to register out-of-bounds IPAs */
> VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE);
>
> + if (fault_status == FSC_ACCESS) {
> + handle_access_fault(vcpu, fault_ipa);
> + ret = 1;
> + goto out_unlock;
> + }
> +
> ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> if (ret == 0)
> ret = 1;
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 364b5382..5665a16 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
> __entry->hxfar, __entry->vcpu_pc)
> );
>
> +TRACE_EVENT(kvm_access_fault,
> + TP_PROTO(unsigned long ipa),
> + TP_ARGS(ipa),
> +
> + TP_STRUCT__entry(
> + __field( unsigned long, ipa )
> + ),
> +
> + TP_fast_assign(
> + __entry->ipa = ipa;
> + ),
> +
> + TP_printk("IPA: %lx", __entry->ipa)
> +);
> +
> TRACE_EVENT(kvm_irq_line,
> TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
> TP_ARGS(type, vcpu_idx, irq_num, level),
> --
> 2.1.4
>
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging
2015-03-12 11:40 ` [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging Christoffer Dall
@ 2015-03-12 14:50 ` Marc Zyngier
0 siblings, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2015-03-12 14:50 UTC (permalink / raw)
To: Christoffer Dall; +Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
On Thu, 12 Mar 2015 11:40:17 +0000
Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Wed, Jan 21, 2015 at 06:42:12PM +0000, Marc Zyngier wrote:
> > Until now, KVM/arm didn't care much for page aging (who was swapping
> > anyway?), and simply provided empty hooks to the core KVM code. With
> > server-type systems now being available, things are quite different.
> >
> > This patch implements very simple support for page aging, by
> > clearing the Access flag in the Stage-2 page tables. On access
> > fault, the current fault handling will write the PTE or PMD again,
> > putting the Access flag back on.
> >
> > It should be possible to implement a much faster handling for Access
> > faults, but that's left for a later patch.
> >
> > With this in place, performance in VMs is degraded much more
> > gracefully.
> >
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> > arch/arm/include/asm/kvm_host.h | 13 ++-------
> > arch/arm/kvm/mmu.c | 59
> > ++++++++++++++++++++++++++++++++++++++-
> > arch/arm/kvm/trace.h | 33 ++++++++++++++++++++++
> > arch/arm64/include/asm/kvm_arm.h | 1 +
> > arch/arm64/include/asm/kvm_host.h | 13 ++------- 5 files changed,
> > 96 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h
> > b/arch/arm/include/asm/kvm_host.h index 04b4ea0..d6b5b85 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -163,19 +163,10 @@ void kvm_set_spte_hva(struct kvm *kvm,
> > unsigned long hva, pte_t pte);
> > unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
> > int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user
> > *indices); +int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > unsigned long end); +int kvm_test_age_hva(struct kvm *kvm, unsigned
> > long hva);
> > /* We do not have shadow page tables, hence the empty hooks */
> > -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > - unsigned long end)
> > -{
> > - return 0;
> > -}
> > -
> > -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long
> > hva) -{
> > - return 0;
> > -}
> > -
> > static inline void kvm_arch_mmu_notifier_invalidate_page(struct
> > kvm *kvm, unsigned long address)
> > {
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index e163a45..ffe89a0 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -1068,6 +1068,7 @@ static int user_mem_abort(struct kvm_vcpu
> > *vcpu, phys_addr_t fault_ipa,
> > out_unlock:
> > spin_unlock(&kvm->mmu_lock);
> > + kvm_set_pfn_accessed(pfn);
> > kvm_release_pfn_clean(pfn);
> > return ret;
> > }
> > @@ -1102,7 +1103,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > *vcpu, struct kvm_run *run)
> > /* Check the stage-2 fault is trans. fault or write fault
> > */ fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> > - if (fault_status != FSC_FAULT && fault_status != FSC_PERM)
> > {
> > + if (fault_status != FSC_FAULT && fault_status != FSC_PERM
> > &&
> > + fault_status != FSC_ACCESS) {
> > kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx
> > ESR_EL2=%#lx\n", kvm_vcpu_trap_get_class(vcpu),
> > (unsigned
> > long)kvm_vcpu_trap_get_fault(vcpu), @@ -1237,6 +1239,61 @@ void
> > kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
> > handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler,
> > &stage2_pte); }
> > +static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void
> > *data) +{
> > + pmd_t *pmd;
> > + pte_t *pte;
> > +
> > + pmd = stage2_get_pmd(kvm, NULL, gpa);
> > + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> > + return 0;
> > +
> > + if (kvm_pmd_huge(*pmd)) { /* THP, HugeTLB */
> > + *pmd = pmd_mkold(*pmd);
> > + goto tlbi;
>
> so in this case we'll loop over a huge pmd on a page-by-page basis,
> invalidating the tlb each time, right?
>
> Would it be worth checking of the access flag is already clear
> (!pmd_young()) and in that case exit without doing tlb invalidation?
>
> In fact, shouldn't we only return 1 if the pmd is indeed young and
> then the tlb invalidation will be done once by
> kvm_flush_remote_tlbs() in kvm_mmu_notifier_clear_flush_young() ?
>
> I got a little lost looking at how the core mm code uses the return
> value, but if I read the x86 and powerpc code correctly, they use it
> the way I suggest. Did I get this all wrong?
That's a very good point. I wonder if we could actually optimize
handle_hva_to_gpa to actually respect the mapping boundaries instead of
stupidly iterating over single pages, but that would be a further
improvement.
In the meantime, testing the access flag and doing an early out if
clear seems like the right thing to do.
> > + }
> > +
> > + pte = pte_offset_kernel(pmd, gpa);
> > + if (pte_none(*pte))
> > + return 0;
> > +
> > + *pte = pte_mkold(*pte); /* Just a page... */
>
> same with checking if it's young or not... ?
Yes, same logic.
> > +tlbi:
> > + kvm_tlb_flush_vmid_ipa(kvm, gpa);
> > + return 1;
> > +}
> > +
> > +static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa,
> > void *data) +{
> > + pmd_t *pmd;
> > + pte_t *pte;
> > +
> > + pmd = stage2_get_pmd(kvm, NULL, gpa);
> > + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> > + return 0;
> > +
> > + if (kvm_pmd_huge(*pmd)) /* THP, HugeTLB */
> > + return pmd_young(*pmd);
> > +
> > + pte = pte_offset_kernel(pmd, gpa);
> > + if (!pte_none(*pte)) /* Just a page... */
> > + return pte_young(*pte);
> > +
> > + return 0;
> > +}
> > +
> > +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned
> > long end) +{
> > + trace_kvm_age_hva(start, end);
> > + return handle_hva_to_gpa(kvm, start, end,
> > kvm_age_hva_handler, NULL); +}
> > +
> > +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> > +{
> > + trace_kvm_test_age_hva(hva);
> > + return handle_hva_to_gpa(kvm, hva, hva,
> > kvm_test_age_hva_handler, NULL); +}
> > +
> > void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> > {
> > mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> > index b6a6e71..364b5382 100644
> > --- a/arch/arm/kvm/trace.h
> > +++ b/arch/arm/kvm/trace.h
> > @@ -203,6 +203,39 @@ TRACE_EVENT(kvm_set_spte_hva,
> > TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
> > );
> >
> > +TRACE_EVENT(kvm_age_hva,
> > + TP_PROTO(unsigned long start, unsigned long end),
> > + TP_ARGS(start, end),
> > +
> > + TP_STRUCT__entry(
> > + __field( unsigned long,
> > start )
> > + __field( unsigned long,
> > end )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->start = start;
> > + __entry->end = end;
> > + ),
> > +
> > + TP_printk("mmu notifier age hva: %#08lx -- %#08lx",
> > + __entry->start, __entry->end)
> > +);
> > +
> > +TRACE_EVENT(kvm_test_age_hva,
> > + TP_PROTO(unsigned long hva),
> > + TP_ARGS(hva),
> > +
> > + TP_STRUCT__entry(
> > + __field( unsigned long,
> > hva )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->hva = hva;
> > + ),
> > +
> > + TP_printk("mmu notifier test age hva: %#08lx",
> > __entry->hva) +);
> > +
> > TRACE_EVENT(kvm_hvc,
> > TP_PROTO(unsigned long vcpu_pc, unsigned long r0, unsigned
> > long imm), TP_ARGS(vcpu_pc, r0, imm),
> > diff --git a/arch/arm64/include/asm/kvm_arm.h
> > b/arch/arm64/include/asm/kvm_arm.h index 8afb863..0d738f2 100644
> > --- a/arch/arm64/include/asm/kvm_arm.h
> > +++ b/arch/arm64/include/asm/kvm_arm.h
> > @@ -212,6 +212,7 @@
> >
> >
> > #define FSC_FAULT (0x04)
> > +#define FSC_ACCESS (0x08)
> > #define FSC_PERM (0x0c)
> >
> > /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h index acd101a..b831710 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -173,19 +173,10 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned
> > long hva); int kvm_unmap_hva_range(struct kvm *kvm,
> > unsigned long start, unsigned long end);
> > void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t
> > pte); +int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > unsigned long end); +int kvm_test_age_hva(struct kvm *kvm, unsigned
> > long hva);
> > /* We do not have shadow page tables, hence the empty hooks */
> > -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > - unsigned long end)
> > -{
> > - return 0;
> > -}
> > -
> > -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long
> > hva) -{
> > - return 0;
> > -}
> > -
> > static inline void kvm_arch_mmu_notifier_invalidate_page(struct
> > kvm *kvm, unsigned long address)
> > {
> > --
> > 2.1.4
> >
> Otherwise, this looks good.
Thanks for having had a look!
M.
--
Jazz is not dead. It just smells funny.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
2015-03-12 11:40 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Christoffer Dall
@ 2015-03-12 15:04 ` Marc Zyngier
2015-03-12 15:07 ` Marc Zyngier
1 sibling, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2015-03-12 15:04 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Steve Capper
On Thu, 12 Mar 2015 11:40:24 +0000
Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Wed, Jan 21, 2015 at 06:42:13PM +0000, Marc Zyngier wrote:
> > Now that we have page aging in Stage-2, it becomes obvious that
> > we're doing way too much work handling the fault.
> >
> > The page is not going anywhere (it is still mapped), the page
> > tables are already allocated, and all we want is to flip a bit
> > in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
> > since a page with the AF bit off is not allowed to be cached.
> >
> > An obvious solution is to have a separate handler for FSC_ACCESS,
> > where we pride ourselves to only do the very minimum amount of
> > work.
> >
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> > arch/arm/kvm/mmu.c | 46
> > ++++++++++++++++++++++++++++++++++++++++++++++ arch/arm/kvm/trace.h
> > | 15 +++++++++++++++ 2 files changed, 61 insertions(+)
> >
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index ffe89a0..112bae1 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -1073,6 +1073,46 @@ out_unlock:
> > return ret;
> > }
> >
> > +/*
> > + * Resolve the access fault by making the page young again.
> > + * Note that because the faulting entry is guaranteed not to be
> > + * cached in the TLB, we don't need to invalidate anything.
> > + */
> > +static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t
> > fault_ipa) +{
> > + pmd_t *pmd;
> > + pte_t *pte;
> > + pfn_t pfn;
> > + bool pfn_valid = false;
>
> I think you can just initialize pfn to KVM_PFN_ERR_FAULT and use
> is_error_pfn() if you like, not sure if it's cleaner.
I can have a look...
> > +
> > + trace_kvm_access_fault(fault_ipa);
> > +
> > + spin_lock(&vcpu->kvm->mmu_lock);
> > +
> > + pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> > + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> > + goto out;
> > +
> > + if (kvm_pmd_huge(*pmd)) { /* THP, HugeTLB */
> > + *pmd = pmd_mkyoung(*pmd);
> > + pfn = pmd_pfn(*pmd);
> > + pfn_valid = true;
> > + goto out;
> > + }
> > +
> > + pte = pte_offset_kernel(pmd, fault_ipa);
> > + if (pte_none(*pte)) /* Nothing there either
> > */
> > + goto out;
> > +
> > + *pte = pte_mkyoung(*pte); /* Just a page... */
> > + pfn = pte_pfn(*pte);
> > + pfn_valid = true;
> > +out:
> > + spin_unlock(&vcpu->kvm->mmu_lock);
>
> do you have a race here if the page is swapped out before you go and
> set pfn accessed? Does that cause any harm?
I don't think it really matters. The physical page is still there, and
is actually being accessed. The fact that the data is being evicted is
an interesting side effect... ;-)
> > + if (pfn_valid)
> > + kvm_set_pfn_accessed(pfn);
> > +}
> > +
> > /**
> > * kvm_handle_guest_abort - handles all 2nd stage aborts
> > * @vcpu: the VCPU pointer
> > @@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > *vcpu, struct kvm_run *run) /* Userspace should not be able to
> > register out-of-bounds IPAs */ VM_BUG_ON(fault_ipa >=
> > KVM_PHYS_SIZE);
> > + if (fault_status == FSC_ACCESS) {
> > + handle_access_fault(vcpu, fault_ipa);
> > + ret = 1;
> > + goto out_unlock;
> > + }
> > +
> > ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
> > fault_status); if (ret == 0)
> > ret = 1;
> > diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> > index 364b5382..5665a16 100644
> > --- a/arch/arm/kvm/trace.h
> > +++ b/arch/arm/kvm/trace.h
> > @@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
> > __entry->hxfar, __entry->vcpu_pc)
> > );
> >
> > +TRACE_EVENT(kvm_access_fault,
> > + TP_PROTO(unsigned long ipa),
> > + TP_ARGS(ipa),
> > +
> > + TP_STRUCT__entry(
> > + __field( unsigned long,
> > ipa )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->ipa = ipa;
> > + ),
> > +
> > + TP_printk("IPA: %lx", __entry->ipa)
> > +);
> > +
> > TRACE_EVENT(kvm_irq_line,
> > TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int
> > level), TP_ARGS(type, vcpu_idx, irq_num, level),
Thanks,
M.
--
Jazz is not dead. It just smells funny.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
2015-03-12 11:40 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Christoffer Dall
2015-03-12 15:04 ` Marc Zyngier
@ 2015-03-12 15:07 ` Marc Zyngier
1 sibling, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2015-03-12 15:07 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Steve Capper
On Thu, 12 Mar 2015 11:40:24 +0000
Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Wed, Jan 21, 2015 at 06:42:13PM +0000, Marc Zyngier wrote:
> > Now that we have page aging in Stage-2, it becomes obvious that
> > we're doing way too much work handling the fault.
> >
> > The page is not going anywhere (it is still mapped), the page
> > tables are already allocated, and all we want is to flip a bit
> > in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
> > since a page with the AF bit off is not allowed to be cached.
> >
> > An obvious solution is to have a separate handler for FSC_ACCESS,
> > where we pride ourselves to only do the very minimum amount of
> > work.
> >
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> > arch/arm/kvm/mmu.c | 46
> > ++++++++++++++++++++++++++++++++++++++++++++++ arch/arm/kvm/trace.h
> > | 15 +++++++++++++++ 2 files changed, 61 insertions(+)
> >
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index ffe89a0..112bae1 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -1073,6 +1073,46 @@ out_unlock:
> > return ret;
> > }
> >
> > +/*
> > + * Resolve the access fault by making the page young again.
> > + * Note that because the faulting entry is guaranteed not to be
> > + * cached in the TLB, we don't need to invalidate anything.
> > + */
> > +static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t
> > fault_ipa) +{
> > + pmd_t *pmd;
> > + pte_t *pte;
> > + pfn_t pfn;
> > + bool pfn_valid = false;
>
> I think you can just initialize pfn to KVM_PFN_ERR_FAULT and use
> is_error_pfn() if you like, not sure if it's cleaner.
I can have a look...
> > +
> > + trace_kvm_access_fault(fault_ipa);
> > +
> > + spin_lock(&vcpu->kvm->mmu_lock);
> > +
> > + pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> > + if (!pmd || pmd_none(*pmd)) /* Nothing there */
> > + goto out;
> > +
> > + if (kvm_pmd_huge(*pmd)) { /* THP, HugeTLB */
> > + *pmd = pmd_mkyoung(*pmd);
> > + pfn = pmd_pfn(*pmd);
> > + pfn_valid = true;
> > + goto out;
> > + }
> > +
> > + pte = pte_offset_kernel(pmd, fault_ipa);
> > + if (pte_none(*pte)) /* Nothing there either
> > */
> > + goto out;
> > +
> > + *pte = pte_mkyoung(*pte); /* Just a page... */
> > + pfn = pte_pfn(*pte);
> > + pfn_valid = true;
> > +out:
> > + spin_unlock(&vcpu->kvm->mmu_lock);
>
> do you have a race here if the page is swapped out before you go and
> set pfn accessed? Does that cause any harm?
I don't think it really matters. The physical page is still there, and
is actually being accessed. The fact that the data is being evicted is
an interesting side effect... ;-)
> > + if (pfn_valid)
> > + kvm_set_pfn_accessed(pfn);
> > +}
> > +
> > /**
> > * kvm_handle_guest_abort - handles all 2nd stage aborts
> > * @vcpu: the VCPU pointer
> > @@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > *vcpu, struct kvm_run *run) /* Userspace should not be able to
> > register out-of-bounds IPAs */ VM_BUG_ON(fault_ipa >=
> > KVM_PHYS_SIZE);
> > + if (fault_status == FSC_ACCESS) {
> > + handle_access_fault(vcpu, fault_ipa);
> > + ret = 1;
> > + goto out_unlock;
> > + }
> > +
> > ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
> > fault_status); if (ret == 0)
> > ret = 1;
> > diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> > index 364b5382..5665a16 100644
> > --- a/arch/arm/kvm/trace.h
> > +++ b/arch/arm/kvm/trace.h
> > @@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
> > __entry->hxfar, __entry->vcpu_pc)
> > );
> >
> > +TRACE_EVENT(kvm_access_fault,
> > + TP_PROTO(unsigned long ipa),
> > + TP_ARGS(ipa),
> > +
> > + TP_STRUCT__entry(
> > + __field( unsigned long,
> > ipa )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->ipa = ipa;
> > + ),
> > +
> > + TP_printk("IPA: %lx", __entry->ipa)
> > +);
> > +
> > TRACE_EVENT(kvm_irq_line,
> > TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int
> > level), TP_ARGS(type, vcpu_idx, irq_num, level),
Thanks,
M.
--
Jazz is not dead. It just smells funny.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-03-12 15:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1421865733-20034-1-git-send-email-marc.zyngier@arm.com>
[not found] ` <1421865733-20034-2-git-send-email-marc.zyngier@arm.com>
2015-03-12 11:40 ` [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value Christoffer Dall
[not found] ` <1421865733-20034-3-git-send-email-marc.zyngier@arm.com>
2015-03-12 11:40 ` [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging Christoffer Dall
2015-03-12 14:50 ` Marc Zyngier
[not found] ` <1421865733-20034-4-git-send-email-marc.zyngier@arm.com>
2015-03-12 11:40 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Christoffer Dall
2015-03-12 15:04 ` Marc Zyngier
2015-03-12 15:07 ` Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox