[PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: marc.zyngier@arm.com (Marc Zyngier)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
Date: Wed, 08 Oct 2014 11:27:39 +0100	[thread overview]
Message-ID: <5435119B.1010404@arm.com> (raw)
In-Reply-To: <20141008094744.GL3717@cbox>

On 08/10/14 10:47, Christoffer Dall wrote:
> On Wed, Oct 08, 2014 at 10:34:31AM +0100, Marc Zyngier wrote:
>> On 07/10/14 20:39, Christoffer Dall wrote:
>>> On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote:
>>>> On 07/10/14 11:48, Catalin Marinas wrote:
>>>>> On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote:
>>>>>> +/**
>>>>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
>>>>>> + * @kvm:       The KVM struct pointer for the VM.
>>>>>> + * @pgd:       The kernel pseudo pgd
>>>>>> + *
>>>>>> + * When the kernel uses more levels of page tables than the guest, we allocate
>>>>>> + * a fake PGD and pre-populate it to point to the next-level page table, which
>>>>>> + * will be the real initial page table pointed to by the VTTBR.
>>>>>> + *
>>>>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
>>>>>> + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
>>>>>> + * allocate 2 consecutive PUD pages.
>>>>>> + */
>>>>>> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3
>>>>>> +#define KVM_PREALLOC_LEVEL     2
>>>>>> +#define PTRS_PER_S2_PGD                1
>>>>>> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>>>
>>>>> I agree that my magic equation wasn't readable ;) (I had troubles
>>>>> re-understanding it as well), but you also have some constants here that
>>>>> are not immediately obvious where you got to them from. IIUC,
>>>>> KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands
>>>>> stage 2 pmd and pte. I guess you could look into the ARM ARM tables but
>>>>> it's still not clear.
>>>>>
>>>>> Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was:
>>>>>
>>>>> #if PGDIR_SHIFT > KVM_PHYS_SHIFT
>>>>> #define PTRS_PER_S2_PGD                     (1)
>>>>> #else
>>>>> #define PTRS_PER_S2_PGD                     (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
>>>>> #endif
>>>>>
>>>>> In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K
>>>>> and 4 levels case below is also correct.
>>>>>
>>>>> The KVM start level calculation, we could assume that KVM needs either
>>>>> host levels or host levels - 1 (unless we go for some weirdly small
>>>>> KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as:
>>>>>
>>>>> #if PTRS_PER_S2_PGD <= 16
>>>>> #define KVM_PREALLOC_LEVEL  (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
>>>>> #else
>>>>> #define KVM_PREALLOC_LEVEL  (0)
>>>>> #endif
>>>>>
>>>>> Basically if you can concatenate 16 or less pages at the level below the
>>>>> top, the architecture does not allow a small top level. In this case,
>>>>> (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the
>>>>> host and we add 1 to go to the next level for KVM stage 2 when
>>>>> PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to
>>>>> preallocate.
>>>>
>>>> I think this makes the whole thing clearer (at least for me), as it
>>>> makes the relationship between KVM_PREALLOC_LEVEL and
>>>> CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me
>>>> initially).
>>>
>>> Agreed.
>>>
>>>>
>>>>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
>>>>>> +{
>>>>>> +       pud_t *pud;
>>>>>> +       pmd_t *pmd;
>>>>>> +
>>>>>> +       pud = pud_offset(pgd, 0);
>>>>>> +       pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
>>>>>> +
>>>>>> +       if (!pmd)
>>>>>> +               return -ENOMEM;
>>>>>> +       pud_populate(NULL, pud, pmd);
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void kvm_free_hwpgd(struct kvm *kvm)
>>>>>> +{
>>>>>> +       pgd_t *pgd = kvm->arch.pgd;
>>>>>> +       pud_t *pud = pud_offset(pgd, 0);
>>>>>> +       pmd_t *pmd = pmd_offset(pud, 0);
>>>>>> +       free_pages((unsigned long)pmd, 0);
>>>>>> +}
>>>>>> +
>>>>>> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm)
>>>>>> +{
>>>>>> +       pgd_t *pgd = kvm->arch.pgd;
>>>>>> +       pud_t *pud = pud_offset(pgd, 0);
>>>>>> +       pmd_t *pmd = pmd_offset(pud, 0);
>>>>>> +       return virt_to_phys(pmd);
>>>>>> +
>>>>>> +}
>>>>>> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4
>>>>>> +#define KVM_PREALLOC_LEVEL     1
>>>>>> +#define PTRS_PER_S2_PGD                2
>>>>>> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>>>
>>>>> Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39))
>>>>> which is 2 and KVM_PREALLOC_LEVEL == 1.
>>>>>
>>>>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
>>>>>> +{
>>>>>> +       pud_t *pud;
>>>>>> +
>>>>>> +       pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
>>>>>> +       if (!pud)
>>>>>> +               return -ENOMEM;
>>>>>> +       pgd_populate(NULL, pgd, pud);
>>>>>> +       pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD);
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>
>>>>> You still need to define these functions but you can make their
>>>>> implementation dependent solely on the KVM_PREALLOC_LEVEL rather than
>>>>> 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you
>>>>> allocate pud and populate the pgds (in a loop based on the
>>>>> PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud
>>>>> (still in a loop though it would probably be 1 iteration). We know based
>>>>> on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and
>>>>> CONFIG_ARM64_PGTABLE_LEVELS == 4.
>>>>>
>>>>
>>>> Also agreed. Most of what you wrote here could also be gathered as
>>>> comments in the patch.
>>>>
>>> Yes, I reworded some of the text slightly as comments for the next
>>> version of the patch.
>>>
>>> However, I'm not sure I have a clear idea of how you'd like these
>>> functions to look like.
>>>
>>> I came up with the following based on your feedback, but I personally
>>> don't find it a lot easier to read than what I had already.  Suggestions
>>> are welcome:
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>>> index a030d16..7941a51 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -41,6 +41,18 @@
>>>   */
>>>  #define TRAMPOLINE_VA          (HYP_PAGE_OFFSET_MASK & PAGE_MASK)
>>>
>>> +/*
>>> + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
>>> + * levels in addition to the PGD and potentially the PUD which are
>>> + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2
>>> + * tables use one level of tables less than the kernel.
>>> + */
>>> +#ifdef CONFIG_ARM64_64K_PAGES
>>> +#define KVM_MMU_CACHE_MIN_PAGES        1
>>> +#else
>>> +#define KVM_MMU_CACHE_MIN_PAGES        2
>>> +#endif
>>> +
>>>  #ifdef __ASSEMBLY__
>>>
>>>  /*
>>> @@ -53,6 +65,7 @@
>>>
>>>  #else
>>>
>>> +#include <asm/pgalloc.h>
>>>  #include <asm/cachetype.h>
>>>  #include <asm/cacheflush.h>
>>>
>>> @@ -65,10 +78,6 @@
>>>  #define KVM_PHYS_SIZE  (1UL << KVM_PHYS_SHIFT)
>>>  #define KVM_PHYS_MASK  (KVM_PHYS_SIZE - 1UL)
>>>
>>> -/* Make sure we get the right size, and thus the right alignment */
>>> -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
>>> -#define S2_PGD_ORDER   get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>> -
>>>  int create_hyp_mappings(void *from, void *to);
>>>  int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>>>  void free_boot_hyp_pgd(void);
>>> @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void);
>>>  #define        kvm_set_pmd(pmdp, pmd)          set_pmd(pmdp, pmd)
>>>
>>>  static inline void kvm_clean_pgd(pgd_t *pgd) {}
>>> +static inline void kvm_clean_pmd(pmd_t *pmd) {}
>>>  static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
>>>  static inline void kvm_clean_pte(pte_t *pte) {}
>>>  static inline void kvm_clean_pte_entry(pte_t *pte) {}
>>> @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr)
>>>  }
>>>
>>>  #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep)
>>> -#ifndef CONFIG_ARM64_64K_PAGES
>>> -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
>>> -#else
>>> +
>>> +#ifdef __PAGETABLE_PMD_FOLDED
>>>  #define kvm_pmd_table_empty(pmdp) (0)
>>> +#else
>>> +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
>>>  #endif
>>> +
>>> +#ifdef __PAGETABLE_PUD_FOLDED
>>>  #define kvm_pud_table_empty(pudp) (0)
>>> +#else
>>> +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp)
>>> +#endif
>>> +
>>> +/*
>>> + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
>>> + * the entire IPA input range with a single pgd entry, and we would only need
>>> + * one pgd entry.
>>> + */
>>> +#if PGDIR_SHIFT > KVM_PHYS_SHIFT
>>> +#define PTRS_PER_S2_PGD                (1)
>>> +#else
>>> +#define PTRS_PER_S2_PGD                (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
>>> +#endif
>>> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>
>>> +/*
>>> + * If we are concatenating first level stage-2 page tables, we would have less
>>> + * than or equal to 16 pointers in the fake PGD, because that's what the
>>> + * architecture allows.  In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
>>> + * represents the first level for the host, and we add 1 to go to the next
>>> + * level (which uses contatenation) for the stage-2 tables.
>>> + */
>>> +#if PTRS_PER_S2_PGD <= 16
>>> +#define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
>>> +#else
>>> +#define KVM_PREALLOC_LEVEL     (0)
>>> +#endif
>>> +
>>> +/**
>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
>>> + * @kvm:       The KVM struct pointer for the VM.
>>> + * @pgd:       The kernel pseudo pgd
>>> + *
>>> + * When the kernel uses more levels of page tables than the guest, we allocate
>>> + * a fake PGD and pre-populate it to point to the next-level page table, which
>>> + * will be the real initial page table pointed to by the VTTBR.
>>> + *
>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
>>> + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
>>> + * allocate 2 consecutive PUD pages.
>>> + */
>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
>>> +{
>>> +       pud_t *pud;
>>> +       pmd_t *pmd;
>>> +       unsigned int order, i;
>>> +       unsigned long hwpgd;
>>> +
>>> +       if (KVM_PREALLOC_LEVEL == 0)
>>> +               return 0;
>>> +
>>> +       order = get_order(PTRS_PER_S2_PGD);
>>
>> S2_PGD_ORDER instead? Otherwise, that doesn't seem quite right...
>>
> 
> no, S2_PGD_ORDER is always the order of the PGD (in linux
> macro-world-pgd-terms) that we allocate currently.  Of course, we could
> rework that like this:
> 
> #if PTRS_PER_S2_PGD <= 16
> #define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> #define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD)
> #else
> #define KVM_PREALLOC_LEVEL     (0)
> #define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> #endif

I see. Got confused by the various PGD...

> 
>>> +       hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
>>> +       if (!hwpgd)
>>> +               return -ENOMEM;
>>> +
>>> +       if (KVM_PREALLOC_LEVEL == 1) {
>>> +               pud = (pud_t *)hwpgd;
>>> +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
>>> +                       pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
>>> +       } else if (KVM_PREALLOC_LEVEL == 2) {
>>> +               pud = pud_offset(pgd, 0);
>>> +               pmd = (pmd_t *)hwpgd;
>>> +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
>>> +                       pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
>>> +       }
>>> +
>>> +       return 0;
>>
>> Shouldn't we return an error here instead? Or BUG()?
>>
> 
> yes, should never happen.
> 
>>> +}
>>> +
>>> +static inline void *kvm_get_hwpgd(struct kvm *kvm)
>>> +{
>>> +       pgd_t *pgd = kvm->arch.pgd;
>>> +       pud_t *pud;
>>> +       pmd_t *pmd;
>>> +
>>> +       switch (KVM_PREALLOC_LEVEL) {
>>> +       case 0:
>>> +               return pgd;
>>> +       case 1:
>>> +               pud = pud_offset(pgd, 0);
>>> +               return pud;
>>> +       case 2:
>>> +               pud = pud_offset(pgd, 0);
>>> +               pmd = pmd_offset(pud, 0);
>>> +               return pmd;
>>> +       default:
>>> +               BUG();
>>> +               return NULL;
>>> +       }
>>> +}
>>> +
>>> +static inline void kvm_free_hwpgd(struct kvm *kvm)
>>> +{
>>> +       if (KVM_PREALLOC_LEVEL > 0) {
>>> +               unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm);
>>> +               free_pages(hwpgd, get_order(S2_PGD_ORDER));
>>
>> Isn't the get_order() a bit wrong here? I'd expect S2_PGD_ORDER to be
>> what we need already...
>>
> 
> yikes!  gonzo coding.
> 
> what it should be is
>         free_pages(hwpgd, get_order(PTRS_PER_S2_PGD));
> 
> but it ties into the discussion above.

Agreed.

>>> +       }
>>> +}
>>
>> I personally like this version more (Catalin may have a different
>> opinion ;-).
>>
> I'm fine with this, but I wasn't sure if you guys had something more
> clever/beautiful in mind....?

See Catalin's reply, but I'm happy either way.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

WARNING: multiple messages have this Message-ID (diff)

From: Marc Zyngier <marc.zyngier@arm.com>
To: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"jungseoklee85@gmail.com" <jungseoklee85@gmail.com>,
	Joel Schopp <joel.schopp@amd.com>
Subject: Re: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
Date: Wed, 08 Oct 2014 11:27:39 +0100	[thread overview]
Message-ID: <5435119B.1010404@arm.com> (raw)
In-Reply-To: <20141008094744.GL3717@cbox>

On 08/10/14 10:47, Christoffer Dall wrote:
> On Wed, Oct 08, 2014 at 10:34:31AM +0100, Marc Zyngier wrote:
>> On 07/10/14 20:39, Christoffer Dall wrote:
>>> On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote:
>>>> On 07/10/14 11:48, Catalin Marinas wrote:
>>>>> On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote:
>>>>>> +/**
>>>>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
>>>>>> + * @kvm:       The KVM struct pointer for the VM.
>>>>>> + * @pgd:       The kernel pseudo pgd
>>>>>> + *
>>>>>> + * When the kernel uses more levels of page tables than the guest, we allocate
>>>>>> + * a fake PGD and pre-populate it to point to the next-level page table, which
>>>>>> + * will be the real initial page table pointed to by the VTTBR.
>>>>>> + *
>>>>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
>>>>>> + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
>>>>>> + * allocate 2 consecutive PUD pages.
>>>>>> + */
>>>>>> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3
>>>>>> +#define KVM_PREALLOC_LEVEL     2
>>>>>> +#define PTRS_PER_S2_PGD                1
>>>>>> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>>>
>>>>> I agree that my magic equation wasn't readable ;) (I had troubles
>>>>> re-understanding it as well), but you also have some constants here that
>>>>> are not immediately obvious where you got to them from. IIUC,
>>>>> KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands
>>>>> stage 2 pmd and pte. I guess you could look into the ARM ARM tables but
>>>>> it's still not clear.
>>>>>
>>>>> Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was:
>>>>>
>>>>> #if PGDIR_SHIFT > KVM_PHYS_SHIFT
>>>>> #define PTRS_PER_S2_PGD                     (1)
>>>>> #else
>>>>> #define PTRS_PER_S2_PGD                     (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
>>>>> #endif
>>>>>
>>>>> In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K
>>>>> and 4 levels case below is also correct.
>>>>>
>>>>> The KVM start level calculation, we could assume that KVM needs either
>>>>> host levels or host levels - 1 (unless we go for some weirdly small
>>>>> KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as:
>>>>>
>>>>> #if PTRS_PER_S2_PGD <= 16
>>>>> #define KVM_PREALLOC_LEVEL  (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
>>>>> #else
>>>>> #define KVM_PREALLOC_LEVEL  (0)
>>>>> #endif
>>>>>
>>>>> Basically if you can concatenate 16 or less pages at the level below the
>>>>> top, the architecture does not allow a small top level. In this case,
>>>>> (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the
>>>>> host and we add 1 to go to the next level for KVM stage 2 when
>>>>> PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to
>>>>> preallocate.
>>>>
>>>> I think this makes the whole thing clearer (at least for me), as it
>>>> makes the relationship between KVM_PREALLOC_LEVEL and
>>>> CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me
>>>> initially).
>>>
>>> Agreed.
>>>
>>>>
>>>>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
>>>>>> +{
>>>>>> +       pud_t *pud;
>>>>>> +       pmd_t *pmd;
>>>>>> +
>>>>>> +       pud = pud_offset(pgd, 0);
>>>>>> +       pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
>>>>>> +
>>>>>> +       if (!pmd)
>>>>>> +               return -ENOMEM;
>>>>>> +       pud_populate(NULL, pud, pmd);
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void kvm_free_hwpgd(struct kvm *kvm)
>>>>>> +{
>>>>>> +       pgd_t *pgd = kvm->arch.pgd;
>>>>>> +       pud_t *pud = pud_offset(pgd, 0);
>>>>>> +       pmd_t *pmd = pmd_offset(pud, 0);
>>>>>> +       free_pages((unsigned long)pmd, 0);
>>>>>> +}
>>>>>> +
>>>>>> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm)
>>>>>> +{
>>>>>> +       pgd_t *pgd = kvm->arch.pgd;
>>>>>> +       pud_t *pud = pud_offset(pgd, 0);
>>>>>> +       pmd_t *pmd = pmd_offset(pud, 0);
>>>>>> +       return virt_to_phys(pmd);
>>>>>> +
>>>>>> +}
>>>>>> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4
>>>>>> +#define KVM_PREALLOC_LEVEL     1
>>>>>> +#define PTRS_PER_S2_PGD                2
>>>>>> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>>>
>>>>> Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39))
>>>>> which is 2 and KVM_PREALLOC_LEVEL == 1.
>>>>>
>>>>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
>>>>>> +{
>>>>>> +       pud_t *pud;
>>>>>> +
>>>>>> +       pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
>>>>>> +       if (!pud)
>>>>>> +               return -ENOMEM;
>>>>>> +       pgd_populate(NULL, pgd, pud);
>>>>>> +       pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD);
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>
>>>>> You still need to define these functions but you can make their
>>>>> implementation dependent solely on the KVM_PREALLOC_LEVEL rather than
>>>>> 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you
>>>>> allocate pud and populate the pgds (in a loop based on the
>>>>> PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud
>>>>> (still in a loop though it would probably be 1 iteration). We know based
>>>>> on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and
>>>>> CONFIG_ARM64_PGTABLE_LEVELS == 4.
>>>>>
>>>>
>>>> Also agreed. Most of what you wrote here could also be gathered as
>>>> comments in the patch.
>>>>
>>> Yes, I reworded some of the text slightly as comments for the next
>>> version of the patch.
>>>
>>> However, I'm not sure I have a clear idea of how you'd like these
>>> functions to look like.
>>>
>>> I came up with the following based on your feedback, but I personally
>>> don't find it a lot easier to read than what I had already.  Suggestions
>>> are welcome:
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>>> index a030d16..7941a51 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -41,6 +41,18 @@
>>>   */
>>>  #define TRAMPOLINE_VA          (HYP_PAGE_OFFSET_MASK & PAGE_MASK)
>>>
>>> +/*
>>> + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
>>> + * levels in addition to the PGD and potentially the PUD which are
>>> + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2
>>> + * tables use one level of tables less than the kernel.
>>> + */
>>> +#ifdef CONFIG_ARM64_64K_PAGES
>>> +#define KVM_MMU_CACHE_MIN_PAGES        1
>>> +#else
>>> +#define KVM_MMU_CACHE_MIN_PAGES        2
>>> +#endif
>>> +
>>>  #ifdef __ASSEMBLY__
>>>
>>>  /*
>>> @@ -53,6 +65,7 @@
>>>
>>>  #else
>>>
>>> +#include <asm/pgalloc.h>
>>>  #include <asm/cachetype.h>
>>>  #include <asm/cacheflush.h>
>>>
>>> @@ -65,10 +78,6 @@
>>>  #define KVM_PHYS_SIZE  (1UL << KVM_PHYS_SHIFT)
>>>  #define KVM_PHYS_MASK  (KVM_PHYS_SIZE - 1UL)
>>>
>>> -/* Make sure we get the right size, and thus the right alignment */
>>> -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
>>> -#define S2_PGD_ORDER   get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>> -
>>>  int create_hyp_mappings(void *from, void *to);
>>>  int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>>>  void free_boot_hyp_pgd(void);
>>> @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void);
>>>  #define        kvm_set_pmd(pmdp, pmd)          set_pmd(pmdp, pmd)
>>>
>>>  static inline void kvm_clean_pgd(pgd_t *pgd) {}
>>> +static inline void kvm_clean_pmd(pmd_t *pmd) {}
>>>  static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
>>>  static inline void kvm_clean_pte(pte_t *pte) {}
>>>  static inline void kvm_clean_pte_entry(pte_t *pte) {}
>>> @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr)
>>>  }
>>>
>>>  #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep)
>>> -#ifndef CONFIG_ARM64_64K_PAGES
>>> -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
>>> -#else
>>> +
>>> +#ifdef __PAGETABLE_PMD_FOLDED
>>>  #define kvm_pmd_table_empty(pmdp) (0)
>>> +#else
>>> +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
>>>  #endif
>>> +
>>> +#ifdef __PAGETABLE_PUD_FOLDED
>>>  #define kvm_pud_table_empty(pudp) (0)
>>> +#else
>>> +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp)
>>> +#endif
>>> +
>>> +/*
>>> + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
>>> + * the entire IPA input range with a single pgd entry, and we would only need
>>> + * one pgd entry.
>>> + */
>>> +#if PGDIR_SHIFT > KVM_PHYS_SHIFT
>>> +#define PTRS_PER_S2_PGD                (1)
>>> +#else
>>> +#define PTRS_PER_S2_PGD                (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
>>> +#endif
>>> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>
>>> +/*
>>> + * If we are concatenating first level stage-2 page tables, we would have less
>>> + * than or equal to 16 pointers in the fake PGD, because that's what the
>>> + * architecture allows.  In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
>>> + * represents the first level for the host, and we add 1 to go to the next
>>> + * level (which uses contatenation) for the stage-2 tables.
>>> + */
>>> +#if PTRS_PER_S2_PGD <= 16
>>> +#define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
>>> +#else
>>> +#define KVM_PREALLOC_LEVEL     (0)
>>> +#endif
>>> +
>>> +/**
>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
>>> + * @kvm:       The KVM struct pointer for the VM.
>>> + * @pgd:       The kernel pseudo pgd
>>> + *
>>> + * When the kernel uses more levels of page tables than the guest, we allocate
>>> + * a fake PGD and pre-populate it to point to the next-level page table, which
>>> + * will be the real initial page table pointed to by the VTTBR.
>>> + *
>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
>>> + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
>>> + * allocate 2 consecutive PUD pages.
>>> + */
>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
>>> +{
>>> +       pud_t *pud;
>>> +       pmd_t *pmd;
>>> +       unsigned int order, i;
>>> +       unsigned long hwpgd;
>>> +
>>> +       if (KVM_PREALLOC_LEVEL == 0)
>>> +               return 0;
>>> +
>>> +       order = get_order(PTRS_PER_S2_PGD);
>>
>> S2_PGD_ORDER instead? Otherwise, that doesn't seem quite right...
>>
> 
> no, S2_PGD_ORDER is always the order of the PGD (in linux
> macro-world-pgd-terms) that we allocate currently.  Of course, we could
> rework that like this:
> 
> #if PTRS_PER_S2_PGD <= 16
> #define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> #define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD)
> #else
> #define KVM_PREALLOC_LEVEL     (0)
> #define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> #endif

I see. Got confused by the various PGD...

> 
>>> +       hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
>>> +       if (!hwpgd)
>>> +               return -ENOMEM;
>>> +
>>> +       if (KVM_PREALLOC_LEVEL == 1) {
>>> +               pud = (pud_t *)hwpgd;
>>> +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
>>> +                       pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
>>> +       } else if (KVM_PREALLOC_LEVEL == 2) {
>>> +               pud = pud_offset(pgd, 0);
>>> +               pmd = (pmd_t *)hwpgd;
>>> +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
>>> +                       pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
>>> +       }
>>> +
>>> +       return 0;
>>
>> Shouldn't we return an error here instead? Or BUG()?
>>
> 
> yes, should never happen.
> 
>>> +}
>>> +
>>> +static inline void *kvm_get_hwpgd(struct kvm *kvm)
>>> +{
>>> +       pgd_t *pgd = kvm->arch.pgd;
>>> +       pud_t *pud;
>>> +       pmd_t *pmd;
>>> +
>>> +       switch (KVM_PREALLOC_LEVEL) {
>>> +       case 0:
>>> +               return pgd;
>>> +       case 1:
>>> +               pud = pud_offset(pgd, 0);
>>> +               return pud;
>>> +       case 2:
>>> +               pud = pud_offset(pgd, 0);
>>> +               pmd = pmd_offset(pud, 0);
>>> +               return pmd;
>>> +       default:
>>> +               BUG();
>>> +               return NULL;
>>> +       }
>>> +}
>>> +
>>> +static inline void kvm_free_hwpgd(struct kvm *kvm)
>>> +{
>>> +       if (KVM_PREALLOC_LEVEL > 0) {
>>> +               unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm);
>>> +               free_pages(hwpgd, get_order(S2_PGD_ORDER));
>>
>> Isn't the get_order() a bit wrong here? I'd expect S2_PGD_ORDER to be
>> what we need already...
>>
> 
> yikes!  gonzo coding.
> 
> what it should be is
>         free_pages(hwpgd, get_order(PTRS_PER_S2_PGD));
> 
> but it ties into the discussion above.

Agreed.

>>> +       }
>>> +}
>>
>> I personally like this version more (Catalin may have a different
>> opinion ;-).
>>
> I'm fine with this, but I wasn't sure if you guys had something more
> clever/beautiful in mind....?

See Catalin's reply, but I'm happy either way.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

next prev parent reply	other threads:[~2014-10-08 10:27 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall
2014-10-06 20:30 ` Christoffer Dall
2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall
2014-10-06 20:30   ` Christoffer Dall
2014-10-07 10:48   ` Catalin Marinas
2014-10-07 10:48     ` Catalin Marinas
2014-10-07 13:28     ` Marc Zyngier
2014-10-07 13:28       ` Marc Zyngier
2014-10-07 19:39       ` Christoffer Dall
2014-10-07 19:39         ` Christoffer Dall
2014-10-08  9:34         ` Marc Zyngier
2014-10-08  9:34           ` Marc Zyngier
2014-10-08  9:47           ` Christoffer Dall
2014-10-08  9:47             ` Christoffer Dall
2014-10-08 10:27             ` Marc Zyngier [this message]
2014-10-08 10:27               ` Marc Zyngier
2014-10-08  9:47         ` Catalin Marinas
2014-10-08  9:47           ` Catalin Marinas
2014-10-09 11:01           ` Christoffer Dall
2014-10-09 11:01             ` Christoffer Dall
2014-10-09 13:36             ` Catalin Marinas
2014-10-09 13:36               ` Catalin Marinas
2014-10-10  8:16               ` Christoffer Dall
2014-10-10  8:16                 ` Christoffer Dall
2014-10-07 13:40   ` Marc Zyngier
2014-10-07 13:40     ` Marc Zyngier
2014-10-08  9:48     ` Christoffer Dall
2014-10-08  9:48       ` Christoffer Dall
2014-10-06 20:30 ` [PATCH v2 2/3] arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE Christoffer Dall
2014-10-06 20:30   ` Christoffer Dall
2014-10-06 20:30 ` [PATCH v2 3/3] arm64: Allow 48-bits VA space without ARM_SMMU Christoffer Dall
2014-10-06 20:30   ` Christoffer Dall
2014-10-07  9:24 ` [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Catalin Marinas
2014-10-07  9:24   ` Catalin Marinas
2014-10-07  9:36   ` Christoffer Dall
2014-10-07  9:36     ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5435119B.1010404@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.