[PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
Date: Thu, 9 Oct 2014 13:01:37 +0200	[thread overview]
Message-ID: <20141009110137.GX3717@cbox> (raw)
In-Reply-To: <20141008094703.GB5906@e104818-lin.cambridge.arm.com>

On Wed, Oct 08, 2014 at 10:47:04AM +0100, Catalin Marinas wrote:
> On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote:
> > I came up with the following based on your feedback, but I personally
> > don't find it a lot easier to read than what I had already.  Suggestions
> > are welcome:
> 
> At least PTRS_PER_S2_PGD and KVM_PREALLOC_LEVEL are clearer to me as
> formulas than the magic numbers.
> 

Agreed.

> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index a030d16..7941a51 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> [...]
> > +/*
> > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
> > + * the entire IPA input range with a single pgd entry, and we would only need
> > + * one pgd entry.
> > + */
> 
> It may be worth here stating that this pgd is actually fake (covered
> below as well). Maybe something like "single (fake) pgd entry".
> 

Yes.

> > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT
> > +#define PTRS_PER_S2_PGD                (1)
> > +#else
> > +#define PTRS_PER_S2_PGD                (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
> > +#endif
> > +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > 
> > +/*
> > + * If we are concatenating first level stage-2 page tables, we would have less
> > + * than or equal to 16 pointers in the fake PGD, because that's what the
> > + * architecture allows.  In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
> > + * represents the first level for the host, and we add 1 to go to the next
> > + * level (which uses contatenation) for the stage-2 tables.
> > + */
> > +#if PTRS_PER_S2_PGD <= 16
> > +#define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> > +#else
> > +#define KVM_PREALLOC_LEVEL     (0)
> > +#endif
> > +
> > +/**
> > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
> > + * @kvm:       The KVM struct pointer for the VM.
> > + * @pgd:       The kernel pseudo pgd
> > + *
> > + * When the kernel uses more levels of page tables than the guest, we allocate
> > + * a fake PGD and pre-populate it to point to the next-level page table, which
> > + * will be the real initial page table pointed to by the VTTBR.
> > + *
> > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
> > + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
> > + * allocate 2 consecutive PUD pages.
> > + */
> 
> I don't have a strong preference here, if you find the code easier to
> read as separate kvm_prealloc_hwpgd() functions, use those, as per your
> original patch. My point was to no longer define the functions based on
> #if 64K && 3-levels etc. but only on KVM_PREALLOC_LEVEL.
> 
> Anyway, I think the code below looks ok, with some fixes.
> 

I think it's nicer too once I got used to it.

> > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
> > +{
> > +       pud_t *pud;
> > +       pmd_t *pmd;
> > +       unsigned int order, i;
> > +       unsigned long hwpgd;
> > +
> > +       if (KVM_PREALLOC_LEVEL == 0)
> > +               return 0;
> > +
> > +       order = get_order(PTRS_PER_S2_PGD);
> 
> Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD
> is 16 or less and the order should not be used.
> 

no, if the kernel has 4K pages and 4 levels, then PGDIR_SHIFT is 39, and
KVM_PHYS_SHIFT stays 40, so that means PTRS_PER_S2_PGD becomes 2, which
means we concatenate two first level stage-2 page tables, which means we
need to allocate two consecutive pages, giving us an order of 1, not 0.

That's exactly why we use get_order(PTRS_PER_S2_PGD) instead of
S2_PGD_ORDER, which is only used when we're not doing the fake PGD trick
(see my response to Marc's mail).

> > +       hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> 
> I assume you need __get_free_pages() for alignment.
> 

yes, would you prefer a comment to that fact?

> > +       if (!hwpgd)
> > +               return -ENOMEM;
> > +
> > +       if (KVM_PREALLOC_LEVEL == 1) {
> > +               pud = (pud_t *)hwpgd;
> > +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
> > +                       pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
> > +       } else if (KVM_PREALLOC_LEVEL == 2) {
> > +               pud = pud_offset(pgd, 0);
> > +               pmd = (pmd_t *)hwpgd;
> > +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
> > +                       pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
> > +       }
> 
> It could be slightly shorter as (I can't guarantee clearer ;)):
> 
> 	for (i = 0; i < PTRS_PER_S2_PGD; i++) {
> 		if (KVM_PREALLOC_LEVEL == 1)
> 			pgd_populate(NULL, pgd + i,
> 				     (pud_t *)hwpgd + i * PTRS_PER_PUD);
> 		else if (KVM_PREALLOC_LEVEL == 2)
> 			pud_populate(NULL, pud_offset(pgd, 0) + i,
> 				     (pmd_t *)hwpgd + i * PTRS_PER_PMD)
> 	}
> 
> Or you could write a kvm_populate_swpgd() to handle the ifs and casting.
> 

I actually quite like this, let's see how it looks in the next revision
and if people really dislike it, we can look at factoring it out
further.

> > +
> > +       return 0;
> > +}
> > +
> > +static inline void *kvm_get_hwpgd(struct kvm *kvm)
> > +{
> > +       pgd_t *pgd = kvm->arch.pgd;
> > +       pud_t *pud;
> > +       pmd_t *pmd;
> > +
> > +       switch (KVM_PREALLOC_LEVEL) {
> > +       case 0:
> > +               return pgd;
> > +       case 1:
> > +               pud = pud_offset(pgd, 0);
> > +               return pud;
> > +       case 2:
> > +               pud = pud_offset(pgd, 0);
> > +               pmd = pmd_offset(pud, 0);
> > +               return pmd;
> > +       default:
> > +               BUG();
> > +               return NULL;
> > +       }
> 
> 	/* not needed? Use BUG_ON or BUILD_BUG_ON */
> 	if (KVM_PREALLOC_LEVEL == 0)
> 		return pgd;
> 
> 	pud = pud_offset(pgd, 0);
> 	if (KVM_PREALLOC_LEVEL == 1)
> 		return pud;
> 
> 	return pmd_offset(pud, 0);

I like this, but...

> 
> You don't need KVM_PREALLOC_LEVEL == 0 case since this function wouldn't
> be called. So you could do with some (BUILD_)BUG_ON and 4 lines after.
> 
It is needed and it is called from arch/arm/kvm/arm.c in update_vttbr().

Thanks!
-Christoffer

WARNING: multiple messages have this Message-ID (diff)

From: Christoffer Dall <christoffer.dall@linaro.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"jungseoklee85@gmail.com" <jungseoklee85@gmail.com>,
	Joel Schopp <joel.schopp@amd.com>
Subject: Re: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
Date: Thu, 9 Oct 2014 13:01:37 +0200	[thread overview]
Message-ID: <20141009110137.GX3717@cbox> (raw)
In-Reply-To: <20141008094703.GB5906@e104818-lin.cambridge.arm.com>

On Wed, Oct 08, 2014 at 10:47:04AM +0100, Catalin Marinas wrote:
> On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote:
> > I came up with the following based on your feedback, but I personally
> > don't find it a lot easier to read than what I had already.  Suggestions
> > are welcome:
> 
> At least PTRS_PER_S2_PGD and KVM_PREALLOC_LEVEL are clearer to me as
> formulas than the magic numbers.
> 

Agreed.

> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index a030d16..7941a51 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> [...]
> > +/*
> > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
> > + * the entire IPA input range with a single pgd entry, and we would only need
> > + * one pgd entry.
> > + */
> 
> It may be worth here stating that this pgd is actually fake (covered
> below as well). Maybe something like "single (fake) pgd entry".
> 

Yes.

> > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT
> > +#define PTRS_PER_S2_PGD                (1)
> > +#else
> > +#define PTRS_PER_S2_PGD                (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
> > +#endif
> > +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > 
> > +/*
> > + * If we are concatenating first level stage-2 page tables, we would have less
> > + * than or equal to 16 pointers in the fake PGD, because that's what the
> > + * architecture allows.  In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
> > + * represents the first level for the host, and we add 1 to go to the next
> > + * level (which uses contatenation) for the stage-2 tables.
> > + */
> > +#if PTRS_PER_S2_PGD <= 16
> > +#define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> > +#else
> > +#define KVM_PREALLOC_LEVEL     (0)
> > +#endif
> > +
> > +/**
> > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
> > + * @kvm:       The KVM struct pointer for the VM.
> > + * @pgd:       The kernel pseudo pgd
> > + *
> > + * When the kernel uses more levels of page tables than the guest, we allocate
> > + * a fake PGD and pre-populate it to point to the next-level page table, which
> > + * will be the real initial page table pointed to by the VTTBR.
> > + *
> > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
> > + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
> > + * allocate 2 consecutive PUD pages.
> > + */
> 
> I don't have a strong preference here, if you find the code easier to
> read as separate kvm_prealloc_hwpgd() functions, use those, as per your
> original patch. My point was to no longer define the functions based on
> #if 64K && 3-levels etc. but only on KVM_PREALLOC_LEVEL.
> 
> Anyway, I think the code below looks ok, with some fixes.
> 

I think it's nicer too once I got used to it.

> > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
> > +{
> > +       pud_t *pud;
> > +       pmd_t *pmd;
> > +       unsigned int order, i;
> > +       unsigned long hwpgd;
> > +
> > +       if (KVM_PREALLOC_LEVEL == 0)
> > +               return 0;
> > +
> > +       order = get_order(PTRS_PER_S2_PGD);
> 
> Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD
> is 16 or less and the order should not be used.
> 

no, if the kernel has 4K pages and 4 levels, then PGDIR_SHIFT is 39, and
KVM_PHYS_SHIFT stays 40, so that means PTRS_PER_S2_PGD becomes 2, which
means we concatenate two first level stage-2 page tables, which means we
need to allocate two consecutive pages, giving us an order of 1, not 0.

That's exactly why we use get_order(PTRS_PER_S2_PGD) instead of
S2_PGD_ORDER, which is only used when we're not doing the fake PGD trick
(see my response to Marc's mail).

> > +       hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> 
> I assume you need __get_free_pages() for alignment.
> 

yes, would you prefer a comment to that fact?

> > +       if (!hwpgd)
> > +               return -ENOMEM;
> > +
> > +       if (KVM_PREALLOC_LEVEL == 1) {
> > +               pud = (pud_t *)hwpgd;
> > +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
> > +                       pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
> > +       } else if (KVM_PREALLOC_LEVEL == 2) {
> > +               pud = pud_offset(pgd, 0);
> > +               pmd = (pmd_t *)hwpgd;
> > +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
> > +                       pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
> > +       }
> 
> It could be slightly shorter as (I can't guarantee clearer ;)):
> 
> 	for (i = 0; i < PTRS_PER_S2_PGD; i++) {
> 		if (KVM_PREALLOC_LEVEL == 1)
> 			pgd_populate(NULL, pgd + i,
> 				     (pud_t *)hwpgd + i * PTRS_PER_PUD);
> 		else if (KVM_PREALLOC_LEVEL == 2)
> 			pud_populate(NULL, pud_offset(pgd, 0) + i,
> 				     (pmd_t *)hwpgd + i * PTRS_PER_PMD)
> 	}
> 
> Or you could write a kvm_populate_swpgd() to handle the ifs and casting.
> 

I actually quite like this, let's see how it looks in the next revision
and if people really dislike it, we can look at factoring it out
further.

> > +
> > +       return 0;
> > +}
> > +
> > +static inline void *kvm_get_hwpgd(struct kvm *kvm)
> > +{
> > +       pgd_t *pgd = kvm->arch.pgd;
> > +       pud_t *pud;
> > +       pmd_t *pmd;
> > +
> > +       switch (KVM_PREALLOC_LEVEL) {
> > +       case 0:
> > +               return pgd;
> > +       case 1:
> > +               pud = pud_offset(pgd, 0);
> > +               return pud;
> > +       case 2:
> > +               pud = pud_offset(pgd, 0);
> > +               pmd = pmd_offset(pud, 0);
> > +               return pmd;
> > +       default:
> > +               BUG();
> > +               return NULL;
> > +       }
> 
> 	/* not needed? Use BUG_ON or BUILD_BUG_ON */
> 	if (KVM_PREALLOC_LEVEL == 0)
> 		return pgd;
> 
> 	pud = pud_offset(pgd, 0);
> 	if (KVM_PREALLOC_LEVEL == 1)
> 		return pud;
> 
> 	return pmd_offset(pud, 0);

I like this, but...

> 
> You don't need KVM_PREALLOC_LEVEL == 0 case since this function wouldn't
> be called. So you could do with some (BUILD_)BUG_ON and 4 lines after.
> 
It is needed and it is called from arch/arm/kvm/arm.c in update_vttbr().

Thanks!
-Christoffer

next prev parent reply	other threads:[~2014-10-09 11:01 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall
2014-10-06 20:30 ` Christoffer Dall
2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall
2014-10-06 20:30   ` Christoffer Dall
2014-10-07 10:48   ` Catalin Marinas
2014-10-07 10:48     ` Catalin Marinas
2014-10-07 13:28     ` Marc Zyngier
2014-10-07 13:28       ` Marc Zyngier
2014-10-07 19:39       ` Christoffer Dall
2014-10-07 19:39         ` Christoffer Dall
2014-10-08  9:34         ` Marc Zyngier
2014-10-08  9:34           ` Marc Zyngier
2014-10-08  9:47           ` Christoffer Dall
2014-10-08  9:47             ` Christoffer Dall
2014-10-08 10:27             ` Marc Zyngier
2014-10-08 10:27               ` Marc Zyngier
2014-10-08  9:47         ` Catalin Marinas
2014-10-08  9:47           ` Catalin Marinas
2014-10-09 11:01           ` Christoffer Dall [this message]
2014-10-09 11:01             ` Christoffer Dall
2014-10-09 13:36             ` Catalin Marinas
2014-10-09 13:36               ` Catalin Marinas
2014-10-10  8:16               ` Christoffer Dall
2014-10-10  8:16                 ` Christoffer Dall
2014-10-07 13:40   ` Marc Zyngier
2014-10-07 13:40     ` Marc Zyngier
2014-10-08  9:48     ` Christoffer Dall
2014-10-08  9:48       ` Christoffer Dall
2014-10-06 20:30 ` [PATCH v2 2/3] arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE Christoffer Dall
2014-10-06 20:30   ` Christoffer Dall
2014-10-06 20:30 ` [PATCH v2 3/3] arm64: Allow 48-bits VA space without ARM_SMMU Christoffer Dall
2014-10-06 20:30   ` Christoffer Dall
2014-10-07  9:24 ` [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Catalin Marinas
2014-10-07  9:24   ` Catalin Marinas
2014-10-07  9:36   ` Christoffer Dall
2014-10-07  9:36     ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141009110137.GX3717@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.