linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit
@ 2013-04-08 15:36 Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 1/7] ARM: KVM: simplify HYP mapping population Marc Zyngier
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

Over the past few weeks, I've gradually realized how broken our HYP
idmap code is. Badly broken.

The main problem is about supporting CPU hotplug. Imagine a CPU being
initialized normally, running VMs, and then being powered down. So
far, so good. Now mentally bring it back online. The CPU will come
back via the secondary CPU boot path, and then what? We cannot use it
anymore, because we need an idmap which is long gone, and because our
page tables are now live, containing the world-switch code, VM
structures, and other bits and pieces.

Another fun issue is that we don't have any TLB invalidation in the
HYP init code. And guess what? we cannot do it! HYP TLB invalidation
has to occur in HYP, and once we've installed the runtime page tables,
it is already too late. It is actually fairly easy to construct a
scenario where idmap and runtime pages have colliding translations.

The nail on the coffin was provided by Catalin Marinas who told me how
much he disliked the arm64 HYP idmap code, and made me realize that we
already have all the necessary code in arch/arm/kvm/mmu.c. It just
needs a tiny bit of care and affection. With a chainsaw.

The solution to the first two issues is a bit tricky, but doesn't
involve a lot of code. The hotplug problem mandates that we keep two
sets of page tables (boot and runtime). The TLB problem mandates that
we're able to transition from one PGD to another while in HYP,
invalidating the TLBs in the process.

To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All
we need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel
  text

The vectors page VA seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical
  memory, it is unlikely to use the last 64kB (I doubt we'll ever
  support KVM on a system with something like 4MB of RAM, but patches
  are very welcome).

Let's call this VA the trampoline VA.

Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd

The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
  runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the
  same physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
  page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).

Once we have this infrastructure in place, supporting CPU hot-plug is
a piece of cake. Just wire a cpu-notifier in the existing code.

This has been tested on both arm (VE TC2) and arm64 (Foundation Model).

>From v1:
- Reduce the mandatory alignement of the init code page, and use a
  bounce page if the code is crossing a page boundary.
- Add some more comments to the code (as requested by Christoffer)
- Fixed compilation issue with older toolchains (spotted by Geoff)

Marc Zyngier (7):
  ARM: KVM: simplify HYP mapping population
  ARM: KVM: fix HYP mapping limitations around zero
  ARM: KVM: move to a KVM provided HYP idmap
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: parametrize HYP page table freeing
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: perform HYP initilization for hotplugged CPUs

 arch/arm/include/asm/idmap.h    |   1 -
 arch/arm/include/asm/kvm_host.h |  18 ++-
 arch/arm/include/asm/kvm_mmu.h  |  27 +++-
 arch/arm/kernel/vmlinux.lds.S   |   7 +-
 arch/arm/kvm/arm.c              |  58 ++++----
 arch/arm/kvm/init.S             |  44 ++++++-
 arch/arm/kvm/mmu.c              | 286 ++++++++++++++++++++++++----------------
 arch/arm/mm/idmap.c             |  32 +----
 8 files changed, 294 insertions(+), 179 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/7] ARM: KVM: simplify HYP mapping population
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 2/7] ARM: KVM: fix HYP mapping limitations around zero Marc Zyngier
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

The way we populate HYP mappings is a bit convoluted, to say the least.
Passing a pointer around to keep track of the current PFN is quite
odd, and we end-up having two different PTE accessors for no good
reason.

Simplify the whole thing by unifying the two PTE accessors, passing
a pgprot_t around, and moving the various validity checks to the
upper layers.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c | 102 ++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2f12e40..3003321 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -125,54 +125,34 @@ void free_hyp_pmds(void)
 }
 
 static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
-				    unsigned long end)
+				    unsigned long end, unsigned long pfn,
+				    pgprot_t prot)
 {
 	pte_t *pte;
 	unsigned long addr;
-	struct page *page;
 
-	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
-		unsigned long hyp_addr = KERN_TO_HYP(addr);
-
-		pte = pte_offset_kernel(pmd, hyp_addr);
-		BUG_ON(!virt_addr_valid(addr));
-		page = virt_to_page(addr);
-		kvm_set_pte(pte, mk_pte(page, PAGE_HYP));
-	}
-}
-
-static void create_hyp_io_pte_mappings(pmd_t *pmd, unsigned long start,
-				       unsigned long end,
-				       unsigned long *pfn_base)
-{
-	pte_t *pte;
-	unsigned long addr;
-
-	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
-		unsigned long hyp_addr = KERN_TO_HYP(addr);
-
-		pte = pte_offset_kernel(pmd, hyp_addr);
-		BUG_ON(pfn_valid(*pfn_base));
-		kvm_set_pte(pte, pfn_pte(*pfn_base, PAGE_HYP_DEVICE));
-		(*pfn_base)++;
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		kvm_set_pte(pte, pfn_pte(pfn, prot));
+		pfn++;
 	}
 }
 
 static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
-				   unsigned long end, unsigned long *pfn_base)
+				   unsigned long end, unsigned long pfn,
+				   pgprot_t prot)
 {
 	pmd_t *pmd;
 	pte_t *pte;
 	unsigned long addr, next;
 
 	for (addr = start; addr < end; addr = next) {
-		unsigned long hyp_addr = KERN_TO_HYP(addr);
-		pmd = pmd_offset(pud, hyp_addr);
+		pmd = pmd_offset(pud, addr);
 
 		BUG_ON(pmd_sect(*pmd));
 
 		if (pmd_none(*pmd)) {
-			pte = pte_alloc_one_kernel(NULL, hyp_addr);
+			pte = pte_alloc_one_kernel(NULL, addr);
 			if (!pte) {
 				kvm_err("Cannot allocate Hyp pte\n");
 				return -ENOMEM;
@@ -182,25 +162,17 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
 
 		next = pmd_addr_end(addr, end);
 
-		/*
-		 * If pfn_base is NULL, we map kernel pages into HYP with the
-		 * virtual address. Otherwise, this is considered an I/O
-		 * mapping and we map the physical region starting at
-		 * *pfn_base to [start, end[.
-		 */
-		if (!pfn_base)
-			create_hyp_pte_mappings(pmd, addr, next);
-		else
-			create_hyp_io_pte_mappings(pmd, addr, next, pfn_base);
+		create_hyp_pte_mappings(pmd, addr, next, pfn, prot);
+		pfn += (next - addr) >> PAGE_SHIFT;
 	}
 
 	return 0;
 }
 
-static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
+static int __create_hyp_mappings(pgd_t *pgdp,
+				 unsigned long start, unsigned long end,
+				 unsigned long pfn, pgprot_t prot)
 {
-	unsigned long start = (unsigned long)from;
-	unsigned long end = (unsigned long)to;
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -209,21 +181,14 @@ static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
 
 	if (start >= end)
 		return -EINVAL;
-	/* Check for a valid kernel memory mapping */
-	if (!pfn_base && (!virt_addr_valid(from) || !virt_addr_valid(to - 1)))
-		return -EINVAL;
-	/* Check for a valid kernel IO mapping */
-	if (pfn_base && (!is_vmalloc_addr(from) || !is_vmalloc_addr(to - 1)))
-		return -EINVAL;
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
-	for (addr = start; addr < end; addr = next) {
-		unsigned long hyp_addr = KERN_TO_HYP(addr);
-		pgd = hyp_pgd + pgd_index(hyp_addr);
-		pud = pud_offset(pgd, hyp_addr);
+	for (addr = start & PAGE_MASK; addr < end; addr = next) {
+		pgd = pgdp + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
 
 		if (pud_none_or_clear_bad(pud)) {
-			pmd = pmd_alloc_one(NULL, hyp_addr);
+			pmd = pmd_alloc_one(NULL, addr);
 			if (!pmd) {
 				kvm_err("Cannot allocate Hyp pmd\n");
 				err = -ENOMEM;
@@ -233,9 +198,10 @@ static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
 		}
 
 		next = pgd_addr_end(addr, end);
-		err = create_hyp_pmd_mappings(pud, addr, next, pfn_base);
+		err = create_hyp_pmd_mappings(pud, addr, next, pfn, prot);
 		if (err)
 			goto out;
+		pfn += (next - addr) >> PAGE_SHIFT;
 	}
 out:
 	mutex_unlock(&kvm_hyp_pgd_mutex);
@@ -255,22 +221,38 @@ out:
  */
 int create_hyp_mappings(void *from, void *to)
 {
-	return __create_hyp_mappings(from, to, NULL);
+	unsigned long phys_addr = virt_to_phys(from);
+	unsigned long start = KERN_TO_HYP((unsigned long)from);
+	unsigned long end = KERN_TO_HYP((unsigned long)to);
+
+	/* Check for a valid kernel memory mapping */
+	if (!virt_addr_valid(from) || !virt_addr_valid(to - 1))
+		return -EINVAL;
+
+	return __create_hyp_mappings(hyp_pgd, start, end,
+				     __phys_to_pfn(phys_addr), PAGE_HYP);
 }
 
 /**
  * create_hyp_io_mappings - duplicate a kernel IO mapping into Hyp mode
  * @from:	The kernel start VA of the range
  * @to:		The kernel end VA of the range (exclusive)
- * @addr:	The physical start address which gets mapped
+ * @phys_addr:	The physical start address which gets mapped
  *
  * The resulting HYP VA is the same as the kernel VA, modulo
  * HYP_PAGE_OFFSET.
  */
-int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr)
 {
-	unsigned long pfn = __phys_to_pfn(addr);
-	return __create_hyp_mappings(from, to, &pfn);
+	unsigned long start = KERN_TO_HYP((unsigned long)from);
+	unsigned long end = KERN_TO_HYP((unsigned long)to);
+
+	/* Check for a valid kernel IO mapping */
+	if (!is_vmalloc_addr(from) || !is_vmalloc_addr(to - 1))
+		return -EINVAL;
+
+	return __create_hyp_mappings(hyp_pgd, start, end,
+				     __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE);
 }
 
 /**
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/7] ARM: KVM: fix HYP mapping limitations around zero
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 1/7] ARM: KVM: simplify HYP mapping population Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 3/7] ARM: KVM: move to a KVM provided HYP idmap Marc Zyngier
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

The current code for creating HYP mapping doesn't like to wrap
around zero, which prevents from mapping anything into the last
page of the virtual address space.

It doesn't take much effort to remove this limitation, making
the code more consistent with the rest of the kernel in the process.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3003321..96d61da 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -131,11 +131,12 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
 	pte_t *pte;
 	unsigned long addr;
 
-	for (addr = start; addr < end; addr += PAGE_SIZE) {
+	addr = start;
+	do {
 		pte = pte_offset_kernel(pmd, addr);
 		kvm_set_pte(pte, pfn_pte(pfn, prot));
 		pfn++;
-	}
+	} while (addr += PAGE_SIZE, addr != end);
 }
 
 static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
@@ -146,7 +147,8 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
 	pte_t *pte;
 	unsigned long addr, next;
 
-	for (addr = start; addr < end; addr = next) {
+	addr = start;
+	do {
 		pmd = pmd_offset(pud, addr);
 
 		BUG_ON(pmd_sect(*pmd));
@@ -164,7 +166,7 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
 
 		create_hyp_pte_mappings(pmd, addr, next, pfn, prot);
 		pfn += (next - addr) >> PAGE_SHIFT;
-	}
+	} while (addr = next, addr != end);
 
 	return 0;
 }
@@ -179,11 +181,10 @@ static int __create_hyp_mappings(pgd_t *pgdp,
 	unsigned long addr, next;
 	int err = 0;
 
-	if (start >= end)
-		return -EINVAL;
-
 	mutex_lock(&kvm_hyp_pgd_mutex);
-	for (addr = start & PAGE_MASK; addr < end; addr = next) {
+	addr = start & PAGE_MASK;
+	end = PAGE_ALIGN(end);
+	do {
 		pgd = pgdp + pgd_index(addr);
 		pud = pud_offset(pgd, addr);
 
@@ -202,7 +203,7 @@ static int __create_hyp_mappings(pgd_t *pgdp,
 		if (err)
 			goto out;
 		pfn += (next - addr) >> PAGE_SHIFT;
-	}
+	} while (addr = next, addr != end);
 out:
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 	return err;
@@ -216,8 +217,6 @@ out:
  * The same virtual address as the kernel virtual address is also used
  * in Hyp-mode mapping (modulo HYP_PAGE_OFFSET) to the same underlying
  * physical pages.
- *
- * Note: Wrapping around zero in the "to" address is not supported.
  */
 int create_hyp_mappings(void *from, void *to)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/7] ARM: KVM: move to a KVM provided HYP idmap
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 1/7] ARM: KVM: simplify HYP mapping population Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 2/7] ARM: KVM: fix HYP mapping limitations around zero Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 4/7] ARM: KVM: enforce maximum size for identity mapped code Marc Zyngier
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

After the HYP page table rework, it is pretty easy to let the KVM
code provide its own idmap, rather than expecting the kernel to
provide it. It takes actually less code to do so.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/idmap.h   |  1 -
 arch/arm/include/asm/kvm_mmu.h |  1 -
 arch/arm/kvm/mmu.c             | 24 +++++++++++++++++++++++-
 arch/arm/mm/idmap.c            | 32 +-------------------------------
 4 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index 1a66f907..bf863ed 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -8,7 +8,6 @@
 #define __idmap __section(.idmap.text) noinline notrace
 
 extern pgd_t *idmap_pgd;
-extern pgd_t *hyp_pgd;
 
 void setup_mm_for_reboot(void);
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 970f3b5..3c71a1d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -21,7 +21,6 @@
 
 #include <asm/cacheflush.h>
 #include <asm/pgalloc.h>
-#include <asm/idmap.h>
 
 /*
  * We directly use the kernel VA for the HYP, as we can directly share
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 96d61da..bfc5927 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -32,6 +32,7 @@
 
 extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
+static pgd_t *hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
 static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
@@ -715,12 +716,33 @@ phys_addr_t kvm_mmu_get_httbr(void)
 
 int kvm_mmu_init(void)
 {
+	unsigned long hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
+	unsigned long hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
+	int err;
+
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
 	if (!hyp_pgd) {
 		kvm_err("Hyp mode PGD not allocated\n");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* Create the idmap in the boot page tables */
+	err = 	__create_hyp_mappings(boot_hyp_pgd,
+				      hyp_idmap_start, hyp_idmap_end,
+				      __phys_to_pfn(hyp_idmap_start),
+				      PAGE_HYP);
+
+	if (err) {
+		kvm_err("Failed to idmap %lx-%lx\n",
+			hyp_idmap_start, hyp_idmap_end);
+		goto out;
 	}
 
 	return 0;
+out:
+	kfree(hyp_pgd);
+	return err;
 }
 
 /**
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 5ee505c..83cb3ac 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -8,7 +8,6 @@
 #include <asm/pgtable.h>
 #include <asm/sections.h>
 #include <asm/system_info.h>
-#include <asm/virt.h>
 
 pgd_t *idmap_pgd;
 
@@ -83,37 +82,10 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start,
 	} while (pgd++, addr = next, addr != end);
 }
 
-#if defined(CONFIG_ARM_VIRT_EXT) && defined(CONFIG_ARM_LPAE)
-pgd_t *hyp_pgd;
-
-extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
-static int __init init_static_idmap_hyp(void)
-{
-	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
-	if (!hyp_pgd)
-		return -ENOMEM;
-
-	pr_info("Setting up static HYP identity map for 0x%p - 0x%p\n",
-		__hyp_idmap_text_start, __hyp_idmap_text_end);
-	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
-			     __hyp_idmap_text_end, PMD_SECT_AP1);
-
-	return 0;
-}
-#else
-static int __init init_static_idmap_hyp(void)
-{
-	return 0;
-}
-#endif
-
 extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-	int ret;
-
 	idmap_pgd = pgd_alloc(&init_mm);
 	if (!idmap_pgd)
 		return -ENOMEM;
@@ -123,12 +95,10 @@ static int __init init_static_idmap(void)
 	identity_mapping_add(idmap_pgd, __idmap_text_start,
 			     __idmap_text_end, 0);
 
-	ret = init_static_idmap_hyp();
-
 	/* Flush L1 for the hardware to see this page table content */
 	flush_cache_louis();
 
-	return ret;
+	return 0;
 }
 early_initcall(init_static_idmap);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/7] ARM: KVM: enforce maximum size for identity mapped code
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
                   ` (2 preceding siblings ...)
  2013-04-08 15:36 ` [PATCH v2 3/7] ARM: KVM: move to a KVM provided HYP idmap Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 5/7] ARM: KVM: parametrize HYP page table freeing Marc Zyngier
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

We're about to move to an init procedure where we rely on the
fact that the init code fits in a single page. Make sure we
align the idmap text on a vector alignment, and that the code is
not bigger than a single page.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kernel/vmlinux.lds.S | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index b571484..a871b8e 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -20,7 +20,7 @@
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
 	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
-	ALIGN_FUNCTION();						\
+	. = ALIGN(32);							\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
@@ -315,3 +315,8 @@ SECTIONS
  */
 ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
 ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
+/*
+ * The HYP init code can't be more than a page long.
+ * The above comment applies as well.
+ */
+ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/7] ARM: KVM: parametrize HYP page table freeing
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
                   ` (3 preceding siblings ...)
  2013-04-08 15:36 ` [PATCH v2 4/7] ARM: KVM: enforce maximum size for identity mapped code Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-08 15:36 ` [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code Marc Zyngier
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

In order to prepare for having to deal with multiple HYP page tables,
pass the PGD parameter to the function performing the freeing of the
page tables.

Also move the freeing of the PGD itself there, and rename the
free_hyp_pmds to free_hyp_pgds.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h |  2 +-
 arch/arm/kvm/arm.c             |  2 +-
 arch/arm/kvm/mmu.c             | 30 +++++++++++++++++-------------
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 3c71a1d..92eb20d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -32,7 +32,7 @@
 
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
-void free_hyp_pmds(void);
+void free_hyp_pgds(void);
 
 int kvm_alloc_stage2_pgd(struct kvm *kvm);
 void kvm_free_stage2_pgd(struct kvm *kvm);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 47cfcc3..8d44ef4 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -937,7 +937,7 @@ static int init_hyp_mode(void)
 out_free_context:
 	free_percpu(kvm_host_cpu_state);
 out_free_mappings:
-	free_hyp_pmds();
+	free_hyp_pgds();
 out_free_stack_pages:
 	for_each_possible_cpu(cpu)
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index bfc5927..c4236e4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -86,42 +86,46 @@ static void free_ptes(pmd_t *pmd, unsigned long addr)
 	}
 }
 
-static void free_hyp_pgd_entry(unsigned long addr)
+static void free_hyp_pgd_entry(pgd_t *pgdp, unsigned long addr)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	unsigned long hyp_addr = KERN_TO_HYP(addr);
 
-	pgd = hyp_pgd + pgd_index(hyp_addr);
-	pud = pud_offset(pgd, hyp_addr);
+	pgd = pgdp + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
 
 	if (pud_none(*pud))
 		return;
 	BUG_ON(pud_bad(*pud));
 
-	pmd = pmd_offset(pud, hyp_addr);
+	pmd = pmd_offset(pud, addr);
 	free_ptes(pmd, addr);
 	pmd_free(NULL, pmd);
 	pud_clear(pud);
 }
 
 /**
- * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ * free_hyp_pgds - free Hyp-mode page tables
  *
- * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * Assumes hyp_pgd is a page table used strictly in Hyp-mode and therefore contains
  * either mappings in the kernel memory area (above PAGE_OFFSET), or
  * device mappings in the vmalloc range (from VMALLOC_START to VMALLOC_END).
  */
-void free_hyp_pmds(void)
+void free_hyp_pgds(void)
 {
 	unsigned long addr;
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
-	for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
-		free_hyp_pgd_entry(addr);
-	for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
-		free_hyp_pgd_entry(addr);
+
+	if (hyp_pgd) {
+		for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
+			free_hyp_pgd_entry(hyp_pgd, KERN_TO_HYP(addr));
+		for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
+			free_hyp_pgd_entry(hyp_pgd, KERN_TO_HYP(addr));
+		kfree(hyp_pgd);
+	}
+
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
@@ -741,7 +745,7 @@ int kvm_mmu_init(void)
 
 	return 0;
 out:
-	kfree(hyp_pgd);
+	free_hyp_pgds();
 	return err;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
                   ` (4 preceding siblings ...)
  2013-04-08 15:36 ` [PATCH v2 5/7] ARM: KVM: parametrize HYP page table freeing Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-09  9:18   ` Will Deacon
  2013-04-08 15:36 ` [PATCH v2 7/7] ARM: KVM: perform HYP initilization for hotplugged CPUs Marc Zyngier
  2013-04-09  5:42 ` [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Christoffer Dall
  7 siblings, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
  runtime mappings, as pages are manipulated from PL1 exclusively

The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.

To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text

The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
  it is unlikely to use the last 64kB (I doubt we'll ever support KVM
  on a system with something like 4MB of RAM, but patches are very
  welcome).

Let's call this VA the trampoline VA.

Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd

The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
  runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
  physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
  page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).

Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h |  18 ++++--
 arch/arm/include/asm/kvm_mmu.h  |  24 +++++++-
 arch/arm/kvm/arm.c              |  11 ++--
 arch/arm/kvm/init.S             |  44 +++++++++++++-
 arch/arm/kvm/mmu.c              | 129 ++++++++++++++++++++++++++++------------
 5 files changed, 173 insertions(+), 53 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a856cc2..9fe22ee 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -190,22 +190,32 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);
 
-static inline void __cpu_init_hyp_mode(unsigned long long pgd_ptr,
+static inline void __cpu_init_hyp_mode(unsigned long long boot_pgd_ptr,
+				       unsigned long long pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
 	unsigned long pgd_low, pgd_high;
 
-	pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
-	pgd_high = (pgd_ptr >> 32ULL);
+	pgd_low = (boot_pgd_ptr & ((1ULL << 32) - 1));
+	pgd_high = (boot_pgd_ptr >> 32ULL);
 
 	/*
 	 * Call initialization code, and switch to the full blown
 	 * HYP code. The init code doesn't need to preserve these registers as
-	 * r1-r3 and r12 are already callee save according to the AAPCS.
+	 * r1-r3 and r12 are already callee saved according to the AAPCS.
 	 * Note that we slightly misuse the prototype by casing the pgd_low to
 	 * a void *.
+	 *
+	 * We don't have enough registers to perform the full init in one go.
+	 * Install the boot PGD first, and then install the runtime PGD,
+	 * stack pointer and vectors.
 	 */
+	kvm_call_hyp((void *)pgd_low, pgd_high, 0, 0);
+
+	pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
+	pgd_high = (pgd_ptr >> 32ULL);
+
 	kvm_call_hyp((void *)pgd_low, pgd_high, hyp_stack_ptr, vector_ptr);
 }
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 92eb20d..24b767a 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -19,17 +19,29 @@
 #ifndef __ARM_KVM_MMU_H__
 #define __ARM_KVM_MMU_H__
 
-#include <asm/cacheflush.h>
-#include <asm/pgalloc.h>
+#include <asm/memory.h>
+#include <asm/page.h>
 
 /*
  * We directly use the kernel VA for the HYP, as we can directly share
  * the mapping (HTTBR "covers" TTBR1).
  */
-#define HYP_PAGE_OFFSET_MASK	(~0UL)
+#define HYP_PAGE_OFFSET_MASK	UL(~0)
 #define HYP_PAGE_OFFSET		PAGE_OFFSET
 #define KERN_TO_HYP(kva)	(kva)
 
+/*
+ * Our virtual mapping for the boot-time MMU-enable code. Must be
+ * shared across all the page-tables. Conveniently, we use the vectors
+ * page, where no kernel data will ever be shared with HYP.
+ */
+#define TRAMPOLINE_VA		UL(CONFIG_VECTORS_BASE)
+
+#ifndef __ASSEMBLY__
+
+#include <asm/cacheflush.h>
+#include <asm/pgalloc.h>
+
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pgds(void);
@@ -44,6 +56,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
+phys_addr_t kvm_mmu_get_boot_httbr(void);
+phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
@@ -113,4 +127,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
 	}
 }
 
+#define kvm_flush_dcache_to_poc(a,l)	__cpuc_flush_dcache_area((a), (l))
+
+#endif	/* !__ASSEMBLY__ */
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8d44ef4..9010a12 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -796,20 +796,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 static void cpu_init_hyp_mode(void *vector)
 {
+	unsigned long long boot_pgd_ptr;
 	unsigned long long pgd_ptr;
 	unsigned long hyp_stack_ptr;
 	unsigned long stack_page;
 	unsigned long vector_ptr;
 
 	/* Switch from the HYP stub to our own HYP init vector */
-	__hyp_set_vectors((unsigned long)vector);
+	__hyp_set_vectors(kvm_get_idmap_vector());
 
+	boot_pgd_ptr = (unsigned long long)kvm_mmu_get_boot_httbr();
 	pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
 	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)__kvm_hyp_vector;
 
-	__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
+	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 }
 
 /**
@@ -863,11 +865,6 @@ static int init_hyp_mode(void)
 	}
 
 	/*
-	 * Unmap the identity mapping
-	 */
-	kvm_clear_hyp_idmap();
-
-	/*
 	 * Map the Hyp-code called directly from the host
 	 */
 	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 9f37a79..0ca054f 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -21,6 +21,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
 
 /********************************************************************
  * Hypervisor initialization
@@ -28,6 +29,25 @@
  *       r0,r1 = Hypervisor pgd pointer
  *       r2 = top of Hyp stack (kernel VA)
  *       r3 = pointer to hyp vectors
+ *
+ * The init scenario is:
+ * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
+ *   runtime stack, runtime vectors
+ * - Enable the MMU with the boot pgd
+ * - Jump to a target into the trampoline page (remember, this is the same
+ *   physical page!)
+ * - Now switch to the runtime pgd (same VA, and still the same physical
+ *   page!)
+ * - Invalidate TLBs
+ * - Set stack and vectors
+ * - Profit! (or eret, if you only care about the code).
+ *
+ * As we only have four registers available to pass parameters (and we
+ * need six), we split the init in two phases:
+ * - Phase 1: r0,r1 contain the boot PGD, r2 = 0, r3 = 0.
+ *   Provides the basic HYP init, and enable the MMU.
+ * - Phase 2: r0,r1 contain the runtime PGD, r2 = ToS, r3 = vectors
+ *   Switches to the runtime PGD, set stack and vectors.
  */
 
 	.text
@@ -47,6 +67,9 @@ __kvm_hyp_init:
 	W(b)	.
 
 __do_hyp_init:
+	cmp	r2, #0			@ We have a SP?
+	bne	phase2			@ Yes, second stage init
+
 	@ Set the HTTBR to point to the hypervisor PGD pointer passed
 	mcrr	p15, 4, r0, r1, c2
 
@@ -96,14 +119,31 @@ __do_hyp_init:
 	orr	r0, r0, r1
 	isb
 	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
-	isb
 
-	@ Set stack pointer and return to the kernel
+	@ End of init phase-1
+	eret
+
+phase2:
+	@ Set stack pointer
 	mov	sp, r2
 
 	@ Set HVBAR to point to the HYP vectors
 	mcr	p15, 4, r3, c12, c0, 0	@ HVBAR
 
+	@ Jump to the trampoline page
+	ldr	r2, =TRAMPOLINE_VA
+	adr	r3, target
+	bfi	r2, r3, #0, #PAGE_SHIFT
+	mov	pc, r2
+
+target:	@ We're now in the trampoline code, switch page tables
+	mcrr	p15, 4, r0, r1, c2
+	isb
+
+	@ Invalidate the old TLBs
+	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
+	dsb
+
 	eret
 
 	.ltorg
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index c4236e4..b20eff2 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -32,9 +32,15 @@
 
 extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
+static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static void *init_bounce_page;
+static unsigned long hyp_idmap_start;
+static unsigned long hyp_idmap_end;
+static phys_addr_t hyp_idmap_vector;
+
 static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 {
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
@@ -108,9 +114,12 @@ static void free_hyp_pgd_entry(pgd_t *pgdp, unsigned long addr)
 /**
  * free_hyp_pgds - free Hyp-mode page tables
  *
- * Assumes hyp_pgd is a page table used strictly in Hyp-mode and therefore contains
- * either mappings in the kernel memory area (above PAGE_OFFSET), or
- * device mappings in the vmalloc range (from VMALLOC_START to VMALLOC_END).
+ * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
+ * therefore contains either mappings in the kernel memory area (above
+ * PAGE_OFFSET), or device mappings in the vmalloc range (from
+ * VMALLOC_START to VMALLOC_END).
+ *
+ * boot_hyp_pgd should only map two pages for the init code.
  */
 void free_hyp_pgds(void)
 {
@@ -118,6 +127,12 @@ void free_hyp_pgds(void)
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
+	if (boot_hyp_pgd) {
+		free_hyp_pgd_entry(boot_hyp_pgd, hyp_idmap_start);
+		free_hyp_pgd_entry(boot_hyp_pgd, TRAMPOLINE_VA);
+		kfree(boot_hyp_pgd);
+	}
+
 	if (hyp_pgd) {
 		for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
 			free_hyp_pgd_entry(hyp_pgd, KERN_TO_HYP(addr));
@@ -126,6 +141,7 @@ void free_hyp_pgds(void)
 		kfree(hyp_pgd);
 	}
 
+	kfree(init_bounce_page);
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
@@ -718,14 +734,62 @@ phys_addr_t kvm_mmu_get_httbr(void)
 	return virt_to_phys(hyp_pgd);
 }
 
+phys_addr_t kvm_mmu_get_boot_httbr(void)
+{
+	VM_BUG_ON(!virt_addr_valid(boot_hyp_pgd));
+	return virt_to_phys(boot_hyp_pgd);
+}
+
+phys_addr_t kvm_get_idmap_vector(void)
+{
+	return hyp_idmap_vector;
+}
+
 int kvm_mmu_init(void)
 {
-	unsigned long hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
-	unsigned long hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
 	int err;
 
+	hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
+	hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
+	hyp_idmap_vector = virt_to_phys(__kvm_hyp_init);
+
+	if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
+		/*
+		 * Our init code is crossing a page boundary. Allocate
+		 * a bounce page, copy the code over and use that.
+		 */
+		size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
+		phys_addr_t phys_base;
+
+		init_bounce_page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+		if (!init_bounce_page) {
+			kvm_err("Couldn't allocate HYP init bounce page\n");
+			err = -ENOMEM;
+			goto out;
+		}
+
+		memcpy(init_bounce_page, __hyp_idmap_text_start, len);
+		/*
+		 * Warning: the code we just copied to the bounce page
+		 * must be flushed to the point of coherency.
+		 * Otherwise, the data may be sitting in L2, and HYP
+		 * mode won't be able to observe it as it runs with
+		 * caches off at that point.
+		 */
+		kvm_flush_dcache_to_poc(init_bounce_page, len);
+
+		phys_base = virt_to_phys(init_bounce_page);
+		hyp_idmap_vector += phys_base - hyp_idmap_start;
+		hyp_idmap_start = phys_base;
+		hyp_idmap_end = phys_base + len;
+
+		kvm_info("Using HYP init bounce page @%lx\n",
+			 (unsigned long)phys_base);
+	}
+
 	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
-	if (!hyp_pgd) {
+	boot_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd || !boot_hyp_pgd) {
 		kvm_err("Hyp mode PGD not allocated\n");
 		err = -ENOMEM;
 		goto out;
@@ -743,39 +807,30 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
+	/* Map the very same page at the trampoline VA */
+	err = 	__create_hyp_mappings(boot_hyp_pgd,
+				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
+				      __phys_to_pfn(hyp_idmap_start),
+				      PAGE_HYP);
+	if (err) {
+		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
+			TRAMPOLINE_VA);
+		goto out;
+	}
+
+	/* Map the same page again into the runtime page tables */
+	err = 	__create_hyp_mappings(hyp_pgd,
+				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
+				      __phys_to_pfn(hyp_idmap_start),
+				      PAGE_HYP);
+	if (err) {
+		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
+			TRAMPOLINE_VA);
+		goto out;
+	}
+
 	return 0;
 out:
 	free_hyp_pgds();
 	return err;
 }
-
-/**
- * kvm_clear_idmap - remove all idmaps from the hyp pgd
- *
- * Free the underlying pmds for all pgds in range and clear the pgds (but
- * don't free them) afterwards.
- */
-void kvm_clear_hyp_idmap(void)
-{
-	unsigned long addr, end;
-	unsigned long next;
-	pgd_t *pgd = hyp_pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-
-	addr = virt_to_phys(__hyp_idmap_text_start);
-	end = virt_to_phys(__hyp_idmap_text_end);
-
-	pgd += pgd_index(addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		pud = pud_offset(pgd, addr);
-		pmd = pmd_offset(pud, addr);
-
-		pud_clear(pud);
-		kvm_clean_pmd_entry(pmd);
-		pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
-	} while (pgd++, addr = next, addr < end);
-}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 7/7] ARM: KVM: perform HYP initilization for hotplugged CPUs
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
                   ` (5 preceding siblings ...)
  2013-04-08 15:36 ` [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code Marc Zyngier
@ 2013-04-08 15:36 ` Marc Zyngier
  2013-04-09  5:42 ` [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Christoffer Dall
  7 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-08 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have the necessary infrastructure to boot a hotplugged CPU
at any point in time, wire a CPU notifier that will perform the HYP
init for the incoming CPU.

Note that this depends on the platform code and/or firmware to boot the
incoming CPU with HYP mode enabled and return to the kernel by following
the normal boot path (HYP stub installed).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9010a12..8bee0bb 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -16,6 +16,7 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/cpu.h>
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/kvm_host.h>
@@ -794,7 +795,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	}
 }
 
-static void cpu_init_hyp_mode(void *vector)
+static void cpu_init_hyp_mode(void *dummy)
 {
 	unsigned long long boot_pgd_ptr;
 	unsigned long long pgd_ptr;
@@ -814,12 +815,28 @@ static void cpu_init_hyp_mode(void *vector)
 	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 }
 
+static int hyp_init_cpu_notify(struct notifier_block *self,
+			       unsigned long action, void *cpu)
+{
+	switch (action) {
+	case CPU_STARTING:
+	case CPU_STARTING_FROZEN:
+		cpu_init_hyp_mode(NULL);
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block hyp_init_cpu_nb = {
+	.notifier_call = hyp_init_cpu_notify,
+};
+
 /**
  * Inits Hyp-mode on all online CPUs
  */
 static int init_hyp_mode(void)
 {
-	phys_addr_t init_phys_addr;
 	int cpu;
 	int err = 0;
 
@@ -852,19 +869,6 @@ static int init_hyp_mode(void)
 	}
 
 	/*
-	 * Execute the init code on each CPU.
-	 *
-	 * Note: The stack is not mapped yet, so don't do anything else than
-	 * initializing the hypervisor mode on each CPU using a local stack
-	 * space for temporary storage.
-	 */
-	init_phys_addr = virt_to_phys(__kvm_hyp_init);
-	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, cpu_init_hyp_mode,
-					 (void *)(long)init_phys_addr, 1);
-	}
-
-	/*
 	 * Map the Hyp-code called directly from the host
 	 */
 	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
@@ -909,6 +913,11 @@ static int init_hyp_mode(void)
 	}
 
 	/*
+	 * Execute the init code on each CPU.
+	 */
+	on_each_cpu(cpu_init_hyp_mode, NULL, 1);
+
+	/*
 	 * Init HYP view of VGIC
 	 */
 	err = kvm_vgic_hyp_init();
@@ -964,6 +973,12 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	err = register_cpu_notifier(&hyp_init_cpu_nb);
+	if (err) {
+		kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
+		goto out_err;
+	}
+
 	kvm_coproc_table_init();
 	return 0;
 out_err:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit
  2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
                   ` (6 preceding siblings ...)
  2013-04-08 15:36 ` [PATCH v2 7/7] ARM: KVM: perform HYP initilization for hotplugged CPUs Marc Zyngier
@ 2013-04-09  5:42 ` Christoffer Dall
  7 siblings, 0 replies; 13+ messages in thread
From: Christoffer Dall @ 2013-04-09  5:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 8, 2013 at 8:36 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Over the past few weeks, I've gradually realized how broken our HYP
> idmap code is. Badly broken.
>
> The main problem is about supporting CPU hotplug. Imagine a CPU being
> initialized normally, running VMs, and then being powered down. So
> far, so good. Now mentally bring it back online. The CPU will come
> back via the secondary CPU boot path, and then what? We cannot use it
> anymore, because we need an idmap which is long gone, and because our
> page tables are now live, containing the world-switch code, VM
> structures, and other bits and pieces.
>
> Another fun issue is that we don't have any TLB invalidation in the
> HYP init code. And guess what? we cannot do it! HYP TLB invalidation
> has to occur in HYP, and once we've installed the runtime page tables,
> it is already too late. It is actually fairly easy to construct a
> scenario where idmap and runtime pages have colliding translations.
>
> The nail on the coffin was provided by Catalin Marinas who told me how
> much he disliked the arm64 HYP idmap code, and made me realize that we
> already have all the necessary code in arch/arm/kvm/mmu.c. It just
> needs a tiny bit of care and affection. With a chainsaw.
>
> The solution to the first two issues is a bit tricky, but doesn't
> involve a lot of code. The hotplug problem mandates that we keep two
> sets of page tables (boot and runtime). The TLB problem mandates that
> we're able to transition from one PGD to another while in HYP,
> invalidating the TLBs in the process.
>
> To be able to do this, we need to share a page between the two page
> tables. A page that will have the same VA in both configurations. All
> we need is a VA that has the following properties:
> - This VA can't be used to represent a kernel mapping.
> - This VA will not conflict with the physical address of the kernel
>   text
>
> The vectors page VA seems to satisfy this requirement:
> - The kernel never maps anything else there
> - The kernel text being copied at the beginning of the physical
>   memory, it is unlikely to use the last 64kB (I doubt we'll ever
>   support KVM on a system with something like 4MB of RAM, but patches
>   are very welcome).
>
> Let's call this VA the trampoline VA.
>
> Now, we map our init page at 3 locations:
> - idmap in the boot pgd
> - trampoline VA in the boot pgd
> - trampoline VA in the runtime pgd
>
> The init scenario is now the following:
> - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>   runtime stack, runtime vectors
> - Enable the MMU with the boot pgd
> - Jump to a target into the trampoline page (remember, this is the
>   same physical page!)
> - Now switch to the runtime pgd (same VA, and still the same physical
>   page!)
> - Invalidate TLBs
> - Set stack and vectors
> - Profit! (or eret, if you only care about the code).
>
> Once we have this infrastructure in place, supporting CPU hot-plug is
> a piece of cake. Just wire a cpu-notifier in the existing code.
>
> This has been tested on both arm (VE TC2) and arm64 (Foundation Model).
>
> From v1:
> - Reduce the mandatory alignement of the init code page, and use a
>   bounce page if the code is crossing a page boundary.
> - Add some more comments to the code (as requested by Christoffer)
> - Fixed compilation issue with older toolchains (spotted by Geoff)
>

The whole thing looks good to me,
-Christoffer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code
  2013-04-08 15:36 ` [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code Marc Zyngier
@ 2013-04-09  9:18   ` Will Deacon
  2013-04-09 10:42     ` Marc Zyngier
  0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2013-04-09  9:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 08, 2013 at 04:36:42PM +0100, Marc Zyngier wrote:
> Our HYP init code suffers from two major design issues:
> - it cannot support CPU hotplug, as we tear down the idmap very early
> - it cannot perform a TLB invalidation when switching from init to
>   runtime mappings, as pages are manipulated from PL1 exclusively
> 
> The hotplug problem mandates that we keep two sets of page tables
> (boot and runtime). The TLB problem mandates that we're able to
> transition from one PGD to another while in HYP, invalidating the TLBs
> in the process.
> 
> To be able to do this, we need to share a page between the two page
> tables. A page that will have the same VA in both configurations. All we
> need is a VA that has the following properties:
> - This VA can't be used to represent a kernel mapping.
> - This VA will not conflict with the physical address of the kernel text
> 
> The vectors page seems to satisfy this requirement:
> - The kernel never maps anything else there
> - The kernel text being copied at the beginning of the physical memory,
>   it is unlikely to use the last 64kB (I doubt we'll ever support KVM
>   on a system with something like 4MB of RAM, but patches are very
>   welcome).
> 
> Let's call this VA the trampoline VA.
> 
> Now, we map our init page at 3 locations:
> - idmap in the boot pgd
> - trampoline VA in the boot pgd
> - trampoline VA in the runtime pgd
> 
> The init scenario is now the following:
> - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>   runtime stack, runtime vectors
> - Enable the MMU with the boot pgd
> - Jump to a target into the trampoline page (remember, this is the same
>   physical page!)
> - Now switch to the runtime pgd (same VA, and still the same physical
>   page!)
> - Invalidate TLBs
> - Set stack and vectors
> - Profit! (or eret, if you only care about the code).
> 
> Note that we keep the boot mapping permanently (it is not strictly an
> idmap anymore) to allow for CPU hotplug in later patches.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h |  18 ++++--
>  arch/arm/include/asm/kvm_mmu.h  |  24 +++++++-
>  arch/arm/kvm/arm.c              |  11 ++--
>  arch/arm/kvm/init.S             |  44 +++++++++++++-
>  arch/arm/kvm/mmu.c              | 129 ++++++++++++++++++++++++++++------------
>  5 files changed, 173 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index a856cc2..9fe22ee 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -190,22 +190,32 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>                 int exception_index);
> 
> -static inline void __cpu_init_hyp_mode(unsigned long long pgd_ptr,
> +static inline void __cpu_init_hyp_mode(unsigned long long boot_pgd_ptr,
> +                                      unsigned long long pgd_ptr,
>                                        unsigned long hyp_stack_ptr,
>                                        unsigned long vector_ptr)
>  {
>         unsigned long pgd_low, pgd_high;
> 
> -       pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
> -       pgd_high = (pgd_ptr >> 32ULL);
> +       pgd_low = (boot_pgd_ptr & ((1ULL << 32) - 1));
> +       pgd_high = (boot_pgd_ptr >> 32ULL);
> 
>         /*
>          * Call initialization code, and switch to the full blown
>          * HYP code. The init code doesn't need to preserve these registers as
> -        * r1-r3 and r12 are already callee save according to the AAPCS.
> +        * r1-r3 and r12 are already callee saved according to the AAPCS.
>          * Note that we slightly misuse the prototype by casing the pgd_low to
>          * a void *.
> +        *
> +        * We don't have enough registers to perform the full init in one go.
> +        * Install the boot PGD first, and then install the runtime PGD,
> +        * stack pointer and vectors.
>          */
> +       kvm_call_hyp((void *)pgd_low, pgd_high, 0, 0);
> +
> +       pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
> +       pgd_high = (pgd_ptr >> 32ULL);

It might be worth macro-ising the low/high extraction operations now that
you're using them twice. Hell, you could even make them big-endian clean!

>         kvm_call_hyp((void *)pgd_low, pgd_high, hyp_stack_ptr, vector_ptr);
>  }
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 92eb20d..24b767a 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -19,17 +19,29 @@
>  #ifndef __ARM_KVM_MMU_H__
>  #define __ARM_KVM_MMU_H__
> 
> -#include <asm/cacheflush.h>
> -#include <asm/pgalloc.h>
> +#include <asm/memory.h>
> +#include <asm/page.h>
> 
>  /*
>   * We directly use the kernel VA for the HYP, as we can directly share
>   * the mapping (HTTBR "covers" TTBR1).
>   */
> -#define HYP_PAGE_OFFSET_MASK   (~0UL)
> +#define HYP_PAGE_OFFSET_MASK   UL(~0)
>  #define HYP_PAGE_OFFSET                PAGE_OFFSET
>  #define KERN_TO_HYP(kva)       (kva)
> 
> +/*
> + * Our virtual mapping for the boot-time MMU-enable code. Must be
> + * shared across all the page-tables. Conveniently, we use the vectors
> + * page, where no kernel data will ever be shared with HYP.
> + */
> +#define TRAMPOLINE_VA          UL(CONFIG_VECTORS_BASE)
> +
> +#ifndef __ASSEMBLY__
> +
> +#include <asm/cacheflush.h>
> +#include <asm/pgalloc.h>
> +
>  int create_hyp_mappings(void *from, void *to);
>  int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>  void free_hyp_pgds(void);
> @@ -44,6 +56,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
> 
>  phys_addr_t kvm_mmu_get_httbr(void);
> +phys_addr_t kvm_mmu_get_boot_httbr(void);
> +phys_addr_t kvm_get_idmap_vector(void);
>  int kvm_mmu_init(void);
>  void kvm_clear_hyp_idmap(void);
> 
> @@ -113,4 +127,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>         }
>  }
> 
> +#define kvm_flush_dcache_to_poc(a,l)   __cpuc_flush_dcache_area((a), (l))
> +
> +#endif /* !__ASSEMBLY__ */
> +
>  #endif /* __ARM_KVM_MMU_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 8d44ef4..9010a12 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -796,20 +796,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
> 
>  static void cpu_init_hyp_mode(void *vector)
>  {
> +       unsigned long long boot_pgd_ptr;
>         unsigned long long pgd_ptr;
>         unsigned long hyp_stack_ptr;
>         unsigned long stack_page;
>         unsigned long vector_ptr;
> 
>         /* Switch from the HYP stub to our own HYP init vector */
> -       __hyp_set_vectors((unsigned long)vector);
> +       __hyp_set_vectors(kvm_get_idmap_vector());
> 
> +       boot_pgd_ptr = (unsigned long long)kvm_mmu_get_boot_httbr();
>         pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();

Strictly speaking, could you avoid using two sets of tables for this? That
is, have code in the trampoline page which remaps the rest of the address
space using the current tables? Not saying it's feasible, just interested.

>         stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
>         hyp_stack_ptr = stack_page + PAGE_SIZE;
>         vector_ptr = (unsigned long)__kvm_hyp_vector;
> 
> -       __cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
> +       __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
>  }
> 
>  /**
> @@ -863,11 +865,6 @@ static int init_hyp_mode(void)
>         }
> 
>         /*
> -        * Unmap the identity mapping
> -        */
> -       kvm_clear_hyp_idmap();
> -
> -       /*
>          * Map the Hyp-code called directly from the host
>          */
>         err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 9f37a79..0ca054f 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -21,6 +21,7 @@
>  #include <asm/asm-offsets.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/kvm_mmu.h>
> 
>  /********************************************************************
>   * Hypervisor initialization
> @@ -28,6 +29,25 @@
>   *       r0,r1 = Hypervisor pgd pointer
>   *       r2 = top of Hyp stack (kernel VA)
>   *       r3 = pointer to hyp vectors
> + *
> + * The init scenario is:
> + * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
> + *   runtime stack, runtime vectors
> + * - Enable the MMU with the boot pgd
> + * - Jump to a target into the trampoline page (remember, this is the same
> + *   physical page!)
> + * - Now switch to the runtime pgd (same VA, and still the same physical
> + *   page!)
> + * - Invalidate TLBs
> + * - Set stack and vectors
> + * - Profit! (or eret, if you only care about the code).
> + *
> + * As we only have four registers available to pass parameters (and we
> + * need six), we split the init in two phases:
> + * - Phase 1: r0,r1 contain the boot PGD, r2 = 0, r3 = 0.
> + *   Provides the basic HYP init, and enable the MMU.
> + * - Phase 2: r0,r1 contain the runtime PGD, r2 = ToS, r3 = vectors
> + *   Switches to the runtime PGD, set stack and vectors.
>   */
> 
>         .text
> @@ -47,6 +67,9 @@ __kvm_hyp_init:
>         W(b)    .
> 
>  __do_hyp_init:
> +       cmp     r2, #0                  @ We have a SP?
> +       bne     phase2                  @ Yes, second stage init
> +
>         @ Set the HTTBR to point to the hypervisor PGD pointer passed
>         mcrr    p15, 4, r0, r1, c2
> 
> @@ -96,14 +119,31 @@ __do_hyp_init:
>         orr     r0, r0, r1
>         isb
>         mcr     p15, 4, r0, c1, c0, 0   @ HSCR
> -       isb
> 
> -       @ Set stack pointer and return to the kernel
> +       @ End of init phase-1
> +       eret
> +
> +phase2:
> +       @ Set stack pointer
>         mov     sp, r2
> 
>         @ Set HVBAR to point to the HYP vectors
>         mcr     p15, 4, r3, c12, c0, 0  @ HVBAR
> 
> +       @ Jump to the trampoline page
> +       ldr     r2, =TRAMPOLINE_VA
> +       adr     r3, target
> +       bfi     r2, r3, #0, #PAGE_SHIFT
> +       mov     pc, r2
> +
> +target:        @ We're now in the trampoline code, switch page tables
> +       mcrr    p15, 4, r0, r1, c2
> +       isb
> +
> +       @ Invalidate the old TLBs
> +       mcr     p15, 4, r0, c8, c7, 0   @ TLBIALLH
> +       dsb
> +
>         eret
> 
>         .ltorg
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index c4236e4..b20eff2 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -32,9 +32,15 @@
> 
>  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
> 
> +static pgd_t *boot_hyp_pgd;
>  static pgd_t *hyp_pgd;
>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
> 
> +static void *init_bounce_page;
> +static unsigned long hyp_idmap_start;
> +static unsigned long hyp_idmap_end;
> +static phys_addr_t hyp_idmap_vector;
> +
>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  {
>         kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
> @@ -108,9 +114,12 @@ static void free_hyp_pgd_entry(pgd_t *pgdp, unsigned long addr)
>  /**
>   * free_hyp_pgds - free Hyp-mode page tables
>   *
> - * Assumes hyp_pgd is a page table used strictly in Hyp-mode and therefore contains
> - * either mappings in the kernel memory area (above PAGE_OFFSET), or
> - * device mappings in the vmalloc range (from VMALLOC_START to VMALLOC_END).
> + * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
> + * therefore contains either mappings in the kernel memory area (above
> + * PAGE_OFFSET), or device mappings in the vmalloc range (from
> + * VMALLOC_START to VMALLOC_END).
> + *
> + * boot_hyp_pgd should only map two pages for the init code.
>   */
>  void free_hyp_pgds(void)
>  {
> @@ -118,6 +127,12 @@ void free_hyp_pgds(void)
> 
>         mutex_lock(&kvm_hyp_pgd_mutex);
> 
> +       if (boot_hyp_pgd) {
> +               free_hyp_pgd_entry(boot_hyp_pgd, hyp_idmap_start);
> +               free_hyp_pgd_entry(boot_hyp_pgd, TRAMPOLINE_VA);
> +               kfree(boot_hyp_pgd);
> +       }
> +
>         if (hyp_pgd) {
>                 for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
>                         free_hyp_pgd_entry(hyp_pgd, KERN_TO_HYP(addr));
> @@ -126,6 +141,7 @@ void free_hyp_pgds(void)
>                 kfree(hyp_pgd);
>         }
> 
> +       kfree(init_bounce_page);
>         mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
> 
> @@ -718,14 +734,62 @@ phys_addr_t kvm_mmu_get_httbr(void)
>         return virt_to_phys(hyp_pgd);
>  }
> 
> +phys_addr_t kvm_mmu_get_boot_httbr(void)
> +{
> +       VM_BUG_ON(!virt_addr_valid(boot_hyp_pgd));

Seems a bit OTT -- it's always going to be in lowmem.

> +       return virt_to_phys(boot_hyp_pgd);
> +}
> +
> +phys_addr_t kvm_get_idmap_vector(void)
> +{
> +       return hyp_idmap_vector;
> +}
> +
>  int kvm_mmu_init(void)
>  {
> -       unsigned long hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
> -       unsigned long hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
>         int err;
> 
> +       hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
> +       hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
> +       hyp_idmap_vector = virt_to_phys(__kvm_hyp_init);
> +
> +       if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
> +               /*
> +                * Our init code is crossing a page boundary. Allocate
> +                * a bounce page, copy the code over and use that.
> +                */
> +               size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
> +               phys_addr_t phys_base;
> +
> +               init_bounce_page = kzalloc(PAGE_SIZE, GFP_KERNEL);
> +               if (!init_bounce_page) {
> +                       kvm_err("Couldn't allocate HYP init bounce page\n");
> +                       err = -ENOMEM;
> +                       goto out;
> +               }

Given that you don't really need a lowmem page for the bounce page, this
might be better expressed using alloc_page and kmap for the memcpy.

I also wonder whether or not you can eventually free the page and
corresponding mappings if !CONFIG_HOTPLUG_CPU?

> +               memcpy(init_bounce_page, __hyp_idmap_text_start, len);
> +               /*
> +                * Warning: the code we just copied to the bounce page
> +                * must be flushed to the point of coherency.
> +                * Otherwise, the data may be sitting in L2, and HYP
> +                * mode won't be able to observe it as it runs with
> +                * caches off at that point.
> +                */
> +               kvm_flush_dcache_to_poc(init_bounce_page, len);
> +
> +               phys_base = virt_to_phys(init_bounce_page);
> +               hyp_idmap_vector += phys_base - hyp_idmap_start;
> +               hyp_idmap_start = phys_base;
> +               hyp_idmap_end = phys_base + len;
> +
> +               kvm_info("Using HYP init bounce page @%lx\n",
> +                        (unsigned long)phys_base);
> +       }
> +
>         hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> -       if (!hyp_pgd) {
> +       boot_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> +       if (!hyp_pgd || !boot_hyp_pgd) {
>                 kvm_err("Hyp mode PGD not allocated\n");
>                 err = -ENOMEM;
>                 goto out;
> @@ -743,39 +807,30 @@ int kvm_mmu_init(void)
>                 goto out;
>         }
> 
> +       /* Map the very same page at the trampoline VA */
> +       err =   __create_hyp_mappings(boot_hyp_pgd,
> +                                     TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> +                                     __phys_to_pfn(hyp_idmap_start),
> +                                     PAGE_HYP);
> +       if (err) {
> +               kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
> +                       TRAMPOLINE_VA);
> +               goto out;
> +       }
> +
> +       /* Map the same page again into the runtime page tables */
> +       err =   __create_hyp_mappings(hyp_pgd,
> +                                     TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> +                                     __phys_to_pfn(hyp_idmap_start),
> +                                     PAGE_HYP);
> +       if (err) {
> +               kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
> +                       TRAMPOLINE_VA);
> +               goto out;
> +       }

I'm probably just missing it, but where are the updated page tables flushed
to memory?

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code
  2013-04-09  9:18   ` Will Deacon
@ 2013-04-09 10:42     ` Marc Zyngier
  2013-04-09 18:28       ` Christoffer Dall
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2013-04-09 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/04/13 10:18, Will Deacon wrote:
> On Mon, Apr 08, 2013 at 04:36:42PM +0100, Marc Zyngier wrote:
>> Our HYP init code suffers from two major design issues:
>> - it cannot support CPU hotplug, as we tear down the idmap very early
>> - it cannot perform a TLB invalidation when switching from init to
>>   runtime mappings, as pages are manipulated from PL1 exclusively
>>
>> The hotplug problem mandates that we keep two sets of page tables
>> (boot and runtime). The TLB problem mandates that we're able to
>> transition from one PGD to another while in HYP, invalidating the TLBs
>> in the process.
>>
>> To be able to do this, we need to share a page between the two page
>> tables. A page that will have the same VA in both configurations. All we
>> need is a VA that has the following properties:
>> - This VA can't be used to represent a kernel mapping.
>> - This VA will not conflict with the physical address of the kernel text
>>
>> The vectors page seems to satisfy this requirement:
>> - The kernel never maps anything else there
>> - The kernel text being copied at the beginning of the physical memory,
>>   it is unlikely to use the last 64kB (I doubt we'll ever support KVM
>>   on a system with something like 4MB of RAM, but patches are very
>>   welcome).
>>
>> Let's call this VA the trampoline VA.
>>
>> Now, we map our init page at 3 locations:
>> - idmap in the boot pgd
>> - trampoline VA in the boot pgd
>> - trampoline VA in the runtime pgd
>>
>> The init scenario is now the following:
>> - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>>   runtime stack, runtime vectors
>> - Enable the MMU with the boot pgd
>> - Jump to a target into the trampoline page (remember, this is the same
>>   physical page!)
>> - Now switch to the runtime pgd (same VA, and still the same physical
>>   page!)
>> - Invalidate TLBs
>> - Set stack and vectors
>> - Profit! (or eret, if you only care about the code).
>>
>> Note that we keep the boot mapping permanently (it is not strictly an
>> idmap anymore) to allow for CPU hotplug in later patches.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h |  18 ++++--
>>  arch/arm/include/asm/kvm_mmu.h  |  24 +++++++-
>>  arch/arm/kvm/arm.c              |  11 ++--
>>  arch/arm/kvm/init.S             |  44 +++++++++++++-
>>  arch/arm/kvm/mmu.c              | 129 ++++++++++++++++++++++++++++------------
>>  5 files changed, 173 insertions(+), 53 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index a856cc2..9fe22ee 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -190,22 +190,32 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
>>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>                 int exception_index);
>>
>> -static inline void __cpu_init_hyp_mode(unsigned long long pgd_ptr,
>> +static inline void __cpu_init_hyp_mode(unsigned long long boot_pgd_ptr,
>> +                                      unsigned long long pgd_ptr,
>>                                        unsigned long hyp_stack_ptr,
>>                                        unsigned long vector_ptr)
>>  {
>>         unsigned long pgd_low, pgd_high;
>>
>> -       pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
>> -       pgd_high = (pgd_ptr >> 32ULL);
>> +       pgd_low = (boot_pgd_ptr & ((1ULL << 32) - 1));
>> +       pgd_high = (boot_pgd_ptr >> 32ULL);
>>
>>         /*
>>          * Call initialization code, and switch to the full blown
>>          * HYP code. The init code doesn't need to preserve these registers as
>> -        * r1-r3 and r12 are already callee save according to the AAPCS.
>> +        * r1-r3 and r12 are already callee saved according to the AAPCS.
>>          * Note that we slightly misuse the prototype by casing the pgd_low to
>>          * a void *.
>> +        *
>> +        * We don't have enough registers to perform the full init in one go.
>> +        * Install the boot PGD first, and then install the runtime PGD,
>> +        * stack pointer and vectors.
>>          */
>> +       kvm_call_hyp((void *)pgd_low, pgd_high, 0, 0);
>> +
>> +       pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
>> +       pgd_high = (pgd_ptr >> 32ULL);
> 
> It might be worth macro-ising the low/high extraction operations now that
> you're using them twice. Hell, you could even make them big-endian clean!

Now you're talking! ;-)

I actually wonder if we could rearrange the code to let the compiler do
the low/high split... Quickly looking through the PCS, it looks like
this would actually work.

>>         kvm_call_hyp((void *)pgd_low, pgd_high, hyp_stack_ptr, vector_ptr);
>>  }
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 92eb20d..24b767a 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -19,17 +19,29 @@
>>  #ifndef __ARM_KVM_MMU_H__
>>  #define __ARM_KVM_MMU_H__
>>
>> -#include <asm/cacheflush.h>
>> -#include <asm/pgalloc.h>
>> +#include <asm/memory.h>
>> +#include <asm/page.h>
>>
>>  /*
>>   * We directly use the kernel VA for the HYP, as we can directly share
>>   * the mapping (HTTBR "covers" TTBR1).
>>   */
>> -#define HYP_PAGE_OFFSET_MASK   (~0UL)
>> +#define HYP_PAGE_OFFSET_MASK   UL(~0)
>>  #define HYP_PAGE_OFFSET                PAGE_OFFSET
>>  #define KERN_TO_HYP(kva)       (kva)
>>
>> +/*
>> + * Our virtual mapping for the boot-time MMU-enable code. Must be
>> + * shared across all the page-tables. Conveniently, we use the vectors
>> + * page, where no kernel data will ever be shared with HYP.
>> + */
>> +#define TRAMPOLINE_VA          UL(CONFIG_VECTORS_BASE)
>> +
>> +#ifndef __ASSEMBLY__
>> +
>> +#include <asm/cacheflush.h>
>> +#include <asm/pgalloc.h>
>> +
>>  int create_hyp_mappings(void *from, void *to);
>>  int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>>  void free_hyp_pgds(void);
>> @@ -44,6 +56,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>  void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
>>
>>  phys_addr_t kvm_mmu_get_httbr(void);
>> +phys_addr_t kvm_mmu_get_boot_httbr(void);
>> +phys_addr_t kvm_get_idmap_vector(void);
>>  int kvm_mmu_init(void);
>>  void kvm_clear_hyp_idmap(void);
>>
>> @@ -113,4 +127,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>         }
>>  }
>>
>> +#define kvm_flush_dcache_to_poc(a,l)   __cpuc_flush_dcache_area((a), (l))
>> +
>> +#endif /* !__ASSEMBLY__ */
>> +
>>  #endif /* __ARM_KVM_MMU_H__ */
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 8d44ef4..9010a12 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -796,20 +796,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>
>>  static void cpu_init_hyp_mode(void *vector)
>>  {
>> +       unsigned long long boot_pgd_ptr;
>>         unsigned long long pgd_ptr;
>>         unsigned long hyp_stack_ptr;
>>         unsigned long stack_page;
>>         unsigned long vector_ptr;
>>
>>         /* Switch from the HYP stub to our own HYP init vector */
>> -       __hyp_set_vectors((unsigned long)vector);
>> +       __hyp_set_vectors(kvm_get_idmap_vector());
>>
>> +       boot_pgd_ptr = (unsigned long long)kvm_mmu_get_boot_httbr();
>>         pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
> 
> Strictly speaking, could you avoid using two sets of tables for this? That
> is, have code in the trampoline page which remaps the rest of the address
> space using the current tables? Not saying it's feasible, just interested.

If you do that, you loose the ability to bring in a hotplugged CPU, as
your idmap and your runtime page tables may have conflicting
translations. Or am I missing something obvious?

>>         stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
>>         hyp_stack_ptr = stack_page + PAGE_SIZE;
>>         vector_ptr = (unsigned long)__kvm_hyp_vector;
>>
>> -       __cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
>> +       __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
>>  }
>>
>>  /**
>> @@ -863,11 +865,6 @@ static int init_hyp_mode(void)
>>         }
>>
>>         /*
>> -        * Unmap the identity mapping
>> -        */
>> -       kvm_clear_hyp_idmap();
>> -
>> -       /*
>>          * Map the Hyp-code called directly from the host
>>          */
>>         err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 9f37a79..0ca054f 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -21,6 +21,7 @@
>>  #include <asm/asm-offsets.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_arm.h>
>> +#include <asm/kvm_mmu.h>
>>
>>  /********************************************************************
>>   * Hypervisor initialization
>> @@ -28,6 +29,25 @@
>>   *       r0,r1 = Hypervisor pgd pointer
>>   *       r2 = top of Hyp stack (kernel VA)
>>   *       r3 = pointer to hyp vectors
>> + *
>> + * The init scenario is:
>> + * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>> + *   runtime stack, runtime vectors
>> + * - Enable the MMU with the boot pgd
>> + * - Jump to a target into the trampoline page (remember, this is the same
>> + *   physical page!)
>> + * - Now switch to the runtime pgd (same VA, and still the same physical
>> + *   page!)
>> + * - Invalidate TLBs
>> + * - Set stack and vectors
>> + * - Profit! (or eret, if you only care about the code).
>> + *
>> + * As we only have four registers available to pass parameters (and we
>> + * need six), we split the init in two phases:
>> + * - Phase 1: r0,r1 contain the boot PGD, r2 = 0, r3 = 0.
>> + *   Provides the basic HYP init, and enable the MMU.
>> + * - Phase 2: r0,r1 contain the runtime PGD, r2 = ToS, r3 = vectors
>> + *   Switches to the runtime PGD, set stack and vectors.
>>   */
>>
>>         .text
>> @@ -47,6 +67,9 @@ __kvm_hyp_init:
>>         W(b)    .
>>
>>  __do_hyp_init:
>> +       cmp     r2, #0                  @ We have a SP?
>> +       bne     phase2                  @ Yes, second stage init
>> +
>>         @ Set the HTTBR to point to the hypervisor PGD pointer passed
>>         mcrr    p15, 4, r0, r1, c2
>>
>> @@ -96,14 +119,31 @@ __do_hyp_init:
>>         orr     r0, r0, r1
>>         isb
>>         mcr     p15, 4, r0, c1, c0, 0   @ HSCR
>> -       isb
>>
>> -       @ Set stack pointer and return to the kernel
>> +       @ End of init phase-1
>> +       eret
>> +
>> +phase2:
>> +       @ Set stack pointer
>>         mov     sp, r2
>>
>>         @ Set HVBAR to point to the HYP vectors
>>         mcr     p15, 4, r3, c12, c0, 0  @ HVBAR
>>
>> +       @ Jump to the trampoline page
>> +       ldr     r2, =TRAMPOLINE_VA
>> +       adr     r3, target
>> +       bfi     r2, r3, #0, #PAGE_SHIFT
>> +       mov     pc, r2
>> +
>> +target:        @ We're now in the trampoline code, switch page tables
>> +       mcrr    p15, 4, r0, r1, c2
>> +       isb
>> +
>> +       @ Invalidate the old TLBs
>> +       mcr     p15, 4, r0, c8, c7, 0   @ TLBIALLH
>> +       dsb
>> +
>>         eret
>>
>>         .ltorg
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index c4236e4..b20eff2 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -32,9 +32,15 @@
>>
>>  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>>
>> +static pgd_t *boot_hyp_pgd;
>>  static pgd_t *hyp_pgd;
>>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
>>
>> +static void *init_bounce_page;
>> +static unsigned long hyp_idmap_start;
>> +static unsigned long hyp_idmap_end;
>> +static phys_addr_t hyp_idmap_vector;
>> +
>>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>  {
>>         kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>> @@ -108,9 +114,12 @@ static void free_hyp_pgd_entry(pgd_t *pgdp, unsigned long addr)
>>  /**
>>   * free_hyp_pgds - free Hyp-mode page tables
>>   *
>> - * Assumes hyp_pgd is a page table used strictly in Hyp-mode and therefore contains
>> - * either mappings in the kernel memory area (above PAGE_OFFSET), or
>> - * device mappings in the vmalloc range (from VMALLOC_START to VMALLOC_END).
>> + * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
>> + * therefore contains either mappings in the kernel memory area (above
>> + * PAGE_OFFSET), or device mappings in the vmalloc range (from
>> + * VMALLOC_START to VMALLOC_END).
>> + *
>> + * boot_hyp_pgd should only map two pages for the init code.
>>   */
>>  void free_hyp_pgds(void)
>>  {
>> @@ -118,6 +127,12 @@ void free_hyp_pgds(void)
>>
>>         mutex_lock(&kvm_hyp_pgd_mutex);
>>
>> +       if (boot_hyp_pgd) {
>> +               free_hyp_pgd_entry(boot_hyp_pgd, hyp_idmap_start);
>> +               free_hyp_pgd_entry(boot_hyp_pgd, TRAMPOLINE_VA);
>> +               kfree(boot_hyp_pgd);
>> +       }
>> +
>>         if (hyp_pgd) {
>>                 for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
>>                         free_hyp_pgd_entry(hyp_pgd, KERN_TO_HYP(addr));
>> @@ -126,6 +141,7 @@ void free_hyp_pgds(void)
>>                 kfree(hyp_pgd);
>>         }
>>
>> +       kfree(init_bounce_page);
>>         mutex_unlock(&kvm_hyp_pgd_mutex);
>>  }
>>
>> @@ -718,14 +734,62 @@ phys_addr_t kvm_mmu_get_httbr(void)
>>         return virt_to_phys(hyp_pgd);
>>  }
>>
>> +phys_addr_t kvm_mmu_get_boot_httbr(void)
>> +{
>> +       VM_BUG_ON(!virt_addr_valid(boot_hyp_pgd));
> 
> Seems a bit OTT -- it's always going to be in lowmem.

Indeed.

>> +       return virt_to_phys(boot_hyp_pgd);
>> +}
>> +
>> +phys_addr_t kvm_get_idmap_vector(void)
>> +{
>> +       return hyp_idmap_vector;
>> +}
>> +
>>  int kvm_mmu_init(void)
>>  {
>> -       unsigned long hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
>> -       unsigned long hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
>>         int err;
>>
>> +       hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
>> +       hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
>> +       hyp_idmap_vector = virt_to_phys(__kvm_hyp_init);
>> +
>> +       if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
>> +               /*
>> +                * Our init code is crossing a page boundary. Allocate
>> +                * a bounce page, copy the code over and use that.
>> +                */
>> +               size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
>> +               phys_addr_t phys_base;
>> +
>> +               init_bounce_page = kzalloc(PAGE_SIZE, GFP_KERNEL);
>> +               if (!init_bounce_page) {
>> +                       kvm_err("Couldn't allocate HYP init bounce page\n");
>> +                       err = -ENOMEM;
>> +                       goto out;
>> +               }
> 
> Given that you don't really need a lowmem page for the bounce page, this
> might be better expressed using alloc_page and kmap for the memcpy.

I'm a bit dubious about that. We have to make sure that the memory is
within the 4GB range, and the only flag I can spot for alloc_page is
GFP_DMA32, which is not exactly what we want, even if it may work.

And yes, we have a problem with platforms having *all* their memory
above 4GB.

> I also wonder whether or not you can eventually free the page and
> corresponding mappings if !CONFIG_HOTPLUG_CPU?

This is indeed possible, as we're sure nothing will use the boot tage
tables once all CPUs have switched to the runtime page tables.

>> +               memcpy(init_bounce_page, __hyp_idmap_text_start, len);
>> +               /*
>> +                * Warning: the code we just copied to the bounce page
>> +                * must be flushed to the point of coherency.
>> +                * Otherwise, the data may be sitting in L2, and HYP
>> +                * mode won't be able to observe it as it runs with
>> +                * caches off at that point.
>> +                */
>> +               kvm_flush_dcache_to_poc(init_bounce_page, len);
>> +
>> +               phys_base = virt_to_phys(init_bounce_page);
>> +               hyp_idmap_vector += phys_base - hyp_idmap_start;
>> +               hyp_idmap_start = phys_base;
>> +               hyp_idmap_end = phys_base + len;
>> +
>> +               kvm_info("Using HYP init bounce page @%lx\n",
>> +                        (unsigned long)phys_base);
>> +       }
>> +
>>         hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>> -       if (!hyp_pgd) {
>> +       boot_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>> +       if (!hyp_pgd || !boot_hyp_pgd) {
>>                 kvm_err("Hyp mode PGD not allocated\n");
>>                 err = -ENOMEM;
>>                 goto out;
>> @@ -743,39 +807,30 @@ int kvm_mmu_init(void)
>>                 goto out;
>>         }
>>
>> +       /* Map the very same page at the trampoline VA */
>> +       err =   __create_hyp_mappings(boot_hyp_pgd,
>> +                                     TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> +                                     __phys_to_pfn(hyp_idmap_start),
>> +                                     PAGE_HYP);
>> +       if (err) {
>> +               kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
>> +                       TRAMPOLINE_VA);
>> +               goto out;
>> +       }
>> +
>> +       /* Map the same page again into the runtime page tables */
>> +       err =   __create_hyp_mappings(hyp_pgd,
>> +                                     TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> +                                     __phys_to_pfn(hyp_idmap_start),
>> +                                     PAGE_HYP);
>> +       if (err) {
>> +               kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
>> +                       TRAMPOLINE_VA);
>> +               goto out;
>> +       }
> 
> I'm probably just missing it, but where are the updated page tables flushed
> to memory?

Mumble mumble... We have some partial flush, but clearly not enough of
it to be completely safe. Good spotting.

I'll update the series and send out a v3.

Thanks for reviewing.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code
  2013-04-09 10:42     ` Marc Zyngier
@ 2013-04-09 18:28       ` Christoffer Dall
  2013-04-10  8:09         ` Marc Zyngier
  0 siblings, 1 reply; 13+ messages in thread
From: Christoffer Dall @ 2013-04-09 18:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 9, 2013 at 3:42 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 09/04/13 10:18, Will Deacon wrote:
>> On Mon, Apr 08, 2013 at 04:36:42PM +0100, Marc Zyngier wrote:
>>> Our HYP init code suffers from two major design issues:
>>> - it cannot support CPU hotplug, as we tear down the idmap very early
>>> - it cannot perform a TLB invalidation when switching from init to
>>>   runtime mappings, as pages are manipulated from PL1 exclusively
>>>
>>> The hotplug problem mandates that we keep two sets of page tables
>>> (boot and runtime). The TLB problem mandates that we're able to
>>> transition from one PGD to another while in HYP, invalidating the TLBs
>>> in the process.
>>>
>>> To be able to do this, we need to share a page between the two page
>>> tables. A page that will have the same VA in both configurations. All we
>>> need is a VA that has the following properties:
>>> - This VA can't be used to represent a kernel mapping.
>>> - This VA will not conflict with the physical address of the kernel text
>>>
>>> The vectors page seems to satisfy this requirement:
>>> - The kernel never maps anything else there
>>> - The kernel text being copied at the beginning of the physical memory,
>>>   it is unlikely to use the last 64kB (I doubt we'll ever support KVM
>>>   on a system with something like 4MB of RAM, but patches are very
>>>   welcome).
>>>
>>> Let's call this VA the trampoline VA.
>>>
>>> Now, we map our init page at 3 locations:
>>> - idmap in the boot pgd
>>> - trampoline VA in the boot pgd
>>> - trampoline VA in the runtime pgd
>>>
>>> The init scenario is now the following:
>>> - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>>>   runtime stack, runtime vectors
>>> - Enable the MMU with the boot pgd
>>> - Jump to a target into the trampoline page (remember, this is the same
>>>   physical page!)
>>> - Now switch to the runtime pgd (same VA, and still the same physical
>>>   page!)
>>> - Invalidate TLBs
>>> - Set stack and vectors
>>> - Profit! (or eret, if you only care about the code).
>>>
>>> Note that we keep the boot mapping permanently (it is not strictly an
>>> idmap anymore) to allow for CPU hotplug in later patches.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm/include/asm/kvm_host.h |  18 ++++--
>>>  arch/arm/include/asm/kvm_mmu.h  |  24 +++++++-
>>>  arch/arm/kvm/arm.c              |  11 ++--
>>>  arch/arm/kvm/init.S             |  44 +++++++++++++-
>>>  arch/arm/kvm/mmu.c              | 129 ++++++++++++++++++++++++++++------------
>>>  5 files changed, 173 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index a856cc2..9fe22ee 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -190,22 +190,32 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
>>>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>>                 int exception_index);
>>>
>>> -static inline void __cpu_init_hyp_mode(unsigned long long pgd_ptr,
>>> +static inline void __cpu_init_hyp_mode(unsigned long long boot_pgd_ptr,
>>> +                                      unsigned long long pgd_ptr,
>>>                                        unsigned long hyp_stack_ptr,
>>>                                        unsigned long vector_ptr)
>>>  {
>>>         unsigned long pgd_low, pgd_high;
>>>
>>> -       pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
>>> -       pgd_high = (pgd_ptr >> 32ULL);
>>> +       pgd_low = (boot_pgd_ptr & ((1ULL << 32) - 1));
>>> +       pgd_high = (boot_pgd_ptr >> 32ULL);
>>>
>>>         /*
>>>          * Call initialization code, and switch to the full blown
>>>          * HYP code. The init code doesn't need to preserve these registers as
>>> -        * r1-r3 and r12 are already callee save according to the AAPCS.
>>> +        * r1-r3 and r12 are already callee saved according to the AAPCS.
>>>          * Note that we slightly misuse the prototype by casing the pgd_low to
>>>          * a void *.
>>> +        *
>>> +        * We don't have enough registers to perform the full init in one go.
>>> +        * Install the boot PGD first, and then install the runtime PGD,
>>> +        * stack pointer and vectors.
>>>          */
>>> +       kvm_call_hyp((void *)pgd_low, pgd_high, 0, 0);
>>> +
>>> +       pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
>>> +       pgd_high = (pgd_ptr >> 32ULL);
>>
>> It might be worth macro-ising the low/high extraction operations now that
>> you're using them twice. Hell, you could even make them big-endian clean!
>
> Now you're talking! ;-)
>
> I actually wonder if we could rearrange the code to let the compiler do
> the low/high split... Quickly looking through the PCS, it looks like
> this would actually work.
>
>>>         kvm_call_hyp((void *)pgd_low, pgd_high, hyp_stack_ptr, vector_ptr);
>>>  }
>>>
>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>> index 92eb20d..24b767a 100644
>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>> @@ -19,17 +19,29 @@
>>>  #ifndef __ARM_KVM_MMU_H__
>>>  #define __ARM_KVM_MMU_H__
>>>
>>> -#include <asm/cacheflush.h>
>>> -#include <asm/pgalloc.h>
>>> +#include <asm/memory.h>
>>> +#include <asm/page.h>
>>>
>>>  /*
>>>   * We directly use the kernel VA for the HYP, as we can directly share
>>>   * the mapping (HTTBR "covers" TTBR1).
>>>   */
>>> -#define HYP_PAGE_OFFSET_MASK   (~0UL)
>>> +#define HYP_PAGE_OFFSET_MASK   UL(~0)
>>>  #define HYP_PAGE_OFFSET                PAGE_OFFSET
>>>  #define KERN_TO_HYP(kva)       (kva)
>>>
>>> +/*
>>> + * Our virtual mapping for the boot-time MMU-enable code. Must be
>>> + * shared across all the page-tables. Conveniently, we use the vectors
>>> + * page, where no kernel data will ever be shared with HYP.
>>> + */
>>> +#define TRAMPOLINE_VA          UL(CONFIG_VECTORS_BASE)
>>> +
>>> +#ifndef __ASSEMBLY__
>>> +
>>> +#include <asm/cacheflush.h>
>>> +#include <asm/pgalloc.h>
>>> +
>>>  int create_hyp_mappings(void *from, void *to);
>>>  int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>>>  void free_hyp_pgds(void);
>>> @@ -44,6 +56,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>>  void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
>>>
>>>  phys_addr_t kvm_mmu_get_httbr(void);
>>> +phys_addr_t kvm_mmu_get_boot_httbr(void);
>>> +phys_addr_t kvm_get_idmap_vector(void);
>>>  int kvm_mmu_init(void);
>>>  void kvm_clear_hyp_idmap(void);
>>>
>>> @@ -113,4 +127,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>         }
>>>  }
>>>
>>> +#define kvm_flush_dcache_to_poc(a,l)   __cpuc_flush_dcache_area((a), (l))
>>> +
>>> +#endif /* !__ASSEMBLY__ */
>>> +
>>>  #endif /* __ARM_KVM_MMU_H__ */
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 8d44ef4..9010a12 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -796,20 +796,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>
>>>  static void cpu_init_hyp_mode(void *vector)
>>>  {
>>> +       unsigned long long boot_pgd_ptr;
>>>         unsigned long long pgd_ptr;
>>>         unsigned long hyp_stack_ptr;
>>>         unsigned long stack_page;
>>>         unsigned long vector_ptr;
>>>
>>>         /* Switch from the HYP stub to our own HYP init vector */
>>> -       __hyp_set_vectors((unsigned long)vector);
>>> +       __hyp_set_vectors(kvm_get_idmap_vector());
>>>
>>> +       boot_pgd_ptr = (unsigned long long)kvm_mmu_get_boot_httbr();
>>>         pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
>>
>> Strictly speaking, could you avoid using two sets of tables for this? That
>> is, have code in the trampoline page which remaps the rest of the address
>> space using the current tables? Not saying it's feasible, just interested.
>
> If you do that, you loose the ability to bring in a hotplugged CPU, as
> your idmap and your runtime page tables may have conflicting
> translations. Or am I missing something obvious?
>

I don't think you're missing anything, there are two requirements:
idmap, and context switch, and as long as we keep the same VA in Hyp
as in kernel mode (modulo offset for arm64) I don't think there's any
way around having those two tables.

>>>         stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
>>>         hyp_stack_ptr = stack_page + PAGE_SIZE;
>>>         vector_ptr = (unsigned long)__kvm_hyp_vector;
>>>
>>> -       __cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
>>> +       __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
>>>  }
>>>
>>>  /**
>>> @@ -863,11 +865,6 @@ static int init_hyp_mode(void)
>>>         }
>>>
>>>         /*
>>> -        * Unmap the identity mapping
>>> -        */
>>> -       kvm_clear_hyp_idmap();
>>> -
>>> -       /*
>>>          * Map the Hyp-code called directly from the host
>>>          */
>>>         err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>> index 9f37a79..0ca054f 100644
>>> --- a/arch/arm/kvm/init.S
>>> +++ b/arch/arm/kvm/init.S
>>> @@ -21,6 +21,7 @@
>>>  #include <asm/asm-offsets.h>
>>>  #include <asm/kvm_asm.h>
>>>  #include <asm/kvm_arm.h>
>>> +#include <asm/kvm_mmu.h>
>>>
>>>  /********************************************************************
>>>   * Hypervisor initialization
>>> @@ -28,6 +29,25 @@
>>>   *       r0,r1 = Hypervisor pgd pointer
>>>   *       r2 = top of Hyp stack (kernel VA)
>>>   *       r3 = pointer to hyp vectors
>>> + *
>>> + * The init scenario is:
>>> + * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>>> + *   runtime stack, runtime vectors
>>> + * - Enable the MMU with the boot pgd
>>> + * - Jump to a target into the trampoline page (remember, this is the same
>>> + *   physical page!)
>>> + * - Now switch to the runtime pgd (same VA, and still the same physical
>>> + *   page!)
>>> + * - Invalidate TLBs
>>> + * - Set stack and vectors
>>> + * - Profit! (or eret, if you only care about the code).
>>> + *
>>> + * As we only have four registers available to pass parameters (and we
>>> + * need six), we split the init in two phases:
>>> + * - Phase 1: r0,r1 contain the boot PGD, r2 = 0, r3 = 0.
>>> + *   Provides the basic HYP init, and enable the MMU.
>>> + * - Phase 2: r0,r1 contain the runtime PGD, r2 = ToS, r3 = vectors
>>> + *   Switches to the runtime PGD, set stack and vectors.
>>>   */
>>>
>>>         .text
>>> @@ -47,6 +67,9 @@ __kvm_hyp_init:
>>>         W(b)    .
>>>
>>>  __do_hyp_init:
>>> +       cmp     r2, #0                  @ We have a SP?
>>> +       bne     phase2                  @ Yes, second stage init
>>> +
>>>         @ Set the HTTBR to point to the hypervisor PGD pointer passed
>>>         mcrr    p15, 4, r0, r1, c2
>>>
>>> @@ -96,14 +119,31 @@ __do_hyp_init:
>>>         orr     r0, r0, r1
>>>         isb
>>>         mcr     p15, 4, r0, c1, c0, 0   @ HSCR
>>> -       isb
>>>
>>> -       @ Set stack pointer and return to the kernel
>>> +       @ End of init phase-1
>>> +       eret
>>> +
>>> +phase2:
>>> +       @ Set stack pointer
>>>         mov     sp, r2
>>>
>>>         @ Set HVBAR to point to the HYP vectors
>>>         mcr     p15, 4, r3, c12, c0, 0  @ HVBAR
>>>
>>> +       @ Jump to the trampoline page
>>> +       ldr     r2, =TRAMPOLINE_VA
>>> +       adr     r3, target
>>> +       bfi     r2, r3, #0, #PAGE_SHIFT
>>> +       mov     pc, r2
>>> +
>>> +target:        @ We're now in the trampoline code, switch page tables
>>> +       mcrr    p15, 4, r0, r1, c2
>>> +       isb
>>> +
>>> +       @ Invalidate the old TLBs
>>> +       mcr     p15, 4, r0, c8, c7, 0   @ TLBIALLH
>>> +       dsb
>>> +
>>>         eret
>>>
>>>         .ltorg
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index c4236e4..b20eff2 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -32,9 +32,15 @@
>>>
>>>  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>>>
>>> +static pgd_t *boot_hyp_pgd;
>>>  static pgd_t *hyp_pgd;
>>>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
>>>
>>> +static void *init_bounce_page;
>>> +static unsigned long hyp_idmap_start;
>>> +static unsigned long hyp_idmap_end;
>>> +static phys_addr_t hyp_idmap_vector;
>>> +
>>>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>>  {
>>>         kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>> @@ -108,9 +114,12 @@ static void free_hyp_pgd_entry(pgd_t *pgdp, unsigned long addr)
>>>  /**
>>>   * free_hyp_pgds - free Hyp-mode page tables
>>>   *
>>> - * Assumes hyp_pgd is a page table used strictly in Hyp-mode and therefore contains
>>> - * either mappings in the kernel memory area (above PAGE_OFFSET), or
>>> - * device mappings in the vmalloc range (from VMALLOC_START to VMALLOC_END).
>>> + * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
>>> + * therefore contains either mappings in the kernel memory area (above
>>> + * PAGE_OFFSET), or device mappings in the vmalloc range (from
>>> + * VMALLOC_START to VMALLOC_END).
>>> + *
>>> + * boot_hyp_pgd should only map two pages for the init code.
>>>   */
>>>  void free_hyp_pgds(void)
>>>  {
>>> @@ -118,6 +127,12 @@ void free_hyp_pgds(void)
>>>
>>>         mutex_lock(&kvm_hyp_pgd_mutex);
>>>
>>> +       if (boot_hyp_pgd) {
>>> +               free_hyp_pgd_entry(boot_hyp_pgd, hyp_idmap_start);
>>> +               free_hyp_pgd_entry(boot_hyp_pgd, TRAMPOLINE_VA);
>>> +               kfree(boot_hyp_pgd);
>>> +       }
>>> +
>>>         if (hyp_pgd) {
>>>                 for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
>>>                         free_hyp_pgd_entry(hyp_pgd, KERN_TO_HYP(addr));
>>> @@ -126,6 +141,7 @@ void free_hyp_pgds(void)
>>>                 kfree(hyp_pgd);
>>>         }
>>>
>>> +       kfree(init_bounce_page);
>>>         mutex_unlock(&kvm_hyp_pgd_mutex);
>>>  }
>>>
>>> @@ -718,14 +734,62 @@ phys_addr_t kvm_mmu_get_httbr(void)
>>>         return virt_to_phys(hyp_pgd);
>>>  }
>>>
>>> +phys_addr_t kvm_mmu_get_boot_httbr(void)
>>> +{
>>> +       VM_BUG_ON(!virt_addr_valid(boot_hyp_pgd));
>>
>> Seems a bit OTT -- it's always going to be in lowmem.
>
> Indeed.
>
>>> +       return virt_to_phys(boot_hyp_pgd);
>>> +}
>>> +
>>> +phys_addr_t kvm_get_idmap_vector(void)
>>> +{
>>> +       return hyp_idmap_vector;
>>> +}
>>> +
>>>  int kvm_mmu_init(void)
>>>  {
>>> -       unsigned long hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
>>> -       unsigned long hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
>>>         int err;
>>>
>>> +       hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
>>> +       hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
>>> +       hyp_idmap_vector = virt_to_phys(__kvm_hyp_init);
>>> +
>>> +       if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
>>> +               /*
>>> +                * Our init code is crossing a page boundary. Allocate
>>> +                * a bounce page, copy the code over and use that.
>>> +                */
>>> +               size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
>>> +               phys_addr_t phys_base;
>>> +
>>> +               init_bounce_page = kzalloc(PAGE_SIZE, GFP_KERNEL);
>>> +               if (!init_bounce_page) {
>>> +                       kvm_err("Couldn't allocate HYP init bounce page\n");
>>> +                       err = -ENOMEM;
>>> +                       goto out;
>>> +               }
>>
>> Given that you don't really need a lowmem page for the bounce page, this
>> might be better expressed using alloc_page and kmap for the memcpy.
>
> I'm a bit dubious about that. We have to make sure that the memory is
> within the 4GB range, and the only flag I can spot for alloc_page is
> GFP_DMA32, which is not exactly what we want, even if it may work.
>
> And yes, we have a problem with platforms having *all* their memory
> above 4GB.
>

now when we're picking at this, do we really need to memset an entire
page to zero? I know it's nice for debugging, but it is really
unnecessary and would slow down boot so slightly, no?

>> I also wonder whether or not you can eventually free the page and
>> corresponding mappings if !CONFIG_HOTPLUG_CPU?
>
> This is indeed possible, as we're sure nothing will use the boot tage
> tables once all CPUs have switched to the runtime page tables.
>
>>> +               memcpy(init_bounce_page, __hyp_idmap_text_start, len);
>>> +               /*
>>> +                * Warning: the code we just copied to the bounce page
>>> +                * must be flushed to the point of coherency.
>>> +                * Otherwise, the data may be sitting in L2, and HYP
>>> +                * mode won't be able to observe it as it runs with
>>> +                * caches off at that point.
>>> +                */
>>> +               kvm_flush_dcache_to_poc(init_bounce_page, len);
>>> +
>>> +               phys_base = virt_to_phys(init_bounce_page);
>>> +               hyp_idmap_vector += phys_base - hyp_idmap_start;
>>> +               hyp_idmap_start = phys_base;
>>> +               hyp_idmap_end = phys_base + len;
>>> +
>>> +               kvm_info("Using HYP init bounce page @%lx\n",
>>> +                        (unsigned long)phys_base);
>>> +       }
>>> +
>>>         hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>>> -       if (!hyp_pgd) {
>>> +       boot_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>>> +       if (!hyp_pgd || !boot_hyp_pgd) {
>>>                 kvm_err("Hyp mode PGD not allocated\n");
>>>                 err = -ENOMEM;
>>>                 goto out;
>>> @@ -743,39 +807,30 @@ int kvm_mmu_init(void)
>>>                 goto out;
>>>         }
>>>
>>> +       /* Map the very same page at the trampoline VA */
>>> +       err =   __create_hyp_mappings(boot_hyp_pgd,
>>> +                                     TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>>> +                                     __phys_to_pfn(hyp_idmap_start),
>>> +                                     PAGE_HYP);
>>> +       if (err) {
>>> +               kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
>>> +                       TRAMPOLINE_VA);
>>> +               goto out;
>>> +       }
>>> +
>>> +       /* Map the same page again into the runtime page tables */
>>> +       err =   __create_hyp_mappings(hyp_pgd,
>>> +                                     TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>>> +                                     __phys_to_pfn(hyp_idmap_start),
>>> +                                     PAGE_HYP);
>>> +       if (err) {
>>> +               kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
>>> +                       TRAMPOLINE_VA);
>>> +               goto out;
>>> +       }
>>
>> I'm probably just missing it, but where are the updated page tables flushed
>> to memory?
>
> Mumble mumble... We have some partial flush, but clearly not enough of
> it to be completely safe. Good spotting.
>
> I'll update the series and send out a v3.
>
> Thanks for reviewing.
>
>         M.
> --
> Jazz is not dead. It just smells funny...
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code
  2013-04-09 18:28       ` Christoffer Dall
@ 2013-04-10  8:09         ` Marc Zyngier
  0 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2013-04-10  8:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 9 Apr 2013 11:28:07 -0700, Christoffer Dall
<cdall@cs.columbia.edu>
wrote:
> On Tue, Apr 9, 2013 at 3:42 AM, Marc Zyngier <marc.zyngier@arm.com>
wrote:
>> On 09/04/13 10:18, Will Deacon wrote:
>>> On Mon, Apr 08, 2013 at 04:36:42PM +0100, Marc Zyngier wrote:
>>>> Our HYP init code suffers from two major design issues:
>>>> - it cannot support CPU hotplug, as we tear down the idmap very early
>>>> - it cannot perform a TLB invalidation when switching from init to
>>>>   runtime mappings, as pages are manipulated from PL1 exclusively
>>>>
>>>> The hotplug problem mandates that we keep two sets of page tables
>>>> (boot and runtime). The TLB problem mandates that we're able to
>>>> transition from one PGD to another while in HYP, invalidating the
TLBs
>>>> in the process.
>>>>
>>>> To be able to do this, we need to share a page between the two page
>>>> tables. A page that will have the same VA in both configurations. All
>>>> we
>>>> need is a VA that has the following properties:
>>>> - This VA can't be used to represent a kernel mapping.
>>>> - This VA will not conflict with the physical address of the kernel
>>>> text
>>>>
>>>> The vectors page seems to satisfy this requirement:
>>>> - The kernel never maps anything else there
>>>> - The kernel text being copied at the beginning of the physical
memory,
>>>>   it is unlikely to use the last 64kB (I doubt we'll ever support KVM
>>>>   on a system with something like 4MB of RAM, but patches are very
>>>>   welcome).
>>>>
>>>> Let's call this VA the trampoline VA.
>>>>
>>>> Now, we map our init page at 3 locations:
>>>> - idmap in the boot pgd
>>>> - trampoline VA in the boot pgd
>>>> - trampoline VA in the runtime pgd
>>>>
>>>> The init scenario is now the following:
>>>> - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>>>>   runtime stack, runtime vectors
>>>> - Enable the MMU with the boot pgd
>>>> - Jump to a target into the trampoline page (remember, this is the
same
>>>>   physical page!)
>>>> - Now switch to the runtime pgd (same VA, and still the same physical
>>>>   page!)
>>>> - Invalidate TLBs
>>>> - Set stack and vectors
>>>> - Profit! (or eret, if you only care about the code).
>>>>
>>>> Note that we keep the boot mapping permanently (it is not strictly an
>>>> idmap anymore) to allow for CPU hotplug in later patches.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---

[...]

>>>>  int kvm_mmu_init(void)
>>>>  {
>>>> -       unsigned long hyp_idmap_start =
>>>> virt_to_phys(__hyp_idmap_text_start);
>>>> -       unsigned long hyp_idmap_end =
>>>> virt_to_phys(__hyp_idmap_text_end);
>>>>         int err;
>>>>
>>>> +       hyp_idmap_start = virt_to_phys(__hyp_idmap_text_start);
>>>> +       hyp_idmap_end = virt_to_phys(__hyp_idmap_text_end);
>>>> +       hyp_idmap_vector = virt_to_phys(__kvm_hyp_init);
>>>> +
>>>> +       if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
>>>> +               /*
>>>> +                * Our init code is crossing a page boundary.
Allocate
>>>> +                * a bounce page, copy the code over and use that.
>>>> +                */
>>>> +               size_t len = __hyp_idmap_text_end -
>>>> __hyp_idmap_text_start;
>>>> +               phys_addr_t phys_base;
>>>> +
>>>> +               init_bounce_page = kzalloc(PAGE_SIZE, GFP_KERNEL);
>>>> +               if (!init_bounce_page) {
>>>> +                       kvm_err("Couldn't allocate HYP init bounce
>>>> page\n");
>>>> +                       err = -ENOMEM;
>>>> +                       goto out;
>>>> +               }
>>>
>>> Given that you don't really need a lowmem page for the bounce page,
this
>>> might be better expressed using alloc_page and kmap for the memcpy.
>>
>> I'm a bit dubious about that. We have to make sure that the memory is
>> within the 4GB range, and the only flag I can spot for alloc_page is
>> GFP_DMA32, which is not exactly what we want, even if it may work.
>>
>> And yes, we have a problem with platforms having *all* their memory
>> above 4GB.
>>
> 
> now when we're picking at this, do we really need to memset an entire
> page to zero? I know it's nice for debugging, but it is really
> unnecessary and would slow down boot so slightly, no?

Sure, we don't need the page clearing. Not that it would appear anywhere
on the radar, but I'll turn it into a plain kmalloc call.

Thanks,

        M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-04-10  8:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-08 15:36 [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 1/7] ARM: KVM: simplify HYP mapping population Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 2/7] ARM: KVM: fix HYP mapping limitations around zero Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 3/7] ARM: KVM: move to a KVM provided HYP idmap Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 4/7] ARM: KVM: enforce maximum size for identity mapped code Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 5/7] ARM: KVM: parametrize HYP page table freeing Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 6/7] ARM: KVM: switch to a dual-step HYP init code Marc Zyngier
2013-04-09  9:18   ` Will Deacon
2013-04-09 10:42     ` Marc Zyngier
2013-04-09 18:28       ` Christoffer Dall
2013-04-10  8:09         ` Marc Zyngier
2013-04-08 15:36 ` [PATCH v2 7/7] ARM: KVM: perform HYP initilization for hotplugged CPUs Marc Zyngier
2013-04-09  5:42 ` [PATCH v2 0/7] ARM: KVM: Revamping the HYP init code for fun and profit Christoffer Dall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).