[PATCH v3 0/2] flag contiguous PTEs in linear mapping

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] flag contiguous PTEs in linear mapping
@ 2016-02-19 17:46 Jeremy Linton
  2016-02-19 17:46 ` [PATCH v3 1/2] arm64: mm: Enable CONT_SIZE aligned sections for 64k page kernels Jeremy Linton
  2016-02-19 17:46 ` [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous Jeremy Linton
  0 siblings, 2 replies; 9+ messages in thread
From: Jeremy Linton @ 2016-02-19 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

This is a rebase of the previous contiguous ptes in linear map patches
on top of Mark Rutland's fixmap changes. Those changes appear to be sufficient
to allow this patch set to boot on JunoR2, Seattle and the xgene/m400. I've
also done basic testing with RODATA turned on. In all cases with ACPI systems.

This patch also adds the ability to align 64k kernels on the 2M CONT boundary
which helps to assure that a number of the sections are completely mapped with
CONT bits. There remain a number of holes due to smaller remapping operations
that don't really affect the page protection states. That can be worked around
in a number of cases if the code gets smart enough to detect that the break
doesn't result in an actual change in permissions. Some of these are visible
in the included example where its not initially obvious why a contiguous 2M
region isn't marked as such until digging into it.

Changed v1->v2:
    16k kernels now use CONT_SIZE rather than SECTION_SIZE for alignment.
Changed v2->v3:
    Updated commit log, and Kconfig description to explicitly state 2M alignment
    Added Ard's Reviewed by.

With 64k pages, and section alignment enabled the kernel looks like:
---[ Kernel Mapping ]---
0xfffffe0000000000-0xfffffe0000200000           2M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe0000200000-0xfffffe0001200000          16M     ro x  SHD AF    CON     UXN MEM/NORMAL
0xfffffe0001200000-0xfffffe0001400000           2M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe0001400000-0xfffffe0001600000           2M     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe0001600000-0xfffffe0002600000          16M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe0002600000-0xfffffe0002800000           2M     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe0002800000-0xfffffe0020000000         472M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe0020000000-0xfffffe0060000000           1G     RW NX SHD AF        BLK UXN MEM/NORMAL
0xfffffe00600f0000-0xfffffe0060200000        1088K     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe0060200000-0xfffffe0076400000         354M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe0076400000-0xfffffe0076600000           2M     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe0076600000-0xfffffe0078e00000          40M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe00793b0000-0xfffffe0079400000         320K     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe0079400000-0xfffffe007e200000          78M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe007e200000-0xfffffe007e3d0000        1856K     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe007e420000-0xfffffe007e600000        1920K     RW NX SHD AF            UXN MEM/NORMAL
0xfffffe007e600000-0xfffffe007f000000          10M     RW NX SHD AF    CON     UXN MEM/NORMAL
0xfffffe0800000000-0xfffffe0980000000           6G     RW NX SHD AF        BLK UXN MEM/NORMAL

With 4k pages
---[ Kernel Mapping ]---
0xffffffc000000000-0xffffffc000200000           2M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc000200000-0xffffffc001200000          16M     ro x  SHD AF        BLK UXN MEM/NORMAL
0xffffffc001200000-0xffffffc001400000           2M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc001400000-0xffffffc0015c0000        1792K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc0015c0000-0xffffffc0015d0000          64K     RW NX SHD AF            UXN MEM/NORMAL
0xffffffc0015d0000-0xffffffc001600000         192K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc001600000-0xffffffc002600000          16M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc002600000-0xffffffc002620000         128K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc002620000-0xffffffc002630000          64K     RW NX SHD AF            UXN MEM/NORMAL
0xffffffc002630000-0xffffffc002800000        1856K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc002800000-0xffffffc060000000        1496M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc0600f0000-0xffffffc060200000        1088K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc060200000-0xffffffc076400000         354M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc076400000-0xffffffc076590000        1600K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc076590000-0xffffffc076595000          20K     RW NX SHD AF            UXN MEM/NORMAL
0xffffffc076596000-0xffffffc0765a0000          40K     RW NX SHD AF            UXN MEM/NORMAL
0xffffffc0765a0000-0xffffffc076600000         384K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc076600000-0xffffffc078e00000          40M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc0793b0000-0xffffffc079400000         320K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc079400000-0xffffffc07e200000          78M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc07e200000-0xffffffc07e3d0000        1856K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc07e420000-0xffffffc07e600000        1920K     RW NX SHD AF    CON     UXN MEM/NORMAL
0xffffffc07e600000-0xffffffc07f000000          10M     RW NX SHD AF        BLK UXN MEM/NORMAL
0xffffffc800000000-0xffffffc980000000           6G     RW NX SHD AF        BLK UXN MEM/NORMAL

Jeremy Linton (2):
  arm64: mm: Enable CONT_SIZE aligned sections for 64k page kernels.
  arm64: Mark kernel page ranges contiguous

 arch/arm64/Kconfig.debug        | 13 +++++----
 arch/arm64/kernel/vmlinux.lds.S | 11 +++----
 arch/arm64/mm/mmu.c             | 64 +++++++++++++++++++++++++++++++++++++----
 3 files changed, 71 insertions(+), 17 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] arm64: mm: Enable CONT_SIZE aligned sections for 64k page kernels.
  2016-02-19 17:46 [PATCH v3 0/2] flag contiguous PTEs in linear mapping Jeremy Linton
@ 2016-02-19 17:46 ` Jeremy Linton
  2016-02-19 17:46 ` [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous Jeremy Linton
  1 sibling, 0 replies; 9+ messages in thread
From: Jeremy Linton @ 2016-02-19 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

This change allows ALIGN_RODATA for 64k and 16k kernels.
In the case of 64k/16k kernels it actually aligns to the CONT_SIZE (2M)
rather than the SECTION_SIZE (which is 512M/32M). This makes it generally
more useful, given contiguous hit enabled kernels.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig.debug        | 13 +++++++------
 arch/arm64/kernel/vmlinux.lds.S | 11 ++++++-----
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
index e13c4bf..9d33d5e 100644
--- a/arch/arm64/Kconfig.debug
+++ b/arch/arm64/Kconfig.debug
@@ -59,15 +59,16 @@ config DEBUG_RODATA
           If in doubt, say Y
 
 config DEBUG_ALIGN_RODATA
-	depends on DEBUG_RODATA && ARM64_4K_PAGES
+	depends on DEBUG_RODATA
 	bool "Align linker sections up to SECTION_SIZE"
 	help
 	  If this option is enabled, sections that may potentially be marked as
-	  read only or non-executable will be aligned up to the section size of
-	  the kernel. This prevents sections from being split into pages and
-	  avoids a potential TLB penalty. The downside is an increase in
-	  alignment and potentially wasted space. Turn on this option if
-	  performance is more important than memory pressure.
+	  read only or non-executable will be aligned up to the section size
+	  or contiguous hint size of the kernel (2 MiB). This prevents sections
+	  from being split into pages and avoids a potential TLB penalty. The
+	  downside is an increase in alignment and potentially wasted space.
+	  Turn on this option if performance is more important than memory
+	  pressure.
 
 	  If in doubt, say N
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b78a3c7..8f4fc2c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -63,13 +63,14 @@ PECOFF_FILE_ALIGNMENT = 0x200;
 #endif
 
 #if defined(CONFIG_DEBUG_ALIGN_RODATA)
-#define ALIGN_DEBUG_RO			. = ALIGN(1<<SECTION_SHIFT);
-#define ALIGN_DEBUG_RO_MIN(min)		ALIGN_DEBUG_RO
+#if defined(CONFIG_ARM64_4K_PAGES)
+#define ALIGN_DEBUG_RO_MIN(min)		. = ALIGN(SECTION_SIZE);
+#else
+#define ALIGN_DEBUG_RO_MIN(min)		. = ALIGN(CONT_SIZE);
+#endif
 #elif defined(CONFIG_DEBUG_RODATA)
-#define ALIGN_DEBUG_RO			. = ALIGN(1<<PAGE_SHIFT);
-#define ALIGN_DEBUG_RO_MIN(min)		ALIGN_DEBUG_RO
+#define ALIGN_DEBUG_RO_MIN(min)		. = ALIGN(PAGE_SIZE);
 #else
-#define ALIGN_DEBUG_RO
 #define ALIGN_DEBUG_RO_MIN(min)		. = ALIGN(min);
 #endif
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-19 17:46 [PATCH v3 0/2] flag contiguous PTEs in linear mapping Jeremy Linton
  2016-02-19 17:46 ` [PATCH v3 1/2] arm64: mm: Enable CONT_SIZE aligned sections for 64k page kernels Jeremy Linton
@ 2016-02-19 17:46 ` Jeremy Linton
  2016-02-22 10:28   ` Ard Biesheuvel
  2016-02-25 16:16   ` Will Deacon
  1 sibling, 2 replies; 9+ messages in thread
From: Jeremy Linton @ 2016-02-19 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

With 64k pages, the next larger segment size is 512M. The linux
kernel also uses different protection flags to cover its code and data.
Because of these requirements, the vast majority of the kernel code and
data structures end up being mapped with 64k pages instead of the larger
pages common with a 4k page kernel.

Recent ARM processors support a contiguous bit in the
page tables which allows a TLB to cover a range larger than a
single PTE if that range is mapped into physically contiguous
RAM.

So, for the kernel its a good idea to set this flag. Some basic
micro benchmarks show it can significantly reduce the number of
L1 dTLB refills.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/mm/mmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7711554..ab69a99 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1,3 +1,4 @@
+
 /*
  * Based on arch/arm/mm/mmu.c
  *
@@ -103,17 +104,49 @@ static void split_pmd(pmd_t *pmd, pte_t *pte)
 		 * Need to have the least restrictive permissions available
 		 * permissions will be fixed up later
 		 */
-		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC_CONT));
 		pfn++;
 	} while (pte++, i++, i < PTRS_PER_PTE);
 }
 
+static void clear_cont_pte_range(pte_t *pte, unsigned long addr)
+{
+	int i;
+
+	pte -= CONT_RANGE_OFFSET(addr);
+	for (i = 0; i < CONT_PTES; i++) {
+		if (pte_cont(*pte))
+			set_pte(pte, pte_mknoncont(*pte));
+		pte++;
+	}
+	flush_tlb_all();
+}
+
+/*
+ * Given a range of PTEs set the pfn and provided page protection flags
+ */
+static void __populate_init_pte(pte_t *pte, unsigned long addr,
+			       unsigned long end, phys_addr_t phys,
+			       pgprot_t prot)
+{
+	unsigned long pfn = __phys_to_pfn(phys);
+
+	do {
+		/* clear all the bits except the pfn, then apply the prot */
+		set_pte(pte, pfn_pte(pfn, prot));
+		pte++;
+		pfn++;
+		addr += PAGE_SIZE;
+	} while (addr != end);
+}
+
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
-				  unsigned long end, unsigned long pfn,
+				  unsigned long end, phys_addr_t phys,
 				  pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void))
 {
 	pte_t *pte;
+	unsigned long next;
 
 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
 		phys_addr_t pte_phys = pgtable_alloc();
@@ -127,10 +160,29 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	BUG_ON(pmd_bad(*pmd));
 
 	pte = pte_set_fixmap_offset(pmd, addr);
+
 	do {
-		set_pte(pte, pfn_pte(pfn, prot));
-		pfn++;
-	} while (pte++, addr += PAGE_SIZE, addr != end);
+		next = min(end, (addr + CONT_SIZE) & CONT_MASK);
+		if (((addr | next | phys) & ~CONT_MASK) == 0) {
+			/* a block of CONT_PTES	 */
+			__populate_init_pte(pte, addr, next, phys,
+					    prot | __pgprot(PTE_CONT));
+		} else {
+			/*
+			 * If the range being split is already inside of a
+			 * contiguous range but this PTE isn't going to be
+			 * contiguous, then we want to unmark the adjacent
+			 * ranges, then update the portion of the range we
+			 * are interrested in.
+			 */
+			clear_cont_pte_range(pte, addr);
+			__populate_init_pte(pte, addr, next, phys, prot);
+		}
+
+		pte += (next - addr) >> PAGE_SHIFT;
+		phys += next - addr;
+		addr = next;
+	} while (addr != end);
 
 	pte_clear_fixmap();
 }
@@ -194,7 +246,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				}
 			}
 		} else {
-			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
+			alloc_init_pte(pmd, addr, next, phys,
 				       prot, pgtable_alloc);
 		}
 		phys += next - addr;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-19 17:46 ` [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous Jeremy Linton
@ 2016-02-22 10:28   ` Ard Biesheuvel
  2016-02-22 15:39     ` Jeremy Linton
  2016-02-25 16:16   ` Will Deacon
  1 sibling, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-02-22 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 February 2016 at 18:46, Jeremy Linton <jeremy.linton@arm.com> wrote:
> With 64k pages, the next larger segment size is 512M. The linux
> kernel also uses different protection flags to cover its code and data.
> Because of these requirements, the vast majority of the kernel code and
> data structures end up being mapped with 64k pages instead of the larger
> pages common with a 4k page kernel.
>
> Recent ARM processors support a contiguous bit in the
> page tables which allows a TLB to cover a range larger than a
> single PTE if that range is mapped into physically contiguous
> RAM.
>
> So, for the kernel its a good idea to set this flag. Some basic
> micro benchmarks show it can significantly reduce the number of
> L1 dTLB refills.
>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

Tested on 4k/3 levels, and the page tables look correct to me

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

You didn't answer my question though, regarding implementing this at
the PMD level for 16k pages kernels, so that we can use 1 GB chunks
for mapping system RAM. Would that just be a matter of doing the exact
same thing at the PMD level? Or is it more complicated than that?

-- 
Ard.

> ---
>  arch/arm64/mm/mmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 58 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554..ab69a99 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1,3 +1,4 @@
> +
>  /*
>   * Based on arch/arm/mm/mmu.c
>   *
> @@ -103,17 +104,49 @@ static void split_pmd(pmd_t *pmd, pte_t *pte)
>                  * Need to have the least restrictive permissions available
>                  * permissions will be fixed up later
>                  */
> -               set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
> +               set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC_CONT));
>                 pfn++;
>         } while (pte++, i++, i < PTRS_PER_PTE);
>  }
>
> +static void clear_cont_pte_range(pte_t *pte, unsigned long addr)
> +{
> +       int i;
> +
> +       pte -= CONT_RANGE_OFFSET(addr);
> +       for (i = 0; i < CONT_PTES; i++) {
> +               if (pte_cont(*pte))
> +                       set_pte(pte, pte_mknoncont(*pte));
> +               pte++;
> +       }
> +       flush_tlb_all();
> +}
> +
> +/*
> + * Given a range of PTEs set the pfn and provided page protection flags
> + */
> +static void __populate_init_pte(pte_t *pte, unsigned long addr,
> +                              unsigned long end, phys_addr_t phys,
> +                              pgprot_t prot)
> +{
> +       unsigned long pfn = __phys_to_pfn(phys);
> +
> +       do {
> +               /* clear all the bits except the pfn, then apply the prot */
> +               set_pte(pte, pfn_pte(pfn, prot));
> +               pte++;
> +               pfn++;
> +               addr += PAGE_SIZE;
> +       } while (addr != end);
> +}
> +
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> -                                 unsigned long end, unsigned long pfn,
> +                                 unsigned long end, phys_addr_t phys,
>                                   pgprot_t prot,
>                                   phys_addr_t (*pgtable_alloc)(void))
>  {
>         pte_t *pte;
> +       unsigned long next;
>
>         if (pmd_none(*pmd) || pmd_sect(*pmd)) {
>                 phys_addr_t pte_phys = pgtable_alloc();
> @@ -127,10 +160,29 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>         BUG_ON(pmd_bad(*pmd));
>
>         pte = pte_set_fixmap_offset(pmd, addr);
> +
>         do {
> -               set_pte(pte, pfn_pte(pfn, prot));
> -               pfn++;
> -       } while (pte++, addr += PAGE_SIZE, addr != end);
> +               next = min(end, (addr + CONT_SIZE) & CONT_MASK);
> +               if (((addr | next | phys) & ~CONT_MASK) == 0) {
> +                       /* a block of CONT_PTES  */
> +                       __populate_init_pte(pte, addr, next, phys,
> +                                           prot | __pgprot(PTE_CONT));
> +               } else {
> +                       /*
> +                        * If the range being split is already inside of a
> +                        * contiguous range but this PTE isn't going to be
> +                        * contiguous, then we want to unmark the adjacent
> +                        * ranges, then update the portion of the range we
> +                        * are interrested in.
> +                        */
> +                       clear_cont_pte_range(pte, addr);
> +                       __populate_init_pte(pte, addr, next, phys, prot);
> +               }
> +
> +               pte += (next - addr) >> PAGE_SHIFT;
> +               phys += next - addr;
> +               addr = next;
> +       } while (addr != end);
>
>         pte_clear_fixmap();
>  }
> @@ -194,7 +246,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>                                 }
>                         }
>                 } else {
> -                       alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
> +                       alloc_init_pte(pmd, addr, next, phys,
>                                        prot, pgtable_alloc);
>                 }
>                 phys += next - addr;
> --
> 2.4.3
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-22 10:28   ` Ard Biesheuvel
@ 2016-02-22 15:39     ` Jeremy Linton
  0 siblings, 0 replies; 9+ messages in thread
From: Jeremy Linton @ 2016-02-22 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/22/2016 04:28 AM, Ard Biesheuvel wrote:
> On 19 February 2016 at 18:46, Jeremy Linton <jeremy.linton@arm.com> wrote:
>> With 64k pages, the next larger segment size is 512M. The linux
>> kernel also uses different protection flags to cover its code and data.
>> Because of these requirements, the vast majority of the kernel code and
>> data structures end up being mapped with 64k pages instead of the larger
>> pages common with a 4k page kernel.
>>
>> Recent ARM processors support a contiguous bit in the
>> page tables which allows a TLB to cover a range larger than a
>> single PTE if that range is mapped into physically contiguous
>> RAM.
>>
>> So, for the kernel its a good idea to set this flag. Some basic
>> micro benchmarks show it can significantly reduce the number of
>> L1 dTLB refills.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>
> Tested on 4k/3 levels, and the page tables look correct to me
>
> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Thanks,

>
> You didn't answer my question though, regarding implementing this at
> the PMD level for 16k pages kernels, so that we can use 1 GB chunks
> for mapping system RAM. Would that just be a matter of doing the exact
> same thing at the PMD level? Or is it more complicated than that?
>

AFAIK, yes. It should be similar with the PMD ranges being unmarked on 
permission change or PTE allocation.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-19 17:46 ` [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous Jeremy Linton
  2016-02-22 10:28   ` Ard Biesheuvel
@ 2016-02-25 16:16   ` Will Deacon
  2016-02-25 20:46     ` Jeremy Linton
  1 sibling, 1 reply; 9+ messages in thread
From: Will Deacon @ 2016-02-25 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 19, 2016 at 11:46:23AM -0600, Jeremy Linton wrote:
> With 64k pages, the next larger segment size is 512M. The linux
> kernel also uses different protection flags to cover its code and data.
> Because of these requirements, the vast majority of the kernel code and
> data structures end up being mapped with 64k pages instead of the larger
> pages common with a 4k page kernel.
> 
> Recent ARM processors support a contiguous bit in the
> page tables which allows a TLB to cover a range larger than a
> single PTE if that range is mapped into physically contiguous
> RAM.
> 
> So, for the kernel its a good idea to set this flag. Some basic
> micro benchmarks show it can significantly reduce the number of
> L1 dTLB refills.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 58 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554..ab69a99 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1,3 +1,4 @@
> +
>  /*
>   * Based on arch/arm/mm/mmu.c
>   *
> @@ -103,17 +104,49 @@ static void split_pmd(pmd_t *pmd, pte_t *pte)
>  		 * Need to have the least restrictive permissions available
>  		 * permissions will be fixed up later
>  		 */
> -		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
> +		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC_CONT));
>  		pfn++;
>  	} while (pte++, i++, i < PTRS_PER_PTE);
>  }
>  
> +static void clear_cont_pte_range(pte_t *pte, unsigned long addr)
> +{
> +	int i;
> +
> +	pte -= CONT_RANGE_OFFSET(addr);
> +	for (i = 0; i < CONT_PTES; i++) {
> +		if (pte_cont(*pte))
> +			set_pte(pte, pte_mknoncont(*pte));
> +		pte++;
> +	}
> +	flush_tlb_all();

Do you still need this invalidation? I thought the table weren't even
live at this point?

> +}
> +
> +/*
> + * Given a range of PTEs set the pfn and provided page protection flags
> + */
> +static void __populate_init_pte(pte_t *pte, unsigned long addr,
> +			       unsigned long end, phys_addr_t phys,
> +			       pgprot_t prot)
> +{
> +	unsigned long pfn = __phys_to_pfn(phys);
> +
> +	do {
> +		/* clear all the bits except the pfn, then apply the prot */
> +		set_pte(pte, pfn_pte(pfn, prot));
> +		pte++;
> +		pfn++;
> +		addr += PAGE_SIZE;
> +	} while (addr != end);
> +}
> +
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> -				  unsigned long end, unsigned long pfn,
> +				  unsigned long end, phys_addr_t phys,
>  				  pgprot_t prot,
>  				  phys_addr_t (*pgtable_alloc)(void))
>  {
>  	pte_t *pte;
> +	unsigned long next;
>  
>  	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
>  		phys_addr_t pte_phys = pgtable_alloc();
> @@ -127,10 +160,29 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  	BUG_ON(pmd_bad(*pmd));
>  
>  	pte = pte_set_fixmap_offset(pmd, addr);
> +
>  	do {
> -		set_pte(pte, pfn_pte(pfn, prot));
> -		pfn++;
> -	} while (pte++, addr += PAGE_SIZE, addr != end);
> +		next = min(end, (addr + CONT_SIZE) & CONT_MASK);
> +		if (((addr | next | phys) & ~CONT_MASK) == 0) {
> +			/* a block of CONT_PTES	 */
> +			__populate_init_pte(pte, addr, next, phys,
> +					    prot | __pgprot(PTE_CONT));
> +		} else {
> +			/*
> +			 * If the range being split is already inside of a
> +			 * contiguous range but this PTE isn't going to be
> +			 * contiguous, then we want to unmark the adjacent
> +			 * ranges, then update the portion of the range we
> +			 * are interrested in.
> +			 */
> +			clear_cont_pte_range(pte, addr);
> +			__populate_init_pte(pte, addr, next, phys, prot);

I don't understand the comment or the code here... the splitting is now
done seperately, and I can't think of a scenario where you need to clear
the cont hint explicitly for adjacent ptes.

What am I missing?

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-25 16:16   ` Will Deacon
@ 2016-02-25 20:46     ` Jeremy Linton
  2016-02-26 17:28       ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Jeremy Linton @ 2016-02-25 20:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/25/2016 10:16 AM, Will Deacon wrote:
> On Fri, Feb 19, 2016 at 11:46:23AM -0600, Jeremy Linton wrote:

(trimming)

>> +static void clear_cont_pte_range(pte_t *pte, unsigned long addr)
>> +{
>> +	int i;
>> +
>> +	pte -= CONT_RANGE_OFFSET(addr);
>> +	for (i = 0; i < CONT_PTES; i++) {
>> +		if (pte_cont(*pte))
>> +			set_pte(pte, pte_mknoncont(*pte));
>> +		pte++;
>> +	}
>> +	flush_tlb_all();
>
> Do you still need this invalidation? I thought the table weren't even
> live at this point?

Well it continues to match the calls in alloc_init_p*.
	
I guess the worry is the extra flush that happens at 
create_mapping_late(), if mapping ranges aren't cont aligned? (because 
the loop won't actually be doing any set_pte's)

If this and the alloc_init_p* flushes are to be removed, there should 
probably be a way to detect any cases where the splits are happening 
after the tables have been activated. This might be a little less 
straightforward given efi_create_mapping().
>
>> +}
>> +
>> +/*
>> + * Given a range of PTEs set the pfn and provided page protection flags
>> + */
>> +static void __populate_init_pte(pte_t *pte, unsigned long addr,
>> +			       unsigned long end, phys_addr_t phys,
>> +			       pgprot_t prot)
>> +{
>> +	unsigned long pfn = __phys_to_pfn(phys);
>> +
>> +	do {
>> +		/* clear all the bits except the pfn, then apply the prot */
>> +		set_pte(pte, pfn_pte(pfn, prot));
>> +		pte++;
>> +		pfn++;
>> +		addr += PAGE_SIZE;
>> +	} while (addr != end);
>> +}
>> +
(trimming)
>> +
>>   	do {
>> -		set_pte(pte, pfn_pte(pfn, prot));
>> -		pfn++;
>> -	} while (pte++, addr += PAGE_SIZE, addr != end);
>> +		next = min(end, (addr + CONT_SIZE) & CONT_MASK);
>> +		if (((addr | next | phys) & ~CONT_MASK) == 0) {
>> +			/* a block of CONT_PTES	 */
>> +			__populate_init_pte(pte, addr, next, phys,
>> +					    prot | __pgprot(PTE_CONT));
>> +		} else {
>> +			/*
>> +			 * If the range being split is already inside of a
>> +			 * contiguous range but this PTE isn't going to be
>> +			 * contiguous, then we want to unmark the adjacent
>> +			 * ranges, then update the portion of the range we
>> +			 * are interested in.
>> +			 */
>> +			clear_cont_pte_range(pte, addr);
>> +			__populate_init_pte(pte, addr, next, phys, prot);
>
> I don't understand the comment or the code here... the splitting is now
> done seperately, and I can't think of a scenario where you need to clear
> the cont hint explicitly for adjacent ptes.

	
My understanding is that splitting is initially running this code path 
(via map_kernel_chunk, then again via create_mapping_late where splits 
won't happen). So, split_pmd() is creating cont ranges. When the ranges 
aren't sufficiently aligned then this is wiping out the cont mapping 
immediately after their creation.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-25 20:46     ` Jeremy Linton
@ 2016-02-26 17:28       ` Will Deacon
  2016-03-16 18:20         ` Catalin Marinas
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2016-02-26 17:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2016 at 02:46:54PM -0600, Jeremy Linton wrote:
> On 02/25/2016 10:16 AM, Will Deacon wrote:
> >On Fri, Feb 19, 2016 at 11:46:23AM -0600, Jeremy Linton wrote:
> 
> (trimming)
> 
> >>+static void clear_cont_pte_range(pte_t *pte, unsigned long addr)
> >>+{
> >>+	int i;
> >>+
> >>+	pte -= CONT_RANGE_OFFSET(addr);
> >>+	for (i = 0; i < CONT_PTES; i++) {
> >>+		if (pte_cont(*pte))
> >>+			set_pte(pte, pte_mknoncont(*pte));
> >>+		pte++;
> >>+	}
> >>+	flush_tlb_all();
> >
> >Do you still need this invalidation? I thought the table weren't even
> >live at this point?
> 
> Well it continues to match the calls in alloc_init_p*.

Ok, but if it's not needed (and I don't think it is), then we should
remove the invalidation from there too rather than add more of it.

> I guess the worry is the extra flush that happens at create_mapping_late(),
> if mapping ranges aren't cont aligned? (because the loop won't actually be
> doing any set_pte's)

I'm just concerned with trying to make this code understandable! I doubt
there's a performance argument to be made.

> If this and the alloc_init_p* flushes are to be removed, there should
> probably be a way to detect any cases where the splits are happening after
> the tables have been activated. This might be a little less straightforward
> given efi_create_mapping().

I think that's a separate issue, since splitting a live page table is
dodgy regardless of the flushing. But yes, it would be nice to detect
that case and scream about it.

> >
> >>+}
> >>+
> >>+/*
> >>+ * Given a range of PTEs set the pfn and provided page protection flags
> >>+ */
> >>+static void __populate_init_pte(pte_t *pte, unsigned long addr,
> >>+			       unsigned long end, phys_addr_t phys,
> >>+			       pgprot_t prot)
> >>+{
> >>+	unsigned long pfn = __phys_to_pfn(phys);
> >>+
> >>+	do {
> >>+		/* clear all the bits except the pfn, then apply the prot */
> >>+		set_pte(pte, pfn_pte(pfn, prot));
> >>+		pte++;
> >>+		pfn++;
> >>+		addr += PAGE_SIZE;
> >>+	} while (addr != end);
> >>+}
> >>+
> (trimming)
> >>+
> >>  	do {
> >>-		set_pte(pte, pfn_pte(pfn, prot));
> >>-		pfn++;
> >>-	} while (pte++, addr += PAGE_SIZE, addr != end);
> >>+		next = min(end, (addr + CONT_SIZE) & CONT_MASK);
> >>+		if (((addr | next | phys) & ~CONT_MASK) == 0) {
> >>+			/* a block of CONT_PTES	 */
> >>+			__populate_init_pte(pte, addr, next, phys,
> >>+					    prot | __pgprot(PTE_CONT));
> >>+		} else {
> >>+			/*
> >>+			 * If the range being split is already inside of a
> >>+			 * contiguous range but this PTE isn't going to be
> >>+			 * contiguous, then we want to unmark the adjacent
> >>+			 * ranges, then update the portion of the range we
> >>+			 * are interested in.
> >>+			 */
> >>+			clear_cont_pte_range(pte, addr);
> >>+			__populate_init_pte(pte, addr, next, phys, prot);
> >
> >I don't understand the comment or the code here... the splitting is now
> >done seperately, and I can't think of a scenario where you need to clear
> >the cont hint explicitly for adjacent ptes.
> 
> 	
> My understanding is that splitting is initially running this code path (via
> map_kernel_chunk, then again via create_mapping_late where splits won't
> happen). So, split_pmd() is creating cont ranges. When the ranges aren't
> sufficiently aligned then this is wiping out the cont mapping immediately
> after their creation.

Gotcha, thanks for the explanation (I somehow overlooked the initial
page table creation).

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous
  2016-02-26 17:28       ` Will Deacon
@ 2016-03-16 18:20         ` Catalin Marinas
  0 siblings, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2016-03-16 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 26, 2016 at 05:28:26PM +0000, Will Deacon wrote:
> On Thu, Feb 25, 2016 at 02:46:54PM -0600, Jeremy Linton wrote:
> > On 02/25/2016 10:16 AM, Will Deacon wrote:
> > >On Fri, Feb 19, 2016 at 11:46:23AM -0600, Jeremy Linton wrote:
> > >>+static void clear_cont_pte_range(pte_t *pte, unsigned long addr)
> > >>+{
> > >>+	int i;
> > >>+
> > >>+	pte -= CONT_RANGE_OFFSET(addr);
> > >>+	for (i = 0; i < CONT_PTES; i++) {
> > >>+		if (pte_cont(*pte))
> > >>+			set_pte(pte, pte_mknoncont(*pte));
> > >>+		pte++;
> > >>+	}
> > >>+	flush_tlb_all();
> > >
> > >Do you still need this invalidation? I thought the table weren't even
> > >live at this point?
> > 
> > Well it continues to match the calls in alloc_init_p*.
> 
> Ok, but if it's not needed (and I don't think it is), then we should
> remove the invalidation from there too rather than add more of it.

I agree. Talking to Mark R, I think all the create_pgd_mapping code
should be trimmed of p*d splitting entirely (with some warnings left in
place).

The efi_create_mapping() case is done on efi_mm anyway with a low
address and non-active TTBR0_EL1. As Mark suggested, dropping p*d
splitting should work for this case as well by ignoring the next
consecutive mapping if it overlaps since it's a linear mapping (possibly
some permissions may need to be upgraded). In this case, the TLBI would
be done by efi_create_mapping() or even skipped altogether assuming that
no page tables are changed (only going from invalid to valid).

My preference, however, would be for efi_create_mapping() should use a
different mechanism from create_pgd_mapping (aimed at very early memmap)
like remap_pfn_range() but I haven't looked in detail on whether this is
feasible.

Once we have this clean-up in place, we can rebase the contiguous PTE
patches on top and remove unnecessary TLBI.

-- 
Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-16 18:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-19 17:46 [PATCH v3 0/2] flag contiguous PTEs in linear mapping Jeremy Linton
2016-02-19 17:46 ` [PATCH v3 1/2] arm64: mm: Enable CONT_SIZE aligned sections for 64k page kernels Jeremy Linton
2016-02-19 17:46 ` [PATCH v3 2/2] arm64: Mark kernel page ranges contiguous Jeremy Linton
2016-02-22 10:28   ` Ard Biesheuvel
2016-02-22 15:39     ` Jeremy Linton
2016-02-25 16:16   ` Will Deacon
2016-02-25 20:46     ` Jeremy Linton
2016-02-26 17:28       ` Will Deacon
2016-03-16 18:20         ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).