* [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support.
@ 2012-10-18 16:15 Steve Capper
2012-10-18 16:15 ` [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE Steve Capper
` (6 more replies)
0 siblings, 7 replies; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
The following patches bring both HugeTLB support and Transparent HugePage (THP)
support to ARM.
Both short descriptors (non-LPAE) and long descriptors (LPAE) are supported.
The non-LPAE HugeTLB code is based on patches by Bill Carson [1], but instead of
allocating extra memory to store "Linux PTEs", it re-purposes the domain bits
of section descriptors and constructs huge Linux PTEs on demand.
As PMDs are walked directly by the kernel THP functions (there are no
huge_pmd_offset style functions), any "linux PMD"/"hardware PMD" distinction
would require some re-working of the ARM PMD/PTE code. Use of the domain bits
allows for a more straightforward THP implementation.
Some general HugeTLB code relating to huge page migration on memory failure
(CONFIG_MEMORY_FAILURE) de-references huge pte_t *s directly rather than use
the huge_ptep_get and set_huge_pte_at functions. Thus this config option is
incompatible with non-LPAE hugepages. At the moment I can only see x86 using
CONFIG_MEMORY_FAILURE though.
For non-LPAE, I make an assumption about how the memory type is mapped between
linux PTE and section descriptor. Ideally I would like to look this information
up, possibly from get_mem_types(MT_MEMORY). Comments on an elegant way of
achieving this are welcome.
Non-LPAE code was tested on a Versatile Express (V2P-CA15_A7 Cortex A15 tile),
Tegra 2 TrimSlice and RealView ARM11MPCore.
The LPAE code manipulates the hardware page tables directly as the long
descriptors are wide enough to contain all the Linux PTE information.
The LPAE code has been tested on a Versatile Express: V2F-2XV6 Cortex A15 and
V2P-CA15_A7 Cortex A15 tiles.
This patch set requires the following to be applied on top of 3.7-rc1:
"ARM: mm: introduce L_PTE_VALID for page table entries"
(PROT_NONE series, posted by Will on linux-arm-kernel)
- http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/126130.html
"mm: thp: Set the accessed flag for old pages on access fault."
(posted by Will on linux-mm)
- http://marc.info/?l=linux-kernel&m=135048927416117&w=2
Cheers,
--
Steve
[1] - http://lists.infradead.org/pipermail/linux-arm-kernel/2012-February/084359.html
Catalin Marinas (2):
ARM: mm: HugeTLB support for LPAE systems.
ARM: mm: Transparent huge page support for LPAE systems.
Steve Capper (4):
ARM: mm: correct pte_same behaviour for LPAE.
ARM: mm: Add support for flushing HugeTLB pages.
ARM: mm: HugeTLB support for non-LPAE systems.
ARM: mm: Transparent huge page support for non-LPAE systems.
arch/arm/Kconfig | 8 ++
arch/arm/include/asm/hugetlb-2level.h | 71 ++++++++++
arch/arm/include/asm/hugetlb-3level.h | 61 +++++++++
arch/arm/include/asm/hugetlb.h | 87 ++++++++++++
arch/arm/include/asm/pgtable-2level.h | 152 ++++++++++++++++++++-
arch/arm/include/asm/pgtable-3level-hwdef.h | 2 +
arch/arm/include/asm/pgtable-3level.h | 77 +++++++++++
arch/arm/include/asm/pgtable.h | 34 ++++-
arch/arm/include/asm/tlb.h | 16 ++-
arch/arm/include/asm/tlbflush.h | 2 +
arch/arm/kernel/head.S | 10 +-
arch/arm/mm/Makefile | 6 +
arch/arm/mm/dma-mapping.c | 2 +-
arch/arm/mm/fault.c | 6 +-
arch/arm/mm/flush.c | 25 ++--
arch/arm/mm/fsr-3level.c | 2 +-
arch/arm/mm/hugetlbpage-2level.c | 115 ++++++++++++++++
arch/arm/mm/hugetlbpage-3level.c | 190 +++++++++++++++++++++++++++
arch/arm/mm/hugetlbpage.c | 65 +++++++++
19 files changed, 909 insertions(+), 22 deletions(-)
create mode 100644 arch/arm/include/asm/hugetlb-2level.h
create mode 100644 arch/arm/include/asm/hugetlb-3level.h
create mode 100644 arch/arm/include/asm/hugetlb.h
create mode 100644 arch/arm/mm/hugetlbpage-2level.c
create mode 100644 arch/arm/mm/hugetlbpage-3level.c
create mode 100644 arch/arm/mm/hugetlbpage.c
--
1.7.9.5
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
@ 2012-10-18 16:15 ` Steve Capper
2013-01-04 5:03 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages Steve Capper
` (5 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes
that are written to a page table but not for ptes created with mk_pte.
This can cause some comparison tests made by pte_same to fail spuriously and
lead to other problems.
To correct this behaviour, we mask off PTE_EXT_NG for any pte that is
present before running the comparison.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
arch/arm/include/asm/pgtable-2level.h | 5 +++++
arch/arm/include/asm/pgtable-3level.h | 5 +++++
arch/arm/include/asm/pgtable.h | 23 +++++++++++++++++++++++
3 files changed, 33 insertions(+)
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 2317a71..662a00e 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -125,6 +125,11 @@
#define L_PTE_SHARED (_AT(pteval_t, 1) << 10) /* shared(v6), coherent(xsc3) */
/*
+ * for 2 levels of paging we don't mask off any bits when comparing present ptes
+ */
+#define L_PTE_CMP_MASKOFF 0
+
+/*
* These are the memory types, defined to be compatible with
* pre-ARMv6 CPUs cacheable and bufferable bits: XXCB
*/
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b249035..0eaeb55 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -84,6 +84,11 @@
#define L_PTE_DIRTY_HIGH (1 << (55 - 32))
/*
+ * we need to mask off PTE_EXT_NG when comparing present ptes.
+ */
+#define L_PTE_CMP_MASKOFF PTE_EXT_NG
+
+/*
* AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
*/
#define L_PTE_MT_UNCACHED (_AT(pteval_t, 0) << 2) /* strongly ordered */
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 08c1231..c35bf46 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -248,6 +248,29 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
}
/*
+ * For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes
+ * that are written to a page table but not for ptes created with mk_pte.
+ *
+ * This can cause some comparison tests made by pte_same to fail spuriously and
+ * lead to other problems.
+ *
+ * To correct this behaviour, we mask off PTE_EXT_NG for any pte that is
+ * present before running the comparison.
+ */
+#define __HAVE_ARCH_PTE_SAME
+static inline int pte_same(pte_t pte_a, pte_t pte_b)
+{
+ pteval_t vala = pte_val(pte_a), valb = pte_val(pte_b);
+ if (pte_present(pte_a))
+ vala &= ~L_PTE_CMP_MASKOFF;
+
+ if (pte_present(pte_b))
+ valb &= ~L_PTE_CMP_MASKOFF;
+
+ return vala == valb;
+}
+
+/*
* Encode and decode a swap entry. Swap entries are stored in the Linux
* page tables as follows:
*
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
2012-10-18 16:15 ` [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE Steve Capper
@ 2012-10-18 16:15 ` Steve Capper
2013-01-04 5:03 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems Steve Capper
` (4 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
On ARM we use the __flush_dcache_page function to flush the dcache of pages
when needed; usually when the PG_dcache_clean bit is unset and we are setting a
PTE.
A HugeTLB page is represented as a compound page consisting of an array of
pages. Thus to flush the dcache of a HugeTLB page, one must flush more than a
single page.
This patch modifies __flush_dcache_page such that all constituent pages of a
HugeTLB page are flushed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
arch/arm/mm/flush.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 1c8f7f5..0a69cb8 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -17,6 +17,7 @@
#include <asm/highmem.h>
#include <asm/smp_plat.h>
#include <asm/tlbflush.h>
+#include <linux/hugetlb.h>
#include "mm.h"
@@ -168,17 +169,21 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
* coherent with the kernels mapping.
*/
if (!PageHighMem(page)) {
- __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
+ __cpuc_flush_dcache_area(page_address(page), (PAGE_SIZE << compound_order(page)));
} else {
- void *addr = kmap_high_get(page);
- if (addr) {
- __cpuc_flush_dcache_area(addr, PAGE_SIZE);
- kunmap_high(page);
- } else if (cache_is_vipt()) {
- /* unmapped pages might still be cached */
- addr = kmap_atomic(page);
- __cpuc_flush_dcache_area(addr, PAGE_SIZE);
- kunmap_atomic(addr);
+ unsigned long i;
+ for(i = 0; i < (1 << compound_order(page)); i++) {
+ struct page *cpage = page + i;
+ void *addr = kmap_high_get(cpage);
+ if (addr) {
+ __cpuc_flush_dcache_area(addr, PAGE_SIZE);
+ kunmap_high(cpage);
+ } else if (cache_is_vipt()) {
+ /* unmapped pages might still be cached */
+ addr = kmap_atomic(cpage);
+ __cpuc_flush_dcache_area(addr, PAGE_SIZE);
+ kunmap_atomic(addr);
+ }
}
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
2012-10-18 16:15 ` [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE Steve Capper
2012-10-18 16:15 ` [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages Steve Capper
@ 2012-10-18 16:15 ` Steve Capper
2013-01-04 5:03 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems Steve Capper
` (3 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
From: Catalin Marinas <catalin.marinas@arm.com>
This patch adds support for hugetlbfs based on the x86 implementation.
It allows mapping of 2MB sections (see Documentation/vm/hugetlbpage.txt
for usage). The 64K pages configuration is not supported (section size
is 512MB in this case).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper at arm.com: symbolic constants replace numbers in places.
Split up into multiple files, to simplify future non-LPAE support].
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
arch/arm/Kconfig | 4 +
arch/arm/include/asm/hugetlb-3level.h | 61 +++++++++++
arch/arm/include/asm/hugetlb.h | 83 ++++++++++++++
arch/arm/include/asm/pgtable-3level.h | 13 +++
arch/arm/mm/Makefile | 2 +
arch/arm/mm/dma-mapping.c | 2 +-
arch/arm/mm/hugetlbpage-3level.c | 190 +++++++++++++++++++++++++++++++++
arch/arm/mm/hugetlbpage.c | 65 +++++++++++
8 files changed, 419 insertions(+), 1 deletion(-)
create mode 100644 arch/arm/include/asm/hugetlb-3level.h
create mode 100644 arch/arm/include/asm/hugetlb.h
create mode 100644 arch/arm/mm/hugetlbpage-3level.c
create mode 100644 arch/arm/mm/hugetlbpage.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73067ef..d863781 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1767,6 +1767,10 @@ config HW_PERF_EVENTS
Enable hardware performance counter support for perf events. If
disabled, perf events will use software events only.
+config SYS_SUPPORTS_HUGETLBFS
+ def_bool y
+ depends on ARM_LPAE
+
source "mm/Kconfig"
config FORCE_MAX_ZONEORDER
diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
new file mode 100644
index 0000000..4868064
--- /dev/null
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -0,0 +1,61 @@
+/*
+ * arch/arm/include/asm/hugetlb-3level.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * Based on arch/x86/include/asm/hugetlb.h.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _ASM_ARM_HUGETLB_3LEVEL_H
+#define _ASM_ARM_HUGETLB_3LEVEL_H
+
+static inline pte_t huge_ptep_get(pte_t *ptep)
+{
+ return *ptep;
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ set_pte_at(mm, addr, ptep, pte);
+}
+
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ ptep_clear_flush(vma, addr, ptep);
+}
+
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ ptep_set_wrprotect(mm, addr, ptep);
+}
+
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ return ptep_get_and_clear(mm, addr, ptep);
+}
+
+static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ pte_t pte, int dirty)
+{
+ return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
+}
+
+#endif /* _ASM_ARM_HUGETLB_3LEVEL_H */
diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
new file mode 100644
index 0000000..7af9cf6
--- /dev/null
+++ b/arch/arm/include/asm/hugetlb.h
@@ -0,0 +1,83 @@
+/*
+ * arch/arm/include/asm/hugetlb.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * Based on arch/x86/include/asm/hugetlb.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _ASM_ARM_HUGETLB_H
+#define _ASM_ARM_HUGETLB_H
+
+#include <asm/page.h>
+
+#include <asm/hugetlb-3level.h>
+
+static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
+ unsigned long addr, unsigned long end,
+ unsigned long floor,
+ unsigned long ceiling)
+{
+ free_pgd_range(tlb, addr, end, floor, ceiling);
+}
+
+
+static inline int is_hugepage_only_range(struct mm_struct *mm,
+ unsigned long addr, unsigned long len)
+{
+ return 0;
+}
+
+static inline int prepare_hugepage_range(struct file *file,
+ unsigned long addr, unsigned long len)
+{
+ struct hstate *h = hstate_file(file);
+ if (len & ~huge_page_mask(h))
+ return -EINVAL;
+ if (addr & ~huge_page_mask(h))
+ return -EINVAL;
+ return 0;
+}
+
+static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm)
+{
+}
+
+static inline int huge_pte_none(pte_t pte)
+{
+ return pte_none(pte);
+}
+
+static inline pte_t huge_pte_wrprotect(pte_t pte)
+{
+ return pte_wrprotect(pte);
+}
+
+static inline int arch_prepare_hugepage(struct page *page)
+{
+ return 0;
+}
+
+static inline void arch_release_hugepage(struct page *page)
+{
+}
+
+static inline void arch_clear_hugepage_flags(struct page *page)
+{
+ clear_bit(PG_dcache_clean, &page->flags);
+}
+
+#endif /* _ASM_ARM_HUGETLB_H */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 0eaeb55..d086f61 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -62,6 +62,14 @@
#define USER_PTRS_PER_PGD (PAGE_OFFSET / PGDIR_SIZE)
/*
+ * Hugetlb definitions.
+ */
+#define HPAGE_SHIFT PMD_SHIFT
+#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK (~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
+
+/*
* "Linux" PTE definitions for LPAE.
*
* These bits overlap with the hardware bits but the naming is preserved for
@@ -153,6 +161,11 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,__pte(pte_val(pte)|(ext)))
+#define pte_huge(pte) ((pte_val(pte) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+
+#define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
+
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_PGTABLE_3LEVEL_H */
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 8a9c4cb..1560bbc 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -16,6 +16,8 @@ obj-$(CONFIG_MODULES) += proc-syms.o
obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
obj-$(CONFIG_HIGHMEM) += highmem.o
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-3level.o
obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 477a2d2..3ced228 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -241,7 +241,7 @@ static void __dma_free_buffer(struct page *page, size_t size)
#ifdef CONFIG_MMU
#ifdef CONFIG_HUGETLB_PAGE
-#error ARM Coherent DMA allocator does not (yet) support huge TLB
+#warning ARM Coherent DMA allocator does not (yet) support huge TLB
#endif
static void *__alloc_from_contiguous(struct device *dev, size_t size,
diff --git a/arch/arm/mm/hugetlbpage-3level.c b/arch/arm/mm/hugetlbpage-3level.c
new file mode 100644
index 0000000..86474f0
--- /dev/null
+++ b/arch/arm/mm/hugetlbpage-3level.c
@@ -0,0 +1,190 @@
+/*
+ * arch/arm/mm/hugetlbpage-3level.c
+ *
+ * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * Based on arch/x86/mm/hugetlbpage.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/pagemap.h>
+#include <linux/err.h>
+#include <linux/sysctl.h>
+#include <asm/mman.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
+
+static unsigned long page_table_shareable(struct vm_area_struct *svma,
+ struct vm_area_struct *vma,
+ unsigned long addr, pgoff_t idx)
+{
+ unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) +
+ svma->vm_start;
+ unsigned long sbase = saddr & PUD_MASK;
+ unsigned long s_end = sbase + PUD_SIZE;
+
+ /* Allow segments to share if only one is marked locked */
+ unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
+ unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
+
+ /*
+ * match the virtual addresses, permission and the alignment of the
+ * page table page.
+ */
+ if (pmd_index(addr) != pmd_index(saddr) ||
+ vm_flags != svm_flags ||
+ sbase < svma->vm_start || svma->vm_end < s_end)
+ return 0;
+
+ return saddr;
+}
+
+static int vma_shareable(struct vm_area_struct *vma, unsigned long addr)
+{
+ unsigned long base = addr & PUD_MASK;
+ unsigned long end = base + PUD_SIZE;
+
+ /*
+ * check on proper vm_flags and page table alignment
+ */
+ if (vma->vm_flags & VM_MAYSHARE &&
+ vma->vm_start <= base && end <= vma->vm_end)
+ return 1;
+ return 0;
+}
+
+/*
+ * search for a shareable pmd page for hugetlb.
+ */
+static pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr,
+ pud_t *pud)
+{
+ struct vm_area_struct *vma = find_vma(mm, addr);
+ struct address_space *mapping = vma->vm_file->f_mapping;
+ pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
+ vma->vm_pgoff;
+ struct vm_area_struct *svma;
+ unsigned long saddr;
+ pte_t *spte = NULL;
+ pte_t *pte;
+
+ if (!vma_shareable(vma, addr))
+ return (pte_t *)pmd_alloc(mm, pud, addr);
+
+ mutex_lock(&mapping->i_mmap_mutex);
+ vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
+ if (svma == vma)
+ continue;
+
+ saddr = page_table_shareable(svma, vma, addr, idx);
+ if (saddr) {
+ spte = huge_pte_offset(svma->vm_mm, saddr);
+ if (spte) {
+ get_page(virt_to_page(spte));
+ break;
+ }
+ }
+ }
+
+ if (!spte)
+ goto out;
+
+ spin_lock(&mm->page_table_lock);
+ if (pud_none(*pud))
+ pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK));
+ else
+ put_page(virt_to_page(spte));
+ spin_unlock(&mm->page_table_lock);
+out:
+ pte = (pte_t *)pmd_alloc(mm, pud, addr);
+ mutex_unlock(&mapping->i_mmap_mutex);
+ return pte;
+}
+
+/*
+ * unmap huge page backed by shared pte.
+ *
+ * Hugetlb pte page is ref counted at the time of mapping. If pte is shared
+ * indicated by page_count > 1, unmap is achieved by clearing pud and
+ * decrementing the ref count. If count == 1, the pte page is not shared.
+ *
+ * called with vma->vm_mm->page_table_lock held.
+ *
+ * returns: 1 successfully unmapped a shared pte page
+ * 0 the underlying pte page is not shared, or it is the last user
+ */
+int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+{
+ pgd_t *pgd = pgd_offset(mm, *addr);
+ pud_t *pud = pud_offset(pgd, *addr);
+
+ BUG_ON(page_count(virt_to_page(ptep)) == 0);
+ if (page_count(virt_to_page(ptep)) == 1)
+ return 0;
+
+ pud_clear(pud);
+ put_page(virt_to_page(ptep));
+ *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
+ return 1;
+}
+
+pte_t *huge_pte_alloc(struct mm_struct *mm,
+ unsigned long addr, unsigned long sz)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pte_t *pte = NULL;
+
+ pgd = pgd_offset(mm, addr);
+ pud = pud_alloc(mm, pgd, addr);
+ if (pud) {
+ BUG_ON(sz != PMD_SIZE);
+ if (pud_none(*pud))
+ pte = huge_pmd_share(mm, addr, pud);
+ else
+ pte = (pte_t *)pmd_alloc(mm, pud, addr);
+ }
+ BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
+
+ return pte;
+}
+
+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+ pmd_t *pmd, int write)
+{
+ struct page *page;
+
+ page = pte_page(*(pte_t *)pmd);
+ if (page)
+ page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ return page;
+}
+
+struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
+ pud_t *pud, int write)
+{
+ struct page *page;
+
+ page = pte_page(*(pte_t *)pud);
+ if (page)
+ page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
+ return page;
+}
diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
new file mode 100644
index 0000000..32fe7fd
--- /dev/null
+++ b/arch/arm/mm/hugetlbpage.c
@@ -0,0 +1,65 @@
+/*
+ * arch/arm/mm/hugetlbpage.c
+ *
+ * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * Based on arch/x86/mm/hugetlbpage.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/pagemap.h>
+#include <linux/err.h>
+#include <linux/sysctl.h>
+#include <asm/mman.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
+
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd = NULL;
+
+ pgd = pgd_offset(mm, addr);
+ if (pgd_present(*pgd)) {
+ pud = pud_offset(pgd, addr);
+ if (pud_present(*pud))
+ pmd = pmd_offset(pud, addr);
+ }
+
+ return (pte_t *)pmd;
+}
+
+struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
+ int write)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+int pmd_huge(pmd_t pmd)
+{
+ return (pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT;
+}
+
+int pud_huge(pud_t pud)
+{
+ return 0;
+}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
` (2 preceding siblings ...)
2012-10-18 16:15 ` [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems Steve Capper
@ 2012-10-18 16:15 ` Steve Capper
2013-01-04 5:04 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems Steve Capper
` (2 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
Based on Bill Carson's HugeTLB patch, with the big difference being in the way
PTEs are passed back to the memory manager. Rather than store a "Linux Huge
PTE" separately; we make one up on the fly in huge_ptep_get. Also rather than
consider 16M supersections, we focus solely on 2x1M sections.
To construct a huge PTE on the fly we need additional information (such as the
accessed flag and dirty bit) which we choose to store in the domain bits of the
short section descriptor. In order to use these domain bits for storage, we need
to make ourselves a client for all 16 domains and this is done in head.S.
Storing extra information in the domain bits also makes it a lot easier to
implement Transparent Huge Pages, and some of the code in pgtable-2level.h is
arranged to facilitate THP support in a later patch.
Non-LPAE HugeTLB pages are incompatible with the huge page migration code
(enabled when CONFIG_MEMORY_FAILURE is selected) as that code dereferences PTEs
directly, rather than calling huge_ptep_get and set_huge_pte_at.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/hugetlb-2level.h | 71 ++++++++++++++++++++
arch/arm/include/asm/hugetlb.h | 4 ++
arch/arm/include/asm/pgtable-2level.h | 79 +++++++++++++++++++++-
arch/arm/include/asm/tlb.h | 10 ++-
arch/arm/kernel/head.S | 10 ++-
arch/arm/mm/Makefile | 4 ++
arch/arm/mm/fault.c | 6 +-
arch/arm/mm/hugetlbpage-2level.c | 115 +++++++++++++++++++++++++++++++++
9 files changed, 293 insertions(+), 8 deletions(-)
create mode 100644 arch/arm/include/asm/hugetlb-2level.h
create mode 100644 arch/arm/mm/hugetlbpage-2level.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index d863781..dd0a230 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1769,7 +1769,7 @@ config HW_PERF_EVENTS
config SYS_SUPPORTS_HUGETLBFS
def_bool y
- depends on ARM_LPAE
+ depends on ARM_LPAE || (!CPU_USE_DOMAINS && !MEMORY_FAILURE)
source "mm/Kconfig"
diff --git a/arch/arm/include/asm/hugetlb-2level.h b/arch/arm/include/asm/hugetlb-2level.h
new file mode 100644
index 0000000..3532b54
--- /dev/null
+++ b/arch/arm/include/asm/hugetlb-2level.h
@@ -0,0 +1,71 @@
+/*
+ * arch/arm/include/asm/hugetlb-2level.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _ASM_ARM_HUGETLB_2LEVEL_H
+#define _ASM_ARM_HUGETLB_2LEVEL_H
+
+
+pte_t huge_ptep_get(pte_t *ptep);
+
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
+
+static inline pte_t pte_mkhuge(pte_t pte) { return pte; }
+
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
+}
+
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t *) ptep;
+ set_pmd_at(mm, addr, pmdp, pmd_wrprotect(*pmdp));
+}
+
+
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t *)ptep;
+ pte_t pte = huge_ptep_get(ptep);
+ pmd_clear(pmdp);
+
+ return pte;
+}
+
+static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ pte_t pte, int dirty)
+{
+ int changed = !pte_same(huge_ptep_get(ptep), pte);
+
+ if (changed) {
+ set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+ huge_ptep_clear_flush(vma, addr, &pte);
+ }
+
+ return changed;
+}
+
+#endif /* _ASM_ARM_HUGETLB_2LEVEL_H */
diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 7af9cf6..1e92975 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -24,7 +24,11 @@
#include <asm/page.h>
+#ifdef CONFIG_ARM_LPAE
#include <asm/hugetlb-3level.h>
+#else
+#include <asm/hugetlb-2level.h>
+#endif
static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 662a00e..fd1d9be 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -163,7 +163,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
return (pmd_t *)pud;
}
-#define pmd_bad(pmd) (pmd_val(pmd) & 2)
+#define pmd_bad(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_FAULT)
#define copy_pmd(pmdpd,pmdps) \
do { \
@@ -184,6 +184,83 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+
+#ifdef CONFIG_SYS_SUPPORTS_HUGETLBFS
+
+/*
+ * now follows some of the definitions to allow huge page support, we can't put
+ * these in the hugetlb source files as they are also required for transparent
+ * hugepage support.
+ */
+
+#define HPAGE_SHIFT PMD_SHIFT
+#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK (~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
+
+#define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
+#define HUGE_LINUX_PTE_SIZE (HUGE_LINUX_PTE_COUNT * sizeof(pte_t *))
+#define HUGE_LINUX_PTE_INDEX(addr) (addr >> HPAGE_SHIFT)
+
+/*
+ * We re-purpose the following domain bits in the section descriptor
+ */
+#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
+#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
+
+#define PMD_BIT_FUNC(fn,op) \
+static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
+
+PMD_BIT_FUNC(wrprotect, &= ~PMD_SECT_AP_WRITE);
+
+static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd)
+{
+ /*
+ * we can sometimes be passed a pmd pointing to a level 2 descriptor
+ * from collapse_huge_page.
+ */
+ if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_TABLE) {
+ pmdp[0] = __pmd(pmd_val(pmd));
+ pmdp[1] = __pmd(pmd_val(pmd) + 256 * sizeof(pte_t));
+ } else {
+ pmdp[0] = __pmd(pmd_val(pmd)); /* first 1M section */
+ pmdp[1] = __pmd(pmd_val(pmd) + SECTION_SIZE); /* second 1M section */
+ }
+
+ flush_pmd_entry(pmdp);
+}
+
+#define HPMD_XLATE(res, cmp, from, to) do { if (cmp & from) res |= to; \
+ else res &= ~to; \
+ } while (0)
+
+static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
+{
+ pmdval_t pmdval = pmd_val(pmd);
+ pteval_t newprotval = pgprot_val(newprot);
+
+ HPMD_XLATE(pmdval, newprotval, L_PTE_XN, PMD_SECT_XN);
+ HPMD_XLATE(pmdval, newprotval, L_PTE_SHARED, PMD_SECT_S);
+ HPMD_XLATE(pmdval, newprotval, L_PTE_YOUNG, PMD_DSECT_AF);
+ HPMD_XLATE(pmdval, newprotval, L_PTE_DIRTY, PMD_DSECT_DIRTY);
+
+ /* preserve bits C & B */
+ pmdval |= (newprotval & (3 << 2));
+
+ /* Linux PTE bit 4 corresponds to PMD TEX bit 0 */
+ HPMD_XLATE(pmdval, newprotval, 1 << 4, PMD_SECT_TEX(1));
+
+ if (newprotval & L_PTE_RDONLY)
+ pmdval &= ~PMD_SECT_AP_WRITE;
+ else
+ pmdval |= PMD_SECT_AP_WRITE;
+
+ return __pmd(pmdval);
+}
+
+#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_PGTABLE_2LEVEL_H */
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 99a1951..685e9e87 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -92,10 +92,16 @@ static inline void tlb_flush(struct mmu_gather *tlb)
static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
{
if (!tlb->fullmm) {
+ unsigned long size = PAGE_SIZE;
+
if (addr < tlb->range_start)
tlb->range_start = addr;
- if (addr + PAGE_SIZE > tlb->range_end)
- tlb->range_end = addr + PAGE_SIZE;
+
+ if (tlb->vma && is_vm_hugetlb_page(tlb->vma))
+ size = HPAGE_SIZE;
+
+ if (addr + size > tlb->range_end)
+ tlb->range_end = addr + size;
}
}
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 4eee351..860f08e 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -410,13 +410,21 @@ __enable_mmu:
mov r5, #0
mcrr p15, 0, r4, r5, c2 @ load TTBR0
#else
+#ifndef CONFIG_SYS_SUPPORTS_HUGETLBFS
mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
domain_val(DOMAIN_IO, DOMAIN_CLIENT))
+#else
+ @ set ourselves as the client in all domains
+ @ this allows us to then use the 4 domain bits in the
+ @ section descriptors in our transparent huge pages
+ ldr r5, =0x55555555
+#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
+
mcr p15, 0, r5, c3, c0, 0 @ load domain access register
mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
-#endif
+#endif /* CONFIG_ARM_LPAE */
b __turn_mmu_on
ENDPROC(__enable_mmu)
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 1560bbc..adf0b19 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -17,7 +17,11 @@ obj-$(CONFIG_MODULES) += proc-syms.o
obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
obj-$(CONFIG_HIGHMEM) += highmem.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
+ifeq ($(CONFIG_ARM_LPAE),y)
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-3level.o
+else
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-2level.o
+endif
obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 5dbf13f..0884936 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -488,13 +488,13 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
#endif /* CONFIG_MMU */
/*
- * Some section permission faults need to be handled gracefully.
- * They can happen due to a __{get,put}_user during an oops.
+ * A fault in a section will likely be due to a huge page, treat it
+ * as a page fault.
*/
static int
do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
{
- do_bad_area(addr, fsr, regs);
+ do_page_fault(addr, fsr, regs);
return 0;
}
diff --git a/arch/arm/mm/hugetlbpage-2level.c b/arch/arm/mm/hugetlbpage-2level.c
new file mode 100644
index 0000000..4b2b38c
--- /dev/null
+++ b/arch/arm/mm/hugetlbpage-2level.c
@@ -0,0 +1,115 @@
+/*
+ * arch/arm/mm/hugetlbpage-2level.c
+ *
+ * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
+ * Copyright (C) 2012 ARM Ltd
+ * Copyright (C) 2012 Bill Carson.
+ *
+ * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/pagemap.h>
+#include <linux/err.h>
+#include <linux/sysctl.h>
+#include <asm/mman.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
+
+int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+{
+ return 0;
+}
+
+pte_t *huge_pte_alloc(struct mm_struct *mm,
+ unsigned long addr, unsigned long sz)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd = pgd_offset(mm, addr);
+ pud = pud_offset(pgd, addr);
+ pmd = pmd_offset(pud, addr);
+
+ return (pte_t *)pmd; /* our huge pte is actually a pmd */
+}
+
+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+ pmd_t *pmd, int write)
+{
+ struct page *page;
+ unsigned long pfn;
+
+ BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
+ pfn = ((pmd_val(*pmd) & HPAGE_MASK) >> PAGE_SHIFT);
+ page = pfn_to_page(pfn);
+ return page;
+}
+
+pte_t huge_ptep_get(pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t*)ptep;
+ pmdval_t pmdval = pmd_val(*pmdp);
+ pteval_t retval;
+
+ if (!pmdval)
+ return __pte(0);
+
+ retval = (pteval_t) (pmdval & HPAGE_MASK);
+ HPMD_XLATE(retval, pmdval, PMD_SECT_XN, L_PTE_XN);
+ HPMD_XLATE(retval, pmdval, PMD_SECT_S, L_PTE_SHARED);
+ HPMD_XLATE(retval, pmdval, PMD_DSECT_AF, L_PTE_YOUNG);
+ HPMD_XLATE(retval, pmdval, PMD_DSECT_DIRTY, L_PTE_DIRTY);
+
+ /* preserve bits C & B */
+ retval |= (pmdval & (3 << 2));
+
+ /* PMD TEX bit 0 corresponds to Linux PTE bit 4 */
+ HPMD_XLATE(retval, pmdval, PMD_SECT_TEX(1), 1 << 4);
+
+ if (pmdval & PMD_SECT_AP_WRITE)
+ retval &= ~L_PTE_RDONLY;
+ else
+ retval |= L_PTE_RDONLY;
+
+ if ((pmdval & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+ retval |= L_PTE_VALID;
+
+ /* we assume all hugetlb pages are user */
+ retval |= L_PTE_USER;
+
+ return __pte(retval);
+}
+
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ pmdval_t pmdval = (pmdval_t) pte_val(pte);
+ pmd_t *pmdp = (pmd_t*) ptep;
+
+ pmdval &= HPAGE_MASK;
+ pmdval |= PMD_SECT_AP_READ | PMD_SECT_nG | PMD_TYPE_SECT;
+ pmdval = pmd_val(pmd_modify(__pmd(pmdval), __pgprot(pte_val(pte))));
+
+ __sync_icache_dcache(pte);
+
+ set_pmd_at(mm, addr, pmdp, __pmd(pmdval));
+}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
` (3 preceding siblings ...)
2012-10-18 16:15 ` [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems Steve Capper
@ 2012-10-18 16:15 ` Steve Capper
2013-01-04 5:04 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems Steve Capper
2012-12-21 13:41 ` [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Gregory CLEMENT
6 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
From: Catalin Marinas <catalin.marinas@arm.com>
The patch adds support for THP (transparent huge pages) to LPAE systems. When
this feature is enabled, the kernel tries to map anonymous pages as 2MB
sections where possible.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper at arm.com: symbolic constants used, value of PMD_SECT_SPLITTING
adjusted, tlbflush.h included in pgtable.h]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
arch/arm/Kconfig | 4 ++
arch/arm/include/asm/pgtable-2level.h | 2 +
arch/arm/include/asm/pgtable-3level-hwdef.h | 2 +
arch/arm/include/asm/pgtable-3level.h | 57 +++++++++++++++++++++++++++
arch/arm/include/asm/pgtable.h | 4 +-
arch/arm/include/asm/tlb.h | 6 +++
arch/arm/include/asm/tlbflush.h | 2 +
arch/arm/mm/fsr-3level.c | 2 +-
8 files changed, 77 insertions(+), 2 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dd0a230..9621d5f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1771,6 +1771,10 @@ config SYS_SUPPORTS_HUGETLBFS
def_bool y
depends on ARM_LPAE || (!CPU_USE_DOMAINS && !MEMORY_FAILURE)
+config HAVE_ARCH_TRANSPARENT_HUGEPAGE
+ def_bool y
+ depends on ARM_LPAE
+
source "mm/Kconfig"
config FORCE_MAX_ZONEORDER
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index fd1d9be..34f4775 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
/* we don't need complex calculations here as the pmd is folded into the pgd */
#define pmd_addr_end(addr,end) (end)
+#define pmd_present(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
+
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..53c7f67 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -38,6 +38,8 @@
*/
#define PMD_SECT_BUFFERABLE (_AT(pmdval_t, 1) << 2)
#define PMD_SECT_CACHEABLE (_AT(pmdval_t, 1) << 3)
+#define PMD_SECT_USER (_AT(pmdval_t, 1) << 6) /* AP[1] */
+#define PMD_SECT_RDONLY (_AT(pmdval_t, 1) << 7) /* AP[2] */
#define PMD_SECT_S (_AT(pmdval_t, 3) << 8)
#define PMD_SECT_AF (_AT(pmdval_t, 1) << 10)
#define PMD_SECT_nG (_AT(pmdval_t, 1) << 11)
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index d086f61..31c071f 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -85,6 +85,9 @@
#define L_PTE_DIRTY (_AT(pteval_t, 1) << 55) /* unused */
#define L_PTE_SPECIAL (_AT(pteval_t, 1) << 56) /* unused */
+#define PMD_SECT_DIRTY (_AT(pmdval_t, 1) << 55)
+#define PMD_SECT_SPLITTING (_AT(pmdval_t, 1) << 57)
+
/*
* To be used in assembly code with the upper page attributes.
*/
@@ -166,6 +169,60 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
+#define pmd_present(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
+#define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF)
+
+#define __HAVE_ARCH_PMD_WRITE
+#define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY))
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
+#endif
+
+#define PMD_BIT_FUNC(fn,op) \
+static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
+
+PMD_BIT_FUNC(wrprotect, |= PMD_SECT_RDONLY);
+PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
+PMD_BIT_FUNC(mksplitting, |= PMD_SECT_SPLITTING);
+PMD_BIT_FUNC(mkwrite, &= ~PMD_SECT_RDONLY);
+PMD_BIT_FUNC(mkdirty, |= PMD_SECT_DIRTY);
+PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);
+PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
+
+#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
+
+#define pmd_pfn(pmd) ((pmd_val(pmd) & PHYS_MASK) >> PAGE_SHIFT)
+#define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
+#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
+
+static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
+{
+ const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
+ pmd_val(pmd) = (pmd_val(pmd) & ~mask) | (pgprot_val(newprot) & mask);
+ return pmd;
+}
+
+static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
+{
+ *pmdp = pmd;
+}
+
+static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd)
+{
+ BUG_ON(addr >= TASK_SIZE);
+ pmd = __pmd(pmd_val(pmd) | PMD_SECT_nG);
+ set_pmd(pmdp, pmd);
+ flush_pmd_entry(pmdp);
+}
+
+static inline int has_transparent_hugepage(void)
+{
+ return 1;
+}
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_PGTABLE_3LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c35bf46..767aa7c 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -24,6 +24,9 @@
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
+
+#include <asm/tlbflush.h>
+
#ifdef CONFIG_ARM_LPAE
#include <asm/pgtable-3level.h>
#else
@@ -163,7 +166,6 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)
#define pmd_none(pmd) (!pmd_val(pmd))
-#define pmd_present(pmd) (pmd_val(pmd))
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 685e9e87..0fc2d9d 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -229,6 +229,12 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
#endif
}
+static inline void
+tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
+{
+ tlb_add_flush(tlb, addr);
+}
+
#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 6e924d3..907cede 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -505,6 +505,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
}
#endif
+#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
+
#endif
#endif /* CONFIG_MMU */
diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
index 05a4e94..47f4c6f 100644
--- a/arch/arm/mm/fsr-3level.c
+++ b/arch/arm/mm/fsr-3level.c
@@ -9,7 +9,7 @@ static struct fsr_info fsr_info[] = {
{ do_page_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
{ do_bad, SIGBUS, 0, "reserved access flag fault" },
{ do_bad, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
- { do_bad, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
{ do_bad, SIGBUS, 0, "reserved permission fault" },
{ do_bad, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
` (4 preceding siblings ...)
2012-10-18 16:15 ` [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems Steve Capper
@ 2012-10-18 16:15 ` Steve Capper
2013-01-04 5:04 ` Christoffer Dall
2012-12-21 13:41 ` [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Gregory CLEMENT
6 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2012-10-18 16:15 UTC (permalink / raw)
To: linux-arm-kernel
Much of the required code for THP has been implemented in the earlier non-LPAE
HugeTLB patch.
One more domain bits is used (to store whether or not the THP is splitting).
Some THP helper functions are defined; and we have to re-define pmd_page such
that it distinguishes between page tables and sections.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/pgtable-2level.h | 68 ++++++++++++++++++++++++++++++++-
arch/arm/include/asm/pgtable-3level.h | 2 +
arch/arm/include/asm/pgtable.h | 7 +++-
4 files changed, 75 insertions(+), 4 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9621d5f..d459673 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1773,7 +1773,7 @@ config SYS_SUPPORTS_HUGETLBFS
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
def_bool y
- depends on ARM_LPAE
+ depends on SYS_SUPPORTS_HUGETLBFS
source "mm/Kconfig"
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 34f4775..67eabb4 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -179,6 +179,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
clean_pmd_entry(pmdp); \
} while (0)
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define _PMD_HUGE(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+#else
+#define _PMD_HUGE(pmd) (0)
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
/* we don't need complex calculations here as the pmd is folded into the pgd */
#define pmd_addr_end(addr,end) (end)
@@ -197,7 +204,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define HPAGE_SHIFT PMD_SHIFT
#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
-#define HPAGE_MASK (~(HPAGE_SIZE - 1))
#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
#define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
@@ -209,6 +215,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
*/
#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
+#define PMD_DSECT_SPLITTING (_AT(pmdval_t, 1) << 7)
#define PMD_BIT_FUNC(fn,op) \
static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
@@ -261,8 +268,67 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
return __pmd(pmdval);
}
+#else
+#define HPAGE_SIZE 0
#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
+#define HPAGE_MASK (~(HPAGE_SIZE - 1))
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
+
+PMD_BIT_FUNC(mkold, &= ~PMD_DSECT_AF);
+PMD_BIT_FUNC(mksplitting, |= PMD_DSECT_SPLITTING);
+PMD_BIT_FUNC(mkdirty, |= PMD_DSECT_DIRTY);
+PMD_BIT_FUNC(mkyoung, |= PMD_DSECT_AF);
+PMD_BIT_FUNC(mkwrite, |= PMD_SECT_AP_WRITE);
+PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
+
+#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_DSECT_SPLITTING)
+#define pmd_young(pmd) (pmd_val(pmd) & PMD_DSECT_AF)
+#define pmd_write(pmd) (pmd_val(pmd) & PMD_SECT_AP_WRITE)
+#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+
+static inline unsigned long pmd_pfn(pmd_t pmd)
+{
+ /*
+ * for a section, we need to mask off more of the pmd
+ * before looking up the pfn
+ */
+ if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+ return __phys_to_pfn(pmd_val(pmd) & HPAGE_MASK);
+ else
+ return __phys_to_pfn(pmd_val(pmd) & PHYS_MASK);
+}
+
+static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = __pmd(__pfn_to_phys(pfn) | PMD_SECT_AP_READ | PMD_SECT_nG);
+
+ return pmd_modify(pmd, prot);
+}
+
+#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot);
+
+static inline int has_transparent_hugepage(void)
+{
+ return 1;
+}
+
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+static inline struct page *pmd_page(pmd_t pmd)
+{
+ /*
+ * for a section, we need to mask off more of the pmd
+ * before looking up the page as it is a section descriptor.
+ */
+ if (_PMD_HUGE(pmd))
+ return phys_to_page(pmd_val(pmd) & HPAGE_MASK);
+
+ return phys_to_page(pmd_val(pmd) & PHYS_MASK);
+}
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_PGTABLE_2LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 31c071f..8360814 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -197,6 +197,8 @@ PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
#define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
+#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+
static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{
const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 767aa7c..2d96381 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -169,11 +169,14 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
+#ifdef SYS_SUPPORTS_HUGETLBFS
+ if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+ return __va(pmd_val(pmd) & HPAGE_MASK);
+#endif
+
return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
}
-#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
-
#ifndef CONFIG_HIGHPTE
#define __pte_map(pmd) pmd_page_vaddr(*(pmd))
#define __pte_unmap(pte) do { } while (0)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support.
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
` (5 preceding siblings ...)
2012-10-18 16:15 ` [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems Steve Capper
@ 2012-12-21 13:41 ` Gregory CLEMENT
2012-12-23 11:11 ` Will Deacon
6 siblings, 1 reply; 25+ messages in thread
From: Gregory CLEMENT @ 2012-12-21 13:41 UTC (permalink / raw)
To: linux-arm-kernel
On 10/18/2012 06:15 PM, Steve Capper wrote:
> Hello,
Hi Steve,
I resurrected this old thread to have some news about this subject.
We are interested to get this support for the Armada 370/XP based boards.
I am surprised that there wasn't any feedback on this series. We are
willing to help this feature being merge in the kernel.
Testing your series on the board we have could help you?
Thanks
> The following patches bring both HugeTLB support and Transparent HugePage (THP)
> support to ARM.
>
> Both short descriptors (non-LPAE) and long descriptors (LPAE) are supported.
>
> The non-LPAE HugeTLB code is based on patches by Bill Carson [1], but instead of
> allocating extra memory to store "Linux PTEs", it re-purposes the domain bits
> of section descriptors and constructs huge Linux PTEs on demand.
>
> As PMDs are walked directly by the kernel THP functions (there are no
> huge_pmd_offset style functions), any "linux PMD"/"hardware PMD" distinction
> would require some re-working of the ARM PMD/PTE code. Use of the domain bits
> allows for a more straightforward THP implementation.
>
> Some general HugeTLB code relating to huge page migration on memory failure
> (CONFIG_MEMORY_FAILURE) de-references huge pte_t *s directly rather than use
> the huge_ptep_get and set_huge_pte_at functions. Thus this config option is
> incompatible with non-LPAE hugepages. At the moment I can only see x86 using
> CONFIG_MEMORY_FAILURE though.
>
> For non-LPAE, I make an assumption about how the memory type is mapped between
> linux PTE and section descriptor. Ideally I would like to look this information
> up, possibly from get_mem_types(MT_MEMORY). Comments on an elegant way of
> achieving this are welcome.
>
> Non-LPAE code was tested on a Versatile Express (V2P-CA15_A7 Cortex A15 tile),
> Tegra 2 TrimSlice and RealView ARM11MPCore.
>
> The LPAE code manipulates the hardware page tables directly as the long
> descriptors are wide enough to contain all the Linux PTE information.
>
> The LPAE code has been tested on a Versatile Express: V2F-2XV6 Cortex A15 and
> V2P-CA15_A7 Cortex A15 tiles.
>
> This patch set requires the following to be applied on top of 3.7-rc1:
> "ARM: mm: introduce L_PTE_VALID for page table entries"
> (PROT_NONE series, posted by Will on linux-arm-kernel)
> - http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/126130.html
>
> "mm: thp: Set the accessed flag for old pages on access fault."
> (posted by Will on linux-mm)
> - http://marc.info/?l=linux-kernel&m=135048927416117&w=2
>
> Cheers,
>
--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support.
2012-12-21 13:41 ` [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Gregory CLEMENT
@ 2012-12-23 11:11 ` Will Deacon
0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2012-12-23 11:11 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Dec 21, 2012 at 01:41:51PM +0000, Gregory CLEMENT wrote:
> On 10/18/2012 06:15 PM, Steve Capper wrote:
> > Hello,
> Hi Steve,
Hi Gregory,
> I resurrected this old thread to have some news about this subject.
> We are interested to get this support for the Armada 370/XP based boards.
>
> I am surprised that there wasn't any feedback on this series. We are
> willing to help this feature being merge in the kernel.
>
> Testing your series on the board we have could help you?
Whilst testing is always welcome, the thing this code *really* needs is some
proper review, especially for the implementation for 3-level page tables. We
tested pretty extensively on v6, v7 and v7+lpae platforms ourselves but, of
course, if you do uncover an issue then please shout.
Will
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE.
2012-10-18 16:15 ` [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE Steve Capper
@ 2013-01-04 5:03 ` Christoffer Dall
2013-01-08 17:56 ` Steve Capper
0 siblings, 1 reply; 25+ messages in thread
From: Christoffer Dall @ 2013-01-04 5:03 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes
> that are written to a page table but not for ptes created with mk_pte.
>
> This can cause some comparison tests made by pte_same to fail spuriously and
> lead to other problems.
>
> To correct this behaviour, we mask off PTE_EXT_NG for any pte that is
> present before running the comparison.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
> arch/arm/include/asm/pgtable-2level.h | 5 +++++
> arch/arm/include/asm/pgtable-3level.h | 5 +++++
> arch/arm/include/asm/pgtable.h | 23 +++++++++++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 2317a71..662a00e 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -125,6 +125,11 @@
> #define L_PTE_SHARED (_AT(pteval_t, 1) << 10) /* shared(v6), coherent(xsc3) */
>
> /*
> + * for 2 levels of paging we don't mask off any bits when comparing present ptes
> + */
> +#define L_PTE_CMP_MASKOFF 0
> +
> +/*
> * These are the memory types, defined to be compatible with
> * pre-ARMv6 CPUs cacheable and bufferable bits: XXCB
> */
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index b249035..0eaeb55 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -84,6 +84,11 @@
> #define L_PTE_DIRTY_HIGH (1 << (55 - 32))
>
> /*
> + * we need to mask off PTE_EXT_NG when comparing present ptes.
> + */
> +#define L_PTE_CMP_MASKOFF PTE_EXT_NG
> +
> +/*
> * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
> */
> #define L_PTE_MT_UNCACHED (_AT(pteval_t, 0) << 2) /* strongly ordered */
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 08c1231..c35bf46 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -248,6 +248,29 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> }
>
> /*
> + * For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes
> + * that are written to a page table but not for ptes created with mk_pte.
> + *
Why is this not the case for 2 levels of paging as well?
Is that because it's always checked against the Linux version, or?
> + * This can cause some comparison tests made by pte_same to fail spuriously and
> + * lead to other problems.
> + *
> + * To correct this behaviour, we mask off PTE_EXT_NG for any pte that is
> + * present before running the comparison.
nit: This comment doesn't really explain the rationale, I'm assuming
that pte_same is used to compare only which page gets mapped, assuming
the attributes etc. remain the same? or also the attributes should be
the same, only mk_pte sets all of these except the NG bit.
> + */
> +#define __HAVE_ARCH_PTE_SAME
> +static inline int pte_same(pte_t pte_a, pte_t pte_b)
> +{
> + pteval_t vala = pte_val(pte_a), valb = pte_val(pte_b);
> + if (pte_present(pte_a))
> + vala &= ~L_PTE_CMP_MASKOFF;
> +
> + if (pte_present(pte_b))
> + valb &= ~L_PTE_CMP_MASKOFF;
> +
> + return vala == valb;
> +}
> +
> +/*
> * Encode and decode a swap entry. Swap entries are stored in the Linux
> * page tables as follows:
> *
> --
> 1.7.9.5
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages.
2012-10-18 16:15 ` [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages Steve Capper
@ 2013-01-04 5:03 ` Christoffer Dall
2013-01-08 17:56 ` Steve Capper
0 siblings, 1 reply; 25+ messages in thread
From: Christoffer Dall @ 2013-01-04 5:03 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> On ARM we use the __flush_dcache_page function to flush the dcache of pages
> when needed; usually when the PG_dcache_clean bit is unset and we are setting a
> PTE.
>
> A HugeTLB page is represented as a compound page consisting of an array of
> pages. Thus to flush the dcache of a HugeTLB page, one must flush more than a
> single page.
>
> This patch modifies __flush_dcache_page such that all constituent pages of a
> HugeTLB page are flushed.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
> arch/arm/mm/flush.c | 25 +++++++++++++++----------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 1c8f7f5..0a69cb8 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -17,6 +17,7 @@
> #include <asm/highmem.h>
> #include <asm/smp_plat.h>
> #include <asm/tlbflush.h>
> +#include <linux/hugetlb.h>
>
> #include "mm.h"
>
> @@ -168,17 +169,21 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
> * coherent with the kernels mapping.
> */
I think it would be good to have a VM_BUG_ON(PageTail(page)) here.
> if (!PageHighMem(page)) {
> - __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> + __cpuc_flush_dcache_area(page_address(page), (PAGE_SIZE << compound_order(page)));
I think 98 characters is a stretch. You could do:
size_t page_size = PAGE_SIZE << compound_order(page);
__cpuc_flush_dcache_area(page_address(page), page_size);
> } else {
> - void *addr = kmap_high_get(page);
> - if (addr) {
> - __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> - kunmap_high(page);
> - } else if (cache_is_vipt()) {
> - /* unmapped pages might still be cached */
> - addr = kmap_atomic(page);
> - __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> - kunmap_atomic(addr);
> + unsigned long i;
> + for(i = 0; i < (1 << compound_order(page)); i++) {
> + struct page *cpage = page + i;
> + void *addr = kmap_high_get(cpage);
> + if (addr) {
> + __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> + kunmap_high(cpage);
> + } else if (cache_is_vipt()) {
> + /* unmapped pages might still be cached */
> + addr = kmap_atomic(cpage);
> + __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> + kunmap_atomic(addr);
> + }
> }
> }
>
> --
> 1.7.9.5
>
otherwise it looks good to me.
-Christoffer
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems.
2012-10-18 16:15 ` [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems Steve Capper
@ 2013-01-04 5:03 ` Christoffer Dall
2013-01-08 17:57 ` Steve Capper
0 siblings, 1 reply; 25+ messages in thread
From: Christoffer Dall @ 2013-01-04 5:03 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
>
> This patch adds support for hugetlbfs based on the x86 implementation.
> It allows mapping of 2MB sections (see Documentation/vm/hugetlbpage.txt
> for usage). The 64K pages configuration is not supported (section size
> is 512MB in this case).
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> [steve.capper at arm.com: symbolic constants replace numbers in places.
> Split up into multiple files, to simplify future non-LPAE support].
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
> arch/arm/Kconfig | 4 +
> arch/arm/include/asm/hugetlb-3level.h | 61 +++++++++++
> arch/arm/include/asm/hugetlb.h | 83 ++++++++++++++
> arch/arm/include/asm/pgtable-3level.h | 13 +++
> arch/arm/mm/Makefile | 2 +
> arch/arm/mm/dma-mapping.c | 2 +-
> arch/arm/mm/hugetlbpage-3level.c | 190 +++++++++++++++++++++++++++++++++
> arch/arm/mm/hugetlbpage.c | 65 +++++++++++
> 8 files changed, 419 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm/include/asm/hugetlb-3level.h
> create mode 100644 arch/arm/include/asm/hugetlb.h
> create mode 100644 arch/arm/mm/hugetlbpage-3level.c
> create mode 100644 arch/arm/mm/hugetlbpage.c
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 73067ef..d863781 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1767,6 +1767,10 @@ config HW_PERF_EVENTS
> Enable hardware performance counter support for perf events. If
> disabled, perf events will use software events only.
>
> +config SYS_SUPPORTS_HUGETLBFS
> + def_bool y
> + depends on ARM_LPAE
> +
> source "mm/Kconfig"
>
> config FORCE_MAX_ZONEORDER
> diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
> new file mode 100644
> index 0000000..4868064
> --- /dev/null
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -0,0 +1,61 @@
> +/*
> + * arch/arm/include/asm/hugetlb-3level.h
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * Based on arch/x86/include/asm/hugetlb.h.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#ifndef _ASM_ARM_HUGETLB_3LEVEL_H
> +#define _ASM_ARM_HUGETLB_3LEVEL_H
> +
> +static inline pte_t huge_ptep_get(pte_t *ptep)
> +{
> + return *ptep;
> +}
> +
> +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pte)
> +{
> + set_pte_at(mm, addr, ptep, pte);
> +}
> +
> +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + ptep_clear_flush(vma, addr, ptep);
> +}
> +
> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> +{
> + ptep_set_wrprotect(mm, addr, ptep);
> +}
> +
> +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> +{
> + return ptep_get_and_clear(mm, addr, ptep);
> +}
> +
> +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + pte_t pte, int dirty)
> +{
> + return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> +}
> +
> +#endif /* _ASM_ARM_HUGETLB_3LEVEL_H */
> diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> new file mode 100644
> index 0000000..7af9cf6
> --- /dev/null
> +++ b/arch/arm/include/asm/hugetlb.h
> @@ -0,0 +1,83 @@
> +/*
> + * arch/arm/include/asm/hugetlb.h
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * Based on arch/x86/include/asm/hugetlb.h
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#ifndef _ASM_ARM_HUGETLB_H
> +#define _ASM_ARM_HUGETLB_H
> +
> +#include <asm/page.h>
> +
> +#include <asm/hugetlb-3level.h>
I feel like it wouldn't hurt anyone to put a comment here explaining
that these "ptes" are in fact pmd section descriptors disguised in pte
types.
> +
> +static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> + unsigned long addr, unsigned long end,
> + unsigned long floor,
> + unsigned long ceiling)
> +{
> + free_pgd_range(tlb, addr, end, floor, ceiling);
> +}
> +
> +
> +static inline int is_hugepage_only_range(struct mm_struct *mm,
> + unsigned long addr, unsigned long len)
> +{
> + return 0;
> +}
> +
> +static inline int prepare_hugepage_range(struct file *file,
> + unsigned long addr, unsigned long len)
> +{
> + struct hstate *h = hstate_file(file);
> + if (len & ~huge_page_mask(h))
> + return -EINVAL;
> + if (addr & ~huge_page_mask(h))
> + return -EINVAL;
> + return 0;
> +}
> +
> +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm)
> +{
> +}
> +
> +static inline int huge_pte_none(pte_t pte)
> +{
> + return pte_none(pte);
> +}
> +
> +static inline pte_t huge_pte_wrprotect(pte_t pte)
> +{
> + return pte_wrprotect(pte);
> +}
> +
> +static inline int arch_prepare_hugepage(struct page *page)
> +{
> + return 0;
> +}
> +
> +static inline void arch_release_hugepage(struct page *page)
> +{
> +}
> +
> +static inline void arch_clear_hugepage_flags(struct page *page)
> +{
> + clear_bit(PG_dcache_clean, &page->flags);
why do we clear this bit here?
> +}
> +
> +#endif /* _ASM_ARM_HUGETLB_H */
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 0eaeb55..d086f61 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -62,6 +62,14 @@
> #define USER_PTRS_PER_PGD (PAGE_OFFSET / PGDIR_SIZE)
>
> /*
> + * Hugetlb definitions.
> + */
> +#define HPAGE_SHIFT PMD_SHIFT
> +#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
> +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
> +
> +/*
> * "Linux" PTE definitions for LPAE.
> *
> * These bits overlap with the hardware bits but the naming is preserved for
> @@ -153,6 +161,11 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>
> #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,__pte(pte_val(pte)|(ext)))
>
> +#define pte_huge(pte) ((pte_val(pte) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> +
> +#define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> +
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_PGTABLE_3LEVEL_H */
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> index 8a9c4cb..1560bbc 100644
> --- a/arch/arm/mm/Makefile
> +++ b/arch/arm/mm/Makefile
> @@ -16,6 +16,8 @@ obj-$(CONFIG_MODULES) += proc-syms.o
>
> obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
> obj-$(CONFIG_HIGHMEM) += highmem.o
> +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
> +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-3level.o
>
> obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
> obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 477a2d2..3ced228 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -241,7 +241,7 @@ static void __dma_free_buffer(struct page *page, size_t size)
>
> #ifdef CONFIG_MMU
> #ifdef CONFIG_HUGETLB_PAGE
> -#error ARM Coherent DMA allocator does not (yet) support huge TLB
> +#warning ARM Coherent DMA allocator does not (yet) support huge TLB
> #endif
>
> static void *__alloc_from_contiguous(struct device *dev, size_t size,
> diff --git a/arch/arm/mm/hugetlbpage-3level.c b/arch/arm/mm/hugetlbpage-3level.c
> new file mode 100644
> index 0000000..86474f0
> --- /dev/null
> +++ b/arch/arm/mm/hugetlbpage-3level.c
> @@ -0,0 +1,190 @@
> +/*
> + * arch/arm/mm/hugetlbpage-3level.c
> + *
> + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * Based on arch/x86/mm/hugetlbpage.c
> + *
this seems to be an almost 1-to-1 copy of the x86 code. Is it not
worth sharing it somehow? Possible?
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/hugetlb.h>
> +#include <linux/pagemap.h>
> +#include <linux/err.h>
> +#include <linux/sysctl.h>
> +#include <asm/mman.h>
> +#include <asm/tlb.h>
> +#include <asm/tlbflush.h>
> +#include <asm/pgalloc.h>
> +
> +static unsigned long page_table_shareable(struct vm_area_struct *svma,
> + struct vm_area_struct *vma,
> + unsigned long addr, pgoff_t idx)
> +{
> + unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) +
> + svma->vm_start;
> + unsigned long sbase = saddr & PUD_MASK;
> + unsigned long s_end = sbase + PUD_SIZE;
these are to check that the potential vma to steal the pmd from covers
the entire pud entry's address space, correct?
it's pretty confusing with the idx conversion back and forward,
especially given that mm/hugetlb.c uses idx to index into number of
huge pages, where this idx is index into number of regular pages, so I
would suggest some clear static conversion functions or a comment.
> +
> + /* Allow segments to share if only one is marked locked */
exactly one or at most one? the code below checks that exactly one is
marked locked, if I read it correctly. Again, for me, the comment
would be more helpful if it stated *why* that's a requirement, not
just that it *is* a requirement.
> + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
> + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
> +
> + /*
> + * match the virtual addresses, permission and the alignment of the
> + * page table page.
> + */
> + if (pmd_index(addr) != pmd_index(saddr) ||
> + vm_flags != svm_flags ||
> + sbase < svma->vm_start || svma->vm_end < s_end)
> + return 0;
> +
> + return saddr;
> +}
> +
> +static int vma_shareable(struct vm_area_struct *vma, unsigned long addr)
> +{
> + unsigned long base = addr & PUD_MASK;
> + unsigned long end = base + PUD_SIZE;
> +
> + /*
> + * check on proper vm_flags and page table alignment
> + */
> + if (vma->vm_flags & VM_MAYSHARE &&
> + vma->vm_start <= base && end <= vma->vm_end)
> + return 1;
> + return 0;
> +}
> +
> +/*
> + * search for a shareable pmd page for hugetlb.
nit:
perhaps this is completely standard knowledge for your garden variety
mm hacker, but I needed to spend 5 minutes figuring out the purpose
here - if I get it right: multiple mappings the hugetlbfs file for the
same mm covering the same pud address range mapping the same data can
use the same pmd, right?
I would either rename the function to find_huge_pmd_share and get rid
of the comment or expand on the comment.
> + */
> +static pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr,
> + pud_t *pud)
> +{
> + struct vm_area_struct *vma = find_vma(mm, addr);
> + struct address_space *mapping = vma->vm_file->f_mapping;
> + pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
> + vma->vm_pgoff;
> + struct vm_area_struct *svma;
> + unsigned long saddr;
> + pte_t *spte = NULL;
> + pte_t *pte;
> +
> + if (!vma_shareable(vma, addr))
> + return (pte_t *)pmd_alloc(mm, pud, addr);
> +
> + mutex_lock(&mapping->i_mmap_mutex);
> + vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
> + if (svma == vma)
> + continue;
> +
> + saddr = page_table_shareable(svma, vma, addr, idx);
> + if (saddr) {
> + spte = huge_pte_offset(svma->vm_mm, saddr);
> + if (spte) {
> + get_page(virt_to_page(spte));
> + break;
> + }
> + }
> + }
> +
> + if (!spte)
> + goto out;
> +
> + spin_lock(&mm->page_table_lock);
> + if (pud_none(*pud))
> + pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK));
> + else
> + put_page(virt_to_page(spte));
> + spin_unlock(&mm->page_table_lock);
> +out:
> + pte = (pte_t *)pmd_alloc(mm, pud, addr);
> + mutex_unlock(&mapping->i_mmap_mutex);
> + return pte;
> +}
> +
> +/*
> + * unmap huge page backed by shared pte.
> + *
> + * Hugetlb pte page is ref counted at the time of mapping. If pte is shared
> + * indicated by page_count > 1, unmap is achieved by clearing pud and
> + * decrementing the ref count. If count == 1, the pte page is not shared.
> + *
> + * called with vma->vm_mm->page_table_lock held.
> + *
> + * returns: 1 successfully unmapped a shared pte page
> + * 0 the underlying pte page is not shared, or it is the last user
> + */
> +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
> +{
> + pgd_t *pgd = pgd_offset(mm, *addr);
> + pud_t *pud = pud_offset(pgd, *addr);
> +
> + BUG_ON(page_count(virt_to_page(ptep)) == 0);
> + if (page_count(virt_to_page(ptep)) == 1)
> + return 0;
> +
> + pud_clear(pud);
> + put_page(virt_to_page(ptep));
> + *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
huh? this hurts my brain. Why the minus HPAGE_SIZE?
> + return 1;
> +}
> +
> +pte_t *huge_pte_alloc(struct mm_struct *mm,
> + unsigned long addr, unsigned long sz)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pte_t *pte = NULL;
> +
> + pgd = pgd_offset(mm, addr);
> + pud = pud_alloc(mm, pgd, addr);
> + if (pud) {
> + BUG_ON(sz != PMD_SIZE);
is this really necessary?
VM_BUG_ON?
> + if (pud_none(*pud))
> + pte = huge_pmd_share(mm, addr, pud);
> + else
> + pte = (pte_t *)pmd_alloc(mm, pud, addr);
> + }
> + BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> +
> + return pte;
> +}
> +
> +struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> + pmd_t *pmd, int write)
> +{
> + struct page *page;
> +
> + page = pte_page(*(pte_t *)pmd);
> + if (page)
> + page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
> + return page;
> +}
> +
> +struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
> + pud_t *pud, int write)
> +{
> + struct page *page;
> +
> + page = pte_page(*(pte_t *)pud);
> + if (page)
> + page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
> + return page;
why implement this? this should never be called right? Shouldn't it
just be a BUG();
> +}
> diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> new file mode 100644
> index 0000000..32fe7fd
> --- /dev/null
> +++ b/arch/arm/mm/hugetlbpage.c
> @@ -0,0 +1,65 @@
> +/*
> + * arch/arm/mm/hugetlbpage.c
> + *
> + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * Based on arch/x86/mm/hugetlbpage.c
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/hugetlb.h>
> +#include <linux/pagemap.h>
> +#include <linux/err.h>
> +#include <linux/sysctl.h>
> +#include <asm/mman.h>
> +#include <asm/tlb.h>
> +#include <asm/tlbflush.h>
> +#include <asm/pgalloc.h>
> +
> +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd = NULL;
> +
> + pgd = pgd_offset(mm, addr);
> + if (pgd_present(*pgd)) {
> + pud = pud_offset(pgd, addr);
> + if (pud_present(*pud))
> + pmd = pmd_offset(pud, addr);
> + }
> +
> + return (pte_t *)pmd;
> +}
> +
> +struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
> + int write)
> +{
> + return ERR_PTR(-EINVAL);
> +}
> +
> +int pmd_huge(pmd_t pmd)
> +{
> + return (pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT;
> +}
> +
> +int pud_huge(pud_t pud)
> +{
> + return 0;
> +}
> --
> 1.7.9.5
>
>
-Christoffer
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems.
2012-10-18 16:15 ` [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems Steve Capper
@ 2013-01-04 5:04 ` Christoffer Dall
2013-01-08 17:58 ` Steve Capper
0 siblings, 1 reply; 25+ messages in thread
From: Christoffer Dall @ 2013-01-04 5:04 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> Based on Bill Carson's HugeTLB patch, with the big difference being in the way
> PTEs are passed back to the memory manager. Rather than store a "Linux Huge
> PTE" separately; we make one up on the fly in huge_ptep_get. Also rather than
> consider 16M supersections, we focus solely on 2x1M sections.
>
> To construct a huge PTE on the fly we need additional information (such as the
> accessed flag and dirty bit) which we choose to store in the domain bits of the
> short section descriptor. In order to use these domain bits for storage, we need
> to make ourselves a client for all 16 domains and this is done in head.S.
>
> Storing extra information in the domain bits also makes it a lot easier to
> implement Transparent Huge Pages, and some of the code in pgtable-2level.h is
> arranged to facilitate THP support in a later patch.
>
> Non-LPAE HugeTLB pages are incompatible with the huge page migration code
> (enabled when CONFIG_MEMORY_FAILURE is selected) as that code dereferences PTEs
> directly, rather than calling huge_ptep_get and set_huge_pte_at.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
> arch/arm/Kconfig | 2 +-
> arch/arm/include/asm/hugetlb-2level.h | 71 ++++++++++++++++++++
> arch/arm/include/asm/hugetlb.h | 4 ++
> arch/arm/include/asm/pgtable-2level.h | 79 +++++++++++++++++++++-
> arch/arm/include/asm/tlb.h | 10 ++-
> arch/arm/kernel/head.S | 10 ++-
> arch/arm/mm/Makefile | 4 ++
> arch/arm/mm/fault.c | 6 +-
> arch/arm/mm/hugetlbpage-2level.c | 115 +++++++++++++++++++++++++++++++++
> 9 files changed, 293 insertions(+), 8 deletions(-)
> create mode 100644 arch/arm/include/asm/hugetlb-2level.h
> create mode 100644 arch/arm/mm/hugetlbpage-2level.c
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index d863781..dd0a230 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1769,7 +1769,7 @@ config HW_PERF_EVENTS
>
> config SYS_SUPPORTS_HUGETLBFS
> def_bool y
> - depends on ARM_LPAE
> + depends on ARM_LPAE || (!CPU_USE_DOMAINS && !MEMORY_FAILURE)
>
> source "mm/Kconfig"
>
> diff --git a/arch/arm/include/asm/hugetlb-2level.h b/arch/arm/include/asm/hugetlb-2level.h
> new file mode 100644
> index 0000000..3532b54
> --- /dev/null
> +++ b/arch/arm/include/asm/hugetlb-2level.h
> @@ -0,0 +1,71 @@
> +/*
> + * arch/arm/include/asm/hugetlb-2level.h
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#ifndef _ASM_ARM_HUGETLB_2LEVEL_H
> +#define _ASM_ARM_HUGETLB_2LEVEL_H
> +
> +
> +pte_t huge_ptep_get(pte_t *ptep);
> +
> +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pte);
> +
> +static inline pte_t pte_mkhuge(pte_t pte) { return pte; }
> +
> +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
don't you need to clear the old TLB entry first here, otherwise
another CPU could put an entry to the old page in its TLB and access
it even after the page_cache_release(old_page) in hugetlb_cow() ?
> +}
> +
> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> +{
> + pmd_t *pmdp = (pmd_t *) ptep;
> + set_pmd_at(mm, addr, pmdp, pmd_wrprotect(*pmdp));
> +}
> +
> +
> +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> +{
> + pmd_t *pmdp = (pmd_t *)ptep;
> + pte_t pte = huge_ptep_get(ptep);
> + pmd_clear(pmdp);
> +
> + return pte;
> +}
> +
> +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + pte_t pte, int dirty)
> +{
> + int changed = !pte_same(huge_ptep_get(ptep), pte);
> +
> + if (changed) {
> + set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> + huge_ptep_clear_flush(vma, addr, &pte);
> + }
> +
> + return changed;
> +}
> +
> +#endif /* _ASM_ARM_HUGETLB_2LEVEL_H */
> diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> index 7af9cf6..1e92975 100644
> --- a/arch/arm/include/asm/hugetlb.h
> +++ b/arch/arm/include/asm/hugetlb.h
> @@ -24,7 +24,11 @@
>
> #include <asm/page.h>
>
> +#ifdef CONFIG_ARM_LPAE
> #include <asm/hugetlb-3level.h>
> +#else
> +#include <asm/hugetlb-2level.h>
> +#endif
>
> static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> unsigned long addr, unsigned long end,
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 662a00e..fd1d9be 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -163,7 +163,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> return (pmd_t *)pud;
> }
>
> -#define pmd_bad(pmd) (pmd_val(pmd) & 2)
> +#define pmd_bad(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_FAULT)
this changes the semantics of the macro - is that on purpose and safe?
(fault entries didn't used to be bad, now they are...)
>
> #define copy_pmd(pmdpd,pmdps) \
> do { \
> @@ -184,6 +184,83 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>
> #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
>
> +
> +#ifdef CONFIG_SYS_SUPPORTS_HUGETLBFS
> +
> +/*
> + * now follows some of the definitions to allow huge page support, we can't put
> + * these in the hugetlb source files as they are also required for transparent
> + * hugepage support.
> + */
> +
> +#define HPAGE_SHIFT PMD_SHIFT
> +#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
> +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
> +
> +#define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
> +#define HUGE_LINUX_PTE_SIZE (HUGE_LINUX_PTE_COUNT * sizeof(pte_t *))
> +#define HUGE_LINUX_PTE_INDEX(addr) (addr >> HPAGE_SHIFT)
> +
> +/*
> + * We re-purpose the following domain bits in the section descriptor
> + */
> +#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
> +#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
> +
> +#define PMD_BIT_FUNC(fn,op) \
> +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> +
> +PMD_BIT_FUNC(wrprotect, &= ~PMD_SECT_AP_WRITE);
> +
> +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> + pmd_t *pmdp, pmd_t pmd)
> +{
> + /*
> + * we can sometimes be passed a pmd pointing to a level 2 descriptor
> + * from collapse_huge_page.
> + */
> + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_TABLE) {
> + pmdp[0] = __pmd(pmd_val(pmd));
> + pmdp[1] = __pmd(pmd_val(pmd) + 256 * sizeof(pte_t));
eh, if I get this right, this means that in the case where the pmd
points to level 2 descriptor, all the pages are lined up to be a huge
page, so just point to the next level 2 pte, which directly follows
the next level 2 descriptor, because they share the same page. But
then why do we need to set any values here?
> + } else {
> + pmdp[0] = __pmd(pmd_val(pmd)); /* first 1M section */
> + pmdp[1] = __pmd(pmd_val(pmd) + SECTION_SIZE); /* second 1M section */
> + }
> +
> + flush_pmd_entry(pmdp);
> +}
> +
> +#define HPMD_XLATE(res, cmp, from, to) do { if (cmp & from) res |= to; \
> + else res &= ~to; \
> + } while (0)
> +
> +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> +{
> + pmdval_t pmdval = pmd_val(pmd);
> + pteval_t newprotval = pgprot_val(newprot);
> +
> + HPMD_XLATE(pmdval, newprotval, L_PTE_XN, PMD_SECT_XN);
> + HPMD_XLATE(pmdval, newprotval, L_PTE_SHARED, PMD_SECT_S);
> + HPMD_XLATE(pmdval, newprotval, L_PTE_YOUNG, PMD_DSECT_AF);
consider something akin to:
#define L_PMD_DSECT_YOUNG (PMD_DSECT_AF)
then you don't have to change several places if you decide to
rearrange the mappings for whatever reason at it makes it slightly
easier to read this code.
> + HPMD_XLATE(pmdval, newprotval, L_PTE_DIRTY, PMD_DSECT_DIRTY);
> +
> + /* preserve bits C & B */
> + pmdval |= (newprotval & (3 << 2));
this looks superfluous?
> +
> + /* Linux PTE bit 4 corresponds to PMD TEX bit 0 */
> + HPMD_XLATE(pmdval, newprotval, 1 << 4, PMD_SECT_TEX(1));
define L_PTE_TEX0 and group with the others above?
> +
> + if (newprotval & L_PTE_RDONLY)
> + pmdval &= ~PMD_SECT_AP_WRITE;
> + else
> + pmdval |= PMD_SECT_AP_WRITE;
> +
> + return __pmd(pmdval);
> +}
> +
> +#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_PGTABLE_2LEVEL_H */
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index 99a1951..685e9e87 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -92,10 +92,16 @@ static inline void tlb_flush(struct mmu_gather *tlb)
> static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
> {
> if (!tlb->fullmm) {
> + unsigned long size = PAGE_SIZE;
> +
> if (addr < tlb->range_start)
> tlb->range_start = addr;
> - if (addr + PAGE_SIZE > tlb->range_end)
> - tlb->range_end = addr + PAGE_SIZE;
> +
> + if (tlb->vma && is_vm_hugetlb_page(tlb->vma))
> + size = HPAGE_SIZE;
> +
> + if (addr + size > tlb->range_end)
> + tlb->range_end = addr + size;
> }
> }
>
> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> index 4eee351..860f08e 100644
> --- a/arch/arm/kernel/head.S
> +++ b/arch/arm/kernel/head.S
> @@ -410,13 +410,21 @@ __enable_mmu:
> mov r5, #0
> mcrr p15, 0, r4, r5, c2 @ load TTBR0
> #else
> +#ifndef CONFIG_SYS_SUPPORTS_HUGETLBFS
> mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
> domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
> domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
> domain_val(DOMAIN_IO, DOMAIN_CLIENT))
> +#else
> + @ set ourselves as the client in all domains
> + @ this allows us to then use the 4 domain bits in the
> + @ section descriptors in our transparent huge pages
> + ldr r5, =0x55555555
> +#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
> +
> mcr p15, 0, r5, c3, c0, 0 @ load domain access register
> mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
> -#endif
> +#endif /* CONFIG_ARM_LPAE */
> b __turn_mmu_on
> ENDPROC(__enable_mmu)
>
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> index 1560bbc..adf0b19 100644
> --- a/arch/arm/mm/Makefile
> +++ b/arch/arm/mm/Makefile
> @@ -17,7 +17,11 @@ obj-$(CONFIG_MODULES) += proc-syms.o
> obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
> obj-$(CONFIG_HIGHMEM) += highmem.o
> obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
> +ifeq ($(CONFIG_ARM_LPAE),y)
> obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-3level.o
> +else
> +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-2level.o
> +endif
>
> obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
> obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 5dbf13f..0884936 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -488,13 +488,13 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
> #endif /* CONFIG_MMU */
>
> /*
> - * Some section permission faults need to be handled gracefully.
> - * They can happen due to a __{get,put}_user during an oops.
> + * A fault in a section will likely be due to a huge page, treat it
> + * as a page fault.
> */
> static int
> do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> {
> - do_bad_area(addr, fsr, regs);
> + do_page_fault(addr, fsr, regs);
doesn't the previous patch require this as well?
(so it should strictly speaking be part of that patch)
> return 0;
> }
>
> diff --git a/arch/arm/mm/hugetlbpage-2level.c b/arch/arm/mm/hugetlbpage-2level.c
> new file mode 100644
> index 0000000..4b2b38c
> --- /dev/null
> +++ b/arch/arm/mm/hugetlbpage-2level.c
> @@ -0,0 +1,115 @@
> +/*
> + * arch/arm/mm/hugetlbpage-2level.c
> + *
> + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
> + * Copyright (C) 2012 ARM Ltd
> + * Copyright (C) 2012 Bill Carson.
> + *
> + * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/hugetlb.h>
> +#include <linux/pagemap.h>
> +#include <linux/err.h>
> +#include <linux/sysctl.h>
> +#include <asm/mman.h>
> +#include <asm/tlb.h>
> +#include <asm/tlbflush.h>
> +#include <asm/pgalloc.h>
> +
> +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
> +{
> + return 0;
> +}
> +
> +pte_t *huge_pte_alloc(struct mm_struct *mm,
> + unsigned long addr, unsigned long sz)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> +
> + pgd = pgd_offset(mm, addr);
> + pud = pud_offset(pgd, addr);
> + pmd = pmd_offset(pud, addr);
> +
> + return (pte_t *)pmd; /* our huge pte is actually a pmd */
> +}
> +
> +struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> + pmd_t *pmd, int write)
> +{
> + struct page *page;
> + unsigned long pfn;
> +
> + BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
I could only see one caller who calls this only when this exact
condition is fulfilled, so unless we anticipate other callers, this
BUG_ON could go.
> + pfn = ((pmd_val(*pmd) & HPAGE_MASK) >> PAGE_SHIFT);
> + page = pfn_to_page(pfn);
> + return page;
> +}
> +
> +pte_t huge_ptep_get(pte_t *ptep)
> +{
> + pmd_t *pmdp = (pmd_t*)ptep;
> + pmdval_t pmdval = pmd_val(*pmdp);
> + pteval_t retval;
> +
> + if (!pmdval)
> + return __pte(0);
> +
> + retval = (pteval_t) (pmdval & HPAGE_MASK);
> + HPMD_XLATE(retval, pmdval, PMD_SECT_XN, L_PTE_XN);
> + HPMD_XLATE(retval, pmdval, PMD_SECT_S, L_PTE_SHARED);
> + HPMD_XLATE(retval, pmdval, PMD_DSECT_AF, L_PTE_YOUNG);
> + HPMD_XLATE(retval, pmdval, PMD_DSECT_DIRTY, L_PTE_DIRTY);
> +
> + /* preserve bits C & B */
> + retval |= (pmdval & (3 << 2));
> +
> + /* PMD TEX bit 0 corresponds to Linux PTE bit 4 */
> + HPMD_XLATE(retval, pmdval, PMD_SECT_TEX(1), 1 << 4);
> +
again, I would define the 1 << 4 to something and treat like the others...
> + if (pmdval & PMD_SECT_AP_WRITE)
> + retval &= ~L_PTE_RDONLY;
> + else
> + retval |= L_PTE_RDONLY;
> +
> + if ((pmdval & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> + retval |= L_PTE_VALID;
> +
> + /* we assume all hugetlb pages are user */
> + retval |= L_PTE_USER;
> +
> + return __pte(retval);
> +}
> +
> +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pte)
> +{
> + pmdval_t pmdval = (pmdval_t) pte_val(pte);
> + pmd_t *pmdp = (pmd_t*) ptep;
> +
> + pmdval &= HPAGE_MASK;
> + pmdval |= PMD_SECT_AP_READ | PMD_SECT_nG | PMD_TYPE_SECT;
> + pmdval = pmd_val(pmd_modify(__pmd(pmdval), __pgprot(pte_val(pte))));
> +
> + __sync_icache_dcache(pte);
> +
> + set_pmd_at(mm, addr, pmdp, __pmd(pmdval));
> +}
so this whole scheme where the caller expects ptes, but really gets
pmds feels strange to me, but perhaps it makes more sense on other
architectures as to not change the caller instead of this magic?
-Christoffer
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems.
2012-10-18 16:15 ` [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems Steve Capper
@ 2013-01-04 5:04 ` Christoffer Dall
2013-01-08 17:59 ` Steve Capper
0 siblings, 1 reply; 25+ messages in thread
From: Christoffer Dall @ 2013-01-04 5:04 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
>
> The patch adds support for THP (transparent huge pages) to LPAE systems. When
> this feature is enabled, the kernel tries to map anonymous pages as 2MB
> sections where possible.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> [steve.capper at arm.com: symbolic constants used, value of PMD_SECT_SPLITTING
> adjusted, tlbflush.h included in pgtable.h]
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
> arch/arm/Kconfig | 4 ++
> arch/arm/include/asm/pgtable-2level.h | 2 +
> arch/arm/include/asm/pgtable-3level-hwdef.h | 2 +
> arch/arm/include/asm/pgtable-3level.h | 57 +++++++++++++++++++++++++++
> arch/arm/include/asm/pgtable.h | 4 +-
> arch/arm/include/asm/tlb.h | 6 +++
> arch/arm/include/asm/tlbflush.h | 2 +
> arch/arm/mm/fsr-3level.c | 2 +-
> 8 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index dd0a230..9621d5f 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1771,6 +1771,10 @@ config SYS_SUPPORTS_HUGETLBFS
> def_bool y
> depends on ARM_LPAE || (!CPU_USE_DOMAINS && !MEMORY_FAILURE)
>
> +config HAVE_ARCH_TRANSPARENT_HUGEPAGE
> + def_bool y
> + depends on ARM_LPAE
> +
> source "mm/Kconfig"
>
> config FORCE_MAX_ZONEORDER
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index fd1d9be..34f4775 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> /* we don't need complex calculations here as the pmd is folded into the pgd */
> #define pmd_addr_end(addr,end) (end)
>
> +#define pmd_present(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
> +
> #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
>
>
> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
> index d795282..53c7f67 100644
> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
> @@ -38,6 +38,8 @@
> */
> #define PMD_SECT_BUFFERABLE (_AT(pmdval_t, 1) << 2)
> #define PMD_SECT_CACHEABLE (_AT(pmdval_t, 1) << 3)
> +#define PMD_SECT_USER (_AT(pmdval_t, 1) << 6) /* AP[1] */
> +#define PMD_SECT_RDONLY (_AT(pmdval_t, 1) << 7) /* AP[2] */
> #define PMD_SECT_S (_AT(pmdval_t, 3) << 8)
> #define PMD_SECT_AF (_AT(pmdval_t, 1) << 10)
> #define PMD_SECT_nG (_AT(pmdval_t, 1) << 11)
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index d086f61..31c071f 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -85,6 +85,9 @@
> #define L_PTE_DIRTY (_AT(pteval_t, 1) << 55) /* unused */
> #define L_PTE_SPECIAL (_AT(pteval_t, 1) << 56) /* unused */
>
> +#define PMD_SECT_DIRTY (_AT(pmdval_t, 1) << 55)
> +#define PMD_SECT_SPLITTING (_AT(pmdval_t, 1) << 57)
> +
> /*
> * To be used in assembly code with the upper page attributes.
> */
> @@ -166,6 +169,60 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> #define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>
>
> +#define pmd_present(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
> +#define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF)
> +
> +#define __HAVE_ARCH_PMD_WRITE
> +#define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY))
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
> +#endif
> +
> +#define PMD_BIT_FUNC(fn,op) \
> +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> +
> +PMD_BIT_FUNC(wrprotect, |= PMD_SECT_RDONLY);
> +PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
> +PMD_BIT_FUNC(mksplitting, |= PMD_SECT_SPLITTING);
> +PMD_BIT_FUNC(mkwrite, &= ~PMD_SECT_RDONLY);
> +PMD_BIT_FUNC(mkdirty, |= PMD_SECT_DIRTY);
> +PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);
> +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
personally I would prefer not to automate the prefixing of pmd_: it
doesn't really save a lot of characters, it doesn't improve
readability and it breaks grep/cscope.
> +
> +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> +
> +#define pmd_pfn(pmd) ((pmd_val(pmd) & PHYS_MASK) >> PAGE_SHIFT)
the arm arm says UNK/SBZP, so we should be fine here right? (noone is
crazy enough to try and squeeze some extra information in the extra
bits here or something like that). For clarity, one could consider:
(((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
> +#define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
> +#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
> +
> +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> +{
> + const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
> + pmd_val(pmd) = (pmd_val(pmd) & ~mask) | (pgprot_val(newprot) & mask);
> + return pmd;
> +}
> +
> +static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
> +{
> + *pmdp = pmd;
> +}
why this level of indirection?
> +
> +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> + pmd_t *pmdp, pmd_t pmd)
> +{
> + BUG_ON(addr >= TASK_SIZE);
> + pmd = __pmd(pmd_val(pmd) | PMD_SECT_nG);
why this side affect?
> + set_pmd(pmdp, pmd);
> + flush_pmd_entry(pmdp);
> +}
> +
> +static inline int has_transparent_hugepage(void)
> +{
> + return 1;
> +}
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_PGTABLE_3LEVEL_H */
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index c35bf46..767aa7c 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -24,6 +24,9 @@
> #include <asm/memory.h>
> #include <asm/pgtable-hwdef.h>
>
> +
> +#include <asm/tlbflush.h>
> +
> #ifdef CONFIG_ARM_LPAE
> #include <asm/pgtable-3level.h>
> #else
> @@ -163,7 +166,6 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> #define pgd_offset_k(addr) pgd_offset(&init_mm, addr)
>
> #define pmd_none(pmd) (!pmd_val(pmd))
> -#define pmd_present(pmd) (pmd_val(pmd))
>
> static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> {
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index 685e9e87..0fc2d9d 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -229,6 +229,12 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> #endif
> }
>
> +static inline void
> +tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
> +{
> + tlb_add_flush(tlb, addr);
> +}
> +
> #define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
> #define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
> #define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index 6e924d3..907cede 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -505,6 +505,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> }
> #endif
>
> +#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
> +
> #endif
>
> #endif /* CONFIG_MMU */
> diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
> index 05a4e94..47f4c6f 100644
> --- a/arch/arm/mm/fsr-3level.c
> +++ b/arch/arm/mm/fsr-3level.c
> @@ -9,7 +9,7 @@ static struct fsr_info fsr_info[] = {
> { do_page_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
> { do_bad, SIGBUS, 0, "reserved access flag fault" },
> { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
> - { do_bad, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
> + { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
> { do_bad, SIGBUS, 0, "reserved permission fault" },
> { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
> --
> 1.7.9.5
>
Besides the nits it looks fine to me. I've done quite extensive
testing with varied workloads on this code over the last couple of
months on the vexpress TC2 and on the ARNDALE board using KVM/ARM with
huge pages, and it gives a nice ~15% performance increase on average
and is completely stable.
-Christoffer
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems.
2012-10-18 16:15 ` [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems Steve Capper
@ 2013-01-04 5:04 ` Christoffer Dall
2013-01-08 17:59 ` Steve Capper
0 siblings, 1 reply; 25+ messages in thread
From: Christoffer Dall @ 2013-01-04 5:04 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> Much of the required code for THP has been implemented in the earlier non-LPAE
> HugeTLB patch.
>
> One more domain bits is used (to store whether or not the THP is splitting).
s/bits/bit/
>
> Some THP helper functions are defined; and we have to re-define pmd_page such
> that it distinguishes between page tables and sections.
super nit: not sure the semi-colon is warranted here.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
> arch/arm/Kconfig | 2 +-
> arch/arm/include/asm/pgtable-2level.h | 68 ++++++++++++++++++++++++++++++++-
> arch/arm/include/asm/pgtable-3level.h | 2 +
> arch/arm/include/asm/pgtable.h | 7 +++-
> 4 files changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 9621d5f..d459673 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1773,7 +1773,7 @@ config SYS_SUPPORTS_HUGETLBFS
>
> config HAVE_ARCH_TRANSPARENT_HUGEPAGE
> def_bool y
> - depends on ARM_LPAE
> + depends on SYS_SUPPORTS_HUGETLBFS
>
> source "mm/Kconfig"
>
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 34f4775..67eabb4 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -179,6 +179,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> clean_pmd_entry(pmdp); \
> } while (0)
>
> +
stray whitespace?
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define _PMD_HUGE(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> +#else
> +#define _PMD_HUGE(pmd) (0)
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +
> /* we don't need complex calculations here as the pmd is folded into the pgd */
> #define pmd_addr_end(addr,end) (end)
>
> @@ -197,7 +204,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>
> #define HPAGE_SHIFT PMD_SHIFT
> #define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
> -#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
>
> #define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
> @@ -209,6 +215,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> */
> #define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
> #define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
> +#define PMD_DSECT_SPLITTING (_AT(pmdval_t, 1) << 7)
>
> #define PMD_BIT_FUNC(fn,op) \
> static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> @@ -261,8 +268,67 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> return __pmd(pmdval);
> }
>
> +#else
> +#define HPAGE_SIZE 0
why this and the conditional define of _PMD_HUGE, you could just do
like in pgtable.h and put the #ifdef around the condition in
pmd_page(pmt_t pmd).
> #endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
>
> +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> +
> +PMD_BIT_FUNC(mkold, &= ~PMD_DSECT_AF);
> +PMD_BIT_FUNC(mksplitting, |= PMD_DSECT_SPLITTING);
> +PMD_BIT_FUNC(mkdirty, |= PMD_DSECT_DIRTY);
> +PMD_BIT_FUNC(mkyoung, |= PMD_DSECT_AF);
> +PMD_BIT_FUNC(mkwrite, |= PMD_SECT_AP_WRITE);
> +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
> +
> +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_DSECT_SPLITTING)
> +#define pmd_young(pmd) (pmd_val(pmd) & PMD_DSECT_AF)
> +#define pmd_write(pmd) (pmd_val(pmd) & PMD_SECT_AP_WRITE)
> +#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> +
> +static inline unsigned long pmd_pfn(pmd_t pmd)
> +{
> + /*
> + * for a section, we need to mask off more of the pmd
> + * before looking up the pfn
> + */
> + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> + return __phys_to_pfn(pmd_val(pmd) & HPAGE_MASK);
> + else
> + return __phys_to_pfn(pmd_val(pmd) & PHYS_MASK);
> +}
> +
> +static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd = __pmd(__pfn_to_phys(pfn) | PMD_SECT_AP_READ | PMD_SECT_nG);
> +
> + return pmd_modify(pmd, prot);
> +}
> +
> +#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot);
> +
> +static inline int has_transparent_hugepage(void)
> +{
> + return 1;
> +}
> +
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +
> +static inline struct page *pmd_page(pmd_t pmd)
> +{
> + /*
> + * for a section, we need to mask off more of the pmd
> + * before looking up the page as it is a section descriptor.
> + */
> + if (_PMD_HUGE(pmd))
> + return phys_to_page(pmd_val(pmd) & HPAGE_MASK);
> +
> + return phys_to_page(pmd_val(pmd) & PHYS_MASK);
> +}
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_PGTABLE_2LEVEL_H */
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 31c071f..8360814 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -197,6 +197,8 @@ PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
> #define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
> #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
>
> +#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> +
> static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> {
> const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 767aa7c..2d96381 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -169,11 +169,14 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>
> static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> {
> +#ifdef SYS_SUPPORTS_HUGETLBFS
> + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> + return __va(pmd_val(pmd) & HPAGE_MASK);
> +#endif
> +
> return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
> }
>
> -#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> -
> #ifndef CONFIG_HIGHPTE
> #define __pte_map(pmd) pmd_page_vaddr(*(pmd))
> #define __pte_unmap(pte) do { } while (0)
> --
> 1.7.9.5
>
The whole series looks functionally correct to me:
Reviewed-by: Christoffer Dall <c.dall@virtualopensystems.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE.
2013-01-04 5:03 ` Christoffer Dall
@ 2013-01-08 17:56 ` Steve Capper
0 siblings, 0 replies; 25+ messages in thread
From: Steve Capper @ 2013-01-08 17:56 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jan 04, 2013 at 05:03:26AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> >
> > /*
> > + * For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes
> > + * that are written to a page table but not for ptes created with mk_pte.
> > + *
>
> Why is this not the case for 2 levels of paging as well?
>
> Is that because it's always checked against the Linux version, or?
>
>
Yes that's the case, I'll update the comment to reflect that.
> > + * This can cause some comparison tests made by pte_same to fail spuriously and
> > + * lead to other problems.
> > + *
> > + * To correct this behaviour, we mask off PTE_EXT_NG for any pte that is
> > + * present before running the comparison.
>
> nit: This comment doesn't really explain the rationale, I'm assuming
> that pte_same is used to compare only which page gets mapped, assuming
> the attributes etc. remain the same? or also the attributes should be
> the same, only mk_pte sets all of these except the NG bit.
>
I'll expand the comment to include the actual case. Essentially hugetlb_nopage
calls mk_pte to give new_pte and passes this to hugetlb_cow which then performs
a pte_same test against a pte that has already been written out to a page
table; the test fails erroneously due to the mismatch in NG bit.
Unfortunately this then causes a memory leak.
> > + */
> > +#define __HAVE_ARCH_PTE_SAME
> > +static inline int pte_same(pte_t pte_a, pte_t pte_b)
> > +{
> > + pteval_t vala = pte_val(pte_a), valb = pte_val(pte_b);
> > + if (pte_present(pte_a))
> > + vala &= ~L_PTE_CMP_MASKOFF;
> > +
> > + if (pte_present(pte_b))
> > + valb &= ~L_PTE_CMP_MASKOFF;
> > +
> > + return vala == valb;
> > +}
> > +
> > +/*
> > * Encode and decode a swap entry. Swap entries are stored in the Linux
> > * page tables as follows:
> > *
> > --
> > 1.7.9.5
> >
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages.
2013-01-04 5:03 ` Christoffer Dall
@ 2013-01-08 17:56 ` Steve Capper
0 siblings, 0 replies; 25+ messages in thread
From: Steve Capper @ 2013-01-08 17:56 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jan 04, 2013 at 05:03:36AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> > index 1c8f7f5..0a69cb8 100644
> > --- a/arch/arm/mm/flush.c
> > +++ b/arch/arm/mm/flush.c
> > @@ -17,6 +17,7 @@
> > #include <asm/highmem.h>
> > #include <asm/smp_plat.h>
> > #include <asm/tlbflush.h>
> > +#include <linux/hugetlb.h>
> >
> > #include "mm.h"
> >
> > @@ -168,17 +169,21 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
> > * coherent with the kernels mapping.
> > */
>
> I think it would be good to have a VM_BUG_ON(PageTail(page)) here.
>
Yes, very much so :-).
> > if (!PageHighMem(page)) {
> > - __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > + __cpuc_flush_dcache_area(page_address(page), (PAGE_SIZE << compound_order(page)));
>
> I think 98 characters is a stretch. You could do:
>
> size_t page_size = PAGE_SIZE << compound_order(page);
> __cpuc_flush_dcache_area(page_address(page), page_size);
>
>
Yes, thanks, that does look better.
> > } else {
> > - void *addr = kmap_high_get(page);
> > - if (addr) {
> > - __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > - kunmap_high(page);
> > - } else if (cache_is_vipt()) {
> > - /* unmapped pages might still be cached */
> > - addr = kmap_atomic(page);
> > - __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > - kunmap_atomic(addr);
> > + unsigned long i;
> > + for(i = 0; i < (1 << compound_order(page)); i++) {
> > + struct page *cpage = page + i;
> > + void *addr = kmap_high_get(cpage);
> > + if (addr) {
> > + __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > + kunmap_high(cpage);
> > + } else if (cache_is_vipt()) {
> > + /* unmapped pages might still be cached */
> > + addr = kmap_atomic(cpage);
> > + __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > + kunmap_atomic(addr);
> > + }
> > }
> > }
> >
> > --
> > 1.7.9.5
> >
>
> otherwise it looks good to me.
>
> -Christoffer
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems.
2013-01-04 5:03 ` Christoffer Dall
@ 2013-01-08 17:57 ` Steve Capper
2013-01-08 18:10 ` Christoffer Dall
0 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2013-01-08 17:57 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jan 04, 2013 at 05:03:59AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> > +++ b/arch/arm/include/asm/hugetlb.h
> > @@ -0,0 +1,83 @@
> > +/*
> > + * arch/arm/include/asm/hugetlb.h
> > + *
> > + * Copyright (C) 2012 ARM Ltd.
> > + *
> > + * Based on arch/x86/include/asm/hugetlb.h
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > + */
> > +
> > +#ifndef _ASM_ARM_HUGETLB_H
> > +#define _ASM_ARM_HUGETLB_H
> > +
> > +#include <asm/page.h>
> > +
> > +#include <asm/hugetlb-3level.h>
>
> I feel like it wouldn't hurt anyone to put a comment here explaining
> that these "ptes" are in fact pmd section descriptors disguised in pte
> types.
>
Yes, that does make sense. I'll put something in to that effect.
> > +
> > +static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> > + unsigned long addr, unsigned long end,
> > + unsigned long floor,
> > + unsigned long ceiling)
> > +{
> > + free_pgd_range(tlb, addr, end, floor, ceiling);
> > +}
> > +
> > +
> > +static inline int is_hugepage_only_range(struct mm_struct *mm,
> > + unsigned long addr, unsigned long len)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline int prepare_hugepage_range(struct file *file,
> > + unsigned long addr, unsigned long len)
> > +{
> > + struct hstate *h = hstate_file(file);
> > + if (len & ~huge_page_mask(h))
> > + return -EINVAL;
> > + if (addr & ~huge_page_mask(h))
> > + return -EINVAL;
> > + return 0;
> > +}
> > +
> > +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm)
> > +{
> > +}
> > +
> > +static inline int huge_pte_none(pte_t pte)
> > +{
> > + return pte_none(pte);
> > +}
> > +
> > +static inline pte_t huge_pte_wrprotect(pte_t pte)
> > +{
> > + return pte_wrprotect(pte);
> > +}
> > +
> > +static inline int arch_prepare_hugepage(struct page *page)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline void arch_release_hugepage(struct page *page)
> > +{
> > +}
> > +
> > +static inline void arch_clear_hugepage_flags(struct page *page)
> > +{
> > + clear_bit(PG_dcache_clean, &page->flags);
>
> why do we clear this bit here?
>
This is called when the huge page is freed, and it indicates that the
dcache needs to be flushed for this page. The mechanism was added by commit:
5d3a551c28c6669dc43be40d8fafafbc2ec8f42b.
"mm: hugetlb: add arch hook for clearing page flags before entering pool"
> > diff --git a/arch/arm/mm/hugetlbpage-3level.c b/arch/arm/mm/hugetlbpage-3level.c
> > new file mode 100644
> > index 0000000..86474f0
> > --- /dev/null
> > +++ b/arch/arm/mm/hugetlbpage-3level.c
> > @@ -0,0 +1,190 @@
> > +/*
> > + * arch/arm/mm/hugetlbpage-3level.c
> > + *
> > + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
> > + * Copyright (C) 2012 ARM Ltd.
> > + *
> > + * Based on arch/x86/mm/hugetlbpage.c
> > + *
>
> this seems to be an almost 1-to-1 copy of the x86 code. Is it not
> worth sharing it somehow? Possible?
>
Yeah, good point, I have a cunning plan though; please see below.
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > + */
> > +
> > +#include <linux/init.h>
> > +#include <linux/fs.h>
> > +#include <linux/mm.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/pagemap.h>
> > +#include <linux/err.h>
> > +#include <linux/sysctl.h>
> > +#include <asm/mman.h>
> > +#include <asm/tlb.h>
> > +#include <asm/tlbflush.h>
> > +#include <asm/pgalloc.h>
> > +
> > +static unsigned long page_table_shareable(struct vm_area_struct *svma,
> > + struct vm_area_struct *vma,
> > + unsigned long addr, pgoff_t idx)
> > +{
> > + unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) +
> > + svma->vm_start;
> > + unsigned long sbase = saddr & PUD_MASK;
> > + unsigned long s_end = sbase + PUD_SIZE;
>
> these are to check that the potential vma to steal the pmd from covers
> the entire pud entry's address space, correct?
Yes that's correct.
>
> it's pretty confusing with the idx conversion back and forward,
> especially given that mm/hugetlb.c uses idx to index into number of
> huge pages, where this idx is index into number of regular pages, so I
> would suggest some clear static conversion functions or a comment.
>
> > +
> > + /* Allow segments to share if only one is marked locked */
>
> exactly one or at most one? the code below checks that exactly one is
> marked locked, if I read it correctly. Again, for me, the comment
> would be more helpful if it stated *why* that's a requirement, not
> just that it *is* a requirement.
>
This originates from commit:
32b154c0b0bae2879bf4e549d861caf1759a3546
"x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not"
and the commit title is clearer.
> > + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
> > + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
> > +
> > + /*
> > + * match the virtual addresses, permission and the alignment of the
> > + * page table page.
> > + */
> > + if (pmd_index(addr) != pmd_index(saddr) ||
> > + vm_flags != svm_flags ||
> > + sbase < svma->vm_start || svma->vm_end < s_end)
> > + return 0;
> > +
> > + return saddr;
> > +}
> > +
> > +static int vma_shareable(struct vm_area_struct *vma, unsigned long addr)
> > +{
> > + unsigned long base = addr & PUD_MASK;
> > + unsigned long end = base + PUD_SIZE;
> > +
> > + /*
> > + * check on proper vm_flags and page table alignment
> > + */
> > + if (vma->vm_flags & VM_MAYSHARE &&
> > + vma->vm_start <= base && end <= vma->vm_end)
> > + return 1;
> > + return 0;
> > +}
> > +
> > +/*
> > + * search for a shareable pmd page for hugetlb.
>
> nit:
>
> perhaps this is completely standard knowledge for your garden variety
> mm hacker, but I needed to spend 5 minutes figuring out the purpose
> here - if I get it right: multiple mappings the hugetlbfs file for the
> same mm covering the same pud address range mapping the same data can
> use the same pmd, right?
>
Yes, that's correct. Multiple puds can point to the same block of 512 pmds.
Which can then lead to cache use reduction when page tables are walked.
This is introduced in commit: 39dde65c9940c97fcd178a3d2b1c57ed8b7b68aa
"[PATCH] shared page table for hugetlb page"
> I would either rename the function to find_huge_pmd_share and get rid
> of the comment or expand on the comment.
>
> > + */
> > +static pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr,
> > + pud_t *pud)
> > +{
> > + struct vm_area_struct *vma = find_vma(mm, addr);
> > + struct address_space *mapping = vma->vm_file->f_mapping;
> > + pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
> > + vma->vm_pgoff;
> > + struct vm_area_struct *svma;
> > + unsigned long saddr;
> > + pte_t *spte = NULL;
> > + pte_t *pte;
> > +
> > + if (!vma_shareable(vma, addr))
> > + return (pte_t *)pmd_alloc(mm, pud, addr);
> > +
> > + mutex_lock(&mapping->i_mmap_mutex);
> > + vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
> > + if (svma == vma)
> > + continue;
> > +
> > + saddr = page_table_shareable(svma, vma, addr, idx);
> > + if (saddr) {
> > + spte = huge_pte_offset(svma->vm_mm, saddr);
> > + if (spte) {
> > + get_page(virt_to_page(spte));
> > + break;
> > + }
> > + }
> > + }
> > +
> > + if (!spte)
> > + goto out;
> > +
> > + spin_lock(&mm->page_table_lock);
> > + if (pud_none(*pud))
> > + pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK));
> > + else
> > + put_page(virt_to_page(spte));
> > + spin_unlock(&mm->page_table_lock);
> > +out:
> > + pte = (pte_t *)pmd_alloc(mm, pud, addr);
> > + mutex_unlock(&mapping->i_mmap_mutex);
> > + return pte;
> > +}
> > +
> > +/*
> > + * unmap huge page backed by shared pte.
> > + *
> > + * Hugetlb pte page is ref counted at the time of mapping. If pte is shared
> > + * indicated by page_count > 1, unmap is achieved by clearing pud and
> > + * decrementing the ref count. If count == 1, the pte page is not shared.
> > + *
> > + * called with vma->vm_mm->page_table_lock held.
> > + *
> > + * returns: 1 successfully unmapped a shared pte page
> > + * 0 the underlying pte page is not shared, or it is the last user
> > + */
> > +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
> > +{
> > + pgd_t *pgd = pgd_offset(mm, *addr);
> > + pud_t *pud = pud_offset(pgd, *addr);
> > +
> > + BUG_ON(page_count(virt_to_page(ptep)) == 0);
> > + if (page_count(virt_to_page(ptep)) == 1)
> > + return 0;
> > +
> > + pud_clear(pud);
> > + put_page(virt_to_page(ptep));
> > + *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
>
> huh? this hurts my brain. Why the minus HPAGE_SIZE?
>
This is called in two places: hugetlb_change_protection and
__unmap_hugepage_range. In both cases it is called as part of a loop where
*addr is incremented by the huge page size when the loop advances. If
huge_pmd_unshare returns 1, then the loop "continue"s. The -HPAGE_SIZE cancels
out the *addr increment at the end of the loop.
> > + return 1;
> > +}
> > +
> > +pte_t *huge_pte_alloc(struct mm_struct *mm,
> > + unsigned long addr, unsigned long sz)
> > +{
> > + pgd_t *pgd;
> > + pud_t *pud;
> > + pte_t *pte = NULL;
> > +
> > + pgd = pgd_offset(mm, addr);
> > + pud = pud_alloc(mm, pgd, addr);
> > + if (pud) {
> > + BUG_ON(sz != PMD_SIZE);
>
> is this really necessary?
>
> VM_BUG_ON?
>
Thanks, I'll clean this up.
So, on to my cunning plan :-)....
Essentially the huge pmd sharing takes place under the following circumstances:
1) At least 1GB of huge memory must be requested in a vm_area block.
2) This must be VM_SHAREABLE; so mmap using MAP_SHARED on hugetlbfs or shmget.
3) The mapping must be at a 1GB memory boundary (to kick off a new pud).
4) Another process must request this same backing file; but again, this must be
mapped on a 1GB boundary in the other process too.
I've only been able to get this working with a 3GB user VM split, and my test
case needed to be statically compiled as .so's were mmaped right where I wanted
to mmap the 1GB huge memory block (at 0x40000000).
Having thought more about this; I don't think the above conditions are likely to
crop up with 2GB or 3GB user VM area in the wild. Thus I am tempted to remove
the huge pmd sharing from the LPAE hugetlb code and simplify things a bit more.
If I've missed a use case where huge pmd sharing may be useful, please give me
a shout?
> > + if (pud_none(*pud))
> > + pte = huge_pmd_share(mm, addr, pud);
> > + else
> > + pte = (pte_t *)pmd_alloc(mm, pud, addr);
> > + }
> > + BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> > +
> > + return pte;
> > +}
> > +
> > +struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> > + pmd_t *pmd, int write)
> > +{
> > + struct page *page;
> > +
> > + page = pte_page(*(pte_t *)pmd);
> > + if (page)
> > + page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
> > + return page;
> > +}
> > +
> > +struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
> > + pud_t *pud, int write)
> > +{
> > + struct page *page;
> > +
> > + page = pte_page(*(pte_t *)pud);
> > + if (page)
> > + page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
> > + return page;
>
> why implement this? this should never be called right? Shouldn't it
> just be a BUG();
>
Yes, thanks, it should be BUG().
> > +}
> > diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> > new file mode 100644
> > index 0000000..32fe7fd
> > --- /dev/null
> > +++ b/arch/arm/mm/hugetlbpage.c
> > @@ -0,0 +1,65 @@
> > +/*
> > + * arch/arm/mm/hugetlbpage.c
> > + *
> > + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
> > + * Copyright (C) 2012 ARM Ltd.
> > + *
> > + * Based on arch/x86/mm/hugetlbpage.c
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > + */
> > +
> > +#include <linux/init.h>
> > +#include <linux/fs.h>
> > +#include <linux/mm.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/pagemap.h>
> > +#include <linux/err.h>
> > +#include <linux/sysctl.h>
> > +#include <asm/mman.h>
> > +#include <asm/tlb.h>
> > +#include <asm/tlbflush.h>
> > +#include <asm/pgalloc.h>
> > +
> > +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> > +{
> > + pgd_t *pgd;
> > + pud_t *pud;
> > + pmd_t *pmd = NULL;
> > +
> > + pgd = pgd_offset(mm, addr);
> > + if (pgd_present(*pgd)) {
> > + pud = pud_offset(pgd, addr);
> > + if (pud_present(*pud))
> > + pmd = pmd_offset(pud, addr);
> > + }
> > +
> > + return (pte_t *)pmd;
> > +}
> > +
> > +struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
> > + int write)
> > +{
> > + return ERR_PTR(-EINVAL);
> > +}
> > +
> > +int pmd_huge(pmd_t pmd)
> > +{
> > + return (pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT;
> > +}
> > +
> > +int pud_huge(pud_t pud)
> > +{
> > + return 0;
> > +}
> > --
> > 1.7.9.5
> >
> >
>
> -Christoffer
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems.
2013-01-04 5:04 ` Christoffer Dall
@ 2013-01-08 17:58 ` Steve Capper
2013-01-08 18:13 ` Christoffer Dall
0 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2013-01-08 17:58 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jan 04, 2013 at 05:04:43AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> > diff --git a/arch/arm/include/asm/hugetlb-2level.h b/arch/arm/include/asm/hugetlb-2level.h
> > new file mode 100644
> > index 0000000..3532b54
> > --- /dev/null
> > +++ b/arch/arm/include/asm/hugetlb-2level.h
> > @@ -0,0 +1,71 @@
> > +/*
> > + * arch/arm/include/asm/hugetlb-2level.h
> > + *
> > + * Copyright (C) 2012 ARM Ltd.
> > + *
> > + * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > + */
> > +
> > +#ifndef _ASM_ARM_HUGETLB_2LEVEL_H
> > +#define _ASM_ARM_HUGETLB_2LEVEL_H
> > +
> > +
> > +pte_t huge_ptep_get(pte_t *ptep);
> > +
> > +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> > + pte_t *ptep, pte_t pte);
> > +
> > +static inline pte_t pte_mkhuge(pte_t pte) { return pte; }
> > +
> > +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> > + unsigned long addr, pte_t *ptep)
> > +{
> > + flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
>
> don't you need to clear the old TLB entry first here, otherwise
> another CPU could put an entry to the old page in its TLB and access
> it even after the page_cache_release(old_page) in hugetlb_cow() ?
>
Yes I do, thanks.
> > +}
> > +
> > +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> > + unsigned long addr, pte_t *ptep)
> > +{
> > + pmd_t *pmdp = (pmd_t *) ptep;
> > + set_pmd_at(mm, addr, pmdp, pmd_wrprotect(*pmdp));
> > +}
> > +
> > +
> > +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> > + unsigned long addr, pte_t *ptep)
> > +{
> > + pmd_t *pmdp = (pmd_t *)ptep;
> > + pte_t pte = huge_ptep_get(ptep);
> > + pmd_clear(pmdp);
> > +
> > + return pte;
> > +}
> > +
> > +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> > + unsigned long addr, pte_t *ptep,
> > + pte_t pte, int dirty)
> > +{
> > + int changed = !pte_same(huge_ptep_get(ptep), pte);
> > +
> > + if (changed) {
> > + set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> > + huge_ptep_clear_flush(vma, addr, &pte);
> > + }
> > +
> > + return changed;
> > +}
> > +
> > +#endif /* _ASM_ARM_HUGETLB_2LEVEL_H */
> > diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> > index 7af9cf6..1e92975 100644
> > --- a/arch/arm/include/asm/hugetlb.h
> > +++ b/arch/arm/include/asm/hugetlb.h
> > @@ -24,7 +24,11 @@
> >
> > #include <asm/page.h>
> >
> > +#ifdef CONFIG_ARM_LPAE
> > #include <asm/hugetlb-3level.h>
> > +#else
> > +#include <asm/hugetlb-2level.h>
> > +#endif
> >
> > static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> > unsigned long addr, unsigned long end,
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 662a00e..fd1d9be 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -163,7 +163,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> > return (pmd_t *)pud;
> > }
> >
> > -#define pmd_bad(pmd) (pmd_val(pmd) & 2)
> > +#define pmd_bad(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_FAULT)
>
> this changes the semantics of the macro - is that on purpose and safe?
>
> (fault entries didn't used to be bad, now they are...)
>
Yes, thanks, the semantics should be retained (they are for LPAE).
> >
> > #define copy_pmd(pmdpd,pmdps) \
> > do { \
> > @@ -184,6 +184,83 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >
> > #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> >
> > +
> > +#ifdef CONFIG_SYS_SUPPORTS_HUGETLBFS
> > +
> > +/*
> > + * now follows some of the definitions to allow huge page support, we can't put
> > + * these in the hugetlb source files as they are also required for transparent
> > + * hugepage support.
> > + */
> > +
> > +#define HPAGE_SHIFT PMD_SHIFT
> > +#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
> > +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> > +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
> > +
> > +#define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
> > +#define HUGE_LINUX_PTE_SIZE (HUGE_LINUX_PTE_COUNT * sizeof(pte_t *))
> > +#define HUGE_LINUX_PTE_INDEX(addr) (addr >> HPAGE_SHIFT)
> > +
> > +/*
> > + * We re-purpose the following domain bits in the section descriptor
> > + */
> > +#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
> > +#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
> > +
> > +#define PMD_BIT_FUNC(fn,op) \
> > +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> > +
> > +PMD_BIT_FUNC(wrprotect, &= ~PMD_SECT_AP_WRITE);
> > +
> > +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> > + pmd_t *pmdp, pmd_t pmd)
> > +{
> > + /*
> > + * we can sometimes be passed a pmd pointing to a level 2 descriptor
> > + * from collapse_huge_page.
> > + */
> > + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_TABLE) {
> > + pmdp[0] = __pmd(pmd_val(pmd));
> > + pmdp[1] = __pmd(pmd_val(pmd) + 256 * sizeof(pte_t));
>
> eh, if I get this right, this means that in the case where the pmd
> points to level 2 descriptor, all the pages are lined up to be a huge
> page, so just point to the next level 2 pte, which directly follows
> the next level 2 descriptor, because they share the same page. But
> then why do we need to set any values here?
>
This is a little weird.
The transparent huge page code will try sometimes to collapse a group of pages
into a huge page. As part of the collapse process, it will invalidate the pmd
before it copies the physical pages into a contiguous huge page. This ensures
that memory accesses to the area being collapsed fault loop whilst the collapse
takes place. Sometimes the collapse process will be aborted after the pmd has
been invalidated, so the original pmd (which points to a page table) needs to
be put back as part of the rollback.
With 2 levels of paging, the pmds are arranged in pairs so we put back a pair
of pmds.
> > + } else {
> > + pmdp[0] = __pmd(pmd_val(pmd)); /* first 1M section */
> > + pmdp[1] = __pmd(pmd_val(pmd) + SECTION_SIZE); /* second 1M section */
> > + }
> > +
> > + flush_pmd_entry(pmdp);
> > +}
> > +
> > +#define HPMD_XLATE(res, cmp, from, to) do { if (cmp & from) res |= to; \
> > + else res &= ~to; \
> > + } while (0)
> > +
> > +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> > +{
> > + pmdval_t pmdval = pmd_val(pmd);
> > + pteval_t newprotval = pgprot_val(newprot);
> > +
> > + HPMD_XLATE(pmdval, newprotval, L_PTE_XN, PMD_SECT_XN);
> > + HPMD_XLATE(pmdval, newprotval, L_PTE_SHARED, PMD_SECT_S);
> > + HPMD_XLATE(pmdval, newprotval, L_PTE_YOUNG, PMD_DSECT_AF);
>
> consider something akin to:
>
> #define L_PMD_DSECT_YOUNG (PMD_DSECT_AF)
>
> then you don't have to change several places if you decide to
> rearrange the mappings for whatever reason at it makes it slightly
> easier to read this code.
>
Yeah, something along those lines may look better. I'll have a tinker.
> > + HPMD_XLATE(pmdval, newprotval, L_PTE_DIRTY, PMD_DSECT_DIRTY);
> > +
> > + /* preserve bits C & B */
> > + pmdval |= (newprotval & (3 << 2));
>
> this looks superfluous?
>
> > +
> > + /* Linux PTE bit 4 corresponds to PMD TEX bit 0 */
> > + HPMD_XLATE(pmdval, newprotval, 1 << 4, PMD_SECT_TEX(1));
>
> define L_PTE_TEX0 and group with the others above?
>
The mapping is not quite that simple. We have multiple memory types defined
in pgtable-{23}level.h and these have different meanings depending on the
target processor. For v6 and v7 the above works, but ideally, I should be able
to look up the memory type mapping. For instance in arch/arm/mm/mmu.c, we can
see cache policies that contain linux pte information and hardware pmd
information. I'll ponder this some more, if anyone has a neat way of handling
this then please let me know :-).
> > +
> > + if (newprotval & L_PTE_RDONLY)
> > + pmdval &= ~PMD_SECT_AP_WRITE;
> > + else
> > + pmdval |= PMD_SECT_AP_WRITE;
> > +
> > + return __pmd(pmdval);
> > +}
> > +
> > +#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
> > +
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* _ASM_PGTABLE_2LEVEL_H */
> > diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> > index 99a1951..685e9e87 100644
> > --- a/arch/arm/include/asm/tlb.h
> > +++ b/arch/arm/include/asm/tlb.h
> > @@ -92,10 +92,16 @@ static inline void tlb_flush(struct mmu_gather *tlb)
> > static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
> > {
> > if (!tlb->fullmm) {
> > + unsigned long size = PAGE_SIZE;
> > +
> > if (addr < tlb->range_start)
> > tlb->range_start = addr;
> > - if (addr + PAGE_SIZE > tlb->range_end)
> > - tlb->range_end = addr + PAGE_SIZE;
> > +
> > + if (tlb->vma && is_vm_hugetlb_page(tlb->vma))
> > + size = HPAGE_SIZE;
> > +
> > + if (addr + size > tlb->range_end)
> > + tlb->range_end = addr + size;
> > }
> > }
> >
> > diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> > index 4eee351..860f08e 100644
> > --- a/arch/arm/kernel/head.S
> > +++ b/arch/arm/kernel/head.S
> > @@ -410,13 +410,21 @@ __enable_mmu:
> > mov r5, #0
> > mcrr p15, 0, r4, r5, c2 @ load TTBR0
> > #else
> > +#ifndef CONFIG_SYS_SUPPORTS_HUGETLBFS
> > mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
> > domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
> > domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
> > domain_val(DOMAIN_IO, DOMAIN_CLIENT))
> > +#else
> > + @ set ourselves as the client in all domains
> > + @ this allows us to then use the 4 domain bits in the
> > + @ section descriptors in our transparent huge pages
> > + ldr r5, =0x55555555
> > +#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
> > +
> > mcr p15, 0, r5, c3, c0, 0 @ load domain access register
> > mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
> > -#endif
> > +#endif /* CONFIG_ARM_LPAE */
> > b __turn_mmu_on
> > ENDPROC(__enable_mmu)
> >
> > diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> > index 1560bbc..adf0b19 100644
> > --- a/arch/arm/mm/Makefile
> > +++ b/arch/arm/mm/Makefile
> > @@ -17,7 +17,11 @@ obj-$(CONFIG_MODULES) += proc-syms.o
> > obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
> > obj-$(CONFIG_HIGHMEM) += highmem.o
> > obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
> > +ifeq ($(CONFIG_ARM_LPAE),y)
> > obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-3level.o
> > +else
> > +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-2level.o
> > +endif
> >
> > obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
> > obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 5dbf13f..0884936 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -488,13 +488,13 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
> > #endif /* CONFIG_MMU */
> >
> > /*
> > - * Some section permission faults need to be handled gracefully.
> > - * They can happen due to a __{get,put}_user during an oops.
> > + * A fault in a section will likely be due to a huge page, treat it
> > + * as a page fault.
> > */
> > static int
> > do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> > {
> > - do_bad_area(addr, fsr, regs);
> > + do_page_fault(addr, fsr, regs);
>
> doesn't the previous patch require this as well?
>
> (so it should strictly speaking be part of that patch)
>
Yes it does. Thanks I'll clean this up by updating the fsr_info tables for long
and short descriptors; and remove the do_sect_fault->do_page_fault daisy chaining.
> > return 0;
> > }
> >
> > diff --git a/arch/arm/mm/hugetlbpage-2level.c b/arch/arm/mm/hugetlbpage-2level.c
> > new file mode 100644
> > index 0000000..4b2b38c
> > --- /dev/null
> > +++ b/arch/arm/mm/hugetlbpage-2level.c
> > @@ -0,0 +1,115 @@
> > +/*
> > + * arch/arm/mm/hugetlbpage-2level.c
> > + *
> > + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
> > + * Copyright (C) 2012 ARM Ltd
> > + * Copyright (C) 2012 Bill Carson.
> > + *
> > + * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > + */
> > +
> > +#include <linux/init.h>
> > +#include <linux/fs.h>
> > +#include <linux/mm.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/pagemap.h>
> > +#include <linux/err.h>
> > +#include <linux/sysctl.h>
> > +#include <asm/mman.h>
> > +#include <asm/tlb.h>
> > +#include <asm/tlbflush.h>
> > +#include <asm/pgalloc.h>
> > +
> > +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
> > +{
> > + return 0;
> > +}
> > +
> > +pte_t *huge_pte_alloc(struct mm_struct *mm,
> > + unsigned long addr, unsigned long sz)
> > +{
> > + pgd_t *pgd;
> > + pud_t *pud;
> > + pmd_t *pmd;
> > +
> > + pgd = pgd_offset(mm, addr);
> > + pud = pud_offset(pgd, addr);
> > + pmd = pmd_offset(pud, addr);
> > +
> > + return (pte_t *)pmd; /* our huge pte is actually a pmd */
> > +}
> > +
> > +struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> > + pmd_t *pmd, int write)
> > +{
> > + struct page *page;
> > + unsigned long pfn;
> > +
> > + BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
>
> I could only see one caller who calls this only when this exact
> condition is fulfilled, so unless we anticipate other callers, this
> BUG_ON could go.
>
Yes thanks, this can be scrubbed.
> > + pfn = ((pmd_val(*pmd) & HPAGE_MASK) >> PAGE_SHIFT);
> > + page = pfn_to_page(pfn);
> > + return page;
> > +}
> > +
> > +pte_t huge_ptep_get(pte_t *ptep)
> > +{
> > + pmd_t *pmdp = (pmd_t*)ptep;
> > + pmdval_t pmdval = pmd_val(*pmdp);
> > + pteval_t retval;
> > +
> > + if (!pmdval)
> > + return __pte(0);
> > +
> > + retval = (pteval_t) (pmdval & HPAGE_MASK);
> > + HPMD_XLATE(retval, pmdval, PMD_SECT_XN, L_PTE_XN);
> > + HPMD_XLATE(retval, pmdval, PMD_SECT_S, L_PTE_SHARED);
> > + HPMD_XLATE(retval, pmdval, PMD_DSECT_AF, L_PTE_YOUNG);
> > + HPMD_XLATE(retval, pmdval, PMD_DSECT_DIRTY, L_PTE_DIRTY);
> > +
> > + /* preserve bits C & B */
> > + retval |= (pmdval & (3 << 2));
> > +
> > + /* PMD TEX bit 0 corresponds to Linux PTE bit 4 */
> > + HPMD_XLATE(retval, pmdval, PMD_SECT_TEX(1), 1 << 4);
> > +
>
> again, I would define the 1 << 4 to something and treat like the others...
>
> > + if (pmdval & PMD_SECT_AP_WRITE)
> > + retval &= ~L_PTE_RDONLY;
> > + else
> > + retval |= L_PTE_RDONLY;
> > +
> > + if ((pmdval & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > + retval |= L_PTE_VALID;
> > +
> > + /* we assume all hugetlb pages are user */
> > + retval |= L_PTE_USER;
> > +
> > + return __pte(retval);
> > +}
> > +
> > +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> > + pte_t *ptep, pte_t pte)
> > +{
> > + pmdval_t pmdval = (pmdval_t) pte_val(pte);
> > + pmd_t *pmdp = (pmd_t*) ptep;
> > +
> > + pmdval &= HPAGE_MASK;
> > + pmdval |= PMD_SECT_AP_READ | PMD_SECT_nG | PMD_TYPE_SECT;
> > + pmdval = pmd_val(pmd_modify(__pmd(pmdval), __pgprot(pte_val(pte))));
> > +
> > + __sync_icache_dcache(pte);
> > +
> > + set_pmd_at(mm, addr, pmdp, __pmd(pmdval));
> > +}
>
> so this whole scheme where the caller expects ptes, but really gets
> pmds feels strange to me, but perhaps it makes more sense on other
> architectures as to not change the caller instead of this magic?
>
It is a little strange, but expected. We are considering one level up from
normal page table entries. The short descriptor case is made stranger by the
linux/hardware pte distinction. I wanted to re-purpose the domain bits and use
translation as this allows for a much simpler transparent huge page
implemention.
I'll see if I can simplify some bits of the short descriptor hugetlb code.
> -Christoffer
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems.
2013-01-04 5:04 ` Christoffer Dall
@ 2013-01-08 17:59 ` Steve Capper
2013-01-08 18:15 ` Christoffer Dall
0 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2013-01-08 17:59 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jan 04, 2013 at 05:04:50AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> > index d086f61..31c071f 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -85,6 +85,9 @@
> > #define L_PTE_DIRTY (_AT(pteval_t, 1) << 55) /* unused */
> > #define L_PTE_SPECIAL (_AT(pteval_t, 1) << 56) /* unused */
> >
> > +#define PMD_SECT_DIRTY (_AT(pmdval_t, 1) << 55)
> > +#define PMD_SECT_SPLITTING (_AT(pmdval_t, 1) << 57)
> > +
> > /*
> > * To be used in assembly code with the upper page attributes.
> > */
> > @@ -166,6 +169,60 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> > #define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> >
> >
> > +#define pmd_present(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
> > +#define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF)
> > +
> > +#define __HAVE_ARCH_PMD_WRITE
> > +#define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY))
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
> > +#endif
> > +
> > +#define PMD_BIT_FUNC(fn,op) \
> > +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> > +
> > +PMD_BIT_FUNC(wrprotect, |= PMD_SECT_RDONLY);
> > +PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
> > +PMD_BIT_FUNC(mksplitting, |= PMD_SECT_SPLITTING);
> > +PMD_BIT_FUNC(mkwrite, &= ~PMD_SECT_RDONLY);
> > +PMD_BIT_FUNC(mkdirty, |= PMD_SECT_DIRTY);
> > +PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);
> > +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
>
> personally I would prefer not to automate the prefixing of pmd_: it
> doesn't really save a lot of characters, it doesn't improve
> readability and it breaks grep/cscope.
>
This follows the pte bit functions to a degree.
> > +
> > +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> > +
> > +#define pmd_pfn(pmd) ((pmd_val(pmd) & PHYS_MASK) >> PAGE_SHIFT)
>
> the arm arm says UNK/SBZP, so we should be fine here right? (noone is
> crazy enough to try and squeeze some extra information in the extra
> bits here or something like that). For clarity, one could consider:
>
> (((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
>
Thanks, yes, it's better to PMD_MASK the value too.
> > +#define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
> > +#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
> > +
> > +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> > +{
> > + const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
> > + pmd_val(pmd) = (pmd_val(pmd) & ~mask) | (pgprot_val(newprot) & mask);
> > + return pmd;
> > +}
> > +
> > +static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
> > +{
> > + *pmdp = pmd;
> > +}
>
> why this level of indirection?
>
Over manipulation in git :-), this can be scrubbed.
> > +
> > +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> > + pmd_t *pmdp, pmd_t pmd)
> > +{
> > + BUG_ON(addr >= TASK_SIZE);
> > + pmd = __pmd(pmd_val(pmd) | PMD_SECT_nG);
>
> why this side affect?
>
This replicates the side effect found when placing ptes into page tables. We
need the NG bit for user pages.
> > + set_pmd(pmdp, pmd);
> > + flush_pmd_entry(pmdp);
> > +}
> > +
> > +static inline int has_transparent_hugepage(void)
> > +{
> > + return 1;
> > +}
> > +
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* _ASM_PGTABLE_3LEVEL_H */
> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> > index c35bf46..767aa7c 100644
> > --- a/arch/arm/include/asm/pgtable.h
> > +++ b/arch/arm/include/asm/pgtable.h
> > @@ -24,6 +24,9 @@
> > #include <asm/memory.h>
> > #include <asm/pgtable-hwdef.h>
> >
> > +
> > +#include <asm/tlbflush.h>
> > +
> > #ifdef CONFIG_ARM_LPAE
> > #include <asm/pgtable-3level.h>
> > #else
> > @@ -163,7 +166,6 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> > #define pgd_offset_k(addr) pgd_offset(&init_mm, addr)
> >
> > #define pmd_none(pmd) (!pmd_val(pmd))
> > -#define pmd_present(pmd) (pmd_val(pmd))
> >
> > static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> > {
> > diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> > index 685e9e87..0fc2d9d 100644
> > --- a/arch/arm/include/asm/tlb.h
> > +++ b/arch/arm/include/asm/tlb.h
> > @@ -229,6 +229,12 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> > #endif
> > }
> >
> > +static inline void
> > +tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
> > +{
> > + tlb_add_flush(tlb, addr);
> > +}
> > +
> > #define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
> > #define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
> > #define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
> > diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> > index 6e924d3..907cede 100644
> > --- a/arch/arm/include/asm/tlbflush.h
> > +++ b/arch/arm/include/asm/tlbflush.h
> > @@ -505,6 +505,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > }
> > #endif
> >
> > +#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
> > +
> > #endif
> >
> > #endif /* CONFIG_MMU */
> > diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
> > index 05a4e94..47f4c6f 100644
> > --- a/arch/arm/mm/fsr-3level.c
> > +++ b/arch/arm/mm/fsr-3level.c
> > @@ -9,7 +9,7 @@ static struct fsr_info fsr_info[] = {
> > { do_page_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
> > { do_bad, SIGBUS, 0, "reserved access flag fault" },
> > { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
> > - { do_bad, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
> > + { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
> > { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
> > { do_bad, SIGBUS, 0, "reserved permission fault" },
> > { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
> > --
> > 1.7.9.5
> >
>
> Besides the nits it looks fine to me. I've done quite extensive
> testing with varied workloads on this code over the last couple of
> months on the vexpress TC2 and on the ARNDALE board using KVM/ARM with
> huge pages, and it gives a nice ~15% performance increase on average
> and is completely stable.
That's great to hear \o/.
Also I've found a decent perf boost when running tools like xz backed by huge pages.
(One can use the LD_PRELOAD mechanism in libhugetlbfs to make mallocs point to
huge pages).
>
> -Christoffer
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems.
2013-01-04 5:04 ` Christoffer Dall
@ 2013-01-08 17:59 ` Steve Capper
2013-01-08 18:17 ` Christoffer Dall
0 siblings, 1 reply; 25+ messages in thread
From: Steve Capper @ 2013-01-08 17:59 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jan 04, 2013 at 05:04:57AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
> > Much of the required code for THP has been implemented in the earlier non-LPAE
> > HugeTLB patch.
> >
> > One more domain bits is used (to store whether or not the THP is splitting).
>
> s/bits/bit/
>
Thanks.
> >
> > Some THP helper functions are defined; and we have to re-define pmd_page such
> > that it distinguishes between page tables and sections.
>
> super nit: not sure the semi-colon is warranted here.
>
Cheers, it is a superfluous semicolon.
> >
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > Signed-off-by: Steve Capper <steve.capper@arm.com>
> > ---
> > arch/arm/Kconfig | 2 +-
> > arch/arm/include/asm/pgtable-2level.h | 68 ++++++++++++++++++++++++++++++++-
> > arch/arm/include/asm/pgtable-3level.h | 2 +
> > arch/arm/include/asm/pgtable.h | 7 +++-
> > 4 files changed, 75 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index 9621d5f..d459673 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1773,7 +1773,7 @@ config SYS_SUPPORTS_HUGETLBFS
> >
> > config HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > def_bool y
> > - depends on ARM_LPAE
> > + depends on SYS_SUPPORTS_HUGETLBFS
> >
> > source "mm/Kconfig"
> >
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 34f4775..67eabb4 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -179,6 +179,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> > clean_pmd_entry(pmdp); \
> > } while (0)
> >
> > +
>
> stray whitespace?
>
Thanks.
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#define _PMD_HUGE(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > +#else
> > +#define _PMD_HUGE(pmd) (0)
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > +
> > /* we don't need complex calculations here as the pmd is folded into the pgd */
> > #define pmd_addr_end(addr,end) (end)
> >
> > @@ -197,7 +204,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >
> > #define HPAGE_SHIFT PMD_SHIFT
> > #define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
> > -#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> > #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
> >
> > #define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
> > @@ -209,6 +215,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> > */
> > #define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
> > #define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
> > +#define PMD_DSECT_SPLITTING (_AT(pmdval_t, 1) << 7)
> >
> > #define PMD_BIT_FUNC(fn,op) \
> > static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> > @@ -261,8 +268,67 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> > return __pmd(pmdval);
> > }
> >
> > +#else
> > +#define HPAGE_SIZE 0
>
> why this and the conditional define of _PMD_HUGE, you could just do
> like in pgtable.h and put the #ifdef around the condition in
> pmd_page(pmt_t pmd).
>
Thanks, I'll take a look at this.
> > #endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
> >
> > +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> > +
> > +PMD_BIT_FUNC(mkold, &= ~PMD_DSECT_AF);
> > +PMD_BIT_FUNC(mksplitting, |= PMD_DSECT_SPLITTING);
> > +PMD_BIT_FUNC(mkdirty, |= PMD_DSECT_DIRTY);
> > +PMD_BIT_FUNC(mkyoung, |= PMD_DSECT_AF);
> > +PMD_BIT_FUNC(mkwrite, |= PMD_SECT_AP_WRITE);
> > +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
> > +
> > +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_DSECT_SPLITTING)
> > +#define pmd_young(pmd) (pmd_val(pmd) & PMD_DSECT_AF)
> > +#define pmd_write(pmd) (pmd_val(pmd) & PMD_SECT_AP_WRITE)
> > +#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > +
> > +static inline unsigned long pmd_pfn(pmd_t pmd)
> > +{
> > + /*
> > + * for a section, we need to mask off more of the pmd
> > + * before looking up the pfn
> > + */
> > + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > + return __phys_to_pfn(pmd_val(pmd) & HPAGE_MASK);
> > + else
> > + return __phys_to_pfn(pmd_val(pmd) & PHYS_MASK);
> > +}
> > +
> > +static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> > +{
> > + pmd_t pmd = __pmd(__pfn_to_phys(pfn) | PMD_SECT_AP_READ | PMD_SECT_nG);
> > +
> > + return pmd_modify(pmd, prot);
> > +}
> > +
> > +#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot);
> > +
> > +static inline int has_transparent_hugepage(void)
> > +{
> > + return 1;
> > +}
> > +
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > +
> > +static inline struct page *pmd_page(pmd_t pmd)
> > +{
> > + /*
> > + * for a section, we need to mask off more of the pmd
> > + * before looking up the page as it is a section descriptor.
> > + */
> > + if (_PMD_HUGE(pmd))
> > + return phys_to_page(pmd_val(pmd) & HPAGE_MASK);
> > +
> > + return phys_to_page(pmd_val(pmd) & PHYS_MASK);
> > +}
> > +
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* _ASM_PGTABLE_2LEVEL_H */
> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> > index 31c071f..8360814 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -197,6 +197,8 @@ PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
> > #define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
> > #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
> >
> > +#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> > +
> > static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> > {
> > const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> > index 767aa7c..2d96381 100644
> > --- a/arch/arm/include/asm/pgtable.h
> > +++ b/arch/arm/include/asm/pgtable.h
> > @@ -169,11 +169,14 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> >
> > static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> > {
> > +#ifdef SYS_SUPPORTS_HUGETLBFS
> > + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > + return __va(pmd_val(pmd) & HPAGE_MASK);
> > +#endif
> > +
> > return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
> > }
> >
> > -#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> > -
> > #ifndef CONFIG_HIGHPTE
> > #define __pte_map(pmd) pmd_page_vaddr(*(pmd))
> > #define __pte_unmap(pte) do { } while (0)
> > --
> > 1.7.9.5
> >
> The whole series looks functionally correct to me:
>
> Reviewed-by: Christoffer Dall <c.dall@virtualopensystems.com>
>
A big thank you for going through this Christoffer.
I'm correcting/simplifying/testing the huge pages code and will send out another
version soon.
Cheers,
--
Steve
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems.
2013-01-08 17:57 ` Steve Capper
@ 2013-01-08 18:10 ` Christoffer Dall
0 siblings, 0 replies; 25+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:10 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jan 8, 2013 at 12:57 PM, Steve Capper <steve.capper@arm.com> wrote:
> On Fri, Jan 04, 2013 at 05:03:59AM +0000, Christoffer Dall wrote:
>> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
>
>> > +++ b/arch/arm/include/asm/hugetlb.h
>> > @@ -0,0 +1,83 @@
>> > +/*
>> > + * arch/arm/include/asm/hugetlb.h
>> > + *
>> > + * Copyright (C) 2012 ARM Ltd.
>> > + *
>> > + * Based on arch/x86/include/asm/hugetlb.h
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License
>> > + * along with this program; if not, write to the Free Software
>> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> > + */
>> > +
>> > +#ifndef _ASM_ARM_HUGETLB_H
>> > +#define _ASM_ARM_HUGETLB_H
>> > +
>> > +#include <asm/page.h>
>> > +
>> > +#include <asm/hugetlb-3level.h>
>>
>> I feel like it wouldn't hurt anyone to put a comment here explaining
>> that these "ptes" are in fact pmd section descriptors disguised in pte
>> types.
>>
>
> Yes, that does make sense. I'll put something in to that effect.
>
>> > +
>> > +static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
>> > + unsigned long addr, unsigned long end,
>> > + unsigned long floor,
>> > + unsigned long ceiling)
>> > +{
>> > + free_pgd_range(tlb, addr, end, floor, ceiling);
>> > +}
>> > +
>> > +
>> > +static inline int is_hugepage_only_range(struct mm_struct *mm,
>> > + unsigned long addr, unsigned long len)
>> > +{
>> > + return 0;
>> > +}
>> > +
>> > +static inline int prepare_hugepage_range(struct file *file,
>> > + unsigned long addr, unsigned long len)
>> > +{
>> > + struct hstate *h = hstate_file(file);
>> > + if (len & ~huge_page_mask(h))
>> > + return -EINVAL;
>> > + if (addr & ~huge_page_mask(h))
>> > + return -EINVAL;
>> > + return 0;
>> > +}
>> > +
>> > +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm)
>> > +{
>> > +}
>> > +
>> > +static inline int huge_pte_none(pte_t pte)
>> > +{
>> > + return pte_none(pte);
>> > +}
>> > +
>> > +static inline pte_t huge_pte_wrprotect(pte_t pte)
>> > +{
>> > + return pte_wrprotect(pte);
>> > +}
>> > +
>> > +static inline int arch_prepare_hugepage(struct page *page)
>> > +{
>> > + return 0;
>> > +}
>> > +
>> > +static inline void arch_release_hugepage(struct page *page)
>> > +{
>> > +}
>> > +
>> > +static inline void arch_clear_hugepage_flags(struct page *page)
>> > +{
>> > + clear_bit(PG_dcache_clean, &page->flags);
>>
>> why do we clear this bit here?
>>
>
> This is called when the huge page is freed, and it indicates that the
> dcache needs to be flushed for this page. The mechanism was added by commit:
> 5d3a551c28c6669dc43be40d8fafafbc2ec8f42b.
> "mm: hugetlb: add arch hook for clearing page flags before entering pool"
>
>> > diff --git a/arch/arm/mm/hugetlbpage-3level.c b/arch/arm/mm/hugetlbpage-3level.c
>> > new file mode 100644
>> > index 0000000..86474f0
>> > --- /dev/null
>> > +++ b/arch/arm/mm/hugetlbpage-3level.c
>> > @@ -0,0 +1,190 @@
>> > +/*
>> > + * arch/arm/mm/hugetlbpage-3level.c
>> > + *
>> > + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
>> > + * Copyright (C) 2012 ARM Ltd.
>> > + *
>> > + * Based on arch/x86/mm/hugetlbpage.c
>> > + *
>>
>> this seems to be an almost 1-to-1 copy of the x86 code. Is it not
>> worth sharing it somehow? Possible?
>>
>
> Yeah, good point, I have a cunning plan though; please see below.
>
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License
>> > + * along with this program; if not, write to the Free Software
>> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> > + */
>> > +
>> > +#include <linux/init.h>
>> > +#include <linux/fs.h>
>> > +#include <linux/mm.h>
>> > +#include <linux/hugetlb.h>
>> > +#include <linux/pagemap.h>
>> > +#include <linux/err.h>
>> > +#include <linux/sysctl.h>
>> > +#include <asm/mman.h>
>> > +#include <asm/tlb.h>
>> > +#include <asm/tlbflush.h>
>> > +#include <asm/pgalloc.h>
>> > +
>> > +static unsigned long page_table_shareable(struct vm_area_struct *svma,
>> > + struct vm_area_struct *vma,
>> > + unsigned long addr, pgoff_t idx)
>> > +{
>> > + unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) +
>> > + svma->vm_start;
>> > + unsigned long sbase = saddr & PUD_MASK;
>> > + unsigned long s_end = sbase + PUD_SIZE;
>>
>> these are to check that the potential vma to steal the pmd from covers
>> the entire pud entry's address space, correct?
>
> Yes that's correct.
>
>>
>> it's pretty confusing with the idx conversion back and forward,
>> especially given that mm/hugetlb.c uses idx to index into number of
>> huge pages, where this idx is index into number of regular pages, so I
>> would suggest some clear static conversion functions or a comment.
>>
>> > +
>> > + /* Allow segments to share if only one is marked locked */
>>
>> exactly one or at most one? the code below checks that exactly one is
>> marked locked, if I read it correctly. Again, for me, the comment
>> would be more helpful if it stated *why* that's a requirement, not
>> just that it *is* a requirement.
>>
>
> This originates from commit:
> 32b154c0b0bae2879bf4e549d861caf1759a3546
> "x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not"
> and the commit title is clearer.
>
>> > + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
>> > + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
>> > +
>> > + /*
>> > + * match the virtual addresses, permission and the alignment of the
>> > + * page table page.
>> > + */
>> > + if (pmd_index(addr) != pmd_index(saddr) ||
>> > + vm_flags != svm_flags ||
>> > + sbase < svma->vm_start || svma->vm_end < s_end)
>> > + return 0;
>> > +
>> > + return saddr;
>> > +}
>> > +
>> > +static int vma_shareable(struct vm_area_struct *vma, unsigned long addr)
>> > +{
>> > + unsigned long base = addr & PUD_MASK;
>> > + unsigned long end = base + PUD_SIZE;
>> > +
>> > + /*
>> > + * check on proper vm_flags and page table alignment
>> > + */
>> > + if (vma->vm_flags & VM_MAYSHARE &&
>> > + vma->vm_start <= base && end <= vma->vm_end)
>> > + return 1;
>> > + return 0;
>> > +}
>> > +
>> > +/*
>> > + * search for a shareable pmd page for hugetlb.
>>
>> nit:
>>
>> perhaps this is completely standard knowledge for your garden variety
>> mm hacker, but I needed to spend 5 minutes figuring out the purpose
>> here - if I get it right: multiple mappings the hugetlbfs file for the
>> same mm covering the same pud address range mapping the same data can
>> use the same pmd, right?
>>
>
> Yes, that's correct. Multiple puds can point to the same block of 512 pmds.
> Which can then lead to cache use reduction when page tables are walked.
>
> This is introduced in commit: 39dde65c9940c97fcd178a3d2b1c57ed8b7b68aa
> "[PATCH] shared page table for hugetlb page"
>
>> I would either rename the function to find_huge_pmd_share and get rid
>> of the comment or expand on the comment.
>>
>> > + */
>> > +static pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr,
>> > + pud_t *pud)
>> > +{
>> > + struct vm_area_struct *vma = find_vma(mm, addr);
>> > + struct address_space *mapping = vma->vm_file->f_mapping;
>> > + pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
>> > + vma->vm_pgoff;
>> > + struct vm_area_struct *svma;
>> > + unsigned long saddr;
>> > + pte_t *spte = NULL;
>> > + pte_t *pte;
>> > +
>> > + if (!vma_shareable(vma, addr))
>> > + return (pte_t *)pmd_alloc(mm, pud, addr);
>> > +
>> > + mutex_lock(&mapping->i_mmap_mutex);
>> > + vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
>> > + if (svma == vma)
>> > + continue;
>> > +
>> > + saddr = page_table_shareable(svma, vma, addr, idx);
>> > + if (saddr) {
>> > + spte = huge_pte_offset(svma->vm_mm, saddr);
>> > + if (spte) {
>> > + get_page(virt_to_page(spte));
>> > + break;
>> > + }
>> > + }
>> > + }
>> > +
>> > + if (!spte)
>> > + goto out;
>> > +
>> > + spin_lock(&mm->page_table_lock);
>> > + if (pud_none(*pud))
>> > + pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK));
>> > + else
>> > + put_page(virt_to_page(spte));
>> > + spin_unlock(&mm->page_table_lock);
>> > +out:
>> > + pte = (pte_t *)pmd_alloc(mm, pud, addr);
>> > + mutex_unlock(&mapping->i_mmap_mutex);
>> > + return pte;
>> > +}
>> > +
>> > +/*
>> > + * unmap huge page backed by shared pte.
>> > + *
>> > + * Hugetlb pte page is ref counted at the time of mapping. If pte is shared
>> > + * indicated by page_count > 1, unmap is achieved by clearing pud and
>> > + * decrementing the ref count. If count == 1, the pte page is not shared.
>> > + *
>> > + * called with vma->vm_mm->page_table_lock held.
>> > + *
>> > + * returns: 1 successfully unmapped a shared pte page
>> > + * 0 the underlying pte page is not shared, or it is the last user
>> > + */
>> > +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
>> > +{
>> > + pgd_t *pgd = pgd_offset(mm, *addr);
>> > + pud_t *pud = pud_offset(pgd, *addr);
>> > +
>> > + BUG_ON(page_count(virt_to_page(ptep)) == 0);
>> > + if (page_count(virt_to_page(ptep)) == 1)
>> > + return 0;
>> > +
>> > + pud_clear(pud);
>> > + put_page(virt_to_page(ptep));
>> > + *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
>>
>> huh? this hurts my brain. Why the minus HPAGE_SIZE?
>>
>
> This is called in two places: hugetlb_change_protection and
> __unmap_hugepage_range. In both cases it is called as part of a loop where
> *addr is incremented by the huge page size when the loop advances. If
> huge_pmd_unshare returns 1, then the loop "continue"s. The -HPAGE_SIZE cancels
> out the *addr increment at the end of the loop.
>
yuck! That logic should really be in the loop if at all posssible -
this current design makes it hard to read both the caller and the
callee.
>> > + return 1;
>> > +}
>> > +
>> > +pte_t *huge_pte_alloc(struct mm_struct *mm,
>> > + unsigned long addr, unsigned long sz)
>> > +{
>> > + pgd_t *pgd;
>> > + pud_t *pud;
>> > + pte_t *pte = NULL;
>> > +
>> > + pgd = pgd_offset(mm, addr);
>> > + pud = pud_alloc(mm, pgd, addr);
>> > + if (pud) {
>> > + BUG_ON(sz != PMD_SIZE);
>>
>> is this really necessary?
>>
>> VM_BUG_ON?
>>
>
> Thanks, I'll clean this up.
>
> So, on to my cunning plan :-)....
>
> Essentially the huge pmd sharing takes place under the following circumstances:
> 1) At least 1GB of huge memory must be requested in a vm_area block.
> 2) This must be VM_SHAREABLE; so mmap using MAP_SHARED on hugetlbfs or shmget.
> 3) The mapping must be at a 1GB memory boundary (to kick off a new pud).
> 4) Another process must request this same backing file; but again, this must be
> mapped on a 1GB boundary in the other process too.
>
> I've only been able to get this working with a 3GB user VM split, and my test
> case needed to be statically compiled as .so's were mmaped right where I wanted
> to mmap the 1GB huge memory block (at 0x40000000).
>
> Having thought more about this; I don't think the above conditions are likely to
> crop up with 2GB or 3GB user VM area in the wild. Thus I am tempted to remove
> the huge pmd sharing from the LPAE hugetlb code and simplify things a bit more.
>
> If I've missed a use case where huge pmd sharing may be useful, please give me
> a shout?
>
I think your reasoning sounds completely sane.
>> > + if (pud_none(*pud))
>> > + pte = huge_pmd_share(mm, addr, pud);
>> > + else
>> > + pte = (pte_t *)pmd_alloc(mm, pud, addr);
>> > + }
>> > + BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>> > +
>> > + return pte;
>> > +}
>> > +
>> > +struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>> > + pmd_t *pmd, int write)
>> > +{
>> > + struct page *page;
>> > +
>> > + page = pte_page(*(pte_t *)pmd);
>> > + if (page)
>> > + page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
>> > + return page;
>> > +}
>> > +
>> > +struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>> > + pud_t *pud, int write)
>> > +{
>> > + struct page *page;
>> > +
>> > + page = pte_page(*(pte_t *)pud);
>> > + if (page)
>> > + page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
>> > + return page;
>>
>> why implement this? this should never be called right? Shouldn't it
>> just be a BUG();
>>
>
> Yes, thanks, it should be BUG().
>
>> > +}
>> > diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
>> > new file mode 100644
>> > index 0000000..32fe7fd
>> > --- /dev/null
>> > +++ b/arch/arm/mm/hugetlbpage.c
>> > @@ -0,0 +1,65 @@
>> > +/*
>> > + * arch/arm/mm/hugetlbpage.c
>> > + *
>> > + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
>> > + * Copyright (C) 2012 ARM Ltd.
>> > + *
>> > + * Based on arch/x86/mm/hugetlbpage.c
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License
>> > + * along with this program; if not, write to the Free Software
>> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> > + */
>> > +
>> > +#include <linux/init.h>
>> > +#include <linux/fs.h>
>> > +#include <linux/mm.h>
>> > +#include <linux/hugetlb.h>
>> > +#include <linux/pagemap.h>
>> > +#include <linux/err.h>
>> > +#include <linux/sysctl.h>
>> > +#include <asm/mman.h>
>> > +#include <asm/tlb.h>
>> > +#include <asm/tlbflush.h>
>> > +#include <asm/pgalloc.h>
>> > +
>> > +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>> > +{
>> > + pgd_t *pgd;
>> > + pud_t *pud;
>> > + pmd_t *pmd = NULL;
>> > +
>> > + pgd = pgd_offset(mm, addr);
>> > + if (pgd_present(*pgd)) {
>> > + pud = pud_offset(pgd, addr);
>> > + if (pud_present(*pud))
>> > + pmd = pmd_offset(pud, addr);
>> > + }
>> > +
>> > + return (pte_t *)pmd;
>> > +}
>> > +
>> > +struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
>> > + int write)
>> > +{
>> > + return ERR_PTR(-EINVAL);
>> > +}
>> > +
>> > +int pmd_huge(pmd_t pmd)
>> > +{
>> > + return (pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT;
>> > +}
>> > +
>> > +int pud_huge(pud_t pud)
>> > +{
>> > + return 0;
>> > +}
>> > --
>> > 1.7.9.5
>> >
>> >
>
>>
>> -Christoffer
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems.
2013-01-08 17:58 ` Steve Capper
@ 2013-01-08 18:13 ` Christoffer Dall
0 siblings, 0 replies; 25+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:13 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jan 8, 2013 at 12:58 PM, Steve Capper <steve.capper@arm.com> wrote:
> On Fri, Jan 04, 2013 at 05:04:43AM +0000, Christoffer Dall wrote:
>> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
>
>> > diff --git a/arch/arm/include/asm/hugetlb-2level.h b/arch/arm/include/asm/hugetlb-2level.h
>> > new file mode 100644
>> > index 0000000..3532b54
>> > --- /dev/null
>> > +++ b/arch/arm/include/asm/hugetlb-2level.h
>> > @@ -0,0 +1,71 @@
>> > +/*
>> > + * arch/arm/include/asm/hugetlb-2level.h
>> > + *
>> > + * Copyright (C) 2012 ARM Ltd.
>> > + *
>> > + * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License
>> > + * along with this program; if not, write to the Free Software
>> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> > + */
>> > +
>> > +#ifndef _ASM_ARM_HUGETLB_2LEVEL_H
>> > +#define _ASM_ARM_HUGETLB_2LEVEL_H
>> > +
>> > +
>> > +pte_t huge_ptep_get(pte_t *ptep);
>> > +
>> > +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>> > + pte_t *ptep, pte_t pte);
>> > +
>> > +static inline pte_t pte_mkhuge(pte_t pte) { return pte; }
>> > +
>> > +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>> > + unsigned long addr, pte_t *ptep)
>> > +{
>> > + flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
>>
>> don't you need to clear the old TLB entry first here, otherwise
>> another CPU could put an entry to the old page in its TLB and access
>> it even after the page_cache_release(old_page) in hugetlb_cow() ?
>>
>
> Yes I do, thanks.
>
>> > +}
>> > +
>> > +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> > + unsigned long addr, pte_t *ptep)
>> > +{
>> > + pmd_t *pmdp = (pmd_t *) ptep;
>> > + set_pmd_at(mm, addr, pmdp, pmd_wrprotect(*pmdp));
>> > +}
>> > +
>> > +
>> > +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> > + unsigned long addr, pte_t *ptep)
>> > +{
>> > + pmd_t *pmdp = (pmd_t *)ptep;
>> > + pte_t pte = huge_ptep_get(ptep);
>> > + pmd_clear(pmdp);
>> > +
>> > + return pte;
>> > +}
>> > +
>> > +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> > + unsigned long addr, pte_t *ptep,
>> > + pte_t pte, int dirty)
>> > +{
>> > + int changed = !pte_same(huge_ptep_get(ptep), pte);
>> > +
>> > + if (changed) {
>> > + set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
>> > + huge_ptep_clear_flush(vma, addr, &pte);
>> > + }
>> > +
>> > + return changed;
>> > +}
>> > +
>> > +#endif /* _ASM_ARM_HUGETLB_2LEVEL_H */
>> > diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
>> > index 7af9cf6..1e92975 100644
>> > --- a/arch/arm/include/asm/hugetlb.h
>> > +++ b/arch/arm/include/asm/hugetlb.h
>> > @@ -24,7 +24,11 @@
>> >
>> > #include <asm/page.h>
>> >
>> > +#ifdef CONFIG_ARM_LPAE
>> > #include <asm/hugetlb-3level.h>
>> > +#else
>> > +#include <asm/hugetlb-2level.h>
>> > +#endif
>> >
>> > static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
>> > unsigned long addr, unsigned long end,
>> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
>> > index 662a00e..fd1d9be 100644
>> > --- a/arch/arm/include/asm/pgtable-2level.h
>> > +++ b/arch/arm/include/asm/pgtable-2level.h
>> > @@ -163,7 +163,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>> > return (pmd_t *)pud;
>> > }
>> >
>> > -#define pmd_bad(pmd) (pmd_val(pmd) & 2)
>> > +#define pmd_bad(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_FAULT)
>>
>> this changes the semantics of the macro - is that on purpose and safe?
>>
>> (fault entries didn't used to be bad, now they are...)
>>
>
> Yes, thanks, the semantics should be retained (they are for LPAE).
>
>> >
>> > #define copy_pmd(pmdpd,pmdps) \
>> > do { \
>> > @@ -184,6 +184,83 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>> >
>> > #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
>> >
>> > +
>> > +#ifdef CONFIG_SYS_SUPPORTS_HUGETLBFS
>> > +
>> > +/*
>> > + * now follows some of the definitions to allow huge page support, we can't put
>> > + * these in the hugetlb source files as they are also required for transparent
>> > + * hugepage support.
>> > + */
>> > +
>> > +#define HPAGE_SHIFT PMD_SHIFT
>> > +#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
>> > +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
>> > +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
>> > +
>> > +#define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
>> > +#define HUGE_LINUX_PTE_SIZE (HUGE_LINUX_PTE_COUNT * sizeof(pte_t *))
>> > +#define HUGE_LINUX_PTE_INDEX(addr) (addr >> HPAGE_SHIFT)
>> > +
>> > +/*
>> > + * We re-purpose the following domain bits in the section descriptor
>> > + */
>> > +#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
>> > +#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
>> > +
>> > +#define PMD_BIT_FUNC(fn,op) \
>> > +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
>> > +
>> > +PMD_BIT_FUNC(wrprotect, &= ~PMD_SECT_AP_WRITE);
>> > +
>> > +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> > + pmd_t *pmdp, pmd_t pmd)
>> > +{
>> > + /*
>> > + * we can sometimes be passed a pmd pointing to a level 2 descriptor
>> > + * from collapse_huge_page.
>> > + */
>> > + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_TABLE) {
>> > + pmdp[0] = __pmd(pmd_val(pmd));
>> > + pmdp[1] = __pmd(pmd_val(pmd) + 256 * sizeof(pte_t));
>>
>> eh, if I get this right, this means that in the case where the pmd
>> points to level 2 descriptor, all the pages are lined up to be a huge
>> page, so just point to the next level 2 pte, which directly follows
>> the next level 2 descriptor, because they share the same page. But
>> then why do we need to set any values here?
>>
>
> This is a little weird.
>
> The transparent huge page code will try sometimes to collapse a group of pages
> into a huge page. As part of the collapse process, it will invalidate the pmd
> before it copies the physical pages into a contiguous huge page. This ensures
> that memory accesses to the area being collapsed fault loop whilst the collapse
> takes place. Sometimes the collapse process will be aborted after the pmd has
> been invalidated, so the original pmd (which points to a page table) needs to
> be put back as part of the rollback.
>
> With 2 levels of paging, the pmds are arranged in pairs so we put back a pair
> of pmds.
>
tricky! I got it, thanks.
>> > + } else {
>> > + pmdp[0] = __pmd(pmd_val(pmd)); /* first 1M section */
>> > + pmdp[1] = __pmd(pmd_val(pmd) + SECTION_SIZE); /* second 1M section */
>> > + }
>> > +
>> > + flush_pmd_entry(pmdp);
>> > +}
>> > +
>> > +#define HPMD_XLATE(res, cmp, from, to) do { if (cmp & from) res |= to; \
>> > + else res &= ~to; \
>> > + } while (0)
>> > +
>> > +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>> > +{
>> > + pmdval_t pmdval = pmd_val(pmd);
>> > + pteval_t newprotval = pgprot_val(newprot);
>> > +
>> > + HPMD_XLATE(pmdval, newprotval, L_PTE_XN, PMD_SECT_XN);
>> > + HPMD_XLATE(pmdval, newprotval, L_PTE_SHARED, PMD_SECT_S);
>> > + HPMD_XLATE(pmdval, newprotval, L_PTE_YOUNG, PMD_DSECT_AF);
>>
>> consider something akin to:
>>
>> #define L_PMD_DSECT_YOUNG (PMD_DSECT_AF)
>>
>> then you don't have to change several places if you decide to
>> rearrange the mappings for whatever reason at it makes it slightly
>> easier to read this code.
>>
>
> Yeah, something along those lines may look better. I'll have a tinker.
>
>> > + HPMD_XLATE(pmdval, newprotval, L_PTE_DIRTY, PMD_DSECT_DIRTY);
>> > +
>> > + /* preserve bits C & B */
>> > + pmdval |= (newprotval & (3 << 2));
>>
>> this looks superfluous?
>>
>> > +
>> > + /* Linux PTE bit 4 corresponds to PMD TEX bit 0 */
>> > + HPMD_XLATE(pmdval, newprotval, 1 << 4, PMD_SECT_TEX(1));
>>
>> define L_PTE_TEX0 and group with the others above?
>>
>
> The mapping is not quite that simple. We have multiple memory types defined
> in pgtable-{23}level.h and these have different meanings depending on the
> target processor. For v6 and v7 the above works, but ideally, I should be able
> to look up the memory type mapping. For instance in arch/arm/mm/mmu.c, we can
> see cache policies that contain linux pte information and hardware pmd
> information. I'll ponder this some more, if anyone has a neat way of handling
> this then please let me know :-).
>
>> > +
>> > + if (newprotval & L_PTE_RDONLY)
>> > + pmdval &= ~PMD_SECT_AP_WRITE;
>> > + else
>> > + pmdval |= PMD_SECT_AP_WRITE;
>> > +
>> > + return __pmd(pmdval);
>> > +}
>> > +
>> > +#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
>> > +
>> > #endif /* __ASSEMBLY__ */
>> >
>> > #endif /* _ASM_PGTABLE_2LEVEL_H */
>> > diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
>> > index 99a1951..685e9e87 100644
>> > --- a/arch/arm/include/asm/tlb.h
>> > +++ b/arch/arm/include/asm/tlb.h
>> > @@ -92,10 +92,16 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>> > static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
>> > {
>> > if (!tlb->fullmm) {
>> > + unsigned long size = PAGE_SIZE;
>> > +
>> > if (addr < tlb->range_start)
>> > tlb->range_start = addr;
>> > - if (addr + PAGE_SIZE > tlb->range_end)
>> > - tlb->range_end = addr + PAGE_SIZE;
>> > +
>> > + if (tlb->vma && is_vm_hugetlb_page(tlb->vma))
>> > + size = HPAGE_SIZE;
>> > +
>> > + if (addr + size > tlb->range_end)
>> > + tlb->range_end = addr + size;
>> > }
>> > }
>> >
>> > diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
>> > index 4eee351..860f08e 100644
>> > --- a/arch/arm/kernel/head.S
>> > +++ b/arch/arm/kernel/head.S
>> > @@ -410,13 +410,21 @@ __enable_mmu:
>> > mov r5, #0
>> > mcrr p15, 0, r4, r5, c2 @ load TTBR0
>> > #else
>> > +#ifndef CONFIG_SYS_SUPPORTS_HUGETLBFS
>> > mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
>> > domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
>> > domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
>> > domain_val(DOMAIN_IO, DOMAIN_CLIENT))
>> > +#else
>> > + @ set ourselves as the client in all domains
>> > + @ this allows us to then use the 4 domain bits in the
>> > + @ section descriptors in our transparent huge pages
>> > + ldr r5, =0x55555555
>> > +#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
>> > +
>> > mcr p15, 0, r5, c3, c0, 0 @ load domain access register
>> > mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
>> > -#endif
>> > +#endif /* CONFIG_ARM_LPAE */
>> > b __turn_mmu_on
>> > ENDPROC(__enable_mmu)
>> >
>> > diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>> > index 1560bbc..adf0b19 100644
>> > --- a/arch/arm/mm/Makefile
>> > +++ b/arch/arm/mm/Makefile
>> > @@ -17,7 +17,11 @@ obj-$(CONFIG_MODULES) += proc-syms.o
>> > obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
>> > obj-$(CONFIG_HIGHMEM) += highmem.o
>> > obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
>> > +ifeq ($(CONFIG_ARM_LPAE),y)
>> > obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-3level.o
>> > +else
>> > +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage-2level.o
>> > +endif
>> >
>> > obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
>> > obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o
>> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
>> > index 5dbf13f..0884936 100644
>> > --- a/arch/arm/mm/fault.c
>> > +++ b/arch/arm/mm/fault.c
>> > @@ -488,13 +488,13 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
>> > #endif /* CONFIG_MMU */
>> >
>> > /*
>> > - * Some section permission faults need to be handled gracefully.
>> > - * They can happen due to a __{get,put}_user during an oops.
>> > + * A fault in a section will likely be due to a huge page, treat it
>> > + * as a page fault.
>> > */
>> > static int
>> > do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>> > {
>> > - do_bad_area(addr, fsr, regs);
>> > + do_page_fault(addr, fsr, regs);
>>
>> doesn't the previous patch require this as well?
>>
>> (so it should strictly speaking be part of that patch)
>>
>
> Yes it does. Thanks I'll clean this up by updating the fsr_info tables for long
> and short descriptors; and remove the do_sect_fault->do_page_fault daisy chaining.
>
>> > return 0;
>> > }
>> >
>> > diff --git a/arch/arm/mm/hugetlbpage-2level.c b/arch/arm/mm/hugetlbpage-2level.c
>> > new file mode 100644
>> > index 0000000..4b2b38c
>> > --- /dev/null
>> > +++ b/arch/arm/mm/hugetlbpage-2level.c
>> > @@ -0,0 +1,115 @@
>> > +/*
>> > + * arch/arm/mm/hugetlbpage-2level.c
>> > + *
>> > + * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
>> > + * Copyright (C) 2012 ARM Ltd
>> > + * Copyright (C) 2012 Bill Carson.
>> > + *
>> > + * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License
>> > + * along with this program; if not, write to the Free Software
>> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> > + */
>> > +
>> > +#include <linux/init.h>
>> > +#include <linux/fs.h>
>> > +#include <linux/mm.h>
>> > +#include <linux/hugetlb.h>
>> > +#include <linux/pagemap.h>
>> > +#include <linux/err.h>
>> > +#include <linux/sysctl.h>
>> > +#include <asm/mman.h>
>> > +#include <asm/tlb.h>
>> > +#include <asm/tlbflush.h>
>> > +#include <asm/pgalloc.h>
>> > +
>> > +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
>> > +{
>> > + return 0;
>> > +}
>> > +
>> > +pte_t *huge_pte_alloc(struct mm_struct *mm,
>> > + unsigned long addr, unsigned long sz)
>> > +{
>> > + pgd_t *pgd;
>> > + pud_t *pud;
>> > + pmd_t *pmd;
>> > +
>> > + pgd = pgd_offset(mm, addr);
>> > + pud = pud_offset(pgd, addr);
>> > + pmd = pmd_offset(pud, addr);
>> > +
>> > + return (pte_t *)pmd; /* our huge pte is actually a pmd */
>> > +}
>> > +
>> > +struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>> > + pmd_t *pmd, int write)
>> > +{
>> > + struct page *page;
>> > + unsigned long pfn;
>> > +
>> > + BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
>>
>> I could only see one caller who calls this only when this exact
>> condition is fulfilled, so unless we anticipate other callers, this
>> BUG_ON could go.
>>
>
> Yes thanks, this can be scrubbed.
>
>> > + pfn = ((pmd_val(*pmd) & HPAGE_MASK) >> PAGE_SHIFT);
>> > + page = pfn_to_page(pfn);
>> > + return page;
>> > +}
>> > +
>> > +pte_t huge_ptep_get(pte_t *ptep)
>> > +{
>> > + pmd_t *pmdp = (pmd_t*)ptep;
>> > + pmdval_t pmdval = pmd_val(*pmdp);
>> > + pteval_t retval;
>> > +
>> > + if (!pmdval)
>> > + return __pte(0);
>> > +
>> > + retval = (pteval_t) (pmdval & HPAGE_MASK);
>> > + HPMD_XLATE(retval, pmdval, PMD_SECT_XN, L_PTE_XN);
>> > + HPMD_XLATE(retval, pmdval, PMD_SECT_S, L_PTE_SHARED);
>> > + HPMD_XLATE(retval, pmdval, PMD_DSECT_AF, L_PTE_YOUNG);
>> > + HPMD_XLATE(retval, pmdval, PMD_DSECT_DIRTY, L_PTE_DIRTY);
>> > +
>> > + /* preserve bits C & B */
>> > + retval |= (pmdval & (3 << 2));
>> > +
>> > + /* PMD TEX bit 0 corresponds to Linux PTE bit 4 */
>> > + HPMD_XLATE(retval, pmdval, PMD_SECT_TEX(1), 1 << 4);
>> > +
>>
>> again, I would define the 1 << 4 to something and treat like the others...
>>
>> > + if (pmdval & PMD_SECT_AP_WRITE)
>> > + retval &= ~L_PTE_RDONLY;
>> > + else
>> > + retval |= L_PTE_RDONLY;
>> > +
>> > + if ((pmdval & PMD_TYPE_MASK) == PMD_TYPE_SECT)
>> > + retval |= L_PTE_VALID;
>> > +
>> > + /* we assume all hugetlb pages are user */
>> > + retval |= L_PTE_USER;
>> > +
>> > + return __pte(retval);
>> > +}
>> > +
>> > +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>> > + pte_t *ptep, pte_t pte)
>> > +{
>> > + pmdval_t pmdval = (pmdval_t) pte_val(pte);
>> > + pmd_t *pmdp = (pmd_t*) ptep;
>> > +
>> > + pmdval &= HPAGE_MASK;
>> > + pmdval |= PMD_SECT_AP_READ | PMD_SECT_nG | PMD_TYPE_SECT;
>> > + pmdval = pmd_val(pmd_modify(__pmd(pmdval), __pgprot(pte_val(pte))));
>> > +
>> > + __sync_icache_dcache(pte);
>> > +
>> > + set_pmd_at(mm, addr, pmdp, __pmd(pmdval));
>> > +}
>>
>> so this whole scheme where the caller expects ptes, but really gets
>> pmds feels strange to me, but perhaps it makes more sense on other
>> architectures as to not change the caller instead of this magic?
>>
>
> It is a little strange, but expected. We are considering one level up from
> normal page table entries. The short descriptor case is made stranger by the
> linux/hardware pte distinction. I wanted to re-purpose the domain bits and use
> translation as this allows for a much simpler transparent huge page
> implemention.
>
> I'll see if I can simplify some bits of the short descriptor hugetlb code.
>
>> -Christoffer
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems.
2013-01-08 17:59 ` Steve Capper
@ 2013-01-08 18:15 ` Christoffer Dall
0 siblings, 0 replies; 25+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:15 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jan 8, 2013 at 12:59 PM, Steve Capper <steve.capper@arm.com> wrote:
> On Fri, Jan 04, 2013 at 05:04:50AM +0000, Christoffer Dall wrote:
>> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
>
>> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> > index d086f61..31c071f 100644
>> > --- a/arch/arm/include/asm/pgtable-3level.h
>> > +++ b/arch/arm/include/asm/pgtable-3level.h
>> > @@ -85,6 +85,9 @@
>> > #define L_PTE_DIRTY (_AT(pteval_t, 1) << 55) /* unused */
>> > #define L_PTE_SPECIAL (_AT(pteval_t, 1) << 56) /* unused */
>> >
>> > +#define PMD_SECT_DIRTY (_AT(pmdval_t, 1) << 55)
>> > +#define PMD_SECT_SPLITTING (_AT(pmdval_t, 1) << 57)
>> > +
>> > /*
>> > * To be used in assembly code with the upper page attributes.
>> > */
>> > @@ -166,6 +169,60 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>> > #define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>> >
>> >
>> > +#define pmd_present(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
>> > +#define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF)
>> > +
>> > +#define __HAVE_ARCH_PMD_WRITE
>> > +#define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY))
>> > +
>> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> > +#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
>> > +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
>> > +#endif
>> > +
>> > +#define PMD_BIT_FUNC(fn,op) \
>> > +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
>> > +
>> > +PMD_BIT_FUNC(wrprotect, |= PMD_SECT_RDONLY);
>> > +PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
>> > +PMD_BIT_FUNC(mksplitting, |= PMD_SECT_SPLITTING);
>> > +PMD_BIT_FUNC(mkwrite, &= ~PMD_SECT_RDONLY);
>> > +PMD_BIT_FUNC(mkdirty, |= PMD_SECT_DIRTY);
>> > +PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);
>> > +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
>>
>> personally I would prefer not to automate the prefixing of pmd_: it
>> doesn't really save a lot of characters, it doesn't improve
>> readability and it breaks grep/cscope.
>>
>
> This follows the pte bit functions to a degree.
>
which is not really an argument to repeat a potentially problematic
approach, but whatever.
>> > +
>> > +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>> > +
>> > +#define pmd_pfn(pmd) ((pmd_val(pmd) & PHYS_MASK) >> PAGE_SHIFT)
>>
>> the arm arm says UNK/SBZP, so we should be fine here right? (noone is
>> crazy enough to try and squeeze some extra information in the extra
>> bits here or something like that). For clarity, one could consider:
>>
>> (((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
>>
>
> Thanks, yes, it's better to PMD_MASK the value too.
>
>> > +#define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
>> > +#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
>> > +
>> > +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>> > +{
>> > + const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
>> > + pmd_val(pmd) = (pmd_val(pmd) & ~mask) | (pgprot_val(newprot) & mask);
>> > + return pmd;
>> > +}
>> > +
>> > +static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>> > +{
>> > + *pmdp = pmd;
>> > +}
>>
>> why this level of indirection?
>>
>
> Over manipulation in git :-), this can be scrubbed.
>
>> > +
>> > +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> > + pmd_t *pmdp, pmd_t pmd)
>> > +{
>> > + BUG_ON(addr >= TASK_SIZE);
>> > + pmd = __pmd(pmd_val(pmd) | PMD_SECT_nG);
>>
>> why this side affect?
>>
>
> This replicates the side effect found when placing ptes into page tables. We
> need the NG bit for user pages.
>
yeah, I got bit by this side effect for over a month tracking down a
horrible bug, so it hurts me and I really don't like it, but that's
the current design, so it's for another day to clean up, if ever. Just
couldn't stay silent :)
>> > + set_pmd(pmdp, pmd);
>> > + flush_pmd_entry(pmdp);
>> > +}
>> > +
>> > +static inline int has_transparent_hugepage(void)
>> > +{
>> > + return 1;
>> > +}
>> > +
>> > #endif /* __ASSEMBLY__ */
>> >
>> > #endif /* _ASM_PGTABLE_3LEVEL_H */
>> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> > index c35bf46..767aa7c 100644
>> > --- a/arch/arm/include/asm/pgtable.h
>> > +++ b/arch/arm/include/asm/pgtable.h
>> > @@ -24,6 +24,9 @@
>> > #include <asm/memory.h>
>> > #include <asm/pgtable-hwdef.h>
>> >
>> > +
>> > +#include <asm/tlbflush.h>
>> > +
>> > #ifdef CONFIG_ARM_LPAE
>> > #include <asm/pgtable-3level.h>
>> > #else
>> > @@ -163,7 +166,6 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>> > #define pgd_offset_k(addr) pgd_offset(&init_mm, addr)
>> >
>> > #define pmd_none(pmd) (!pmd_val(pmd))
>> > -#define pmd_present(pmd) (pmd_val(pmd))
>> >
>> > static inline pte_t *pmd_page_vaddr(pmd_t pmd)
>> > {
>> > diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
>> > index 685e9e87..0fc2d9d 100644
>> > --- a/arch/arm/include/asm/tlb.h
>> > +++ b/arch/arm/include/asm/tlb.h
>> > @@ -229,6 +229,12 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
>> > #endif
>> > }
>> >
>> > +static inline void
>> > +tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>> > +{
>> > + tlb_add_flush(tlb, addr);
>> > +}
>> > +
>> > #define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
>> > #define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
>> > #define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
>> > diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
>> > index 6e924d3..907cede 100644
>> > --- a/arch/arm/include/asm/tlbflush.h
>> > +++ b/arch/arm/include/asm/tlbflush.h
>> > @@ -505,6 +505,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>> > }
>> > #endif
>> >
>> > +#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>> > +
>> > #endif
>> >
>> > #endif /* CONFIG_MMU */
>> > diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
>> > index 05a4e94..47f4c6f 100644
>> > --- a/arch/arm/mm/fsr-3level.c
>> > +++ b/arch/arm/mm/fsr-3level.c
>> > @@ -9,7 +9,7 @@ static struct fsr_info fsr_info[] = {
>> > { do_page_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
>> > { do_bad, SIGBUS, 0, "reserved access flag fault" },
>> > { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
>> > - { do_bad, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
>> > + { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
>> > { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
>> > { do_bad, SIGBUS, 0, "reserved permission fault" },
>> > { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
>> > --
>> > 1.7.9.5
>> >
>>
>> Besides the nits it looks fine to me. I've done quite extensive
>> testing with varied workloads on this code over the last couple of
>> months on the vexpress TC2 and on the ARNDALE board using KVM/ARM with
>> huge pages, and it gives a nice ~15% performance increase on average
>> and is completely stable.
>
> That's great to hear \o/.
> Also I've found a decent perf boost when running tools like xz backed by huge pages.
> (One can use the LD_PRELOAD mechanism in libhugetlbfs to make mallocs point to
> huge pages).
>
cool!
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems.
2013-01-08 17:59 ` Steve Capper
@ 2013-01-08 18:17 ` Christoffer Dall
0 siblings, 0 replies; 25+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:17 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jan 8, 2013 at 12:59 PM, Steve Capper <steve.capper@arm.com> wrote:
> On Fri, Jan 04, 2013 at 05:04:57AM +0000, Christoffer Dall wrote:
>> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:
>> > Much of the required code for THP has been implemented in the earlier non-LPAE
>> > HugeTLB patch.
>> >
>> > One more domain bits is used (to store whether or not the THP is splitting).
>>
>> s/bits/bit/
>>
> Thanks.
>
>> >
>> > Some THP helper functions are defined; and we have to re-define pmd_page such
>> > that it distinguishes between page tables and sections.
>>
>> super nit: not sure the semi-colon is warranted here.
>>
> Cheers, it is a superfluous semicolon.
>
>> >
>> > Signed-off-by: Will Deacon <will.deacon@arm.com>
>> > Signed-off-by: Steve Capper <steve.capper@arm.com>
>> > ---
>> > arch/arm/Kconfig | 2 +-
>> > arch/arm/include/asm/pgtable-2level.h | 68 ++++++++++++++++++++++++++++++++-
>> > arch/arm/include/asm/pgtable-3level.h | 2 +
>> > arch/arm/include/asm/pgtable.h | 7 +++-
>> > 4 files changed, 75 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> > index 9621d5f..d459673 100644
>> > --- a/arch/arm/Kconfig
>> > +++ b/arch/arm/Kconfig
>> > @@ -1773,7 +1773,7 @@ config SYS_SUPPORTS_HUGETLBFS
>> >
>> > config HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> > def_bool y
>> > - depends on ARM_LPAE
>> > + depends on SYS_SUPPORTS_HUGETLBFS
>> >
>> > source "mm/Kconfig"
>> >
>> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
>> > index 34f4775..67eabb4 100644
>> > --- a/arch/arm/include/asm/pgtable-2level.h
>> > +++ b/arch/arm/include/asm/pgtable-2level.h
>> > @@ -179,6 +179,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>> > clean_pmd_entry(pmdp); \
>> > } while (0)
>> >
>> > +
>>
>> stray whitespace?
>>
>
> Thanks.
>
>> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> > +#define _PMD_HUGE(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
>> > +#else
>> > +#define _PMD_HUGE(pmd) (0)
>> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>> > +
>> > /* we don't need complex calculations here as the pmd is folded into the pgd */
>> > #define pmd_addr_end(addr,end) (end)
>> >
>> > @@ -197,7 +204,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>> >
>> > #define HPAGE_SHIFT PMD_SHIFT
>> > #define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
>> > -#define HPAGE_MASK (~(HPAGE_SIZE - 1))
>> > #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
>> >
>> > #define HUGE_LINUX_PTE_COUNT (PAGE_OFFSET >> HPAGE_SHIFT)
>> > @@ -209,6 +215,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>> > */
>> > #define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
>> > #define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
>> > +#define PMD_DSECT_SPLITTING (_AT(pmdval_t, 1) << 7)
>> >
>> > #define PMD_BIT_FUNC(fn,op) \
>> > static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
>> > @@ -261,8 +268,67 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>> > return __pmd(pmdval);
>> > }
>> >
>> > +#else
>> > +#define HPAGE_SIZE 0
>>
>> why this and the conditional define of _PMD_HUGE, you could just do
>> like in pgtable.h and put the #ifdef around the condition in
>> pmd_page(pmt_t pmd).
>>
>
> Thanks, I'll take a look at this.
>
>> > #endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
>> >
>> > +#define HPAGE_MASK (~(HPAGE_SIZE - 1))
>> > +
>> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> > +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>> > +
>> > +PMD_BIT_FUNC(mkold, &= ~PMD_DSECT_AF);
>> > +PMD_BIT_FUNC(mksplitting, |= PMD_DSECT_SPLITTING);
>> > +PMD_BIT_FUNC(mkdirty, |= PMD_DSECT_DIRTY);
>> > +PMD_BIT_FUNC(mkyoung, |= PMD_DSECT_AF);
>> > +PMD_BIT_FUNC(mkwrite, |= PMD_SECT_AP_WRITE);
>> > +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
>> > +
>> > +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_DSECT_SPLITTING)
>> > +#define pmd_young(pmd) (pmd_val(pmd) & PMD_DSECT_AF)
>> > +#define pmd_write(pmd) (pmd_val(pmd) & PMD_SECT_AP_WRITE)
>> > +#define pmd_trans_huge(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
>> > +
>> > +static inline unsigned long pmd_pfn(pmd_t pmd)
>> > +{
>> > + /*
>> > + * for a section, we need to mask off more of the pmd
>> > + * before looking up the pfn
>> > + */
>> > + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
>> > + return __phys_to_pfn(pmd_val(pmd) & HPAGE_MASK);
>> > + else
>> > + return __phys_to_pfn(pmd_val(pmd) & PHYS_MASK);
>> > +}
>> > +
>> > +static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
>> > +{
>> > + pmd_t pmd = __pmd(__pfn_to_phys(pfn) | PMD_SECT_AP_READ | PMD_SECT_nG);
>> > +
>> > + return pmd_modify(pmd, prot);
>> > +}
>> > +
>> > +#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot);
>> > +
>> > +static inline int has_transparent_hugepage(void)
>> > +{
>> > + return 1;
>> > +}
>> > +
>> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>> > +
>> > +static inline struct page *pmd_page(pmd_t pmd)
>> > +{
>> > + /*
>> > + * for a section, we need to mask off more of the pmd
>> > + * before looking up the page as it is a section descriptor.
>> > + */
>> > + if (_PMD_HUGE(pmd))
>> > + return phys_to_page(pmd_val(pmd) & HPAGE_MASK);
>> > +
>> > + return phys_to_page(pmd_val(pmd) & PHYS_MASK);
>> > +}
>> > +
>> > #endif /* __ASSEMBLY__ */
>> >
>> > #endif /* _ASM_PGTABLE_2LEVEL_H */
>> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> > index 31c071f..8360814 100644
>> > --- a/arch/arm/include/asm/pgtable-3level.h
>> > +++ b/arch/arm/include/asm/pgtable-3level.h
>> > @@ -197,6 +197,8 @@ PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
>> > #define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
>> > #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
>> >
>> > +#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>> > +
>> > static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>> > {
>> > const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
>> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> > index 767aa7c..2d96381 100644
>> > --- a/arch/arm/include/asm/pgtable.h
>> > +++ b/arch/arm/include/asm/pgtable.h
>> > @@ -169,11 +169,14 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>> >
>> > static inline pte_t *pmd_page_vaddr(pmd_t pmd)
>> > {
>> > +#ifdef SYS_SUPPORTS_HUGETLBFS
>> > + if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
>> > + return __va(pmd_val(pmd) & HPAGE_MASK);
>> > +#endif
>> > +
>> > return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
>> > }
>> >
>> > -#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>> > -
>> > #ifndef CONFIG_HIGHPTE
>> > #define __pte_map(pmd) pmd_page_vaddr(*(pmd))
>> > #define __pte_unmap(pte) do { } while (0)
>> > --
>> > 1.7.9.5
>> >
>> The whole series looks functionally correct to me:
>>
>> Reviewed-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>
>
> A big thank you for going through this Christoffer.
>
> I'm correcting/simplifying/testing the huge pages code and will send out another
> version soon.
>
Great. This really makes a performance difference for running VMs, so
I'm happy to have the code for KVM/ARM. Please remember to cc me
and/or the kvmarm mailing list for a new version of this series.
-Christoffer
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2013-01-08 18:17 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
2012-10-18 16:15 ` [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE Steve Capper
2013-01-04 5:03 ` Christoffer Dall
2013-01-08 17:56 ` Steve Capper
2012-10-18 16:15 ` [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages Steve Capper
2013-01-04 5:03 ` Christoffer Dall
2013-01-08 17:56 ` Steve Capper
2012-10-18 16:15 ` [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems Steve Capper
2013-01-04 5:03 ` Christoffer Dall
2013-01-08 17:57 ` Steve Capper
2013-01-08 18:10 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems Steve Capper
2013-01-04 5:04 ` Christoffer Dall
2013-01-08 17:58 ` Steve Capper
2013-01-08 18:13 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems Steve Capper
2013-01-04 5:04 ` Christoffer Dall
2013-01-08 17:59 ` Steve Capper
2013-01-08 18:15 ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems Steve Capper
2013-01-04 5:04 ` Christoffer Dall
2013-01-08 17:59 ` Steve Capper
2013-01-08 18:17 ` Christoffer Dall
2012-12-21 13:41 ` [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Gregory CLEMENT
2012-12-23 11:11 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).