linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ARM hugetlb support
@ 2012-01-30  7:57 bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 1/7] Add various hugetlb arm high level hooks bill4carson at gmail.com
                   ` (7 more replies)
  0 siblings, 8 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel



Hi All


This patch aims to support huge page for ARM. For now, 2MB(two 1MB page)/16MB
huge page are supported, Versatile Express Cortex-A9x4 tile is used as test 
board. Verifications are running with libhugetlbfs and ltp.

Any suggestions would be welcome.


Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/Kconfig                      |   29 ++++
 arch/arm/include/asm/glue-proc.h      |    3 +
 arch/arm/include/asm/hugetlb.h        |  240 +++++++++++++++++++++++++++++++++
 arch/arm/include/asm/page.h           |   15 ++
 arch/arm/include/asm/pgtable-2level.h |    8 +
 arch/arm/include/asm/pgtable.h        |   28 ++++
 arch/arm/include/asm/proc-fns.h       |    3 +
 arch/arm/mm/Makefile                  |    1 +
 arch/arm/mm/dma-mapping.c             |    3 -
 arch/arm/mm/fault.c                   |   15 ++
 arch/arm/mm/hugetlb.c                 |  187 +++++++++++++++++++++++++
 arch/arm/mm/pgd.c                     |   28 ++++
 arch/arm/mm/proc-v7-2level.S          |   96 +++++++++++++
 include/linux/mm_types.h              |   11 ++
 14 files changed, 664 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/include/asm/hugetlb.h
 create mode 100644 arch/arm/mm/hugetlb.c

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-02-06 17:07   ` Catalin Marinas
  2012-02-07 12:15   ` Catalin Marinas
  2012-01-30  7:57 ` [PATCH 2/7] Add various hugetlb page table fix bill4carson at gmail.com
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

Why 2MB huge page is exported to user?

1M huge page only resides at pgd level, it does not need pte level stuff,
Point is that huge page based VMA mingles with 4K page based VMA at pgd level
no matter the page fault sequence of these two different VMA,
a pagefault at one VMA will contaminate the pgd entry of the other VMA.

example:

Create a 4K page based VMA and 1M page based VMA,

VMA_4K range is 0x40280000 ~ 0x40280400, pgd level index = 0x402
VMA_1M range is 0x40300000 ~ 0x40400000, pgd level index = 0x403

When page fault occurs at VMA_1M first, all we need to do is to allocate a 1M
page, and setup one pgd entry at index 0x403;

then at the occurrence of page fault for VMA_4K, both pgd entries at index
0x402 and 0x403 are configured according to current ARM 4K page pagetable
design, this will ruin the first VMA_1M page entry causing weird kernel oops;
we cannot prejudge the sequence of page fault for VMA_4K and VMA_1M;
Much of special cares need to to be taken to rule out such pagetable damages,
this work involves heavy hacking of current 4K pagetale management,
it is complex and prone to breakage.

The solution to this problem is to chose a page size that does not fall into
the 4k page pgd entries pair, and the best one is using 2MB huge page.

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/include/asm/hugetlb.h |  240 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/page.h    |   15 +++
 arch/arm/mm/hugetlb.c          |  187 +++++++++++++++++++++++++++++++
 3 files changed, 442 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/include/asm/hugetlb.h
 create mode 100644 arch/arm/mm/hugetlb.c

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
new file mode 100644
index 0000000..d7ad0fc
--- /dev/null
+++ b/arch/arm/include/asm/hugetlb.h
@@ -0,0 +1,240 @@
+/*
+ * hugetlb.h, ARM Huge Tlb Page support.
+ *
+ * Copyright (c) Bill Carson
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#ifndef __ASM_HUGETLB_H
+#define __ASM_HUGETLB_H
+
+#include <asm/page.h>
+#include <asm/pgtable-2level.h>
+#include <asm/tlb.h>
+
+
+/* 2M and 16M hugepage linux ptes are stored in an array
+ *
+ * 2M hugepage
+ * ===========
+ * one linux pte caters to two HW ptes,
+ * so the maximum huge linux pte needed is 4096M/2M = 2048 entry pointers.
+ * Two 4K page is used to store these entry pointers(2048 * 4 = 8192 bytes)
+ * in a two-dimension array, huge_2m_pte[2][1024].
+ *
+ * How to find the hugepage linux pte corresponding to a specific address ?
+ * VA[31] is used as row index;
+ * VA[30:21] is used as column  index;
+ *
+ * 16M hugepage
+ * ============
+ * one linux pte caters for one HW pte,
+ * so maxium huge linux pte needed is 4096M/16M = 256 entry pointers,
+ * 256 * 4 = 1024 bytes spaces is allocated to store these linux pte;
+ * this is a simple one-dimension array huge_16m_pte[256].
+ *
+ * VA[31:24] is used to index this array;
+ */
+
+/* 2M hugepage */
+#define HUGEPAGE_2M_PTE_ARRAY_ROW(addr)  ((addr & 0x80000000) >> 31)
+#define HUGEPAGE_2M_PTE_ARRAY_COL(addr)  ((addr & 0x7fe00000) >> 21)
+/* 16M hugepage */
+#define HUGEPAGE_16M_PTE_ARRAY_INDEX(addr)  ((addr & 0xff000000) >> 24)
+
+#define ALIGN_16M_PMD_ENTRY(pmd) ((unsigned long)pmd & (~0x3f))
+
+
+static inline int is_hugepage_only_range(struct mm_struct *mm,
+					 unsigned long addr,
+					 unsigned long len)
+{
+	return 0;
+}
+
+static inline int prepare_hugepage_range(struct file *file,
+					 unsigned long addr,
+					 unsigned long len)
+{
+	struct hstate *h = hstate_file(file);
+	/* addr/len should be aligned with huge page size */
+	if (len & ~huge_page_mask(h))
+		return -EINVAL;
+	if (addr & ~huge_page_mask(h))
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm)
+{
+}
+
+static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
+			unsigned long addr, unsigned long end,
+			unsigned long floor, unsigned long ceiling)
+{
+}
+
+static inline void set_hugepte_section(struct mm_struct *mm, unsigned long addr,
+				   pte_t *ptep, pte_t pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	int col, row;
+	pte_t **huge_linuxpte = &mm->huge_2m_pte[0];
+
+	row = HUGEPAGE_2M_PTE_ARRAY_ROW(addr);
+	col = HUGEPAGE_2M_PTE_ARRAY_COL(addr);
+
+	/* an valid pte pointer is expected */
+	BUG_ON(huge_linuxpte[row] == 0);
+	BUG_ON(ptep != &huge_linuxpte[row][col]);
+
+	/* set linux pte first */
+	huge_linuxpte[row][col] = pte;
+
+	/* set hardware pte */
+	pgd = pgd_offset(mm, addr);
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+
+	set_hugepte_at(mm, addr, pmd, pte);
+}
+
+static inline void set_hugepte_supersection(struct mm_struct *mm,
+			unsigned long addr, pte_t *ptep, pte_t pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	int index;
+	pte_t *huge_linuxpte = mm->huge_16m_pte;
+
+	index = HUGEPAGE_16M_PTE_ARRAY_INDEX(addr);
+
+	BUG_ON(huge_linuxpte == 0);
+	BUG_ON(ptep != &huge_linuxpte[index]);
+
+	/* set linux pte first */
+	huge_linuxpte[index] = pte;
+
+	/* set hardware pte */
+	addr &= SUPERSECTION_MASK;
+	pgd = pgd_offset(mm, addr);
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+
+	set_hugepte_at(mm, addr, pmd, pte);
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+				   pte_t *ptep, pte_t pte)
+{
+	if (pte & L_PTE_HPAGE_2M)
+		set_hugepte_section(mm, addr, ptep, pte);
+	else
+		set_hugepte_supersection(mm, addr, ptep, pte);
+}
+
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+					    unsigned long addr, pte_t *ptep)
+{
+	pte_t pte = *ptep;
+	pte_t fake = (L_PTE_HUGEPAGE | L_PTE_YOUNG);
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	/* clear linux pte */
+	*ptep = 0;
+	pgd = pgd_offset(mm, addr);
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+
+	if (pte & L_PTE_HPAGE_16M) {
+		pmd = (pmd_t *)ALIGN_16M_PMD_ENTRY(pmd);
+		fake |= L_PTE_HPAGE_16M;
+
+	} else
+		fake |= L_PTE_HPAGE_2M;
+
+	/* let set_hugepte_at clear HW entry */
+	set_hugepte_at(mm, addr, pmd, fake);
+	return pte;
+}
+
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+					 unsigned long addr, pte_t *ptep)
+{
+	if (*ptep & L_PTE_HPAGE_16M)
+		flush_tlb_page(vma, addr & SUPERSECTION_MASK);
+	else {
+		flush_tlb_page(vma, addr & SECTION_MASK);
+		flush_tlb_page(vma, (addr & SECTION_MASK)^0x100000);
+	}
+}
+
+static inline int huge_pte_none(pte_t pte)
+{
+	return pte_none(pte);
+}
+
+static inline pte_t huge_pte_wrprotect(pte_t pte)
+{
+	return pte_wrprotect(pte);
+}
+
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+					   unsigned long addr, pte_t *ptep)
+{
+	pte_t old_pte = *ptep;
+	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
+}
+
+static inline pte_t huge_ptep_get(pte_t *ptep)
+{
+	return *ptep;
+}
+
+static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+					     unsigned long addr,
+					     pte_t *ptep, pte_t pte,
+					     int dirty)
+{
+	int changed = !pte_same(huge_ptep_get(ptep), pte);
+	if (changed) {
+		set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+		huge_ptep_clear_flush(vma, addr, &pte);
+	}
+
+	return changed;
+}
+
+static inline int arch_prepare_hugepage(struct page *page)
+{
+	return 0;
+}
+
+static inline void arch_release_hugepage(struct page *page)
+{
+}
+
+#endif /* __ASM_HUGETLB_H */
+
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 97b440c..3e6769a 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -15,6 +15,21 @@
 #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
 
+#ifdef CONFIG_HUGEPAGE_SIZE_2MB
+/* we have 2MB hugepage for two 1MB section mapping */
+#define HPAGE_SHIFT		(SECTION_SHIFT + 1)
+#define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
+#endif
+
+#ifdef CONFIG_HUGEPAGE_SIZE_16MB
+#define HPAGE_SHIFT		SUPERSECTION_SHIFT
+#define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
+#endif
+
 #ifndef __ASSEMBLY__
 
 #ifndef CONFIG_MMU
diff --git a/arch/arm/mm/hugetlb.c b/arch/arm/mm/hugetlb.c
new file mode 100644
index 0000000..fe7d787
--- /dev/null
+++ b/arch/arm/mm/hugetlb.c
@@ -0,0 +1,187 @@
+/*
+ * hugetlb.c, ARM Huge Tlb Page support.
+ *
+ * Copyright (c) Bill Carson
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#include <linux/hugetlb.h>
+
+pte_t *huge_pte_alloc_section(struct mm_struct *mm, unsigned long addr,
+		      unsigned long sz)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd = NULL;
+	int col, row;
+	pte_t **huge_linuxpte = &mm->huge_2m_pte[0];
+
+	/* check this mapping exist at pmd level */
+	pgd = pgd_offset(mm, addr);
+	if (pgd_present(*pgd)) {
+		pud = pud_offset(pgd, addr);
+		pmd = pmd_offset(pud, addr);
+		if (*pmd & PMD_TYPE_TABLE) {
+			printk(KERN_ERR "alloc huge pte for non huge mapping!\n");
+			BUG();
+		}
+	}
+
+	row = HUGEPAGE_2M_PTE_ARRAY_ROW(addr);
+	col = HUGEPAGE_2M_PTE_ARRAY_COL(addr);
+
+	if (huge_linuxpte[row] == 0)
+		huge_linuxpte[row] = (pte_t *) __get_free_page(PGALLOC_GFP);
+
+	return &huge_linuxpte[row][col];
+}
+
+pte_t *huge_pte_alloc_supersection(struct mm_struct *mm, unsigned long addr,
+		      unsigned long sz)
+{
+	pte_t *linuxpte_super = mm->huge_16m_pte;
+	size_t size = ((SUPERSECTION_MASK >> SUPERSECTION_SHIFT) + 1) *
+			sizeof(pte_t *);
+	int index ;
+
+	if (linuxpte_super == NULL) {
+		linuxpte_super = kzalloc(size, GFP_ATOMIC);
+		if (linuxpte_super == NULL) {
+			printk(KERN_ERR "Cannot allocate memory for pte\n");
+			return NULL;
+		}
+		mm->huge_16m_pte = linuxpte_super;
+	}
+
+	index = HUGEPAGE_16M_PTE_ARRAY_INDEX(addr);
+	return &linuxpte_super[index];
+}
+
+
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
+		      unsigned long sz)
+{
+	if (sz == (SECTION_SIZE*2))
+		return huge_pte_alloc_section(mm, addr, sz);
+	else
+		return huge_pte_alloc_supersection(mm, addr, sz);
+}
+
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd = NULL;
+
+	/* check this mapping exist at pmd level */
+	pgd = pgd_offset(mm, addr);
+	if (pgd_present(*pgd)) {
+		pud = pud_offset(pgd, addr);
+		pmd = pmd_offset(pud, addr);
+		if (!pmd_present(*pmd))
+			return NULL;
+	}
+
+	BUG_ON((*pmd & PMD_TYPE_MASK) != PMD_TYPE_SECT);
+
+	/* then return linux version pte addr */
+	if ((*pmd & PMD_SECT_SUPER) == 0) {
+		/* section */
+		int col, row;
+		pte_t **huge_linuxpte = &mm->huge_2m_pte[0];
+		row = HUGEPAGE_2M_PTE_ARRAY_ROW(addr);
+		col = HUGEPAGE_2M_PTE_ARRAY_COL(addr);
+
+		return huge_linuxpte[row] ? &huge_linuxpte[row][col] : NULL;
+	} else{
+		/* supersection */
+		pte_t *linuxpte_super = mm->huge_16m_pte;
+		int index ;
+		index = HUGEPAGE_16M_PTE_ARRAY_INDEX(addr);
+		return linuxpte_super ? &linuxpte_super[index] : NULL;
+	}
+
+}
+
+int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+{
+	return 0;
+}
+
+struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
+				int write)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+int pmd_huge(pmd_t pmd)
+{
+	return (pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT;
+}
+
+int pud_huge(pud_t pud)
+{
+	return  0;
+}
+
+struct page *
+follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+		pmd_t *pmd, int write)
+{
+	struct page *page = NULL;
+	unsigned long pfn;
+
+	BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
+	
+	if ((pmd_val(*pmd) & PMD_SECT_SUPER) == 0)
+		 pfn = (pmd_val(*pmd) & SECTION_MASK) >> PAGE_SHIFT;
+	else
+		 pfn = (pmd_val(*pmd) & SUPERSECTION_MASK) >> PAGE_SHIFT;
+
+	page = pfn_to_page(pfn);
+	return page;
+}
+
+static int __init add_huge_page_size(unsigned long long size)
+{
+	int shift = __ffs(size);
+
+	/* Check that it is a page size supported by the hardware and
+	 * that it fits within pagetable and slice limits. */
+	if (!is_power_of_2(size) ||
+	    ((shift != (SECTION_SHIFT+1)) && (shift != SUPERSECTION_SHIFT)))
+		return -EINVAL;
+
+	/* Return if huge page size has already been setup */
+	if (size_to_hstate(size))
+		return 0;
+
+	hugetlb_add_hstate(shift - PAGE_SHIFT);
+	return 0;
+}
+
+static int __init hugepage_setup_sz(char *str)
+{
+	unsigned long long size;
+
+	size = memparse(str, &str);
+	if (add_huge_page_size(size) != 0)
+		printk(KERN_WARNING "Invalid huge page size specified(%llu)\n",
+			 size);
+
+	return 1;
+}
+__setup("hugepagesz=", hugepage_setup_sz);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 1/7] Add various hugetlb arm high level hooks bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-01-31  9:57   ` Catalin Marinas
  2012-01-31  9:58   ` Russell King - ARM Linux
  2012-01-30  7:57 ` [PATCH 3/7] Introduce set_hugepte_ext api for huge page hardware page table setup bill4carson at gmail.com
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

    - Add L_PTE_huge page to mark huge page
    - Modify pte_pfn for hugetlb
    - Add set_hugepte_at for hugetlb arm high levle hooks use

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/include/asm/pgtable-2level.h |    8 ++++++++
 arch/arm/include/asm/pgtable.h        |   28 ++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 2317a71..062c93c 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -123,6 +123,11 @@
 #define L_PTE_USER		(_AT(pteval_t, 1) << 8)
 #define L_PTE_XN		(_AT(pteval_t, 1) << 9)
 #define L_PTE_SHARED		(_AT(pteval_t, 1) << 10)	/* shared(v6), coherent(xsc3) */
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+#define L_PTE_HUGEPAGE	(_AT(pteval_t, 1) << 11) /* mark hugepage */
+#define L_PTE_HPAGE_2M  (_AT(pteval_t, 1) << 12) /* only when HUGEPAGE set */
+#define L_PTE_HPAGE_16M (_AT(pteval_t, 1) << 13) /* only when HUGEPAGE set */
+#endif
 
 /*
  * These are the memory types, defined to be compatible with
@@ -178,6 +183,9 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+#define set_hugepte_ext(ptep,pte,ext) cpu_set_hugepte_ext(ptep,pte,ext)
+#endif
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index f66626d..da875d8 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -187,7 +187,21 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_offset_map(pmd,addr)	(__pte_map(pmd) + pte_index(addr))
 #define pte_unmap(pte)			__pte_unmap(pte)
 
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+
+#ifdef CONFIG_HUGEPAGE_SIZE_2MB
+#define hugepte_pfn(pte)    ((pte_val(pte) & SECTION_MASK) >> PAGE_SHIFT)
+#endif
+#ifdef CONFIG_HUGEPAGE_SIZE_16MB
+#define hugepte_pfn(pte)    ((pte_val(pte) & SUPERSECTION_MASK) >> PAGE_SHIFT)
+#endif
+#define pte_is_huge(pte)	(pte_val(pte) & L_PTE_HUGEPAGE)
+#define pte_pfn(pte)	    (pte_is_huge(pte) ? \
+				hugepte_pfn(pte) : ((pte_val(pte) & PHYS_MASK) >> PAGE_SHIFT))
+#else
 #define pte_pfn(pte)		((pte_val(pte) & PHYS_MASK) >> PAGE_SHIFT)
+#endif /*!CONFIG_ARM_HUGETLB_SUPPORT*/
+
 #define pfn_pte(pfn,prot)	__pte(__pfn_to_phys(pfn) | pgprot_val(prot))
 
 #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
@@ -213,6 +227,14 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 		set_pte_ext(ptep, pteval, PTE_EXT_NG);
 	}
 }
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+static inline void set_hugepte_at(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval)
+{
+	__sync_icache_dcache(pteval);
+	set_hugepte_ext(ptep, pteval, PTE_EXT_NG);
+}
+#endif
 
 #define pte_none(pte)		(!pte_val(pte))
 #define pte_present(pte)	(pte_val(pte) & L_PTE_PRESENT)
@@ -235,6 +257,12 @@ PTE_BIT_FUNC(mkclean,   &= ~L_PTE_DIRTY);
 PTE_BIT_FUNC(mkdirty,   |= L_PTE_DIRTY);
 PTE_BIT_FUNC(mkold,     &= ~L_PTE_YOUNG);
 PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
+#ifdef CONFIG_HUGEPAGE_SIZE_2MB
+PTE_BIT_FUNC(mkhuge,    |= L_PTE_HUGEPAGE | L_PTE_HPAGE_2M);
+#endif
+#ifdef CONFIG_HUGEPAGE_SIZE_16MB
+PTE_BIT_FUNC(mkhuge,    |= L_PTE_HUGEPAGE | L_PTE_HPAGE_16M);
+#endif
 
 static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/7] Introduce set_hugepte_ext api for huge page hardware page table setup
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 1/7] Add various hugetlb arm high level hooks bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 2/7] Add various hugetlb page table fix bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 4/7] Store huge page linux pte in mm_struct bill4carson at gmail.com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

All hugetlb arm high level hooks eventually call this api using set_hugepte_at

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/include/asm/glue-proc.h |    3 +
 arch/arm/include/asm/proc-fns.h  |    3 +
 arch/arm/mm/proc-v7-2level.S     |   96 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/glue-proc.h b/arch/arm/include/asm/glue-proc.h
index e2be7f1..2bbd452 100644
--- a/arch/arm/include/asm/glue-proc.h
+++ b/arch/arm/include/asm/glue-proc.h
@@ -256,6 +256,9 @@
 #define cpu_dcache_clean_area		__glue(CPU_NAME,_dcache_clean_area)
 #define cpu_do_switch_mm		__glue(CPU_NAME,_switch_mm)
 #define cpu_set_pte_ext			__glue(CPU_NAME,_set_pte_ext)
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+#define cpu_set_hugepte_ext		__glue(CPU_NAME,_set_hugepte_ext)
+#endif
 #define cpu_suspend_size		__glue(CPU_NAME,_suspend_size)
 #define cpu_do_suspend			__glue(CPU_NAME,_do_suspend)
 #define cpu_do_resume			__glue(CPU_NAME,_do_resume)
diff --git a/arch/arm/include/asm/proc-fns.h b/arch/arm/include/asm/proc-fns.h
index f3628fb..75bd755 100644
--- a/arch/arm/include/asm/proc-fns.h
+++ b/arch/arm/include/asm/proc-fns.h
@@ -87,6 +87,9 @@ extern void cpu_do_switch_mm(unsigned long pgd_phys, struct mm_struct *mm);
 extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte);
 #else
 extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte, unsigned int ext);
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+extern void cpu_set_hugepte_ext(pte_t *ptep, pte_t pte, unsigned int ext);
+#endif
 #endif
 extern void cpu_reset(unsigned long addr) __attribute__((noreturn));
 
diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index 3a4b3e7..7aee5b5 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -111,6 +111,102 @@ ENTRY(cpu_v7_set_pte_ext)
 	mov	pc, lr
 ENDPROC(cpu_v7_set_pte_ext)
 
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+ENTRY(cpu_v7_set_hugepte_ext)
+set_hugepage:
+	@mask out AP[2:0] TEX[2:0] in first level section descriptor
+	bic	r3, r1, #0x0000fc00
+	@clear NX/IMP
+	bic	r3, r3,#0x210
+
+	@clear BIT1:0
+	bic	r3, r3, #PMD_TYPE_MASK
+
+	@set extension bit
+	orr	r3, r3, #PMD_SECT_nG @HUGEPAGE always non-global
+
+	@set SECT mapping,1M section or 16M supersection
+	orr	r3, r3, #PMD_SECT_AP_WRITE
+	orr	r3, r3, #PMD_TYPE_SECT
+
+	@BIT18 1: 16M supersection 0: 1M section
+	bic	r3, r3, #PMD_SECT_SUPER
+
+	@ shared bit
+	tst	r1,#L_PTE_SHARED
+	orrne	r3,r3,#PMD_SECT_S
+
+	@ shared device ?
+	tst	r1, #1 << 4
+	orrne	r3, r3, #PMD_SECT_TEX(1)
+
+	eor	r1, r1,  #L_PTE_DIRTY
+	tst	r1, #L_PTE_RDONLY | L_PTE_DIRTY
+	orrne	r3, r3, #PMD_SECT_APX
+
+	tst	r1, #L_PTE_USER
+	orrne	r3, r3, #PMD_SECT_AP_READ
+#ifdef CONFIG_CPU_USE_DOMAINS
+	tstne	r3, #PMD_SECT_APX
+	bicne	r3, r3, #PMD_SECT_APX | PMD_SECT_AP_WRITE
+#endif
+
+	@set domain
+	bic 	r3, r3, #(0xf << 5)
+	orr	r3, r3, #PMD_DOMAIN(0x1)
+
+	@ for supersection mapping
+	@ clear domain setting and extend addr
+	@ set BIT18 to denote supersection
+	ldr 	r2,=L_PTE_HPAGE_2M
+	tst	r1,r2
+	bne	page_2M
+
+	@ if not 2M huge page, then it's 16M
+page_16M:
+	ldr	r2, =((0xf << 5) | (0xf << 20))
+	bic	r3, r3, r2
+	orr	r3, r3, #PMD_SECT_SUPER
+
+page_2M:
+	tst	r1, #L_PTE_XN
+	orrne	r3, r3, #PMD_SECT_XN
+
+	tst	r1, #L_PTE_YOUNG
+	tstne	r1, #L_PTE_PRESENT
+	moveq	r3, #0
+
+	ldr     r2,=L_PTE_HPAGE_2M
+	tst	r1, r2
+	bne     setpte_2M
+
+	@ set 16M huge page
+setpte_16M:
+	.rept 15
+ 	str r3, [r0]
+	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
+	add	r0, r0, #4
+	.endr
+	b	out
+
+	@ set 2M huge page
+setpte_2M:
+	@ 1st 1MB mapping
+ 	str r3, [r0]
+	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
+	@ 2st 1MB mapping
+	cmp	r3,#0
+	movne	r2,#0x100000
+	addne	r3, r3, r2
+	add	r0, r0, #4
+out:
+ 	str r3, [r0]
+	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
+	mov pc, lr
+#endif /*COFNIG_ARM_HUGETLB_SUPPORT*/
+
+ENDPROC(cpu_v7_set_hugepte_ext)
+
 	/*
 	 * Memory region attributes with SCTLR.TRE=1
 	 *
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
                   ` (2 preceding siblings ...)
  2012-01-30  7:57 ` [PATCH 3/7] Introduce set_hugepte_ext api for huge page hardware page table setup bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-01-31  9:37   ` Catalin Marinas
  2012-01-31 10:01   ` Russell King - ARM Linux
  2012-01-30  7:57 ` [PATCH 5/7] Using do_page_fault for section fault handling bill4carson at gmail.com
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

One easy way to store huge page linux pte is mm_struct instead of thread_info
that's because when parent task with huge page VMA calls fork, parent huge page
pagetable entries are copied into child pagetable. This is done in

int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
			    struct vm_area_struct *vma)

We cannot derive child's thread_info just using struct mm_struct *dst.
if we have struct mm_struct **dst, then it's easy to find the corresponding
task_struct as well as thread_info, but we only get struct mm_struct *dst.
It's possible to find the desired task_struct by iterating the global task list
by comparing task_struct->mm with dst.
So mm_struct is used for huge page linux pte for faster lookup and efficient.

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/mm/pgd.c        |   28 ++++++++++++++++++++++++++++
 include/linux/mm_types.h |   11 +++++++++++
 2 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index a3e78cc..b04a69a 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -91,6 +91,14 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 		pte_unmap(new_pte);
 	}
 
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+	/* reset the hugepage linux pte pointer
+	 * for new mm_struct when we do the fork
+	 */
+	mm->huge_2m_pte[HUGE_2M_PTE_1ST_ARRAY] = 0;
+	mm->huge_2m_pte[HUGE_2M_PTE_2ND_ARRAY] = 0;
+	mm->huge_16m_pte   = 0;
+#endif
 	return new_pgd;
 
 no_pte:
@@ -103,6 +111,25 @@ no_pgd:
 	return NULL;
 }
 
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+static void free_huge_linuxpte(struct mm_struct *mm)
+{
+	pte_t **huge_linuxpte = &mm->huge_2m_pte[0];
+	int i;
+
+	for (i = 0; i < HUGE_2M_PTE_SIZE; i++)
+		if (huge_linuxpte[i] != 0)
+			free_page((unsigned long)huge_linuxpte[i]);
+
+	if (mm->huge_16m_pte != NULL)
+		kfree(mm->huge_16m_pte);
+}
+#else
+static void free_huge_linuxpte(struct mm_struct *mm)
+{
+}
+#endif
+
 void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
 {
 	pgd_t *pgd;
@@ -135,6 +162,7 @@ no_pud:
 	pgd_clear(pgd);
 	pud_free(mm, pud);
 no_pgd:
+	free_huge_linuxpte(mm);
 #ifdef CONFIG_ARM_LPAE
 	/*
 	 * Free modules/pkmap or identity pmd tables.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cc3062..88f76e6 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,6 +23,11 @@
 struct address_space;
 
 #define USE_SPLIT_PTLOCKS	(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+#define HUGE_2M_PTE_SIZE      2
+#define HUGE_2M_PTE_1ST_ARRAY 0
+#define HUGE_2M_PTE_2ND_ARRAY 1
+#endif
 
 /*
  * Each physical page in the system has a struct page associated with
@@ -388,6 +393,12 @@ struct mm_struct {
 #ifdef CONFIG_CPUMASK_OFFSTACK
 	struct cpumask cpumask_allocation;
 #endif
+
+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+	/* we place hugepage linux pte at mm_struct  */
+	pte_t *huge_2m_pte[HUGE_2M_PTE_SIZE];
+	pte_t *huge_16m_pte;
+#endif
 };
 
 static inline void mm_init_cpumask(struct mm_struct *mm)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/7] Using do_page_fault for section fault handling
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
                   ` (3 preceding siblings ...)
  2012-01-30  7:57 ` [PATCH 4/7] Store huge page linux pte in mm_struct bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 6/7] Add hugetlb Kconfig option bill4carson at gmail.com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/mm/fault.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index bb7eac3..af6703d 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -19,6 +19,7 @@
 #include <linux/sched.h>
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
+#include <linux/hugetlb.h>
 
 #include <asm/exception.h>
 #include <asm/system.h>
@@ -485,6 +486,7 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
 }
 #endif					/* CONFIG_MMU */
 
+#ifndef CONFIG_ARM_HUGETLB_SUPPORT
 /*
  * Some section permission faults need to be handled gracefully.
  * They can happen due to a __{get,put}_user during an oops.
@@ -496,6 +498,19 @@ do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	return 0;
 }
 
+#else
+
+/* Since normal 4K page based vma will never fault into section traps,
+ * This will enable us to use do_page_fault for section permission fault.
+ */
+static int
+do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
+{
+	do_page_fault(addr, fsr, regs);
+	return 0;
+}
+#endif
+
 /*
  * This abort handler always returns "fault".
  */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/7] Add hugetlb Kconfig option
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
                   ` (4 preceding siblings ...)
  2012-01-30  7:57 ` [PATCH 5/7] Using do_page_fault for section fault handling bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-01-30  7:57 ` [PATCH 7/7] Minor compiling fix bill4carson at gmail.com
  2012-01-31  9:29 ` [RFC] ARM hugetlb support Catalin Marinas
  7 siblings, 0 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/Kconfig |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a48aecc..161bca6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -985,6 +985,35 @@ config ARCH_ZYNQ
 	  Support for Xilinx Zynq ARM Cortex A9 Platform
 endchoice
 
+config SYS_SUPPORTS_HUGETLBFS
+		def_bool n
+
+config ARM_HUGETLB_SUPPORT
+	bool "Support HUGETLB for ARMv7 (EXPERIMENTAL)"
+	depends on CPU_V7 && EXPERIMENTAL
+	select SYS_SUPPORTS_HUGETLBFS
+	select HUGETLBFS
+	default y
+
+choice
+	prompt "Huge Page Size"
+	depends on ARM_HUGETLB_SUPPORT
+	default HUGEPAGE_SIZE_2MB
+
+config HUGEPAGE_SIZE_2MB
+	bool "2MB"
+	depends on ARM_HUGETLB_SUPPORT
+	help
+	 This option select huge page size in 2MB units
+
+config HUGEPAGE_SIZE_16MB
+	bool "16MB"
+	depends on ARM_HUGETLB_SUPPORT
+	help
+	  This option select huge page size in 16MB units
+
+endchoice
+
 #
 # This is sorted alphabetically by mach-* pathname.  However, plat-*
 # Kconfigs may be included either alphabetically (according to the
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/7] Minor compiling fix
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
                   ` (5 preceding siblings ...)
  2012-01-30  7:57 ` [PATCH 6/7] Add hugetlb Kconfig option bill4carson at gmail.com
@ 2012-01-30  7:57 ` bill4carson at gmail.com
  2012-01-31  9:29 ` [RFC] ARM hugetlb support Catalin Marinas
  7 siblings, 0 replies; 50+ messages in thread
From: bill4carson at gmail.com @ 2012-01-30  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Bill Carson <bill4carson@gmail.com>

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/mm/Makefile      |    1 +
 arch/arm/mm/dma-mapping.c |    3 ---
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index bca7e61..9348e1e 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -99,4 +99,5 @@ AFLAGS_proc-v7.o	:=-Wa,-march=armv7-a
 obj-$(CONFIG_CACHE_FEROCEON_L2)	+= cache-feroceon-l2.o
 obj-$(CONFIG_CACHE_L2X0)	+= cache-l2x0.o
 obj-$(CONFIG_CACHE_XSC3L2)	+= cache-xsc3l2.o
+obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlb.o
 obj-$(CONFIG_CACHE_TAUROS2)	+= cache-tauros2.o
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 1aa664a..8dc5fb4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -152,9 +152,6 @@ static struct arm_vmregion_head consistent_head = {
 	.vm_end		= CONSISTENT_END,
 };
 
-#ifdef CONFIG_HUGETLB_PAGE
-#error ARM Coherent DMA allocator does not (yet) support huge TLB
-#endif
 
 /*
  * Initialise the consistent memory allocation.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC] ARM hugetlb support
  2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
                   ` (6 preceding siblings ...)
  2012-01-30  7:57 ` [PATCH 7/7] Minor compiling fix bill4carson at gmail.com
@ 2012-01-31  9:29 ` Catalin Marinas
  2012-02-01  1:56   ` bill4carson
  7 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-01-31  9:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 30 January 2012 07:57,  <bill4carson@gmail.com> wrote:
> This patch aims to support huge page for ARM. For now, 2MB(two 1MB page)/16MB
> huge page are supported, Versatile Express Cortex-A9x4 tile is used as test
> board. Verifications are running with libhugetlbfs and ltp.

I haven't reviewed your patches yet but just a few thoughts. Do you
have a clear case where 16MB super-sections is needed? Do you have
memory beyond 4G that can only be accessed with super-sections? This
is an optional feature and not all processors have it. It also doesn't
help much with the TLB hit rate since the micro-TLB most likely only
support 1MB sections.

BTW, I have a (simpler) implementation of hugetlbfs with 2MB sections
but for LPAE only:

http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-arm-arch.git;a=shortlog;h=refs/heads/hugetlb

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-01-30  7:57 ` [PATCH 4/7] Store huge page linux pte in mm_struct bill4carson at gmail.com
@ 2012-01-31  9:37   ` Catalin Marinas
  2012-01-31 10:01   ` Russell King - ARM Linux
  1 sibling, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-01-31  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 January 2012 07:57,  <bill4carson@gmail.com> wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc3062..88f76e6 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -23,6 +23,11 @@
> ?struct address_space;
>
> ?#define USE_SPLIT_PTLOCKS ? ? ?(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
> +#ifdef CONFIG_ARM_HUGETLB_SUPPORT
> +#define HUGE_2M_PTE_SIZE ? ? ?2
> +#define HUGE_2M_PTE_1ST_ARRAY 0
> +#define HUGE_2M_PTE_2ND_ARRAY 1
> +#endif
>
> ?/*
> ?* Each physical page in the system has a struct page associated with
> @@ -388,6 +393,12 @@ struct mm_struct {
> ?#ifdef CONFIG_CPUMASK_OFFSTACK
> ? ? ? ?struct cpumask cpumask_allocation;
> ?#endif
> +
> +#ifdef CONFIG_ARM_HUGETLB_SUPPORT
> + ? ? ? /* we place hugepage linux pte at mm_struct ?*/
> + ? ? ? pte_t *huge_2m_pte[HUGE_2M_PTE_SIZE];
> + ? ? ? pte_t *huge_16m_pte;
> +#endif
> ?};
>
> ?static inline void mm_init_cpumask(struct mm_struct *mm)

Please don't touch generic code. There is mm_context_t defined in
arch/arm/include/asm/mmu.h, just add the stuff you need there.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-01-30  7:57 ` [PATCH 2/7] Add various hugetlb page table fix bill4carson at gmail.com
@ 2012-01-31  9:57   ` Catalin Marinas
  2012-01-31  9:58   ` Russell King - ARM Linux
  1 sibling, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-01-31  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 January 2012 07:57,  <bill4carson@gmail.com> wrote:
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 2317a71..062c93c 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -123,6 +123,11 @@
> ?#define L_PTE_USER ? ? ? ? ? ? (_AT(pteval_t, 1) << 8)
> ?#define L_PTE_XN ? ? ? ? ? ? ? (_AT(pteval_t, 1) << 9)
> ?#define L_PTE_SHARED ? ? ? ? ? (_AT(pteval_t, 1) << 10) ? ? ? ?/* shared(v6), coherent(xsc3) */
> +#ifdef CONFIG_ARM_HUGETLB_SUPPORT
> +#define L_PTE_HUGEPAGE (_AT(pteval_t, 1) << 11) /* mark hugepage */

Do we actually need this bit? The checks are done starting from the
top pgd/pud/pmd. So we stop at the pmd level and check whether it is
table or a section mapping. I don't think we ever check a pte entry
for whether it is huge or not (as,@least with the implementation
you posted, it does not support 64K pages).

> +#define L_PTE_HPAGE_2M ?(_AT(pteval_t, 1) << 12) /* only when HUGEPAGE set */
> +#define L_PTE_HPAGE_16M (_AT(pteval_t, 1) << 13) /* only when HUGEPAGE set */

If we go for 2MB only, we can ignore both definitions here.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-01-30  7:57 ` [PATCH 2/7] Add various hugetlb page table fix bill4carson at gmail.com
  2012-01-31  9:57   ` Catalin Marinas
@ 2012-01-31  9:58   ` Russell King - ARM Linux
  2012-01-31 12:25     ` Catalin Marinas
  1 sibling, 1 reply; 50+ messages in thread
From: Russell King - ARM Linux @ 2012-01-31  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 30, 2012 at 03:57:13PM +0800, bill4carson at gmail.com wrote:
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 2317a71..062c93c 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -123,6 +123,11 @@
>  #define L_PTE_USER		(_AT(pteval_t, 1) << 8)
>  #define L_PTE_XN		(_AT(pteval_t, 1) << 9)
>  #define L_PTE_SHARED		(_AT(pteval_t, 1) << 10)	/* shared(v6), coherent(xsc3) */
> +#ifdef CONFIG_ARM_HUGETLB_SUPPORT
> +#define L_PTE_HUGEPAGE	(_AT(pteval_t, 1) << 11) /* mark hugepage */
> +#define L_PTE_HPAGE_2M  (_AT(pteval_t, 1) << 12) /* only when HUGEPAGE set */
> +#define L_PTE_HPAGE_16M (_AT(pteval_t, 1) << 13) /* only when HUGEPAGE set */
> +#endif

(1) How does this work when normal pages can have bit 11 set if they're an
odd PFN?

(2) How do we even get to PTE level when a 2 or 16MB section doesn't have
a pte table (as the L1 entry is used for the section or supersection
mapping) ?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-01-30  7:57 ` [PATCH 4/7] Store huge page linux pte in mm_struct bill4carson at gmail.com
  2012-01-31  9:37   ` Catalin Marinas
@ 2012-01-31 10:01   ` Russell King - ARM Linux
  2012-02-01  5:45     ` bill4carson
  1 sibling, 1 reply; 50+ messages in thread
From: Russell King - ARM Linux @ 2012-01-31 10:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 30, 2012 at 03:57:15PM +0800, bill4carson at gmail.com wrote:
> From: Bill Carson <bill4carson@gmail.com>
> 
> One easy way to store huge page linux pte is mm_struct instead of thread_info
> that's because when parent task with huge page VMA calls fork, parent huge page
> pagetable entries are copied into child pagetable. This is done in
> 
> int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> 			    struct vm_area_struct *vma)
> 
> We cannot derive child's thread_info just using struct mm_struct *dst.
> if we have struct mm_struct **dst, then it's easy to find the corresponding
> task_struct as well as thread_info, but we only get struct mm_struct *dst.
> It's possible to find the desired task_struct by iterating the global task list
> by comparing task_struct->mm with dst.
> So mm_struct is used for huge page linux pte for faster lookup and efficient.

I really do not understand this description, and it doesn't seem to tie
up with the code.  What problem are you trying to solve here?

Note that a mm_struct can be shared between multiple task_structs, so
if your thinking is that something in the mm_struct or page table needs
to know about a task_struct, you're ideas are wrong.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-01-31  9:58   ` Russell King - ARM Linux
@ 2012-01-31 12:25     ` Catalin Marinas
  2012-02-01  3:10       ` bill4carson
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-01-31 12:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 January 2012 09:58, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 30, 2012 at 03:57:13PM +0800, bill4carson at gmail.com wrote:
>> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
>> index 2317a71..062c93c 100644
>> --- a/arch/arm/include/asm/pgtable-2level.h
>> +++ b/arch/arm/include/asm/pgtable-2level.h
>> @@ -123,6 +123,11 @@
>> ?#define L_PTE_USER ? ? ? ? ? (_AT(pteval_t, 1) << 8)
>> ?#define L_PTE_XN ? ? ? ? ? ? (_AT(pteval_t, 1) << 9)
>> ?#define L_PTE_SHARED ? ? ? ? (_AT(pteval_t, 1) << 10) ? ? ? ?/* shared(v6), coherent(xsc3) */
>> +#ifdef CONFIG_ARM_HUGETLB_SUPPORT
>> +#define L_PTE_HUGEPAGE ? ? ? (_AT(pteval_t, 1) << 11) /* mark hugepage */
>> +#define L_PTE_HPAGE_2M ?(_AT(pteval_t, 1) << 12) /* only when HUGEPAGE set */
>> +#define L_PTE_HPAGE_16M (_AT(pteval_t, 1) << 13) /* only when HUGEPAGE set */
>> +#endif
>
> (1) How does this work when normal pages can have bit 11 set if they're an
> odd PFN?

Isn't that bit 12?

> (2) How do we even get to PTE level when a 2 or 16MB section doesn't have
> a pte table (as the L1 entry is used for the section or supersection
> mapping) ?

We don't, that's why I think we don't even need this bit defined.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC] ARM hugetlb support
  2012-01-31  9:29 ` [RFC] ARM hugetlb support Catalin Marinas
@ 2012-02-01  1:56   ` bill4carson
  2012-02-02 14:38     ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-01  1:56 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?01?31? 17:29, Catalin Marinas wrote:
> Hi,
>
> On 30 January 2012 07:57,<bill4carson@gmail.com>  wrote:
>> This patch aims to support huge page for ARM. For now, 2MB(two 1MB page)/16MB
>> huge page are supported, Versatile Express Cortex-A9x4 tile is used as test
>> board. Verifications are running with libhugetlbfs and ltp.
> I haven't reviewed your patches yet but just a few thoughts. Do you
> have a clear case where 16MB super-sections is needed? Do you have
> memory beyond 4G that can only be accessed with super-sections? This
> is an optional feature and not all processors have it.
The initial support only focus on ARMv7 based processor like cortex A9, 
basically that's
what I got in my hand, one Versatile Express Cortex-A9x4 tile board.

> It also doesn't
> help much with the TLB hit rate since the micro-TLB most likely only
> support 1MB sections.
>
I am afraid I can't agree with you on this.
Truly there is no specific statement about whether micro-TLB support 
16MB super-sections,
but main-TLB actually helps the hit rate, and main-TLB support all page 
size mapping.


> BTW, I have a (simpler) implementation of hugetlbfs with 2MB sections
> but for LPAE only:
>
> http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-arm-arch.git;a=shortlog;h=refs/heads/hugetlb
>
Thanks for your information, I will study your code carefully :)



-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-01-31 12:25     ` Catalin Marinas
@ 2012-02-01  3:10       ` bill4carson
  2012-02-06 16:26         ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-01  3:10 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?01?31? 20:25, Catalin Marinas wrote:
> On 31 January 2012 09:58, Russell King - ARM Linux
> <linux@arm.linux.org.uk>  wrote:
>> On Mon, Jan 30, 2012 at 03:57:13PM +0800, bill4carson at gmail.com wrote:
>>> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
>>> index 2317a71..062c93c 100644
>>> --- a/arch/arm/include/asm/pgtable-2level.h
>>> +++ b/arch/arm/include/asm/pgtable-2level.h
>>> @@ -123,6 +123,11 @@
>>>   #define L_PTE_USER           (_AT(pteval_t, 1)<<  8)
>>>   #define L_PTE_XN             (_AT(pteval_t, 1)<<  9)
>>>   #define L_PTE_SHARED         (_AT(pteval_t, 1)<<  10)        /* shared(v6), coherent(xsc3) */
>>> +#ifdef CONFIG_ARM_HUGETLB_SUPPORT
>>> +#define L_PTE_HUGEPAGE       (_AT(pteval_t, 1)<<  11) /* mark hugepage */
>>> +#define L_PTE_HPAGE_2M  (_AT(pteval_t, 1)<<  12) /* only when HUGEPAGE set */
>>> +#define L_PTE_HPAGE_16M (_AT(pteval_t, 1)<<  13) /* only when HUGEPAGE set */
>>> +#endif
>> (1) How does this work when normal pages can have bit 11 set if they're an
>> odd PFN?
> Isn't that bit 12?
>
>> (2) How do we even get to PTE level when a 2 or 16MB section doesn't have
>> a pte table (as the L1 entry is used for the section or supersection
>> mapping) ?
> We don't, that's why I think we don't even need this bit defined.
>
First, thanks for Russel/Catalin to take time review this patch:)


a:
By pte used in here, I mean linux pte. linux pte is needed by generic mm 
layer
whether this mapping is 4K, 1MB or 16MB.


b:
Why L_PTE_HUGEPAGE is needed?

hugetlb subsystem will call pte_page to derive the corresponding page 
struct from
a given pte, and pte_pfn is used first to convert pte into a page frame 
number.
This is where need to be careful. Normal page based pte upper [31:12] 
bits is pfn,
huge page(1MB) based pte upper[31:20] is pfn, so one bit MUST 
distinguish normal
page based pte with huge page based pte.

This one bit will *only* be BIT11, one last unused PTE flag within 
normal page.
that's why L_PTE_HUGEPAGE is defined.

If L_PTE_HUGEPAGE is set, then BIT19:12 is enough to be used to mark 
this pte is 1MB
or 16MB huge page.



-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-01-31 10:01   ` Russell King - ARM Linux
@ 2012-02-01  5:45     ` bill4carson
  2012-02-06  2:04       ` bill4carson
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-01  5:45 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?01?31? 18:01, Russell King - ARM Linux wrote:
> On Mon, Jan 30, 2012 at 03:57:15PM +0800, bill4carson at gmail.com wrote:
>> From: Bill Carson<bill4carson@gmail.com>
>>
>> One easy way to store huge page linux pte is mm_struct instead of thread_info
>> that's because when parent task with huge page VMA calls fork, parent huge page
>> pagetable entries are copied into child pagetable. This is done in
>>
>> int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>> 			    struct vm_area_struct *vma)
>>
>> We cannot derive child's thread_info just using struct mm_struct *dst.
>> if we have struct mm_struct **dst, then it's easy to find the corresponding
>> task_struct as well as thread_info, but we only get struct mm_struct *dst.
>> It's possible to find the desired task_struct by iterating the global task list
>> by comparing task_struct->mm with dst.
>> So mm_struct is used for huge page linux pte for faster lookup and efficient.
> I really do not understand this description, and it doesn't seem to tie
> up with the code.  What problem are you trying to solve here?
>
> Note that a mm_struct can be shared between multiple task_structs, so
> if your thinking is that something in the mm_struct or page table needs
> to know about a task_struct, you're ideas are wrong.
>
Normal page based pte has hardware version and linux version, these two 
kinds of
ptes occupy half of 4K page, and each of two pmd level entry pointer to 
this half 4K
page.

For huge page, mappings only exist in pmd level from hardware 
perspective, however
mm subsystem also needs to know the linux version *pte* of this huge 
page based
mapping, problems is where to store these huge page linux pte?

A:
Store huge page linux pte in mm_struct

B:
Store huge page linux pte in mm_context_t suggested by Catalin.
This is almost like option A, while it's nice to modify arm code only
and take small effort to make it happen.

C:
Modify pgd_alloc to allocate 2048 pgd_t entries plus two void pointers
Use these two additional pointers to store huge page linux pte.
It's feasible, but I don't know whether this is a good idea.

Russelll and Catalin:

Could you please give your advice on this?


-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC] ARM hugetlb support
  2012-02-01  1:56   ` bill4carson
@ 2012-02-02 14:38     ` Catalin Marinas
  2012-02-03  1:41       ` bill4carson
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-02 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2012 01:56, bill4carson <bill4carson@gmail.com> wrote:
> On 2012?01?31? 17:29, Catalin Marinas wrote:
>> It also doesn't
>> help much with the TLB hit rate since the micro-TLB most likely only
>> support 1MB sections.
>
> I am afraid I can't agree with you on this.
> Truly there is no specific statement about whether micro-TLB support 16MB
> super-sections,
> but main-TLB actually helps the hit rate, and main-TLB support all page size
> mapping.

The feedback over time was that 16MB didn't help much in terms of TLB
misses compared to 1MB but that was for standard usage. I guess the
hugetlbfs has specific scenarios that may benefit.

Anyway, could you make the 2MB/16MB huge page size configurable at
boot time (command line option like on other architectures)? And also
check the ID_MMFR0[31:28] for whether supersections are supported.

>> BTW, I have a (simpler) implementation of hugetlbfs with 2MB sections
>> but for LPAE only:
>>
>>
>> http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-arm-arch.git;a=shortlog;h=refs/heads/hugetlb
>>
> Thanks for your information, I will study your code carefully :)

Most of the simplicity comes from the fact that LPAE doesn't need a
separate Linux PTE table.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC] ARM hugetlb support
  2012-02-02 14:38     ` Catalin Marinas
@ 2012-02-03  1:41       ` bill4carson
  2012-02-06 16:29         ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-03  1:41 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?02? 22:38, Catalin Marinas wrote:
> On 1 February 2012 01:56, bill4carson<bill4carson@gmail.com>  wrote:
>> On 2012?01?31? 17:29, Catalin Marinas wrote:
>>> It also doesn't
>>> help much with the TLB hit rate since the micro-TLB most likely only
>>> support 1MB sections.
>> I am afraid I can't agree with you on this.
>> Truly there is no specific statement about whether micro-TLB support 16MB
>> super-sections,
>> but main-TLB actually helps the hit rate, and main-TLB support all page size
>> mapping.
> The feedback over time was that 16MB didn't help much in terms of TLB
> misses compared to 1MB but that was for standard usage. I guess the
> hugetlbfs has specific scenarios that may benefit.
>
> Anyway, could you make the 2MB/16MB huge page size configurable at
> boot time (command line option like on other architectures)?
Actually, only X86 supports multiple page size configuration at boot 
time, other architectures
will has fixed huge page size once build.

I'm not sure it's a MUST to support multiple page size at boot time,  
but I'm sure this will make
marco "pte_pfn" much more complicated.

> And also
> check the ID_MMFR0[31:28] for whether supersections are supported.
>
Thanks for your tips :)


>>> BTW, I have a (simpler) implementation of hugetlbfs with 2MB sections
>>> but for LPAE only:
>>>
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-arm-arch.git;a=shortlog;h=refs/heads/hugetlb
>>>
>> Thanks for your information, I will study your code carefully :)
> Most of the simplicity comes from the fact that LPAE doesn't need a
> separate Linux PTE table.
>

-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-02-01  5:45     ` bill4carson
@ 2012-02-06  2:04       ` bill4carson
  2012-02-06 10:29         ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-06  2:04 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?01? 13:45, bill4carson wrote:
>
>
> On 2012?01?31? 18:01, Russell King - ARM Linux wrote:
>> On Mon, Jan 30, 2012 at 03:57:15PM +0800, bill4carson at gmail.com wrote:
>>> From: Bill Carson<bill4carson@gmail.com>
>>>
>>> One easy way to store huge page linux pte is mm_struct instead of
>>> thread_info
>>> that's because when parent task with huge page VMA calls fork, parent
>>> huge page
>>> pagetable entries are copied into child pagetable. This is done in
>>>
>>> int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct
>>> *src,
>>> struct vm_area_struct *vma)
>>>
>>> We cannot derive child's thread_info just using struct mm_struct *dst.
>>> if we have struct mm_struct **dst, then it's easy to find the
>>> corresponding
>>> task_struct as well as thread_info, but we only get struct mm_struct
>>> *dst.
>>> It's possible to find the desired task_struct by iterating the global
>>> task list
>>> by comparing task_struct->mm with dst.
>>> So mm_struct is used for huge page linux pte for faster lookup and
>>> efficient.
>> I really do not understand this description, and it doesn't seem to tie
>> up with the code. What problem are you trying to solve here?
>>
>> Note that a mm_struct can be shared between multiple task_structs, so
>> if your thinking is that something in the mm_struct or page table needs
>> to know about a task_struct, you're ideas are wrong.
>>
> Normal page based pte has hardware version and linux version, these two
> kinds of
> ptes occupy half of 4K page, and each of two pmd level entry pointer to
> this half 4K
> page.
>
> For huge page, mappings only exist in pmd level from hardware
> perspective, however
> mm subsystem also needs to know the linux version *pte* of this huge
> page based
> mapping, problems is where to store these huge page linux pte?
>
> A:
> Store huge page linux pte in mm_struct
>
> B:
> Store huge page linux pte in mm_context_t suggested by Catalin.
> This is almost like option A, while it's nice to modify arm code only
> and take small effort to make it happen.
>
> C:
> Modify pgd_alloc to allocate 2048 pgd_t entries plus two void pointers
> Use these two additional pointers to store huge page linux pte.
> It's feasible, but I don't know whether this is a good idea.
>
> Russelll and Catalin:
>
> Could you please give your advice on this?
>
>

Hi, Russell

I didn't mean to derive task_struct from a mm_struct, as you said,
multiple tasks could possible share one mm_struct.

What I did is to store huge page linux pte on mm_struct(or
mm_context_t), cause all hugetlb up layer hooks have mm_struct in
parameters. IMHO, that's what I could figure out right now, if this
idea sounds wrong or silly, please let me know.

And I'm wondering could you please review the other parts of this patch?


thanks


-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-02-06  2:04       ` bill4carson
@ 2012-02-06 10:29         ` Catalin Marinas
  2012-02-06 14:40           ` carson bill
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-06 10:29 UTC (permalink / raw)
  To: linux-arm-kernel

bill4carson wrote:
> Normal page based pte has hardware version and linux version, these
> two kinds of ptes occupy half of 4K page, and each of two pmd level
> entry pointer to this half 4K page.
>
> For huge page, mappings only exist in pmd level from hardware
> perspective, however mm subsystem also needs to know the linux version
> *pte* of this huge page based mapping, problems is where to store
> these huge page linux pte?
>
> A:
> Store huge page linux pte in mm_struct
>
> B:
> Store huge page linux pte in mm_context_t suggested by Catalin.
> This is almost like option A, while it's nice to modify arm code only
> and take small effort to make it happen.
>
> C:
> Modify pgd_alloc to allocate 2048 pgd_t entries plus two void pointers
> Use these two additional pointers to store huge page linux pte.
> It's feasible, but I don't know whether this is a good idea.

I wouldn't add this to pgd_alloc since not all tasks need huge pages.
First option is also not feasible as we shouldn't touch generic code.
This leaves us with B, unless better options are suggested.

I had a patch couple of years ago
(http://article.gmane.org/gmane.linux.ports.arm.kernel/55045) to use the
AF (Access Flag) bit and free up some bits in the page table, allowing
us to drop the Linux PTE. But we end up in a bigger configuration mess
as we already have to choose between classic MMU and LPAE. The
performance advantage wasn't that big either.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/7] Store huge page linux pte in mm_struct
  2012-02-06 10:29         ` Catalin Marinas
@ 2012-02-06 14:40           ` carson bill
  0 siblings, 0 replies; 50+ messages in thread
From: carson bill @ 2012-02-06 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

2012/2/6, Catalin Marinas <catalin.marinas@arm.com>:
> bill4carson wrote:
>> Normal page based pte has hardware version and linux version, these
>> two kinds of ptes occupy half of 4K page, and each of two pmd level
>> entry pointer to this half 4K page.
>>
>> For huge page, mappings only exist in pmd level from hardware
>> perspective, however mm subsystem also needs to know the linux version
>> *pte* of this huge page based mapping, problems is where to store
>> these huge page linux pte?
>>
>> A:
>> Store huge page linux pte in mm_struct
>>
>> B:
>> Store huge page linux pte in mm_context_t suggested by Catalin.
>> This is almost like option A, while it's nice to modify arm code only
>> and take small effort to make it happen.
>>
>> C:
>> Modify pgd_alloc to allocate 2048 pgd_t entries plus two void pointers
>> Use these two additional pointers to store huge page linux pte.
>> It's feasible, but I don't know whether this is a good idea.
>
> I wouldn't add this to pgd_alloc since not all tasks need huge pages.
> First option is also not feasible as we shouldn't touch generic code.
> This leaves us with B, unless better options are suggested.
>

So far B sounds better, I will make that happens in V2.
I don't know whether other guru have any opinion on this patch,
Advice/suggestion/criticism are all truly welcome!


> I had a patch couple of years ago
> (http://article.gmane.org/gmane.linux.ports.arm.kernel/55045) to use the
> AF (Access Flag) bit and free up some bits in the page table, allowing
> us to drop the Linux PTE. But we end up in a bigger configuration mess
> as we already have to choose between classic MMU and LPAE. The
> performance advantage wasn't that big either.
> --
> Catalin
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-01  3:10       ` bill4carson
@ 2012-02-06 16:26         ` Catalin Marinas
  2012-02-07  1:42           ` bill4carson
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-06 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
> Why L_PTE_HUGEPAGE is needed?
> 
> hugetlb subsystem will call pte_page to derive the corresponding page 
> struct from a given pte, and pte_pfn is used first to convert pte into
> a page frame number.

Are you sure the pte_pfn() conversion is right? Does it need to be
different from the 4K pfn? I haven't seen any other architecture doing
shifts other than PAGE_SHIFT even for huge pages.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC] ARM hugetlb support
  2012-02-03  1:41       ` bill4carson
@ 2012-02-06 16:29         ` Catalin Marinas
  0 siblings, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-02-06 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 03, 2012 at 01:41:18AM +0000, bill4carson wrote:
> On 2012?02?02? 22:38, Catalin Marinas wrote:
> > On 1 February 2012 01:56, bill4carson<bill4carson@gmail.com>  wrote:
> >> On 2012?01?31? 17:29, Catalin Marinas wrote:
> >>> It also doesn't
> >>> help much with the TLB hit rate since the micro-TLB most likely only
> >>> support 1MB sections.
> >> I am afraid I can't agree with you on this.
> >> Truly there is no specific statement about whether micro-TLB support 16MB
> >> super-sections,
> >> but main-TLB actually helps the hit rate, and main-TLB support all page size
> >> mapping.
> > The feedback over time was that 16MB didn't help much in terms of TLB
> > misses compared to 1MB but that was for standard usage. I guess the
> > hugetlbfs has specific scenarios that may benefit.
> >
> > Anyway, could you make the 2MB/16MB huge page size configurable at
> > boot time (command line option like on other architectures)?
> Actually, only X86 supports multiple page size configuration at boot 
> time, other architectures will has fixed huge page size once build.

Boot time would be just fine on ARM as well. But as we go towards single
zImage, the super-sections may not be always available so one can always
use sections as a fall-back.

> I'm not sure it's a MUST to support multiple page size at boot time,
> but I'm sure this will make marco "pte_pfn" much more complicated.

I'm not entirely sure the complicated pte_pfn is needed (see my other
comment).

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-01-30  7:57 ` [PATCH 1/7] Add various hugetlb arm high level hooks bill4carson at gmail.com
@ 2012-02-06 17:07   ` Catalin Marinas
  2012-02-07  2:00     ` bill4carson
  2012-02-07 12:15   ` Catalin Marinas
  1 sibling, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-06 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 30, 2012 at 07:57:12AM +0000, bill4carson at gmail.com wrote:
> +static inline void set_hugepte_section(struct mm_struct *mm, unsigned long addr,
> +                                  pte_t *ptep, pte_t pte)
> +{
> +       pgd_t *pgd;
> +       pud_t *pud;
> +       pmd_t *pmd;
> +
> +       int col, row;
> +       pte_t **huge_linuxpte = &mm->huge_2m_pte[0];
> +
> +       row = HUGEPAGE_2M_PTE_ARRAY_ROW(addr);
> +       col = HUGEPAGE_2M_PTE_ARRAY_COL(addr);
> +
> +       /* an valid pte pointer is expected */
> +       BUG_ON(huge_linuxpte[row] == 0);
> +       BUG_ON(ptep != &huge_linuxpte[row][col]);
> +
> +       /* set linux pte first */
> +       huge_linuxpte[row][col] = pte;
> +
> +       /* set hardware pte */
> +       pgd = pgd_offset(mm, addr);
> +       pud = pud_offset(pgd, addr);
> +       pmd = pmd_offset(pud, addr);
> +
> +       set_hugepte_at(mm, addr, pmd, pte);
> +}

I haven't followed the whole structure of your patches but do we need to
walk the page tables here? Isn't the ptep the same as the pmd when
passed to this function (at least it was with my LPAE implementation).

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-06 16:26         ` Catalin Marinas
@ 2012-02-07  1:42           ` bill4carson
  2012-02-07 11:50             ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-07  1:42 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?07? 00:26, Catalin Marinas wrote:
> On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
>> Why L_PTE_HUGEPAGE is needed?
>>
>> hugetlb subsystem will call pte_page to derive the corresponding page
>> struct from a given pte, and pte_pfn is used first to convert pte into
>> a page frame number.
>
> Are you sure the pte_pfn() conversion is right? Does it need to be
> different from the 4K pfn?

Hello, Catalin

Let me take a few words to make this clear for us.

pte_page is defined as following to derive page struct from a given pte.
This macro is used both in generic mm as well as hugetlb sub-system, so
we need do the switch in pte_pfn to mark huge page based linux pte out
of normal page based linux pte, that's what L_PTE_HUGEPAGE for.

#define pte_page(pte)		pfn_to_page(pte_pfn(pte))


So L_PTE_HUGEPAGE is *NOT* set in normal page based linux pte,
linux pte bits[31:12] is the page frame number;
otherwise, we got a huge page based linux pte, and linux pte
bits[31:20] is page frame number for SECTION mapping, and bits[31:24]
is page frame number for SUPER-SECTION mapping.

I think this the full story of following code:

#ifdef CONFIG_ARM_HUGETLB_SUPPORT

#ifdef CONFIG_HUGEPAGE_SIZE_2MB
#define hugepte_pfn(pte)    ((pte_val(pte) & SECTION_MASK) >> PAGE_SHIFT)
#endif
#ifdef CONFIG_HUGEPAGE_SIZE_16MB
#define hugepte_pfn(pte)    ((pte_val(pte) & SUPERSECTION_MASK) >> 
PAGE_SHIFT)
#endif
#define pte_is_huge(pte)	(pte_val(pte) & L_PTE_HUGEPAGE)
#define pte_pfn(pte)	    (pte_is_huge(pte) ? \
				hugepte_pfn(pte) : ((pte_val(pte) & PHYS_MASK) >> PAGE_SHIFT))
#else
#define pte_pfn(pte)		((pte_val(pte) & PHYS_MASK) >> PAGE_SHIFT)
#endif /*!CONFIG_ARM_HUGETLB_SUPPORT*/


#define pte_page(pte)		pfn_to_page(pte_pfn(pte))




> I haven't seen any other architecture doing
> shifts other than PAGE_SHIFT even for huge pages.
>

#define L_PTE_HUGEPAGE	(_AT(pteval_t, 1) << 11) /* mark hugepage */
#define L_PTE_HPAGE_2M  (_AT(pteval_t, 1) << 12) /* only when HUGEPAGE 
set */
#define L_PTE_HPAGE_16M (_AT(pteval_t, 1) << 13) /* only when HUGEPAGE 
set */

See linux pte BIT12 is used to denote 2MB huge page, BIT13 is used for
16MB page, that's why PAGE_SHIFT is not enough for do the shifting.

I hope I understand your question and give the right answer :)



-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-06 17:07   ` Catalin Marinas
@ 2012-02-07  2:00     ` bill4carson
  2012-02-07 11:54       ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-07  2:00 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?07? 01:07, Catalin Marinas wrote:
> On Mon, Jan 30, 2012 at 07:57:12AM +0000, bill4carson at gmail.com wrote:
>> +static inline void set_hugepte_section(struct mm_struct *mm, unsigned long addr,
>> +                                  pte_t *ptep, pte_t pte)
>> +{
>> +       pgd_t *pgd;
>> +       pud_t *pud;
>> +       pmd_t *pmd;
>> +
>> +       int col, row;
>> +       pte_t **huge_linuxpte =&mm->huge_2m_pte[0];
>> +
>> +       row = HUGEPAGE_2M_PTE_ARRAY_ROW(addr);
>> +       col = HUGEPAGE_2M_PTE_ARRAY_COL(addr);
>> +
>> +       /* an valid pte pointer is expected */
>> +       BUG_ON(huge_linuxpte[row] == 0);
>> +       BUG_ON(ptep !=&huge_linuxpte[row][col]);
>> +
>> +       /* set linux pte first */
>> +       huge_linuxpte[row][col] = pte;
>> +
>> +       /* set hardware pte */
>> +       pgd = pgd_offset(mm, addr);
>> +       pud = pud_offset(pgd, addr);
>> +       pmd = pmd_offset(pud, addr);
>> +
>> +       set_hugepte_at(mm, addr, pmd, pte);
>> +}
>
> I haven't followed the whole structure of your patches but do we need to
> walk the page tables here? Isn't the ptep the same as the pmd when
> passed to this function (at least it was with my LPAE implementation).
>

Here, ptep is not the same as pmd, mm layer always manages linux pte.
For normal page, linux pte and hardware pte is just an 2048 bytes offset
away and cpu_v7_set_pte_ext can set both linux/hardware pte easily.


For huge page, linux pte is stored somewhere else,far away from hardware
pte table, in set_hugepte_section, "ptep" is the huge page based linux
pte address, first set linux pte value "pte" in there; and hardware pmd
address could only be derived from "addr" by a page table walk, then
setting hardware pmd value in set_hugepte_at, which
cpu_v7_set_hugepte_ext does the whole job.




-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-07  1:42           ` bill4carson
@ 2012-02-07 11:50             ` Catalin Marinas
  2012-02-07 13:24               ` carson bill
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-07 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 07, 2012 at 01:42:01AM +0000, bill4carson wrote:
> On 2012?02?07? 00:26, Catalin Marinas wrote:
> > On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
> >> Why L_PTE_HUGEPAGE is needed?
> >>
> >> hugetlb subsystem will call pte_page to derive the corresponding page
> >> struct from a given pte, and pte_pfn is used first to convert pte into
> >> a page frame number.
> >
> > Are you sure the pte_pfn() conversion is right? Does it need to be
> > different from the 4K pfn?
...
> pte_page is defined as following to derive page struct from a given pte.
> This macro is used both in generic mm as well as hugetlb sub-system, so
> we need do the switch in pte_pfn to mark huge page based linux pte out
> of normal page based linux pte, that's what L_PTE_HUGEPAGE for.
> 
> #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
> 
> So L_PTE_HUGEPAGE is *NOT* set in normal page based linux pte,
> linux pte bits[31:12] is the page frame number;

I agree.

> otherwise, we got a huge page based linux pte, and linux pte
> bits[31:20] is page frame number for SECTION mapping, and bits[31:24]
> is page frame number for SUPER-SECTION mapping.

Actually it is still 31:12 but with bits 19:12 or 23:12 masked out. So
you do the correct shift by PAGE_SHIFT with the additional masking for
huge pages (harmless).

But do we actually need this masking? Do the huge_pte_offset() or
huge_pte_alloc() functions return the Linux pte (pmd) for the huge page?
If yes, can we not ensure that bits 19:12 are already zero? This
shouldn't be any different from the 4K Linux pte but with an address
aligned to 1MB.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-07  2:00     ` bill4carson
@ 2012-02-07 11:54       ` Catalin Marinas
  0 siblings, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-02-07 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 07, 2012 at 02:00:25AM +0000, bill4carson wrote:
> On 2012?02?07? 01:07, Catalin Marinas wrote:
> > On Mon, Jan 30, 2012 at 07:57:12AM +0000, bill4carson at gmail.com wrote:
> >> +static inline void set_hugepte_section(struct mm_struct *mm, unsigned long addr,
> >> +                                  pte_t *ptep, pte_t pte)
> >> +{
> >> +       pgd_t *pgd;
> >> +       pud_t *pud;
> >> +       pmd_t *pmd;
> >> +
> >> +       int col, row;
> >> +       pte_t **huge_linuxpte =&mm->huge_2m_pte[0];
> >> +
> >> +       row = HUGEPAGE_2M_PTE_ARRAY_ROW(addr);
> >> +       col = HUGEPAGE_2M_PTE_ARRAY_COL(addr);
> >> +
> >> +       /* an valid pte pointer is expected */
> >> +       BUG_ON(huge_linuxpte[row] == 0);
> >> +       BUG_ON(ptep !=&huge_linuxpte[row][col]);
> >> +
> >> +       /* set linux pte first */
> >> +       huge_linuxpte[row][col] = pte;
> >> +
> >> +       /* set hardware pte */
> >> +       pgd = pgd_offset(mm, addr);
> >> +       pud = pud_offset(pgd, addr);
> >> +       pmd = pmd_offset(pud, addr);
> >> +
> >> +       set_hugepte_at(mm, addr, pmd, pte);
> >> +}
> >
> > I haven't followed the whole structure of your patches but do we need to
> > walk the page tables here? Isn't the ptep the same as the pmd when
> > passed to this function (at least it was with my LPAE implementation).
> 
> Here, ptep is not the same as pmd, mm layer always manages linux pte.
> For normal page, linux pte and hardware pte is just an 2048 bytes offset
> away and cpu_v7_set_pte_ext can set both linux/hardware pte easily.
> 
> For huge page, linux pte is stored somewhere else,far away from hardware
> pte table, in set_hugepte_section, "ptep" is the huge page based linux
> pte address, first set linux pte value "pte" in there; and hardware pmd
> address could only be derived from "addr" by a page table walk, then
> setting hardware pmd value in set_hugepte_at, which
> cpu_v7_set_hugepte_ext does the whole job.

OK. I thought we already have some pointer in the Linux pmd for the
actual hardware pmd. Walking the page table isn't expensive in this
case, with only 2 levels. For LPAE we don't even need this since we
don't have a separate Linux pte/pmd.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-01-30  7:57 ` [PATCH 1/7] Add various hugetlb arm high level hooks bill4carson at gmail.com
  2012-02-06 17:07   ` Catalin Marinas
@ 2012-02-07 12:15   ` Catalin Marinas
  2012-02-07 12:57     ` carson bill
  1 sibling, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-07 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 30, 2012 at 07:57:12AM +0000, bill4carson at gmail.com wrote:
> +/* 2M and 16M hugepage linux ptes are stored in an array
> + *
> + * 2M hugepage
> + * ===========
> + * one linux pte caters to two HW ptes,
> + * so the maximum huge linux pte needed is 4096M/2M = 2048 entry pointers.
> + * Two 4K page is used to store these entry pointers(2048 * 4 = 8192 bytes)
> + * in a two-dimension array, huge_2m_pte[2][1024].

Actually we only need to cover TASK_SIZE so for a 2:2 split you only
need half of the above.

> + *
> + * How to find the hugepage linux pte corresponding to a specific address ?
> + * VA[31] is used as row index;
> + * VA[30:21] is used as column  index;

I haven't fully reviewed the code but can we not drop this row/column
set up and just use a VA[31:21] as the index?

> + *
> + * 16M hugepage
> + * ============
> + * one linux pte caters for one HW pte,

Actually that's a bit misleading as we need 16 consecutive pmd entries
for a supersection. So one Linux pmd caters for 16 HW pmds.

> + * so maxium huge linux pte needed is 4096M/16M = 256 entry pointers,
> + * 256 * 4 = 1024 bytes spaces is allocated to store these linux pte;
> + * this is a simple one-dimension array huge_16m_pte[256].
> + *
> + * VA[31:24] is used to index this array;

Maybe we should call them Linux pmd rather than pte in the comments? It
is less confusing (I know that the generic hugetlb code calls them
ptes).

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-07 12:15   ` Catalin Marinas
@ 2012-02-07 12:57     ` carson bill
  0 siblings, 0 replies; 50+ messages in thread
From: carson bill @ 2012-02-07 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
> On Mon, Jan 30, 2012 at 07:57:12AM +0000, bill4carson at gmail.com wrote:
>> +/* 2M and 16M hugepage linux ptes are stored in an array
>> + *
>> + * 2M hugepage
>> + * ===========
>> + * one linux pte caters to two HW ptes,
>> + * so the maximum huge linux pte needed is 4096M/2M = 2048 entry
>> pointers.
>> + * Two 4K page is used to store these entry pointers(2048 * 4 = 8192
>> bytes)
>> + * in a two-dimension array, huge_2m_pte[2][1024].
>
> Actually we only need to cover TASK_SIZE so for a 2:2 split you only
> need half of the above.

I haven't polish here yet.
Yes, to be precise, this array depends on TASK_SIZE configuration.
The maximum array size is 1536 when TASK_SIZE is 0xc0000000
then kmalloc instead of __get_free_page can be used to allocate
1536 * sizeof(pte_t *). In this way VA[31:21] could be used as index
to get rid of row/column stuff.

I will keep this on V2 todo list :)


>
>> + *
>> + * How to find the hugepage linux pte corresponding to a specific address
>> ?
>> + * VA[31] is used as row index;
>> + * VA[30:21] is used as column  index;
>
> I haven't fully reviewed the code but can we not drop this row/column
> set up and just use a VA[31:21] as the index?

Please see above reply.

>
>> + *
>> + * 16M hugepage
>> + * ============
>> + * one linux pte caters for one HW pte,
>
> Actually that's a bit misleading as we need 16 consecutive pmd entries
> for a supersection. So one Linux pmd caters for 16 HW pmds.

You are right, I will correct this in V2.

>
>> + * so maxium huge linux pte needed is 4096M/16M = 256 entry pointers,
>> + * 256 * 4 = 1024 bytes spaces is allocated to store these linux pte;
>> + * this is a simple one-dimension array huge_16m_pte[256].
>> + *
>> + * VA[31:24] is used to index this array;
>
> Maybe we should call them Linux pmd rather than pte in the comments? It
> is less confusing (I know that the generic hugetlb code calls them
> ptes).

OK, I will correct this in V2.

>
> --
> Catalin
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-07 11:50             ` Catalin Marinas
@ 2012-02-07 13:24               ` carson bill
  2012-02-07 14:11                 ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: carson bill @ 2012-02-07 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
> On Tue, Feb 07, 2012 at 01:42:01AM +0000, bill4carson wrote:
>> On 2012?02?07? 00:26, Catalin Marinas wrote:
>> > On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
>> >> Why L_PTE_HUGEPAGE is needed?
>> >>
>> >> hugetlb subsystem will call pte_page to derive the corresponding page
>> >> struct from a given pte, and pte_pfn is used first to convert pte into
>> >> a page frame number.
>> >
>> > Are you sure the pte_pfn() conversion is right? Does it need to be
>> > different from the 4K pfn?
> ...
>> pte_page is defined as following to derive page struct from a given pte.
>> This macro is used both in generic mm as well as hugetlb sub-system, so
>> we need do the switch in pte_pfn to mark huge page based linux pte out
>> of normal page based linux pte, that's what L_PTE_HUGEPAGE for.
>>
>> #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
>>
>> So L_PTE_HUGEPAGE is *NOT* set in normal page based linux pte,
>> linux pte bits[31:12] is the page frame number;
>
> I agree.
>
>> otherwise, we got a huge page based linux pte, and linux pte
>> bits[31:20] is page frame number for SECTION mapping, and bits[31:24]
>> is page frame number for SUPER-SECTION mapping.
>
> Actually it is still 31:12 but with bits 19:12 or 23:12 masked out. So
> you do the correct shift by PAGE_SHIFT with the additional masking for
> huge pages (harmless).
>
> But do we actually need this masking? Do the huge_pte_offset() or
> huge_pte_alloc() functions return the Linux pte (pmd) for the huge page?
> If yes, can we not ensure that bits 19:12 are already zero? This
> shouldn't be any different from the 4K Linux pte but with an address
> aligned to 1MB.
>

I'm afraid there is some misunderstanding.
huge_pte_offset() returns the huge linux pte address if they exist;
huge_pte_alloc()  allocates a location to store huge linux pte, and
return this address;
non of above functions return huge linux pte *value*.

make_huge_pte() will return huge linux pte for a given page and vma
protection bits,
please notice pte_mkhuge is used to mark this pte as huge linux pte by setting
L_PTE_HUGEPAGE, then set_huge_pte_at() is used to set huge linux pte as well
huge hardware pte.


2113static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
2114                                int writable)
2115{
2116        pte_t entry;
2117
2118        if (writable) {
2119                entry =
2120                    pte_mkwrite(pte_mkdirty(mk_pte(page,
vma->vm_page_prot)));
2121        } else {
2122                entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot));
2123        }
2124        entry = pte_mkyoung(entry);
2125        entry = pte_mkhuge(entry);
2126
2127        return entry;
2128}

Hence, normal linux pte must has L_PTE_HUGEPAE cleared;
A huge linux pte must has L_PTE_HUGEPAGE(BIT11) set
This could lead to L_PTE_HPAGE_2M(BIT12) or L_PTE_HPAGE_16M(BIT13) set
respectively, that's why the masking is needed for pte_pfn.


+#ifdef CONFIG_ARM_HUGETLB_SUPPORT
+#define L_PTE_HUGEPAGE	(_AT(pteval_t, 1) << 11) /* mark hugepage */
+#define L_PTE_HPAGE_2M  (_AT(pteval_t, 1) << 12) /* only when HUGEPAGE set */
+#define L_PTE_HPAGE_16M (_AT(pteval_t, 1) << 13) /* only when HUGEPAGE set */
+#endif


> --
> Catalin
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-07 13:24               ` carson bill
@ 2012-02-07 14:11                 ` Catalin Marinas
  2012-02-07 14:46                   ` carson bill
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-07 14:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 07, 2012 at 01:24:09PM +0000, carson bill wrote:
> 2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
> > On Tue, Feb 07, 2012 at 01:42:01AM +0000, bill4carson wrote:
> >> On 2012?02?07? 00:26, Catalin Marinas wrote:
> >> > On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
> >> >> Why L_PTE_HUGEPAGE is needed?
> >> >>
> >> >> hugetlb subsystem will call pte_page to derive the corresponding page
> >> >> struct from a given pte, and pte_pfn is used first to convert pte into
> >> >> a page frame number.
> >> >
> >> > Are you sure the pte_pfn() conversion is right? Does it need to be
> >> > different from the 4K pfn?
> > ...
> >> pte_page is defined as following to derive page struct from a given pte.
> >> This macro is used both in generic mm as well as hugetlb sub-system, so
> >> we need do the switch in pte_pfn to mark huge page based linux pte out
> >> of normal page based linux pte, that's what L_PTE_HUGEPAGE for.
> >>
> >> #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
> >>
> >> So L_PTE_HUGEPAGE is *NOT* set in normal page based linux pte,
> >> linux pte bits[31:12] is the page frame number;
> >
> > I agree.
> >
> >> otherwise, we got a huge page based linux pte, and linux pte
> >> bits[31:20] is page frame number for SECTION mapping, and bits[31:24]
> >> is page frame number for SUPER-SECTION mapping.
> >
> > Actually it is still 31:12 but with bits 19:12 or 23:12 masked out. So
> > you do the correct shift by PAGE_SHIFT with the additional masking for
> > huge pages (harmless).
> >
> > But do we actually need this masking? Do the huge_pte_offset() or
> > huge_pte_alloc() functions return the Linux pte (pmd) for the huge page?
> > If yes, can we not ensure that bits 19:12 are already zero? This
> > shouldn't be any different from the 4K Linux pte but with an address
> > aligned to 1MB.
> 
> I'm afraid there is some misunderstanding.
> huge_pte_offset() returns the huge linux pte address if they exist;
> huge_pte_alloc()  allocates a location to store huge linux pte, and
> return this address;
> non of above functions return huge linux pte *value*.

I agree, huge_pte_offset() returns a pointer to the Linux pte/pmd if it
exists. My point is that the values stored in Linux pte/pmd have bits
20:12 cleared already as the address is at least 2MB aligned (well,
apart from the additional L_PTE_HPAGE_* bits that you declared). Is this
correct? If yes, then you don't need any additional masking for
pte_pfn() even if it is passed a Linux pmd.

> make_huge_pte() will return huge linux pte for a given page and vma
> protection bits,
> please notice pte_mkhuge is used to mark this pte as huge linux pte by setting
> L_PTE_HUGEPAGE, then set_huge_pte_at() is used to set huge linux pte as well
> huge hardware pte.
> 
> 
> 2113static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
> 2114                                int writable)
> 2115{
> 2116        pte_t entry;
> 2117
> 2118        if (writable) {
> 2119                entry =
> 2120                    pte_mkwrite(pte_mkdirty(mk_pte(page,
> vma->vm_page_prot)));
> 2121        } else {
> 2122                entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot));
> 2123        }
> 2124        entry = pte_mkyoung(entry);
> 2125        entry = pte_mkhuge(entry);
> 2126
> 2127        return entry;
> 2128}
> 
> Hence, normal linux pte must has L_PTE_HUGEPAE cleared;
> A huge linux pte must has L_PTE_HUGEPAGE(BIT11) set
> This could lead to L_PTE_HPAGE_2M(BIT12) or L_PTE_HPAGE_16M(BIT13) set
> respectively, that's why the masking is needed for pte_pfn.

But if you avoid setting L_PTE_HPAGE_*, than we don't need the masking
for pte_pfn. In which case, we don't need to differentiate between a
normal and a huge pte in pte_pfn(), so no need for L_PTE_HUGEPAGE. The
set_huge_pte_at() function is only called with a huge pte, so it doesn't
need to check the L_PTE_HUGEPAGE bit either.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-07 14:11                 ` Catalin Marinas
@ 2012-02-07 14:46                   ` carson bill
  2012-02-07 15:09                     ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: carson bill @ 2012-02-07 14:46 UTC (permalink / raw)
  To: linux-arm-kernel

2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
> On Tue, Feb 07, 2012 at 01:24:09PM +0000, carson bill wrote:
>> 2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
>> > On Tue, Feb 07, 2012 at 01:42:01AM +0000, bill4carson wrote:
>> >> On 2012?02?07? 00:26, Catalin Marinas wrote:
>> >> > On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
>> >> >> Why L_PTE_HUGEPAGE is needed?
>> >> >>
>> >> >> hugetlb subsystem will call pte_page to derive the corresponding
>> >> >> page
>> >> >> struct from a given pte, and pte_pfn is used first to convert pte
>> >> >> into
>> >> >> a page frame number.
>> >> >
>> >> > Are you sure the pte_pfn() conversion is right? Does it need to be
>> >> > different from the 4K pfn?
>> > ...
>> >> pte_page is defined as following to derive page struct from a given
>> >> pte.
>> >> This macro is used both in generic mm as well as hugetlb sub-system, so
>> >> we need do the switch in pte_pfn to mark huge page based linux pte out
>> >> of normal page based linux pte, that's what L_PTE_HUGEPAGE for.
>> >>
>> >> #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
>> >>
>> >> So L_PTE_HUGEPAGE is *NOT* set in normal page based linux pte,
>> >> linux pte bits[31:12] is the page frame number;
>> >
>> > I agree.
>> >
>> >> otherwise, we got a huge page based linux pte, and linux pte
>> >> bits[31:20] is page frame number for SECTION mapping, and bits[31:24]
>> >> is page frame number for SUPER-SECTION mapping.
>> >
>> > Actually it is still 31:12 but with bits 19:12 or 23:12 masked out. So
>> > you do the correct shift by PAGE_SHIFT with the additional masking for
>> > huge pages (harmless).
>> >
>> > But do we actually need this masking? Do the huge_pte_offset() or
>> > huge_pte_alloc() functions return the Linux pte (pmd) for the huge page?
>> > If yes, can we not ensure that bits 19:12 are already zero? This
>> > shouldn't be any different from the 4K Linux pte but with an address
>> > aligned to 1MB.
>>
>> I'm afraid there is some misunderstanding.
>> huge_pte_offset() returns the huge linux pte address if they exist;
>> huge_pte_alloc()  allocates a location to store huge linux pte, and
>> return this address;
>> non of above functions return huge linux pte *value*.
>
> I agree, huge_pte_offset() returns a pointer to the Linux pte/pmd if it
> exists. My point is that the values stored in Linux pte/pmd have bits
> 20:12 cleared already as the address is at least 2MB aligned (well,
> apart from the additional L_PTE_HPAGE_* bits that you declared). Is this
> correct? If yes, then you don't need any additional masking for
> pte_pfn() even if it is passed a Linux pmd.

Yes, pte_pfn doesn't need any modification if we don't need any L_PTE_HPAGE_*).


>
>> make_huge_pte() will return huge linux pte for a given page and vma
>> protection bits,
>> please notice pte_mkhuge is used to mark this pte as huge linux pte by
>> setting
>> L_PTE_HUGEPAGE, then set_huge_pte_at() is used to set huge linux pte as
>> well
>> huge hardware pte.
>>
>>
>> 2113static pte_t make_huge_pte(struct vm_area_struct *vma, struct page
>> *page,
>> 2114                                int writable)
>> 2115{
>> 2116        pte_t entry;
>> 2117
>> 2118        if (writable) {
>> 2119                entry =
>> 2120                    pte_mkwrite(pte_mkdirty(mk_pte(page,
>> vma->vm_page_prot)));
>> 2121        } else {
>> 2122                entry = huge_pte_wrprotect(mk_pte(page,
>> vma->vm_page_prot));
>> 2123        }
>> 2124        entry = pte_mkyoung(entry);
>> 2125        entry = pte_mkhuge(entry);
>> 2126
>> 2127        return entry;
>> 2128}
>>
>> Hence, normal linux pte must has L_PTE_HUGEPAE cleared;
>> A huge linux pte must has L_PTE_HUGEPAGE(BIT11) set
>> This could lead to L_PTE_HPAGE_2M(BIT12) or L_PTE_HPAGE_16M(BIT13) set
>> respectively, that's why the masking is needed for pte_pfn.
>
> But if you avoid setting L_PTE_HPAGE_*, than we don't need the masking
> for pte_pfn. In which case, we don't need to differentiate between a
> normal and a huge pte in pte_pfn(), so no need for L_PTE_HUGEPAGE. The
> set_huge_pte_at() function is only called with a huge pte, so it doesn't
> need to check the L_PTE_HUGEPAGE bit either.
>

I understood what you mean now, and the original design is almost like you said.
But the consequences of eliminating L_PTE_HUGEPAGE as well as L_PTE_HPAGE_*
only leave us with huge page size fixed at build time, I mean boot
time huge page
size configuration feature like X86 will NOT be feasible anymore!

looks like we have to made a choice now, what do you think? Catalin

> --
> Catalin
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-07 14:46                   ` carson bill
@ 2012-02-07 15:09                     ` Catalin Marinas
  2012-02-07 15:41                       ` carson bill
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-07 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 07, 2012 at 02:46:50PM +0000, carson bill wrote:
> 2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
> > But if you avoid setting L_PTE_HPAGE_*, than we don't need the
> > masking for pte_pfn. In which case, we don't need to differentiate
> > between a normal and a huge pte in pte_pfn(), so no need for
> > L_PTE_HUGEPAGE. The set_huge_pte_at() function is only called with a
> > huge pte, so it doesn't need to check the L_PTE_HUGEPAGE bit either.
> 
> I understood what you mean now, and the original design is almost like
> you said.  But the consequences of eliminating L_PTE_HUGEPAGE as well
> as L_PTE_HPAGE_* only leave us with huge page size fixed at build
> time, I mean boot time huge page size configuration feature like X86
> will NOT be feasible anymore!

Yes it will :). Just store the page size in some variable that you check
at run-time. We won't support mixed huge page sizes though.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/7] Add various hugetlb page table fix
  2012-02-07 15:09                     ` Catalin Marinas
@ 2012-02-07 15:41                       ` carson bill
  0 siblings, 0 replies; 50+ messages in thread
From: carson bill @ 2012-02-07 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
> On Tue, Feb 07, 2012 at 02:46:50PM +0000, carson bill wrote:
>> 2012/2/7, Catalin Marinas <catalin.marinas@arm.com>:
>> > But if you avoid setting L_PTE_HPAGE_*, than we don't need the
>> > masking for pte_pfn. In which case, we don't need to differentiate
>> > between a normal and a huge pte in pte_pfn(), so no need for
>> > L_PTE_HUGEPAGE. The set_huge_pte_at() function is only called with a
>> > huge pte, so it doesn't need to check the L_PTE_HUGEPAGE bit either.
>>
>> I understood what you mean now, and the original design is almost like
>> you said.  But the consequences of eliminating L_PTE_HUGEPAGE as well
>> as L_PTE_HPAGE_* only leave us with huge page size fixed at build
>> time, I mean boot time huge page size configuration feature like X86
>> will NOT be feasible anymore!
>
> Yes it will :). Just store the page size in some variable that you check
> at run-time. We won't support mixed huge page sizes though.
>

Got it:)
Please review the other parts of patch, let me know if any question arises.



> --
> Catalin
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-13  9:44 [RFC-PATCH V2] " Bill Carson
@ 2012-02-13  9:44 ` Bill Carson
  2012-02-28 17:30   ` Catalin Marinas
                     ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Bill Carson @ 2012-02-13  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/arm/include/asm/hugetlb.h |  178 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/page.h    |   15 ++++
 arch/arm/mm/hugetlb.c          |  132 +++++++++++++++++++++++++++++
 3 files changed, 325 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/include/asm/hugetlb.h
 create mode 100644 arch/arm/mm/hugetlb.c

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
new file mode 100644
index 0000000..3c34528
--- /dev/null
+++ b/arch/arm/include/asm/hugetlb.h
@@ -0,0 +1,178 @@
+/*
+ * hugetlb.h, ARM Huge Tlb Page support.
+ *
+ * Copyright (c) Bill Carson
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#ifndef __ASM_HUGETLB_H
+#define __ASM_HUGETLB_H
+
+#include <asm/page.h>
+#include <asm/pgtable-2level.h>
+#include <asm/tlb.h>
+
+
+/* 2M and 16M hugepage linux ptes are stored in mmu_context_t->huge_linux_pte
+ *
+ * 2M hugepage
+ * ===========
+ * one huge linux pte caters to two HW ptes,
+ *
+ * 16M hugepage
+ * ============
+ * one huge linux pte caters for sixteen HW ptes,
+ *
+ * The number of huge linux ptes depends on PAGE_OFFSET configuration
+ * which is defined as following:
+ */
+#define HUGE_LINUX_PTE_COUNT	( PAGE_OFFSET >> HPAGE_SHIFT)
+#define HUGE_LINUX_PTE_SIZE		(HUGE_LINUX_PTE_COUNT * sizeof(pte_t *))
+#define HUGE_LINUX_PTE_INDEX(addr) (addr >> HPAGE_SHIFT)
+
+static inline int is_hugepage_only_range(struct mm_struct *mm,
+					 unsigned long addr,
+					 unsigned long len)
+{
+	return 0;
+}
+
+static inline int prepare_hugepage_range(struct file *file,
+					 unsigned long addr,
+					 unsigned long len)
+{
+	struct hstate *h = hstate_file(file);
+	/* addr/len should be aligned with huge page size */
+	if (len & ~huge_page_mask(h))
+		return -EINVAL;
+	if (addr & ~huge_page_mask(h))
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm)
+{
+}
+
+static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
+			unsigned long addr, unsigned long end,
+			unsigned long floor, unsigned long ceiling)
+{
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+				   pte_t *ptep, pte_t pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *linuxpte = mm->context.huge_linux_pte;
+
+	BUG_ON(linuxpte == NULL);
+	BUG_ON(HUGE_LINUX_PTE_INDEX(addr) >= HUGE_LINUX_PTE_COUNT);
+	BUG_ON(ptep != &linuxpte[HUGE_LINUX_PTE_INDEX(addr)]);
+
+	/* set huge linux pte first */
+	*ptep = pte;
+	
+	/* then set hardware pte */
+	addr &= HPAGE_MASK;
+	pgd = pgd_offset(mm, addr);
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+	set_hugepte_at(mm, addr, pmd, pte);
+}
+
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+					    unsigned long addr, pte_t *ptep)
+{
+	pte_t pte = *ptep;
+	pte_t fake = L_PTE_YOUNG;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	/* clear linux pte */
+	*ptep = 0;
+
+	/* let set_hugepte_at clear HW entry */
+	addr &= HPAGE_MASK;
+	pgd = pgd_offset(mm, addr);
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+	set_hugepte_at(mm, addr, pmd, fake);
+	return pte;
+}
+
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+					 unsigned long addr, pte_t *ptep)
+{
+	if (HPAGE_SHIFT == SUPERSECTION_SHIFT)
+		flush_tlb_page(vma, addr & SUPERSECTION_MASK);
+	else {
+		flush_tlb_page(vma, addr & SECTION_MASK);
+		flush_tlb_page(vma, (addr & SECTION_MASK)^0x100000);
+	}
+}
+
+static inline int huge_pte_none(pte_t pte)
+{
+	return pte_none(pte);
+}
+
+static inline pte_t huge_pte_wrprotect(pte_t pte)
+{
+	return pte_wrprotect(pte);
+}
+
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+					   unsigned long addr, pte_t *ptep)
+{
+	pte_t old_pte = *ptep;
+	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
+}
+
+static inline pte_t huge_ptep_get(pte_t *ptep)
+{
+	return *ptep;
+}
+
+static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+					     unsigned long addr,
+					     pte_t *ptep, pte_t pte,
+					     int dirty)
+{
+	int changed = !pte_same(huge_ptep_get(ptep), pte);
+	if (changed) {
+		set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+		huge_ptep_clear_flush(vma, addr, &pte);
+	}
+
+	return changed;
+}
+
+static inline int arch_prepare_hugepage(struct page *page)
+{
+	return 0;
+}
+
+static inline void arch_release_hugepage(struct page *page)
+{
+}
+
+#endif /* __ASM_HUGETLB_H */
+
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 97b440c..3e6769a 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -15,6 +15,21 @@
 #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
 
+#ifdef CONFIG_HUGEPAGE_SIZE_2MB
+/* we have 2MB hugepage for two 1MB section mapping */
+#define HPAGE_SHIFT		(SECTION_SHIFT + 1)
+#define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
+#endif
+
+#ifdef CONFIG_HUGEPAGE_SIZE_16MB
+#define HPAGE_SHIFT		SUPERSECTION_SHIFT
+#define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
+#endif
+
 #ifndef __ASSEMBLY__
 
 #ifndef CONFIG_MMU
diff --git a/arch/arm/mm/hugetlb.c b/arch/arm/mm/hugetlb.c
new file mode 100644
index 0000000..165bd8f
--- /dev/null
+++ b/arch/arm/mm/hugetlb.c
@@ -0,0 +1,132 @@
+/*
+ * hugetlb.c, ARM Huge Tlb Page support.
+ *
+ * Copyright (c) Bill Carson
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#include <linux/hugetlb.h>
+
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
+		      unsigned long sz)
+{
+	pte_t *linuxpte = mm->context.huge_linux_pte;
+	int index;
+
+	if (linuxpte == NULL) {
+		linuxpte = kzalloc(HUGE_LINUX_PTE_SIZE, GFP_ATOMIC);
+		if (linuxpte == NULL) {
+			printk(KERN_ERR "Cannot allocate memory for huge linux pte\n");
+			return NULL;
+		}
+		mm->context.huge_linux_pte = linuxpte;
+	}
+	/* huge page mapping only cover user space address */
+	BUG_ON(HUGE_LINUX_PTE_INDEX(addr) >= HUGE_LINUX_PTE_COUNT);
+	index = HUGE_LINUX_PTE_INDEX(addr);
+	return &linuxpte[HUGE_LINUX_PTE_INDEX(addr)];
+}
+
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd = NULL;
+	pte_t *linuxpte = mm->context.huge_linux_pte;
+
+	/* check this mapping exist at pmd level */
+	pgd = pgd_offset(mm, addr);
+	if (pgd_present(*pgd)) {
+		pud = pud_offset(pgd, addr);
+		pmd = pmd_offset(pud, addr);
+		if (!pmd_present(*pmd))
+			return NULL;
+	}
+
+	BUG_ON(HUGE_LINUX_PTE_INDEX(addr) >= HUGE_LINUX_PTE_COUNT);
+	BUG_ON((*pmd & PMD_TYPE_MASK) != PMD_TYPE_SECT);
+	return &linuxpte[HUGE_LINUX_PTE_INDEX(addr)];
+}
+
+int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+{
+	return 0;
+}
+
+struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
+				int write)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+int pmd_huge(pmd_t pmd)
+{
+	return (pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT;
+}
+
+int pud_huge(pud_t pud)
+{
+	return  0; } struct page * follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
+{
+	struct page *page = NULL;
+	unsigned long pfn;
+
+	BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
+	pfn = ((pmd_val(*pmd) & HPAGE_MASK) >> PAGE_SHIFT);
+	page = pfn_to_page(pfn);
+	return page;
+}
+
+static int __init add_huge_page_size(unsigned long long size)
+{
+	int shift = __ffs(size);
+	u32 mmfr3 = 0;
+
+	/* Check that it is a page size supported by the hardware and
+	 * that it fits within pagetable and slice limits. */
+	if (!is_power_of_2(size) || (shift != HPAGE_SHIFT))
+		return -EINVAL;
+
+	/* If user wants super-section support, then check if our cpu
+	 * has this feature supported in ID_MMFR3 */
+	if (shift == SUPERSECTION_SHIFT) {
+		__asm__("mrc p15, 0, %0, c0, c1, 7\n" : "=r" (mmfr3));
+		if (mmfr3 & 0xF0000000) {
+			printk("Super-Section is NOT supported by this CPU, mmfr3:0x%x\n", mmfr3);
+			return -EINVAL;
+		}
+	}
+
+	/* Return if huge page size has already been setup */
+	if (size_to_hstate(size))
+		return 0;
+
+	hugetlb_add_hstate(shift - PAGE_SHIFT);
+	return 0;
+}
+
+static int __init hugepage_setup_sz(char *str)
+{
+	unsigned long long size;
+
+	size = memparse(str, &str);
+	if (add_huge_page_size(size) != 0)
+		printk(KERN_WARNING "Invalid huge page size specified(%llu)\n",
+			 size);
+
+	return 1;
+}
+__setup("hugepagesz=", hugepage_setup_sz);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-13  9:44 ` [PATCH 1/7] Add various hugetlb arm high level hooks Bill Carson
@ 2012-02-28 17:30   ` Catalin Marinas
  2012-02-29  2:34     ` bill4carson
  2012-02-29 10:32   ` Catalin Marinas
  2012-02-29 12:31   ` Arnd Bergmann
  2 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-28 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> index 97b440c..3e6769a 100644
> --- a/arch/arm/include/asm/page.h
> +++ b/arch/arm/include/asm/page.h
> @@ -15,6 +15,21 @@
>  #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
>  #define PAGE_MASK		(~(PAGE_SIZE-1))
>  
> +#ifdef CONFIG_HUGEPAGE_SIZE_2MB
> +/* we have 2MB hugepage for two 1MB section mapping */
> +#define HPAGE_SHIFT		(SECTION_SHIFT + 1)
> +#define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
> +#endif
> +
> +#ifdef CONFIG_HUGEPAGE_SIZE_16MB
> +#define HPAGE_SHIFT		SUPERSECTION_SHIFT
> +#define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
> +#endif

Ah, you still have these just config time options. Can you not make an
hpage_shift variable like PowerPC or IA-64?

(I haven't yet reviewed the rest of the code)

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-28 17:30   ` Catalin Marinas
@ 2012-02-29  2:34     ` bill4carson
  2012-02-29  9:39       ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-29  2:34 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?29? 01:30, Catalin Marinas wrote:
> On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
>> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
>> index 97b440c..3e6769a 100644
>> --- a/arch/arm/include/asm/page.h
>> +++ b/arch/arm/include/asm/page.h
>> @@ -15,6 +15,21 @@
>>   #define PAGE_SIZE		(_AC(1,UL)<<  PAGE_SHIFT)
>>   #define PAGE_MASK		(~(PAGE_SIZE-1))
>>
>> +#ifdef CONFIG_HUGEPAGE_SIZE_2MB
>> +/* we have 2MB hugepage for two 1MB section mapping */
>> +#define HPAGE_SHIFT		(SECTION_SHIFT + 1)
>> +#define HPAGE_SIZE		(_AC(1, UL)<<  HPAGE_SHIFT)
>> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
>> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
>> +#endif
>> +
>> +#ifdef CONFIG_HUGEPAGE_SIZE_16MB
>> +#define HPAGE_SHIFT		SUPERSECTION_SHIFT
>> +#define HPAGE_SIZE		(_AC(1, UL)<<  HPAGE_SHIFT)
>> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
>> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
>> +#endif
>
> Ah, you still have these just config time options. Can you not make an
> hpage_shift variable like PowerPC or IA-64?
>
Hi, Catalin

Thank you for reviewing this patch.


unsigned int hpage_shift = SECTION_SHIFT + 1; /* default to 2MB page */
#define HPAGE_SHIFT hpage_shift

then user could configure hpage_shift through "hugepagesz" parameter.
I think this is what I am supposed to do,
If so, hugepage_setup_sz/cpu_v7_set_hugepte_ext needs modification


> (I haven't yet reviewed the rest of the code)
>


So, should I send another version now, or wait until you walk through
the reset of patch set?

And again, thanks for the patch reviewing:)


-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29  2:34     ` bill4carson
@ 2012-02-29  9:39       ` Catalin Marinas
  2012-02-29 10:21         ` bill4carson
  0 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-29  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 29, 2012 at 02:34:34AM +0000, bill4carson wrote:
> On 2012?02?29? 01:30, Catalin Marinas wrote:
> > On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
> >> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> >> index 97b440c..3e6769a 100644
> >> --- a/arch/arm/include/asm/page.h
> >> +++ b/arch/arm/include/asm/page.h
> >> @@ -15,6 +15,21 @@
> >>   #define PAGE_SIZE		(_AC(1,UL)<<  PAGE_SHIFT)
> >>   #define PAGE_MASK		(~(PAGE_SIZE-1))
> >>
> >> +#ifdef CONFIG_HUGEPAGE_SIZE_2MB
> >> +/* we have 2MB hugepage for two 1MB section mapping */
> >> +#define HPAGE_SHIFT		(SECTION_SHIFT + 1)
> >> +#define HPAGE_SIZE		(_AC(1, UL)<<  HPAGE_SHIFT)
> >> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
> >> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
> >> +#endif
> >> +
> >> +#ifdef CONFIG_HUGEPAGE_SIZE_16MB
> >> +#define HPAGE_SHIFT		SUPERSECTION_SHIFT
> >> +#define HPAGE_SIZE		(_AC(1, UL)<<  HPAGE_SHIFT)
> >> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
> >> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
> >> +#endif
> >
> > Ah, you still have these just config time options. Can you not make an
> > hpage_shift variable like PowerPC or IA-64?
> 
> Thank you for reviewing this patch.

I haven't finished yet (another email to come :)).

> unsigned int hpage_shift = SECTION_SHIFT + 1; /* default to 2MB page */
> #define HPAGE_SHIFT hpage_shift

Something like that. You could even use PMD_SHIFT which is 21 already
and it makes it clearer that we use huge pages at the pmd level.

> So, should I send another version now, or wait until you walk through
> the reset of patch set?

Just wait, I'll have a look through all the patches.

BTW, there are several coding style issues, I won't go through them but
it would help if you have a look at Documentation/CodingStyle
(especially comment style, if/else brackets).

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29  9:39       ` Catalin Marinas
@ 2012-02-29 10:21         ` bill4carson
  2012-02-29 10:23           ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-02-29 10:21 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?29? 17:39, Catalin Marinas wrote:
> On Wed, Feb 29, 2012 at 02:34:34AM +0000, bill4carson wrote:
>> On 2012?02?29? 01:30, Catalin Marinas wrote:
>>> On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
>>>> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
>>>> index 97b440c..3e6769a 100644
>>>> --- a/arch/arm/include/asm/page.h
>>>> +++ b/arch/arm/include/asm/page.h
>>>> @@ -15,6 +15,21 @@
>>>>    #define PAGE_SIZE		(_AC(1,UL)<<   PAGE_SHIFT)
>>>>    #define PAGE_MASK		(~(PAGE_SIZE-1))
>>>>
>>>> +#ifdef CONFIG_HUGEPAGE_SIZE_2MB
>>>> +/* we have 2MB hugepage for two 1MB section mapping */
>>>> +#define HPAGE_SHIFT		(SECTION_SHIFT + 1)
>>>> +#define HPAGE_SIZE		(_AC(1, UL)<<   HPAGE_SHIFT)
>>>> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
>>>> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
>>>> +#endif
>>>> +
>>>> +#ifdef CONFIG_HUGEPAGE_SIZE_16MB
>>>> +#define HPAGE_SHIFT		SUPERSECTION_SHIFT
>>>> +#define HPAGE_SIZE		(_AC(1, UL)<<   HPAGE_SHIFT)
>>>> +#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
>>>> +#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
>>>> +#endif
>>>
>>> Ah, you still have these just config time options. Can you not make an
>>> hpage_shift variable like PowerPC or IA-64?
>>
>> Thank you for reviewing this patch.
>
> I haven't finished yet (another email to come :)).
>
I'm looking forward any comments/suggestions ^_^


>> unsigned int hpage_shift = SECTION_SHIFT + 1; /* default to 2MB page */
>> #define HPAGE_SHIFT hpage_shift
>
> Something like that. You could even use PMD_SHIFT which is 21 already
> and it makes it clearer that we use huge pages at the pmd level.
>
OK, I will keep that in mind for V3.


>> So, should I send another version now, or wait until you walk through
>> the reset of patch set?
>
> Just wait, I'll have a look through all the patches.
>
> BTW, there are several coding style issues, I won't go through them but
> it would help if you have a look at Documentation/CodingStyle
> (especially comment style, if/else brackets).
>

Yes, I forgot to run checkpatch.pl to sanitize this patch set.
Will keep in mind for all next version.

thanks

> Thanks.
>

-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29 10:21         ` bill4carson
@ 2012-02-29 10:23           ` Catalin Marinas
  0 siblings, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-02-29 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 February 2012 10:21, bill4carson <bill4carson@gmail.com> wrote:
> On 2012?02?29? 17:39, Catalin Marinas wrote:
>> BTW, there are several coding style issues, I won't go through them but
>> it would help if you have a look at Documentation/CodingStyle
>> (especially comment style, if/else brackets).
>
> Yes, I forgot to run checkpatch.pl to sanitize this patch set.

Don't just rely on checkpatch.pl, have a look at the CodingStyle doc :)

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-13  9:44 ` [PATCH 1/7] Add various hugetlb arm high level hooks Bill Carson
  2012-02-28 17:30   ` Catalin Marinas
@ 2012-02-29 10:32   ` Catalin Marinas
  2012-02-29 11:28     ` bill4carson
  2012-02-29 12:31   ` Arnd Bergmann
  2 siblings, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-29 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
> --- /dev/null
> +++ b/arch/arm/include/asm/hugetlb.h
...
> +#include <asm/page.h>
> +#include <asm/pgtable-2level.h>

Just include asm/pgtable.h, it includes the right 2level.h file
automatically (and I plan to add LPAE support as well).

> +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> +				   pte_t *ptep, pte_t pte)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *linuxpte = mm->context.huge_linux_pte;
> +
> +	BUG_ON(linuxpte == NULL);
> +	BUG_ON(HUGE_LINUX_PTE_INDEX(addr) >= HUGE_LINUX_PTE_COUNT);
> +	BUG_ON(ptep != &linuxpte[HUGE_LINUX_PTE_INDEX(addr)]);
> +
> +	/* set huge linux pte first */
> +	*ptep = pte;
> +	
> +	/* then set hardware pte */
> +	addr &= HPAGE_MASK;
> +	pgd = pgd_offset(mm, addr);
> +	pud = pud_offset(pgd, addr);
> +	pmd = pmd_offset(pud, addr);
> +	set_hugepte_at(mm, addr, pmd, pte);

You may want to add a comment here that we only have two levels of page
tables (and there is no need for pud_none() checks).

Also I would say the set_hugepte_at function name is easily confused
with set_huge_pte_at(), maybe change it to __set_huge_pte_at() or
something else.

> +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> +					    unsigned long addr, pte_t *ptep)
> +{
> +	pte_t pte = *ptep;
> +	pte_t fake = L_PTE_YOUNG;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	/* clear linux pte */
> +	*ptep = 0;
> +
> +	/* let set_hugepte_at clear HW entry */
> +	addr &= HPAGE_MASK;
> +	pgd = pgd_offset(mm, addr);
> +	pud = pud_offset(pgd, addr);
> +	pmd = pmd_offset(pud, addr);
> +	set_hugepte_at(mm, addr, pmd, fake);

Same here.

> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> +					   unsigned long addr, pte_t *ptep)
> +{
> +	pte_t old_pte = *ptep;
> +	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
> +}

You could use the generic ptep_set_wrprotect()

> +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +					     unsigned long addr,
> +					     pte_t *ptep, pte_t pte,
> +					     int dirty)
> +{
> +	int changed = !pte_same(huge_ptep_get(ptep), pte);
> +	if (changed) {
> +		set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> +		huge_ptep_clear_flush(vma, addr, &pte);
> +	}
> +
> +	return changed;
> +}

I could also use the generic ptep_set_access_flags().

> --- /dev/null
> +++ b/arch/arm/mm/hugetlb.c
...
> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
> +		      unsigned long sz)
> +{
> +	pte_t *linuxpte = mm->context.huge_linux_pte;
> +	int index;
> +
> +	if (linuxpte == NULL) {
> +		linuxpte = kzalloc(HUGE_LINUX_PTE_SIZE, GFP_ATOMIC);
> +		if (linuxpte == NULL) {
> +			printk(KERN_ERR "Cannot allocate memory for huge linux pte\n");

pr_err()?

> +			return NULL;
> +		}
> +		mm->context.huge_linux_pte = linuxpte;
> +	}
> +	/* huge page mapping only cover user space address */
> +	BUG_ON(HUGE_LINUX_PTE_INDEX(addr) >= HUGE_LINUX_PTE_COUNT);
> +	index = HUGE_LINUX_PTE_INDEX(addr);
> +	return &linuxpte[HUGE_LINUX_PTE_INDEX(addr)];
> +}
> +
> +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd = NULL;
> +	pte_t *linuxpte = mm->context.huge_linux_pte;
> +
> +	/* check this mapping exist at pmd level */
> +	pgd = pgd_offset(mm, addr);
> +	if (pgd_present(*pgd)) {
> +		pud = pud_offset(pgd, addr);
> +		pmd = pmd_offset(pud, addr);
> +		if (!pmd_present(*pmd))
> +			return NULL;
> +	}

You could add checks for the pud as well, they would be optimised out by
the compiler but it would be easier to add support for LPAE as well. In
my LPAE hugetlb implementation, I have something like this:

	pgd = pgd_offset(mm, addr);
	if (pgd_present(*pgd)) {
		pud = pud_offset(pgd, addr);
		if (pud_present(*pud))
			pmd = pmd_offset(pud, addr);
	}

> +	BUG_ON(HUGE_LINUX_PTE_INDEX(addr) >= HUGE_LINUX_PTE_COUNT);
> +	BUG_ON((*pmd & PMD_TYPE_MASK) != PMD_TYPE_SECT);
> +	return &linuxpte[HUGE_LINUX_PTE_INDEX(addr)];

You could add a macro to make it easier for LPAE:

#define huge_pte(mm, pmd, addr)		\
	(&mm->context.huge_linux_pte(HUGE_LINUX_PTE_INDEX(addr)))

With LPAE, it would simply be a (pte_t *)pmd cast.

> +int pud_huge(pud_t pud)
> +{
> +	return  0; } struct page * follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)

Something went wrong around here.

> +{
> +	struct page *page = NULL;

You don't need to initialise page here.

> +	unsigned long pfn;
> +
> +	BUG_ON((pmd_val(*pmd) & PMD_TYPE_MASK) != PMD_TYPE_SECT);
> +	pfn = ((pmd_val(*pmd) & HPAGE_MASK) >> PAGE_SHIFT);
> +	page = pfn_to_page(pfn);
> +	return page;
> +}
> +
> +static int __init add_huge_page_size(unsigned long long size)
> +{
> +	int shift = __ffs(size);
> +	u32 mmfr3 = 0;
> +
> +	/* Check that it is a page size supported by the hardware and
> +	 * that it fits within pagetable and slice limits. */
> +	if (!is_power_of_2(size) || (shift != HPAGE_SHIFT))
> +		return -EINVAL;

You could use get_order() instead of __ffs(), the latter just finds the
first bit set.

But here you should have set hpage_shift.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29 10:32   ` Catalin Marinas
@ 2012-02-29 11:28     ` bill4carson
  2012-02-29 11:36       ` Catalin Marinas
  2012-02-29 15:38       ` Catalin Marinas
  0 siblings, 2 replies; 50+ messages in thread
From: bill4carson @ 2012-02-29 11:28 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?29? 18:32, Catalin Marinas wrote:
> On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
>> --- /dev/null
>> +++ b/arch/arm/include/asm/hugetlb.h
> ...
>> +#include<asm/page.h>
>> +#include<asm/pgtable-2level.h>
>
> Just include asm/pgtable.h, it includes the right 2level.h file
> automatically (and I plan to add LPAE support as well).
>
>> +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>> +				   pte_t *ptep, pte_t pte)
>> +{
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *linuxpte = mm->context.huge_linux_pte;
>> +
>> +	BUG_ON(linuxpte == NULL);
>> +	BUG_ON(HUGE_LINUX_PTE_INDEX(addr)>= HUGE_LINUX_PTE_COUNT);
>> +	BUG_ON(ptep !=&linuxpte[HUGE_LINUX_PTE_INDEX(addr)]);
>> +
>> +	/* set huge linux pte first */
>> +	*ptep = pte;
>> +	
>> +	/* then set hardware pte */
>> +	addr&= HPAGE_MASK;
>> +	pgd = pgd_offset(mm, addr);
>> +	pud = pud_offset(pgd, addr);
>> +	pmd = pmd_offset(pud, addr);
>> +	set_hugepte_at(mm, addr, pmd, pte);
>
> You may want to add a comment here that we only have two levels of page
> tables (and there is no need for pud_none() checks).
>
I will add a comment to clarify this.

> Also I would say the set_hugepte_at function name is easily confused
> with set_huge_pte_at(), maybe change it to __set_huge_pte_at() or
> something else.
>
yes, I will make that change.


>> +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> +					    unsigned long addr, pte_t *ptep)
>> +{
>> +	pte_t pte = *ptep;
>> +	pte_t fake = L_PTE_YOUNG;
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +
>> +	/* clear linux pte */
>> +	*ptep = 0;
>> +
>> +	/* let set_hugepte_at clear HW entry */
>> +	addr&= HPAGE_MASK;
>> +	pgd = pgd_offset(mm, addr);
>> +	pud = pud_offset(pgd, addr);
>> +	pmd = pmd_offset(pud, addr);
>> +	set_hugepte_at(mm, addr, pmd, fake);
>
> Same here.
>
>> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> +					   unsigned long addr, pte_t *ptep)
>> +{
>> +	pte_t old_pte = *ptep;
>> +	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>> +}
>
> You could use the generic ptep_set_wrprotect()

I'm a bit of confused about this.

generic ptep_set_wrprotect() can not set huge pte, that's why
set_huge_pte_at is used instead here.


>
>> +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> +					     unsigned long addr,
>> +					     pte_t *ptep, pte_t pte,
>> +					     int dirty)
>> +{
>> +	int changed = !pte_same(huge_ptep_get(ptep), pte);
>> +	if (changed) {
>> +		set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
>> +		huge_ptep_clear_flush(vma, addr,&pte);
>> +	}
>> +
>> +	return changed;
>> +}
>
> I could also use the generic ptep_set_access_flags().

Same as above.

IMHO, cannot use generic hooks here, cause we are setting huge pte
with a different set pte API than set_pte_at.


>
>> --- /dev/null
>> +++ b/arch/arm/mm/hugetlb.c
> ...
>> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
>> +		      unsigned long sz)
>> +{
>> +	pte_t *linuxpte = mm->context.huge_linux_pte;
>> +	int index;
>> +
>> +	if (linuxpte == NULL) {
>> +		linuxpte = kzalloc(HUGE_LINUX_PTE_SIZE, GFP_ATOMIC);
>> +		if (linuxpte == NULL) {
>> +			printk(KERN_ERR "Cannot allocate memory for huge linux pte\n");
>
> pr_err()?
Yes, pr_err should be used in here.
thanks


>
>> +			return NULL;
>> +		}
>> +		mm->context.huge_linux_pte = linuxpte;
>> +	}
>> +	/* huge page mapping only cover user space address */
>> +	BUG_ON(HUGE_LINUX_PTE_INDEX(addr)>= HUGE_LINUX_PTE_COUNT);
>> +	index = HUGE_LINUX_PTE_INDEX(addr);
>> +	return&linuxpte[HUGE_LINUX_PTE_INDEX(addr)];
>> +}
>> +
>> +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>> +{
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd = NULL;
>> +	pte_t *linuxpte = mm->context.huge_linux_pte;
>> +
>> +	/* check this mapping exist at pmd level */
>> +	pgd = pgd_offset(mm, addr);
>> +	if (pgd_present(*pgd)) {
>> +		pud = pud_offset(pgd, addr);
>> +		pmd = pmd_offset(pud, addr);
>> +		if (!pmd_present(*pmd))
>> +			return NULL;
>> +	}
>
> You could add checks for the pud as well, they would be optimised out by
> the compiler but it would be easier to add support for LPAE as well. In
> my LPAE hugetlb implementation, I have something like this:
>
> 	pgd = pgd_offset(mm, addr);
> 	if (pgd_present(*pgd)) {
> 		pud = pud_offset(pgd, addr);
> 		if (pud_present(*pud))
> 			pmd = pmd_offset(pud, addr);
> 	}
>
Ok, I will add the pud checks as per your comment.



>> +	BUG_ON(HUGE_LINUX_PTE_INDEX(addr)>= HUGE_LINUX_PTE_COUNT);
>> +	BUG_ON((*pmd&  PMD_TYPE_MASK) != PMD_TYPE_SECT);
>> +	return&linuxpte[HUGE_LINUX_PTE_INDEX(addr)];
>
> You could add a macro to make it easier for LPAE:
>
> #define huge_pte(mm, pmd, addr)		\
> 	(&mm->context.huge_linux_pte(HUGE_LINUX_PTE_INDEX(addr)))
>
Nice, I will keep this in V3.


> With LPAE, it would simply be a (pte_t *)pmd cast.
>
>> +int pud_huge(pud_t pud)
>> +{
>> +	return  0; } struct page * follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
>
> Something went wrong around here.
>
crap! I will make it cleaner next time. I promise!


>> +{
>> +	struct page *page = NULL;
>
> You don't need to initialise page here.
OK, I will drop the "NULL".

>
>> +	unsigned long pfn;
>> +
>> +	BUG_ON((pmd_val(*pmd)&  PMD_TYPE_MASK) != PMD_TYPE_SECT);
>> +	pfn = ((pmd_val(*pmd)&  HPAGE_MASK)>>  PAGE_SHIFT);
>> +	page = pfn_to_page(pfn);
>> +	return page;
>> +}
>> +
>> +static int __init add_huge_page_size(unsigned long long size)
>> +{
>> +	int shift = __ffs(size);
>> +	u32 mmfr3 = 0;
>> +
>> +	/* Check that it is a page size supported by the hardware and
>> +	 * that it fits within pagetable and slice limits. */
>> +	if (!is_power_of_2(size) || (shift != HPAGE_SHIFT))
>> +		return -EINVAL;
>
> You could use get_order() instead of __ffs(), the latter just finds the
> first bit set.
With all due respect, I'm afraid I can't agree with you on this.
here, we should use __ffs to return this "shift" not the order.

For "hugepagesz=2M", hpage_shift/HPAGE_SHIFT should be set to 21,
*not* the order 9(21-12), that's what HUGETLB_PAGE_ORDER for.


>
> But here you should have set hpage_shift.
>

-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29 11:28     ` bill4carson
@ 2012-02-29 11:36       ` Catalin Marinas
  2012-02-29 15:38       ` Catalin Marinas
  1 sibling, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-02-29 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 29, 2012 at 11:28:30AM +0000, bill4carson wrote:
> On 2012?02?29? 18:32, Catalin Marinas wrote:
> > On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
> >> +static int __init add_huge_page_size(unsigned long long size)
> >> +{
> >> +	int shift = __ffs(size);
> >> +	u32 mmfr3 = 0;
> >> +
> >> +	/* Check that it is a page size supported by the hardware and
> >> +	 * that it fits within pagetable and slice limits. */
> >> +	if (!is_power_of_2(size) || (shift != HPAGE_SHIFT))
> >> +		return -EINVAL;
> >
> > You could use get_order() instead of __ffs(), the latter just finds the
> > first bit set.
> 
> With all due respect, I'm afraid I can't agree with you on this.
> here, we should use __ffs to return this "shift" not the order.
> 
> For "hugepagesz=2M", hpage_shift/HPAGE_SHIFT should be set to 21,
> *not* the order 9(21-12), that's what HUGETLB_PAGE_ORDER for.

I agree (I got confused by get_order() and log2()).

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-13  9:44 ` [PATCH 1/7] Add various hugetlb arm high level hooks Bill Carson
  2012-02-28 17:30   ` Catalin Marinas
  2012-02-29 10:32   ` Catalin Marinas
@ 2012-02-29 12:31   ` Arnd Bergmann
  2012-02-29 14:22     ` Catalin Marinas
  2 siblings, 1 reply; 50+ messages in thread
From: Arnd Bergmann @ 2012-02-29 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 13 February 2012, Bill Carson wrote:
> +
> +/* 2M and 16M hugepage linux ptes are stored in mmu_context_t->huge_linux_pte
> + *
> + * 2M hugepage
> + * ===========
> + * one huge linux pte caters to two HW ptes,
> + *

I think this needs more explanation. Why do you use a 2MB hugepage with two ptes here?
Wouldn't it be more logical to use 1MB hugepages?

	Arnd

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29 12:31   ` Arnd Bergmann
@ 2012-02-29 14:22     ` Catalin Marinas
  0 siblings, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-02-29 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 29, 2012 at 12:31:51PM +0000, Arnd Bergmann wrote:
> On Monday 13 February 2012, Bill Carson wrote:
> > +
> > +/* 2M and 16M hugepage linux ptes are stored in mmu_context_t->huge_linux_pte
> > + *
> > + * 2M hugepage
> > + * ===========
> > + * one huge linux pte caters to two HW ptes,
> > + *
> 
> I think this needs more explanation. Why do you use a 2MB hugepage with two ptes here?
> Wouldn't it be more logical to use 1MB hugepages?

That's because how pgd_t is defined for the classic MMU (PMD_SHIFT also
being 21). I think Bill explained in the first series of the patch why
using 1MB for huge pages doesn't go that well. But I agree, the
explanation should go in here as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29 11:28     ` bill4carson
  2012-02-29 11:36       ` Catalin Marinas
@ 2012-02-29 15:38       ` Catalin Marinas
  2012-03-08  0:35         ` bill4carson
  1 sibling, 1 reply; 50+ messages in thread
From: Catalin Marinas @ 2012-02-29 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 29, 2012 at 11:28:30AM +0000, bill4carson wrote:
> On 2012?02?29? 18:32, Catalin Marinas wrote:
> > On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
> >> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> >> +					   unsigned long addr, pte_t *ptep)
> >> +{
> >> +	pte_t old_pte = *ptep;
> >> +	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
> >> +}
> >
> > You could use the generic ptep_set_wrprotect()
> 
> I'm a bit of confused about this.
> 
> generic ptep_set_wrprotect() can not set huge pte, that's why
> set_huge_pte_at is used instead here.

Ah, the generic one can only work with with LPAE where set_huge_pte_at()
is just a set_pte_at(). So this part looks good.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-02-29 15:38       ` Catalin Marinas
@ 2012-03-08  0:35         ` bill4carson
  2012-03-08  9:21           ` Catalin Marinas
  0 siblings, 1 reply; 50+ messages in thread
From: bill4carson @ 2012-03-08  0:35 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?02?29? 23:38, Catalin Marinas wrote:
> On Wed, Feb 29, 2012 at 11:28:30AM +0000, bill4carson wrote:
>> On 2012?02?29? 18:32, Catalin Marinas wrote:
>>> On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
>>>> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>>>> +					   unsigned long addr, pte_t *ptep)
>>>> +{
>>>> +	pte_t old_pte = *ptep;
>>>> +	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>>>> +}
>>>
>>> You could use the generic ptep_set_wrprotect()
>>
>> I'm a bit of confused about this.
>>
>> generic ptep_set_wrprotect() can not set huge pte, that's why
>> set_huge_pte_at is used instead here.
>
> Ah, the generic one can only work with with LPAE where set_huge_pte_at()
> is just a set_pte_at(). So this part looks good.
>
Hi, Catalin

Thanks for your time to review PATCH1/7 :)

I'm wondering if reset of this patch set is ok,
I would like to start working on V3 as per your comments.



-- 
I am a slow learner
but I will keep trying to fight for my dreams!

--bill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] Add various hugetlb arm high level hooks
  2012-03-08  0:35         ` bill4carson
@ 2012-03-08  9:21           ` Catalin Marinas
  0 siblings, 0 replies; 50+ messages in thread
From: Catalin Marinas @ 2012-03-08  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 08, 2012 at 12:35:36AM +0000, bill4carson wrote:
> On 2012?02?29? 23:38, Catalin Marinas wrote:
> > On Wed, Feb 29, 2012 at 11:28:30AM +0000, bill4carson wrote:
> >> On 2012?02?29? 18:32, Catalin Marinas wrote:
> >>> On Mon, Feb 13, 2012 at 09:44:22AM +0000, Bill Carson wrote:
> >>>> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> >>>> +					   unsigned long addr, pte_t *ptep)
> >>>> +{
> >>>> +	pte_t old_pte = *ptep;
> >>>> +	set_huge_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
> >>>> +}
> >>>
> >>> You could use the generic ptep_set_wrprotect()
> >>
> >> I'm a bit of confused about this.
> >>
> >> generic ptep_set_wrprotect() can not set huge pte, that's why
> >> set_huge_pte_at is used instead here.
> >
> > Ah, the generic one can only work with with LPAE where set_huge_pte_at()
> > is just a set_pte_at(). So this part looks good.
> 
> Thanks for your time to review PATCH1/7 :)
> 
> I'm wondering if reset of this patch set is ok,
> I would like to start working on V3 as per your comments.

I haven't finished looking through the patches. Probably I will by the
end of this week.

-- 
Catalin

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2012-03-08  9:21 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-30  7:57 [RFC] ARM hugetlb support bill4carson at gmail.com
2012-01-30  7:57 ` [PATCH 1/7] Add various hugetlb arm high level hooks bill4carson at gmail.com
2012-02-06 17:07   ` Catalin Marinas
2012-02-07  2:00     ` bill4carson
2012-02-07 11:54       ` Catalin Marinas
2012-02-07 12:15   ` Catalin Marinas
2012-02-07 12:57     ` carson bill
2012-01-30  7:57 ` [PATCH 2/7] Add various hugetlb page table fix bill4carson at gmail.com
2012-01-31  9:57   ` Catalin Marinas
2012-01-31  9:58   ` Russell King - ARM Linux
2012-01-31 12:25     ` Catalin Marinas
2012-02-01  3:10       ` bill4carson
2012-02-06 16:26         ` Catalin Marinas
2012-02-07  1:42           ` bill4carson
2012-02-07 11:50             ` Catalin Marinas
2012-02-07 13:24               ` carson bill
2012-02-07 14:11                 ` Catalin Marinas
2012-02-07 14:46                   ` carson bill
2012-02-07 15:09                     ` Catalin Marinas
2012-02-07 15:41                       ` carson bill
2012-01-30  7:57 ` [PATCH 3/7] Introduce set_hugepte_ext api for huge page hardware page table setup bill4carson at gmail.com
2012-01-30  7:57 ` [PATCH 4/7] Store huge page linux pte in mm_struct bill4carson at gmail.com
2012-01-31  9:37   ` Catalin Marinas
2012-01-31 10:01   ` Russell King - ARM Linux
2012-02-01  5:45     ` bill4carson
2012-02-06  2:04       ` bill4carson
2012-02-06 10:29         ` Catalin Marinas
2012-02-06 14:40           ` carson bill
2012-01-30  7:57 ` [PATCH 5/7] Using do_page_fault for section fault handling bill4carson at gmail.com
2012-01-30  7:57 ` [PATCH 6/7] Add hugetlb Kconfig option bill4carson at gmail.com
2012-01-30  7:57 ` [PATCH 7/7] Minor compiling fix bill4carson at gmail.com
2012-01-31  9:29 ` [RFC] ARM hugetlb support Catalin Marinas
2012-02-01  1:56   ` bill4carson
2012-02-02 14:38     ` Catalin Marinas
2012-02-03  1:41       ` bill4carson
2012-02-06 16:29         ` Catalin Marinas
  -- strict thread matches above, loose matches on Subject: below --
2012-02-13  9:44 [RFC-PATCH V2] " Bill Carson
2012-02-13  9:44 ` [PATCH 1/7] Add various hugetlb arm high level hooks Bill Carson
2012-02-28 17:30   ` Catalin Marinas
2012-02-29  2:34     ` bill4carson
2012-02-29  9:39       ` Catalin Marinas
2012-02-29 10:21         ` bill4carson
2012-02-29 10:23           ` Catalin Marinas
2012-02-29 10:32   ` Catalin Marinas
2012-02-29 11:28     ` bill4carson
2012-02-29 11:36       ` Catalin Marinas
2012-02-29 15:38       ` Catalin Marinas
2012-03-08  0:35         ` bill4carson
2012-03-08  9:21           ` Catalin Marinas
2012-02-29 12:31   ` Arnd Bergmann
2012-02-29 14:22     ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).