* [PATCH 0/4] sparc64: Add 16GB hugepage support
@ 2017-06-21 1:05 Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: sparclinux
SPARC architecture supports 16G hugepages but the kernel did not
support these. This patch series adds support for it and also cleanes
up some page walk/alloc functions.
Patch 1/4: Core changes needed to add 16G hugepage support: To map a
single 16G hugepage, two PUD entries are used. Each PUD entry maps
8G portion of a 16G page. This page table encoding scheme is same as
that used for hugepages at PMD level (8M, 256M and 2G pages) where
each PMD entry points successively to 8M regions within a page. No
page table entries below the PUD level are allocated for 16G
hugepage since those are not required.
TSB entries for a 16G page are created at every 4M boundary since
the HUGE_TSB is used for these pages which is configured with page
size of 4M. When walking page tables (on a TSB miss), bits [32:22]
are transferred from vaddr to PUD to resolve addresses at 4M
boundary. The resolved address mapping is then stored in HUGE_TSB.
Patch 2/4: get_user_pages() etc. are used for direct IO. These
functions were not aware of hugepages at the PUD level and would try
to continue walking page tables beyond the PUD level. Since 16G
hugepages have page tables allocated till PUD level only, these
accesses would result in invalid access. This patch adds the case
for PUD huge pages to these functions.
Patch 3/4: Fixes a bug in gup_huge_pmd() which caused wrong page
within a hugepage to be considered as the head page. gup_huge_pud()
already has this fix.
Patch 4/4: Patch 1 added the case of PUD entry being huge in page
table walk and alloc functions. This new case further increased
nesting in these functions and made them harder to follow. This
patch flattens these functions for better readability.
Cc: sparclinux@vger.kernel.org
Changelog v4 vs v3:
- Added cover letter (patch 0/4) for patch series.
Changelog v3 vs v2:
- Fixed email headers so the subject shows up correctly.
Changelog v2 vs v1:
- Remove redundant brgez,pn (Bob Picco)
- Remove unncessary label rename from 700 to 701 (Rob Gardner)
- Add patch description (Paul)
- Add 16G case to get_user_pages()
Nitin Gupta (4):
sparc64: Add 16GB hugepage support
sparc64: Support huge PUD case in get_user_pages
sparc64: Fix gup_huge_pmd
sparc64: Cleanup hugepage table walk functions
arch/sparc/include/asm/page_64.h | 3 +-
arch/sparc/include/asm/pgtable_64.h | 20 ++++++-
arch/sparc/include/asm/tsb.h | 30 +++++++++++
arch/sparc/kernel/tsb.S | 2 +-
arch/sparc/mm/gup.c | 49 ++++++++++++++++-
arch/sparc/mm/hugetlbpage.c | 102 +++++++++++++++++++++---------------
arch/sparc/mm/init_64.c | 41 ++++++++++++---
7 files changed, 193 insertions(+), 54 deletions(-)
--
2.9.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/4] sparc64: Add 16GB hugepage support
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
@ 2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: Nitin Gupta, Mike Kravetz, Kirill A. Shutemov, Tom Hromatka,
Michal Hocko, Ingo Molnar, Hugh Dickins, bob picco,
Pavel Tatashin, Steven Sistare, Paul Gortmaker, Thomas Tai,
Atish Patra, sparclinux, linux-kernel
Adds support for 16GB hugepage size. To use this page size
use kernel parameters as:
default_hugepagesz\x16G hugepagesz\x16G hugepages\x10
Testing:
Tested with the stream benchmark which allocates 48G of
arrays backed by 16G hugepages and does RW operation on
them in parallel.
Orabug: 25362942
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/include/asm/page_64.h | 3 +-
arch/sparc/include/asm/pgtable_64.h | 5 +++
arch/sparc/include/asm/tsb.h | 30 +++++++++++++++
arch/sparc/kernel/tsb.S | 2 +-
arch/sparc/mm/hugetlbpage.c | 74 ++++++++++++++++++++++++++-----------
arch/sparc/mm/init_64.c | 41 ++++++++++++++++----
6 files changed, 125 insertions(+), 30 deletions(-)
diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 5961b2d..8ee1f97 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -17,6 +17,7 @@
#define HPAGE_SHIFT 23
#define REAL_HPAGE_SHIFT 22
+#define HPAGE_16GB_SHIFT 34
#define HPAGE_2GB_SHIFT 31
#define HPAGE_256MB_SHIFT 28
#define HPAGE_64K_SHIFT 16
@@ -28,7 +29,7 @@
#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
#define REAL_HPAGE_PER_HPAGE (_AC(1,UL) << (HPAGE_SHIFT - REAL_HPAGE_SHIFT))
-#define HUGE_MAX_HSTATE 4
+#define HUGE_MAX_HSTATE 5
#endif
#ifndef __ASSEMBLY__
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931..2444b02 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -414,6 +414,11 @@ static inline bool is_hugetlb_pmd(pmd_t pmd)
return !!(pmd_val(pmd) & _PAGE_PMD_HUGE);
}
+static inline bool is_hugetlb_pud(pud_t pud)
+{
+ return !!(pud_val(pud) & _PAGE_PUD_HUGE);
+}
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline pmd_t pmd_mkhuge(pmd_t pmd)
{
diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 32258e0..7b240a3 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
nop; \
699:
+ /* PUD has been loaded into REG1, interpret the value, seeing
+ * if it is a HUGE PUD or a normal one. If it is not valid
+ * then jump to FAIL_LABEL. If it is a HUGE PUD, and it
+ * translates to a valid PTE, branch to PTE_LABEL.
+ *
+ * We have to propagate bits [32:22] from the virtual address
+ * to resolve at 4M granularity.
+ */
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+ brz,pn REG1, FAIL_LABEL; \
+ sethi %uhi(_PAGE_PUD_HUGE), REG2; \
+ sllx REG2, 32, REG2; \
+ andcc REG1, REG2, %g0; \
+ be,pt %xcc, 700f; \
+ sethi %hi(0x1ffc0000), REG2; \
+ sllx REG2, 1, REG2; \
+ brgez,pn REG1, FAIL_LABEL; \
+ andn REG1, REG2, REG1; \
+ and VADDR, REG2, REG2; \
+ brlz,pt REG1, PTE_LABEL; \
+ or REG1, REG2, REG1; \
+700:
+#else
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+ brz,pn REG1, FAIL_LABEL; \
+ nop;
+#endif
+
/* PMD has been loaded into REG1, interpret the value, seeing
* if it is a HUGE PMD or a normal one. If it is not valid
* then jump to FAIL_LABEL. If it is a HUGE PMD, and it
@@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+ USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 07c0df9..5f42ac0 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -117,7 +117,7 @@ tsb_miss_page_table_walk_sun4v_fastpath:
/* Valid PTE is now in %g5. */
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
- sethi %uhi(_PAGE_PMD_HUGE), %g7
+ sethi %uhi(_PAGE_PMD_HUGE | _PAGE_PUD_HUGE), %g7
sllx %g7, 32, %g7
andcc %g5, %g7, %g0
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 88855e3..f0bb42d 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -143,6 +143,10 @@ static pte_t sun4v_hugepage_shift_to_tte(pte_t entry, unsigned int shift)
pte_val(entry) = pte_val(entry) & ~_PAGE_SZALL_4V;
switch (shift) {
+ case HPAGE_16GB_SHIFT:
+ hugepage_size = _PAGE_SZ16GB_4V;
+ pte_val(entry) |= _PAGE_PUD_HUGE;
+ break;
case HPAGE_2GB_SHIFT:
hugepage_size = _PAGE_SZ2GB_4V;
pte_val(entry) |= _PAGE_PMD_HUGE;
@@ -187,6 +191,9 @@ static unsigned int sun4v_huge_tte_to_shift(pte_t entry)
unsigned int shift;
switch (tte_szbits) {
+ case _PAGE_SZ16GB_4V:
+ shift = HPAGE_16GB_SHIFT;
+ break;
case _PAGE_SZ2GB_4V:
shift = HPAGE_2GB_SHIFT;
break;
@@ -263,7 +270,12 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
pgd = pgd_offset(mm, addr);
pud = pud_alloc(mm, pgd, addr);
- if (pud) {
+ if (!pud)
+ return NULL;
+
+ if (sz >= PUD_SIZE)
+ pte = (pte_t *)pud;
+ else {
pmd = pmd_alloc(mm, pud, addr);
if (!pmd)
return NULL;
@@ -288,12 +300,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
if (!pgd_none(*pgd)) {
pud = pud_offset(pgd, addr);
if (!pud_none(*pud)) {
- pmd = pmd_offset(pud, addr);
- if (!pmd_none(*pmd)) {
- if (is_hugetlb_pmd(*pmd))
- pte = (pte_t *)pmd;
- else
- pte = pte_offset_map(pmd, addr);
+ if (is_hugetlb_pud(*pud))
+ pte = (pte_t *)pud;
+ else {
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_none(*pmd)) {
+ if (is_hugetlb_pmd(*pmd))
+ pte = (pte_t *)pmd;
+ else
+ pte = pte_offset_map(pmd, addr);
+ }
}
}
}
@@ -304,12 +320,20 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t entry)
{
- unsigned int i, nptes, orig_shift, shift;
- unsigned long size;
+ unsigned int nptes, orig_shift, shift;
+ unsigned long i, size;
pte_t orig;
size = huge_tte_to_size(entry);
- shift = size >= HPAGE_SIZE ? PMD_SHIFT : PAGE_SHIFT;
+
+ shift = PAGE_SHIFT;
+ if (size >= PUD_SIZE)
+ shift = PUD_SHIFT;
+ else if (size >= PMD_SIZE)
+ shift = PMD_SHIFT;
+ else
+ shift = PAGE_SHIFT;
+
nptes = size >> shift;
if (!pte_present(*ptep) && pte_present(entry))
@@ -332,19 +356,23 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
- unsigned int i, nptes, hugepage_shift;
+ unsigned int i, nptes, orig_shift, shift;
unsigned long size;
pte_t entry;
entry = *ptep;
size = huge_tte_to_size(entry);
- if (size >= HPAGE_SIZE)
- nptes = size >> PMD_SHIFT;
+
+ shift = PAGE_SHIFT;
+ if (size >= PUD_SIZE)
+ shift = PUD_SHIFT;
+ else if (size >= PMD_SIZE)
+ shift = PMD_SHIFT;
else
- nptes = size >> PAGE_SHIFT;
+ shift = PAGE_SHIFT;
- hugepage_shift = pte_none(entry) ? PAGE_SHIFT :
- huge_tte_to_shift(entry);
+ nptes = size >> shift;
+ orig_shift = pte_none(entry) ? PAGE_SHIFT : huge_tte_to_shift(entry);
if (pte_present(entry))
mm->context.hugetlb_pte_count -= nptes;
@@ -353,11 +381,11 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
for (i = 0; i < nptes; i++)
ptep[i] = __pte(0UL);
- maybe_tlb_batch_add(mm, addr, ptep, entry, 0, hugepage_shift);
+ maybe_tlb_batch_add(mm, addr, ptep, entry, 0, orig_shift);
/* An HPAGE_SIZE'ed page is composed of two REAL_HPAGE_SIZE'ed pages */
if (size = HPAGE_SIZE)
maybe_tlb_batch_add(mm, addr + REAL_HPAGE_SIZE, ptep, entry, 0,
- hugepage_shift);
+ orig_shift);
return entry;
}
@@ -370,7 +398,8 @@ int pmd_huge(pmd_t pmd)
int pud_huge(pud_t pud)
{
- return 0;
+ return !pud_none(pud) &&
+ (pud_val(pud) & (_PAGE_VALID|_PAGE_PUD_HUGE)) != _PAGE_VALID;
}
static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
@@ -434,8 +463,11 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
next = pud_addr_end(addr, end);
if (pud_none_or_clear_bad(pud))
continue;
- hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
- ceiling);
+ if (is_hugetlb_pud(*pud))
+ pud_clear(pud);
+ else
+ hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
+ ceiling);
} while (pud++, addr = next, addr != end);
start &= PGDIR_MASK;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd..cc8d0d4 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -337,6 +337,10 @@ static int __init setup_hugepagesz(char *string)
hugepage_shift = ilog2(hugepage_size);
switch (hugepage_shift) {
+ case HPAGE_16GB_SHIFT:
+ hv_pgsz_mask = HV_PGSZ_MASK_16GB;
+ hv_pgsz_idx = HV_PGSZ_IDX_16GB;
+ break;
case HPAGE_2GB_SHIFT:
hv_pgsz_mask = HV_PGSZ_MASK_2GB;
hv_pgsz_idx = HV_PGSZ_IDX_2GB;
@@ -377,6 +381,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
{
struct mm_struct *mm;
unsigned long flags;
+ bool is_huge_tsb;
pte_t pte = *ptep;
if (tlb_type != hypervisor) {
@@ -394,15 +399,37 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
spin_lock_irqsave(&mm->context.lock, flags);
+ is_huge_tsb = false;
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
- if ((mm->context.hugetlb_pte_count || mm->context.thp_pte_count) &&
- is_hugetlb_pmd(__pmd(pte_val(pte)))) {
- /* We are fabricating 8MB pages using 4MB real hw pages. */
- pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
- __update_mmu_tsb_insert(mm, MM_TSB_HUGE, REAL_HPAGE_SHIFT,
- address, pte_val(pte));
- } else
+ if (mm->context.hugetlb_pte_count || mm->context.thp_pte_count) {
+ unsigned long hugepage_size = PAGE_SIZE;
+
+ if (is_vm_hugetlb_page(vma))
+ hugepage_size = huge_page_size(hstate_vma(vma));
+
+ if (hugepage_size >= PUD_SIZE) {
+ unsigned long mask = 0x1ffc00000UL;
+
+ /* Transfer bits [32:22] from address to resolve
+ * at 4M granularity.
+ */
+ pte_val(pte) &= ~mask;
+ pte_val(pte) |= (address & mask);
+ } else if (hugepage_size >= PMD_SIZE) {
+ /* We are fabricating 8MB pages using 4MB
+ * real hw pages.
+ */
+ pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
+ }
+
+ if (hugepage_size >= PMD_SIZE) {
+ __update_mmu_tsb_insert(mm, MM_TSB_HUGE,
+ REAL_HPAGE_SHIFT, address, pte_val(pte));
+ is_huge_tsb = true;
+ }
+ }
#endif
+ if (!is_huge_tsb)
__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
address, pte_val(pte));
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 1/4] sparc64: Add 16GB hugepage support
@ 2017-06-21 1:05 ` Nitin Gupta
0 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: David S. Miller, Nitin Gupta, Mike Kravetz, Kirill A. Shutemov,
Tom Hromatka, Michal Hocko, Ingo Molnar, Hugh Dickins, bob picco,
Pavel Tatashin, Steven Sistare, Paul Gortmaker, Thomas Tai,
Atish Patra, sparclinux, linux-kernel
Adds support for 16GB hugepage size. To use this page size
use kernel parameters as:
default_hugepagesz=16G hugepagesz=16G hugepages=10
Testing:
Tested with the stream benchmark which allocates 48G of
arrays backed by 16G hugepages and does RW operation on
them in parallel.
Orabug: 25362942
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/include/asm/page_64.h | 3 +-
arch/sparc/include/asm/pgtable_64.h | 5 +++
arch/sparc/include/asm/tsb.h | 30 +++++++++++++++
arch/sparc/kernel/tsb.S | 2 +-
arch/sparc/mm/hugetlbpage.c | 74 ++++++++++++++++++++++++++-----------
arch/sparc/mm/init_64.c | 41 ++++++++++++++++----
6 files changed, 125 insertions(+), 30 deletions(-)
diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 5961b2d..8ee1f97 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -17,6 +17,7 @@
#define HPAGE_SHIFT 23
#define REAL_HPAGE_SHIFT 22
+#define HPAGE_16GB_SHIFT 34
#define HPAGE_2GB_SHIFT 31
#define HPAGE_256MB_SHIFT 28
#define HPAGE_64K_SHIFT 16
@@ -28,7 +29,7 @@
#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
#define REAL_HPAGE_PER_HPAGE (_AC(1,UL) << (HPAGE_SHIFT - REAL_HPAGE_SHIFT))
-#define HUGE_MAX_HSTATE 4
+#define HUGE_MAX_HSTATE 5
#endif
#ifndef __ASSEMBLY__
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931..2444b02 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -414,6 +414,11 @@ static inline bool is_hugetlb_pmd(pmd_t pmd)
return !!(pmd_val(pmd) & _PAGE_PMD_HUGE);
}
+static inline bool is_hugetlb_pud(pud_t pud)
+{
+ return !!(pud_val(pud) & _PAGE_PUD_HUGE);
+}
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline pmd_t pmd_mkhuge(pmd_t pmd)
{
diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 32258e0..7b240a3 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
nop; \
699:
+ /* PUD has been loaded into REG1, interpret the value, seeing
+ * if it is a HUGE PUD or a normal one. If it is not valid
+ * then jump to FAIL_LABEL. If it is a HUGE PUD, and it
+ * translates to a valid PTE, branch to PTE_LABEL.
+ *
+ * We have to propagate bits [32:22] from the virtual address
+ * to resolve at 4M granularity.
+ */
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+ brz,pn REG1, FAIL_LABEL; \
+ sethi %uhi(_PAGE_PUD_HUGE), REG2; \
+ sllx REG2, 32, REG2; \
+ andcc REG1, REG2, %g0; \
+ be,pt %xcc, 700f; \
+ sethi %hi(0x1ffc0000), REG2; \
+ sllx REG2, 1, REG2; \
+ brgez,pn REG1, FAIL_LABEL; \
+ andn REG1, REG2, REG1; \
+ and VADDR, REG2, REG2; \
+ brlz,pt REG1, PTE_LABEL; \
+ or REG1, REG2, REG1; \
+700:
+#else
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+ brz,pn REG1, FAIL_LABEL; \
+ nop;
+#endif
+
/* PMD has been loaded into REG1, interpret the value, seeing
* if it is a HUGE PMD or a normal one. If it is not valid
* then jump to FAIL_LABEL. If it is a HUGE PMD, and it
@@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+ USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 07c0df9..5f42ac0 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -117,7 +117,7 @@ tsb_miss_page_table_walk_sun4v_fastpath:
/* Valid PTE is now in %g5. */
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
- sethi %uhi(_PAGE_PMD_HUGE), %g7
+ sethi %uhi(_PAGE_PMD_HUGE | _PAGE_PUD_HUGE), %g7
sllx %g7, 32, %g7
andcc %g5, %g7, %g0
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 88855e3..f0bb42d 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -143,6 +143,10 @@ static pte_t sun4v_hugepage_shift_to_tte(pte_t entry, unsigned int shift)
pte_val(entry) = pte_val(entry) & ~_PAGE_SZALL_4V;
switch (shift) {
+ case HPAGE_16GB_SHIFT:
+ hugepage_size = _PAGE_SZ16GB_4V;
+ pte_val(entry) |= _PAGE_PUD_HUGE;
+ break;
case HPAGE_2GB_SHIFT:
hugepage_size = _PAGE_SZ2GB_4V;
pte_val(entry) |= _PAGE_PMD_HUGE;
@@ -187,6 +191,9 @@ static unsigned int sun4v_huge_tte_to_shift(pte_t entry)
unsigned int shift;
switch (tte_szbits) {
+ case _PAGE_SZ16GB_4V:
+ shift = HPAGE_16GB_SHIFT;
+ break;
case _PAGE_SZ2GB_4V:
shift = HPAGE_2GB_SHIFT;
break;
@@ -263,7 +270,12 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
pgd = pgd_offset(mm, addr);
pud = pud_alloc(mm, pgd, addr);
- if (pud) {
+ if (!pud)
+ return NULL;
+
+ if (sz >= PUD_SIZE)
+ pte = (pte_t *)pud;
+ else {
pmd = pmd_alloc(mm, pud, addr);
if (!pmd)
return NULL;
@@ -288,12 +300,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
if (!pgd_none(*pgd)) {
pud = pud_offset(pgd, addr);
if (!pud_none(*pud)) {
- pmd = pmd_offset(pud, addr);
- if (!pmd_none(*pmd)) {
- if (is_hugetlb_pmd(*pmd))
- pte = (pte_t *)pmd;
- else
- pte = pte_offset_map(pmd, addr);
+ if (is_hugetlb_pud(*pud))
+ pte = (pte_t *)pud;
+ else {
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_none(*pmd)) {
+ if (is_hugetlb_pmd(*pmd))
+ pte = (pte_t *)pmd;
+ else
+ pte = pte_offset_map(pmd, addr);
+ }
}
}
}
@@ -304,12 +320,20 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t entry)
{
- unsigned int i, nptes, orig_shift, shift;
- unsigned long size;
+ unsigned int nptes, orig_shift, shift;
+ unsigned long i, size;
pte_t orig;
size = huge_tte_to_size(entry);
- shift = size >= HPAGE_SIZE ? PMD_SHIFT : PAGE_SHIFT;
+
+ shift = PAGE_SHIFT;
+ if (size >= PUD_SIZE)
+ shift = PUD_SHIFT;
+ else if (size >= PMD_SIZE)
+ shift = PMD_SHIFT;
+ else
+ shift = PAGE_SHIFT;
+
nptes = size >> shift;
if (!pte_present(*ptep) && pte_present(entry))
@@ -332,19 +356,23 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
- unsigned int i, nptes, hugepage_shift;
+ unsigned int i, nptes, orig_shift, shift;
unsigned long size;
pte_t entry;
entry = *ptep;
size = huge_tte_to_size(entry);
- if (size >= HPAGE_SIZE)
- nptes = size >> PMD_SHIFT;
+
+ shift = PAGE_SHIFT;
+ if (size >= PUD_SIZE)
+ shift = PUD_SHIFT;
+ else if (size >= PMD_SIZE)
+ shift = PMD_SHIFT;
else
- nptes = size >> PAGE_SHIFT;
+ shift = PAGE_SHIFT;
- hugepage_shift = pte_none(entry) ? PAGE_SHIFT :
- huge_tte_to_shift(entry);
+ nptes = size >> shift;
+ orig_shift = pte_none(entry) ? PAGE_SHIFT : huge_tte_to_shift(entry);
if (pte_present(entry))
mm->context.hugetlb_pte_count -= nptes;
@@ -353,11 +381,11 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
for (i = 0; i < nptes; i++)
ptep[i] = __pte(0UL);
- maybe_tlb_batch_add(mm, addr, ptep, entry, 0, hugepage_shift);
+ maybe_tlb_batch_add(mm, addr, ptep, entry, 0, orig_shift);
/* An HPAGE_SIZE'ed page is composed of two REAL_HPAGE_SIZE'ed pages */
if (size == HPAGE_SIZE)
maybe_tlb_batch_add(mm, addr + REAL_HPAGE_SIZE, ptep, entry, 0,
- hugepage_shift);
+ orig_shift);
return entry;
}
@@ -370,7 +398,8 @@ int pmd_huge(pmd_t pmd)
int pud_huge(pud_t pud)
{
- return 0;
+ return !pud_none(pud) &&
+ (pud_val(pud) & (_PAGE_VALID|_PAGE_PUD_HUGE)) != _PAGE_VALID;
}
static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
@@ -434,8 +463,11 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
next = pud_addr_end(addr, end);
if (pud_none_or_clear_bad(pud))
continue;
- hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
- ceiling);
+ if (is_hugetlb_pud(*pud))
+ pud_clear(pud);
+ else
+ hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
+ ceiling);
} while (pud++, addr = next, addr != end);
start &= PGDIR_MASK;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd..cc8d0d4 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -337,6 +337,10 @@ static int __init setup_hugepagesz(char *string)
hugepage_shift = ilog2(hugepage_size);
switch (hugepage_shift) {
+ case HPAGE_16GB_SHIFT:
+ hv_pgsz_mask = HV_PGSZ_MASK_16GB;
+ hv_pgsz_idx = HV_PGSZ_IDX_16GB;
+ break;
case HPAGE_2GB_SHIFT:
hv_pgsz_mask = HV_PGSZ_MASK_2GB;
hv_pgsz_idx = HV_PGSZ_IDX_2GB;
@@ -377,6 +381,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
{
struct mm_struct *mm;
unsigned long flags;
+ bool is_huge_tsb;
pte_t pte = *ptep;
if (tlb_type != hypervisor) {
@@ -394,15 +399,37 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
spin_lock_irqsave(&mm->context.lock, flags);
+ is_huge_tsb = false;
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
- if ((mm->context.hugetlb_pte_count || mm->context.thp_pte_count) &&
- is_hugetlb_pmd(__pmd(pte_val(pte)))) {
- /* We are fabricating 8MB pages using 4MB real hw pages. */
- pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
- __update_mmu_tsb_insert(mm, MM_TSB_HUGE, REAL_HPAGE_SHIFT,
- address, pte_val(pte));
- } else
+ if (mm->context.hugetlb_pte_count || mm->context.thp_pte_count) {
+ unsigned long hugepage_size = PAGE_SIZE;
+
+ if (is_vm_hugetlb_page(vma))
+ hugepage_size = huge_page_size(hstate_vma(vma));
+
+ if (hugepage_size >= PUD_SIZE) {
+ unsigned long mask = 0x1ffc00000UL;
+
+ /* Transfer bits [32:22] from address to resolve
+ * at 4M granularity.
+ */
+ pte_val(pte) &= ~mask;
+ pte_val(pte) |= (address & mask);
+ } else if (hugepage_size >= PMD_SIZE) {
+ /* We are fabricating 8MB pages using 4MB
+ * real hw pages.
+ */
+ pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
+ }
+
+ if (hugepage_size >= PMD_SIZE) {
+ __update_mmu_tsb_insert(mm, MM_TSB_HUGE,
+ REAL_HPAGE_SHIFT, address, pte_val(pte));
+ is_huge_tsb = true;
+ }
+ }
#endif
+ if (!is_huge_tsb)
__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
address, pte_val(pte));
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/4] sparc64: Support huge PUD case in get_user_pages
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
@ 2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: Nitin Gupta, Kirill A. Shutemov, Tom Hromatka, Michal Hocko,
Ingo Molnar, Lorenzo Stoakes, Jan Kara, sparclinux, linux-kernel
get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/include/asm/pgtable_64.h | 15 ++++++++++--
arch/sparc/mm/gup.c | 47 ++++++++++++++++++++++++++++++++++++-
2 files changed, 59 insertions(+), 3 deletions(-)
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2444b02..4fefe37 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -692,6 +692,8 @@ static inline unsigned long pmd_write(pmd_t pmd)
return pte_write(pte);
}
+#define pud_write(pud) pte_write(__pte(pud_val(pud)))
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline unsigned long pmd_dirty(pmd_t pmd)
{
@@ -828,9 +830,18 @@ static inline unsigned long __pmd_page(pmd_t pmd)
return ((unsigned long) __va(pfn << PAGE_SHIFT));
}
+
+static inline unsigned long pud_page_vaddr(pud_t pud)
+{
+ pte_t pte = __pte(pud_val(pud));
+ unsigned long pfn;
+
+ pfn = pte_pfn(pte);
+
+ return ((unsigned long) __va(pfn << PAGE_SHIFT));
+}
+
#define pmd_page(pmd) virt_to_page((void *)__pmd_page(pmd))
-#define pud_page_vaddr(pud) \
- ((unsigned long) __va(pud_val(pud)))
#define pud_page(pud) virt_to_page((void *)pud_page_vaddr(pud))
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL)
#define pud_present(pud) (pud_val(pud) != 0U)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index cd0e32b..7cfa9c5 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -103,6 +103,47 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
return 1;
}
+static int gup_huge_pud(pud_t *pudp, pud_t pud, unsigned long addr,
+ unsigned long end, int write, struct page **pages,
+ int *nr)
+{
+ struct page *head, *page;
+ int refs;
+
+ if (!(pud_val(pud) & _PAGE_VALID))
+ return 0;
+
+ if (write && !pud_write(pud))
+ return 0;
+
+ refs = 0;
+ head = pud_page(pud);
+ page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+ if (PageTail(head))
+ head = compound_head(head);
+ do {
+ VM_BUG_ON(compound_head(page) != head);
+ pages[*nr] = page;
+ (*nr)++;
+ page++;
+ refs++;
+ } while (addr += PAGE_SIZE, addr != end);
+
+ if (!page_cache_add_speculative(head, refs)) {
+ *nr -= refs;
+ return 0;
+ }
+
+ if (unlikely(pud_val(pud) != pud_val(*pudp))) {
+ *nr -= refs;
+ while (refs--)
+ put_page(head);
+ return 0;
+ }
+
+ return 1;
+}
+
static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
int write, struct page **pages, int *nr)
{
@@ -141,7 +182,11 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
next = pud_addr_end(addr, end);
if (pud_none(pud))
return 0;
- if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+ if (unlikely(pud_large(pud))) {
+ if (!gup_huge_pud(pudp, pud, addr, next,
+ write, pages, nr))
+ return 0;
+ } else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
return 0;
} while (pudp++, addr = next, addr != end);
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/4] sparc64: Support huge PUD case in get_user_pages
@ 2017-06-21 1:05 ` Nitin Gupta
0 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: David S. Miller, Nitin Gupta, Kirill A. Shutemov, Tom Hromatka,
Michal Hocko, Ingo Molnar, Lorenzo Stoakes, Jan Kara, sparclinux,
linux-kernel
get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/include/asm/pgtable_64.h | 15 ++++++++++--
arch/sparc/mm/gup.c | 47 ++++++++++++++++++++++++++++++++++++-
2 files changed, 59 insertions(+), 3 deletions(-)
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2444b02..4fefe37 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -692,6 +692,8 @@ static inline unsigned long pmd_write(pmd_t pmd)
return pte_write(pte);
}
+#define pud_write(pud) pte_write(__pte(pud_val(pud)))
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline unsigned long pmd_dirty(pmd_t pmd)
{
@@ -828,9 +830,18 @@ static inline unsigned long __pmd_page(pmd_t pmd)
return ((unsigned long) __va(pfn << PAGE_SHIFT));
}
+
+static inline unsigned long pud_page_vaddr(pud_t pud)
+{
+ pte_t pte = __pte(pud_val(pud));
+ unsigned long pfn;
+
+ pfn = pte_pfn(pte);
+
+ return ((unsigned long) __va(pfn << PAGE_SHIFT));
+}
+
#define pmd_page(pmd) virt_to_page((void *)__pmd_page(pmd))
-#define pud_page_vaddr(pud) \
- ((unsigned long) __va(pud_val(pud)))
#define pud_page(pud) virt_to_page((void *)pud_page_vaddr(pud))
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL)
#define pud_present(pud) (pud_val(pud) != 0U)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index cd0e32b..7cfa9c5 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -103,6 +103,47 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
return 1;
}
+static int gup_huge_pud(pud_t *pudp, pud_t pud, unsigned long addr,
+ unsigned long end, int write, struct page **pages,
+ int *nr)
+{
+ struct page *head, *page;
+ int refs;
+
+ if (!(pud_val(pud) & _PAGE_VALID))
+ return 0;
+
+ if (write && !pud_write(pud))
+ return 0;
+
+ refs = 0;
+ head = pud_page(pud);
+ page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+ if (PageTail(head))
+ head = compound_head(head);
+ do {
+ VM_BUG_ON(compound_head(page) != head);
+ pages[*nr] = page;
+ (*nr)++;
+ page++;
+ refs++;
+ } while (addr += PAGE_SIZE, addr != end);
+
+ if (!page_cache_add_speculative(head, refs)) {
+ *nr -= refs;
+ return 0;
+ }
+
+ if (unlikely(pud_val(pud) != pud_val(*pudp))) {
+ *nr -= refs;
+ while (refs--)
+ put_page(head);
+ return 0;
+ }
+
+ return 1;
+}
+
static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
int write, struct page **pages, int *nr)
{
@@ -141,7 +182,11 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
next = pud_addr_end(addr, end);
if (pud_none(pud))
return 0;
- if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+ if (unlikely(pud_large(pud))) {
+ if (!gup_huge_pud(pudp, pud, addr, next,
+ write, pages, nr))
+ return 0;
+ } else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
return 0;
} while (pudp++, addr = next, addr != end);
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/4] sparc64: Fix gup_huge_pmd
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
@ 2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: Nitin Gupta, Lorenzo Stoakes, Jan Kara, Michal Hocko, sparclinux,
linux-kernel
The function assumes that each PMD points to head of a
huge page. This is not correct as a PMD can point to
start of any 8M region with a, say 256M, hugepage. The
fix ensures that it points to the correct head of any PMD
huge page.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/mm/gup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index 7cfa9c5..b1c649d 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -80,6 +80,8 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
refs = 0;
head = pmd_page(pmd);
page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ if (PageTail(head))
+ head = compound_head(head);
do {
VM_BUG_ON(compound_head(page) != head);
pages[*nr] = page;
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/4] sparc64: Fix gup_huge_pmd
@ 2017-06-21 1:05 ` Nitin Gupta
0 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: David S. Miller, Nitin Gupta, Lorenzo Stoakes, Jan Kara,
Michal Hocko, sparclinux, linux-kernel
The function assumes that each PMD points to head of a
huge page. This is not correct as a PMD can point to
start of any 8M region with a, say 256M, hugepage. The
fix ensures that it points to the correct head of any PMD
huge page.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/mm/gup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index 7cfa9c5..b1c649d 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -80,6 +80,8 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
refs = 0;
head = pmd_page(pmd);
page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ if (PageTail(head))
+ head = compound_head(head);
do {
VM_BUG_ON(compound_head(page) != head);
pages[*nr] = page;
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/4] sparc64: Cleanup hugepage table walk functions
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
@ 2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: Nitin Gupta, Mike Kravetz, Michal Hocko, Ingo Molnar,
Hugh Dickins, sparclinux, linux-kernel
Flatten out nested code structure in huge_pte_offset()
and huge_pte_alloc().
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/mm/hugetlbpage.c | 54 +++++++++++++++++----------------------------
1 file changed, 20 insertions(+), 34 deletions(-)
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index f0bb42d..e8b7245 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -266,27 +266,19 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
- pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
pud = pud_alloc(mm, pgd, addr);
if (!pud)
return NULL;
-
if (sz >= PUD_SIZE)
- pte = (pte_t *)pud;
- else {
- pmd = pmd_alloc(mm, pud, addr);
- if (!pmd)
- return NULL;
-
- if (sz >= PMD_SIZE)
- pte = (pte_t *)pmd;
- else
- pte = pte_alloc_map(mm, pmd, addr);
- }
-
- return pte;
+ return (pte_t *)pud;
+ pmd = pmd_alloc(mm, pud, addr);
+ if (!pmd)
+ return NULL;
+ if (sz >= PMD_SIZE)
+ return (pte_t *)pmd;
+ return pte_alloc_map(mm, pmd, addr);
}
pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
@@ -294,27 +286,21 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
- pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- if (!pgd_none(*pgd)) {
- pud = pud_offset(pgd, addr);
- if (!pud_none(*pud)) {
- if (is_hugetlb_pud(*pud))
- pte = (pte_t *)pud;
- else {
- pmd = pmd_offset(pud, addr);
- if (!pmd_none(*pmd)) {
- if (is_hugetlb_pmd(*pmd))
- pte = (pte_t *)pmd;
- else
- pte = pte_offset_map(pmd, addr);
- }
- }
- }
- }
-
- return pte;
+ if (pgd_none(*pgd))
+ return NULL;
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ return NULL;
+ if (is_hugetlb_pud(*pud))
+ return (pte_t *)pud;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ return NULL;
+ if (is_hugetlb_pmd(*pmd))
+ return (pte_t *)pmd;
+ return pte_offset_map(pmd, addr);
}
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/4] sparc64: Cleanup hugepage table walk functions
@ 2017-06-21 1:05 ` Nitin Gupta
0 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 1:05 UTC (permalink / raw)
To: David S. Miller
Cc: David S. Miller, Nitin Gupta, Mike Kravetz, Michal Hocko,
Ingo Molnar, Hugh Dickins, sparclinux, linux-kernel
Flatten out nested code structure in huge_pte_offset()
and huge_pte_alloc().
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
arch/sparc/mm/hugetlbpage.c | 54 +++++++++++++++++----------------------------
1 file changed, 20 insertions(+), 34 deletions(-)
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index f0bb42d..e8b7245 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -266,27 +266,19 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
- pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
pud = pud_alloc(mm, pgd, addr);
if (!pud)
return NULL;
-
if (sz >= PUD_SIZE)
- pte = (pte_t *)pud;
- else {
- pmd = pmd_alloc(mm, pud, addr);
- if (!pmd)
- return NULL;
-
- if (sz >= PMD_SIZE)
- pte = (pte_t *)pmd;
- else
- pte = pte_alloc_map(mm, pmd, addr);
- }
-
- return pte;
+ return (pte_t *)pud;
+ pmd = pmd_alloc(mm, pud, addr);
+ if (!pmd)
+ return NULL;
+ if (sz >= PMD_SIZE)
+ return (pte_t *)pmd;
+ return pte_alloc_map(mm, pmd, addr);
}
pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
@@ -294,27 +286,21 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
- pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- if (!pgd_none(*pgd)) {
- pud = pud_offset(pgd, addr);
- if (!pud_none(*pud)) {
- if (is_hugetlb_pud(*pud))
- pte = (pte_t *)pud;
- else {
- pmd = pmd_offset(pud, addr);
- if (!pmd_none(*pmd)) {
- if (is_hugetlb_pmd(*pmd))
- pte = (pte_t *)pmd;
- else
- pte = pte_offset_map(pmd, addr);
- }
- }
- }
- }
-
- return pte;
+ if (pgd_none(*pgd))
+ return NULL;
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ return NULL;
+ if (is_hugetlb_pud(*pud))
+ return (pte_t *)pud;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ return NULL;
+ if (is_hugetlb_pmd(*pmd))
+ return (pte_t *)pmd;
+ return pte_offset_map(pmd, addr);
}
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
--
2.9.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/4] sparc64: Add 16GB hugepage support
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
` (3 preceding siblings ...)
2017-06-21 1:05 ` Nitin Gupta
@ 2017-06-21 1:21 ` David Miller
2017-06-21 2:15 ` Nitin Gupta
2017-06-21 4:11 ` David Miller
6 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2017-06-21 1:21 UTC (permalink / raw)
To: sparclinux
From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Tue, 20 Jun 2017 18:05:53 -0700
> Patch 1/4: Core changes needed to add 16G hugepage support: To map a
> single 16G hugepage, two PUD entries are used. Each PUD entry maps
> 8G portion of a 16G page. This page table encoding scheme is same as
> that used for hugepages at PMD level (8M, 256M and 2G pages) where
> each PMD entry points successively to 8M regions within a page. No
> page table entries below the PUD level are allocated for 16G
> hugepage since those are not required.
>
> TSB entries for a 16G page are created at every 4M boundary since
> the HUGE_TSB is used for these pages which is configured with page
> size of 4M. When walking page tables (on a TSB miss), bits [32:22]
> are transferred from vaddr to PUD to resolve addresses at 4M
> boundary. The resolved address mapping is then stored in HUGE_TSB.
Ok this is a new feature.
> Patch 2/4: get_user_pages() etc. are used for direct IO. These
> functions were not aware of hugepages at the PUD level and would try
> to continue walking page tables beyond the PUD level. Since 16G
> hugepages have page tables allocated till PUD level only, these
> accesses would result in invalid access. This patch adds the case
> for PUD huge pages to these functions.
>
> Patch 3/4: Fixes a bug in gup_huge_pmd() which caused wrong page
> within a hugepage to be considered as the head page. gup_huge_pud()
> already has this fix.
These are bug fixes that appear to be relevant for mainline right?
Please submit these two separately.
> Patch 4/4: Patch 1 added the case of PUD entry being huge in page
> table walk and alloc functions. This new case further increased
> nesting in these functions and made them harder to follow. This
> patch flattens these functions for better readability.
Ok, this is a cleanup and can go along with the new feature patch #1.
So submit the two bug fixes against 'sparc' GIT, then when I next
merge 'sparc' into 'sparc-next', you can submit #1 and #4.
THanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/4] sparc64: Add 16GB hugepage support
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
` (4 preceding siblings ...)
2017-06-21 1:21 ` [PATCH 0/4] sparc64: Add 16GB hugepage support David Miller
@ 2017-06-21 2:15 ` Nitin Gupta
2017-06-21 4:11 ` David Miller
6 siblings, 0 replies; 12+ messages in thread
From: Nitin Gupta @ 2017-06-21 2:15 UTC (permalink / raw)
To: sparclinux
On 6/20/17 6:21 PM, David Miller wrote:
> From: Nitin Gupta <nitin.m.gupta@oracle.com>
> Date: Tue, 20 Jun 2017 18:05:53 -0700
>
>> Patch 1/4: Core changes needed to add 16G hugepage support: To map a
>> single 16G hugepage, two PUD entries are used. Each PUD entry maps
>> 8G portion of a 16G page. This page table encoding scheme is same as
>> that used for hugepages at PMD level (8M, 256M and 2G pages) where
>> each PMD entry points successively to 8M regions within a page. No
>> page table entries below the PUD level are allocated for 16G
>> hugepage since those are not required.
>>
>> TSB entries for a 16G page are created at every 4M boundary since
>> the HUGE_TSB is used for these pages which is configured with page
>> size of 4M. When walking page tables (on a TSB miss), bits [32:22]
>> are transferred from vaddr to PUD to resolve addresses at 4M
>> boundary. The resolved address mapping is then stored in HUGE_TSB.
>
> Ok this is a new feature.
>
>> Patch 2/4: get_user_pages() etc. are used for direct IO. These
>> functions were not aware of hugepages at the PUD level and would try
>> to continue walking page tables beyond the PUD level. Since 16G
>> hugepages have page tables allocated till PUD level only, these
>> accesses would result in invalid access. This patch adds the case
>> for PUD huge pages to these functions.
>>
>> Patch 3/4: Fixes a bug in gup_huge_pmd() which caused wrong page
>> within a hugepage to be considered as the head page. gup_huge_pud()
>> already has this fix.
>
> These are bug fixes that appear to be relevant for mainline right?
>
> Please submit these two separately.
Yes, these two bugfixes are relevant for mainline. I think Patch 2
should go along with feature patches since without Patch 1, which adds
16G hugepage support, Patch 2 does not make any sense. Patch 2 is more
like an extension of Patch 1, and adds direct IO support over 16G pages.
>
>> Patch 4/4: Patch 1 added the case of PUD entry being huge in page
>> table walk and alloc functions. This new case further increased
>> nesting in these functions and made them harder to follow. This
>> patch flattens these functions for better readability.
>
> Ok, this is a cleanup and can go along with the new feature patch #1.
>
> So submit the two bug fixes against 'sparc' GIT, then when I next
> merge 'sparc' into 'sparc-next', you can submit #1 and #4.
>
Ok, please let me know if you agree with above and I should send only
Patch 3 for now. When this fix makes it into sparc-next, I will send
feature patches against the sparc tree.
Thanks,
Nitin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/4] sparc64: Add 16GB hugepage support
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
` (5 preceding siblings ...)
2017-06-21 2:15 ` Nitin Gupta
@ 2017-06-21 4:11 ` David Miller
6 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2017-06-21 4:11 UTC (permalink / raw)
To: sparclinux
From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Tue, 20 Jun 2017 19:15:55 -0700
> On 6/20/17 6:21 PM, David Miller wrote:
>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>> Date: Tue, 20 Jun 2017 18:05:53 -0700
>>
>>> Patch 1/4: Core changes needed to add 16G hugepage support: To map a
>>> single 16G hugepage, two PUD entries are used. Each PUD entry maps
>>> 8G portion of a 16G page. This page table encoding scheme is same as
>>> that used for hugepages at PMD level (8M, 256M and 2G pages) where
>>> each PMD entry points successively to 8M regions within a page. No
>>> page table entries below the PUD level are allocated for 16G
>>> hugepage since those are not required.
>>>
>>> TSB entries for a 16G page are created at every 4M boundary since
>>> the HUGE_TSB is used for these pages which is configured with page
>>> size of 4M. When walking page tables (on a TSB miss), bits [32:22]
>>> are transferred from vaddr to PUD to resolve addresses at 4M
>>> boundary. The resolved address mapping is then stored in HUGE_TSB.
>>
>> Ok this is a new feature.
>>
>>> Patch 2/4: get_user_pages() etc. are used for direct IO. These
>>> functions were not aware of hugepages at the PUD level and would try
>>> to continue walking page tables beyond the PUD level. Since 16G
>>> hugepages have page tables allocated till PUD level only, these
>>> accesses would result in invalid access. This patch adds the case
>>> for PUD huge pages to these functions.
>>>
>>> Patch 3/4: Fixes a bug in gup_huge_pmd() which caused wrong page
>>> within a hugepage to be considered as the head page. gup_huge_pud()
>>> already has this fix.
>>
>> These are bug fixes that appear to be relevant for mainline right?
>>
>> Please submit these two separately.
>
>
> Yes, these two bugfixes are relevant for mainline. I think Patch 2
> should go along with feature patches since without Patch 1, which adds
> 16G hugepage support, Patch 2 does not make any sense. Patch 2 is more
> like an extension of Patch 1, and adds direct IO support over 16G pages.
Ok, so PUD huge pages don't happen without patch #1 and therefore
patch #2 doesn't apply without 16GB page support.
But this makes me think that you must combine patch #1 and #2,
because #1 adds a regression in that the page table walked in
get_user_pages() is broken by it.
Or, reverse them, do patch #2 to teach get_user_pages() about
pud huge pages. Then you can safely add #1.
>>> Patch 4/4: Patch 1 added the case of PUD entry being huge in page
>>> table walk and alloc functions. This new case further increased
>>> nesting in these functions and made them harder to follow. This
>>> patch flattens these functions for better readability.
>>
>> Ok, this is a cleanup and can go along with the new feature patch #1.
>>
>> So submit the two bug fixes against 'sparc' GIT, then when I next
>> merge 'sparc' into 'sparc-next', you can submit #1 and #4.
>>
>
> Ok, please let me know if you agree with above and I should send only
> Patch 3 for now. When this fix makes it into sparc-next, I will send
> feature patches against the sparc tree.
I do agree. #3 for sparc and #1+#2 & #3 for sparc-next.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-06-21 4:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-21 1:05 [PATCH 0/4] sparc64: Add 16GB hugepage support Nitin Gupta
2017-06-21 1:05 ` [PATCH 1/4] " Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` [PATCH 2/4] sparc64: Support huge PUD case in get_user_pages Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` [PATCH 3/4] sparc64: Fix gup_huge_pmd Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:05 ` [PATCH 4/4] sparc64: Cleanup hugepage table walk functions Nitin Gupta
2017-06-21 1:05 ` Nitin Gupta
2017-06-21 1:21 ` [PATCH 0/4] sparc64: Add 16GB hugepage support David Miller
2017-06-21 2:15 ` Nitin Gupta
2017-06-21 4:11 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.