* [PATCH -V3 10/25] powerpc: Return all the valid pte ecndoing in KVM_PPC_GET_SMMU_INFO ioctl
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/kvm/book3s_hv.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48f6d99..f472414 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1508,14 +1508,24 @@ long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *ret)
static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
int linux_psize)
{
+ int i, index = 0;
struct mmu_psize_def *def = &mmu_psize_defs[linux_psize];
if (!def->shift)
return;
(*sps)->page_shift = def->shift;
(*sps)->slb_enc = def->sllp;
- (*sps)->enc[0].page_shift = def->shift;
- (*sps)->enc[0].pte_enc = def->penc[linux_psize];
+ for (i = 0; i < MMU_PAGE_COUNT; i++) {
+ if (def->penc[i] != -1) {
+ if (index >= KVM_PPC_PAGE_SIZES_MAX_SZ) {
+ WARN_ON(1);
+ break;
+ }
+ (*sps)->enc[index].page_shift = mmu_psize_defs[i].shift;
+ (*sps)->enc[index].pte_enc = def->penc[i];
+ index++;
+ }
+ }
(*sps)++;
}
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 06/25] powerpc: Reduce PTE table memory wastage
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We allocate one page for the last level of linux page table. With THP and
large page size of 16MB, that would mean we are wasting large part
of that page. To map 16MB area, we only need a PTE space of 2K with 64K
page size. This patch reduce the space wastage by sharing the page
allocated for the last level of linux page table with multiple pmd
entries. We call these smaller chunks PTE page fragments and allocated
page, PTE page.
In order to support systems which doesn't have 64K HPTE support, we also
add another 2K to PTE page fragment. The second half of the PTE fragments
is used for storing slot and secondary bit information of an HPTE. With this
we now have a 4K PTE fragment.
We use a simple approach to share the PTE page. On allocation, we bump the
PTE page refcount to 16 and share the PTE page with the next 16 pte alloc
request. This should help in the node locality of the PTE page fragment,
assuming that the immediate pte alloc request will mostly come from the
same NUMA node. We don't try to reuse the freed PTE page fragment. Hence
we could be waisting some space.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mmu-book3e.h | 4 +
arch/powerpc/include/asm/mmu-hash64.h | 4 +
arch/powerpc/include/asm/page.h | 4 +
arch/powerpc/include/asm/pgalloc-64.h | 72 ++++-------------
arch/powerpc/kernel/setup_64.c | 4 +-
arch/powerpc/mm/mmu_context_hash64.c | 35 ++++++++
arch/powerpc/mm/pgtable_64.c | 144 +++++++++++++++++++++++++++++++++
7 files changed, 209 insertions(+), 58 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 99d43e0..affbd68 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -231,6 +231,10 @@ typedef struct {
u64 high_slices_psize; /* 4 bits per slice for now */
u16 user_psize; /* page size index */
#endif
+#ifdef CONFIG_PPC_64K_PAGES
+ /* for 4K PTE fragment support */
+ struct page *pgtable_page;
+#endif
} mm_context_t;
/* Page size definitions, common between 32 and 64-bit
diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 35bb51e..300ac3c 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -498,6 +498,10 @@ typedef struct {
unsigned long acop; /* mask of enabled coprocessor types */
unsigned int cop_pid; /* pid value used with coprocessors */
#endif /* CONFIG_PPC_ICSWX */
+#ifdef CONFIG_PPC_64K_PAGES
+ /* for 4K PTE fragment support */
+ struct page *pgtable_page;
+#endif
} mm_context_t;
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f072e97..38e7ff6 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -378,7 +378,11 @@ void arch_free_page(struct page *page, int order);
struct vm_area_struct;
+#ifdef CONFIG_PPC_64K_PAGES
+typedef pte_t *pgtable_t;
+#else
typedef struct page *pgtable_t;
+#endif
#include <asm-generic/memory_model.h>
#endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index cdbf555..3418989 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -150,6 +150,13 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
#else /* if CONFIG_PPC_64K_PAGES */
+extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
+extern void page_table_free(struct mm_struct *, unsigned long *, int);
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
+#ifdef CONFIG_SMP
+extern void __tlb_remove_table(void *_table);
+#endif
+
#define pud_populate(mm, pud, pmd) pud_set(pud, (unsigned long)pmd)
static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
@@ -161,90 +168,42 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
{
- pmd_populate_kernel(mm, pmd, page_address(pte_page));
+ pmd_set(pmd, (unsigned long)pte_page);
}
static inline pgtable_t pmd_pgtable(pmd_t pmd)
{
- return pmd_page(pmd);
+ return (pgtable_t)(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE);
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
+ return (pte_t *)page_table_alloc(mm, address, 1);
}
static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
- unsigned long address)
+ unsigned long address)
{
- struct page *page;
- pte_t *pte;
-
- pte = pte_alloc_one_kernel(mm, address);
- if (!pte)
- return NULL;
- page = virt_to_page(pte);
- pgtable_page_ctor(page);
- return page;
+ return (pgtable_t)page_table_alloc(mm, address, 0);
}
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
- free_page((unsigned long)pte);
+ page_table_free(mm, (unsigned long *)pte, 1);
}
static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
{
- pgtable_page_dtor(ptepage);
- __free_page(ptepage);
-}
-
-static inline void pgtable_free(void *table, unsigned index_size)
-{
- if (!index_size)
- free_page((unsigned long)table);
- else {
- BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
- kmem_cache_free(PGT_CACHE(index_size), table);
- }
+ page_table_free(mm, (unsigned long *)ptepage, 0);
}
-#ifdef CONFIG_SMP
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
- void *table, int shift)
-{
- unsigned long pgf = (unsigned long)table;
- BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
- pgf |= shift;
- tlb_remove_table(tlb, (void *)pgf);
-}
-
-static inline void __tlb_remove_table(void *_table)
-{
- void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
- unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
-
- pgtable_free(table, shift);
-}
-#else /* !CONFIG_SMP */
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
- void *table, int shift)
-{
- pgtable_free(table, shift);
-}
-#endif /* CONFIG_SMP */
-
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
unsigned long address)
{
- struct page *page = page_address(table);
-
tlb_flush_pgtable(tlb, address);
- pgtable_page_dtor(page);
- pgtable_free_tlb(tlb, page, 0);
+ pgtable_free_tlb(tlb, table, 0);
}
-
#endif /* CONFIG_PPC_64K_PAGES */
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -258,7 +217,6 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd);
}
-
#define __pmd_free_tlb(tlb, pmd, addr) \
pgtable_free_tlb(tlb, pmd, PMD_INDEX_SIZE)
#ifndef CONFIG_PPC_64K_PAGES
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6da881b..04d833c 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -575,7 +575,9 @@ void __init setup_arch(char **cmdline_p)
init_mm.end_code = (unsigned long) _etext;
init_mm.end_data = (unsigned long) _edata;
init_mm.brk = klimit;
-
+#ifdef CONFIG_PPC_64K_PAGES
+ init_mm.context.pgtable_page = NULL;
+#endif
irqstack_early_init();
exc_lvl_early_init();
emergency_stack_init();
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index 59cd773..fbfdca2 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -86,6 +86,9 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
spin_lock_init(mm->context.cop_lockp);
#endif /* CONFIG_PPC_ICSWX */
+#ifdef CONFIG_PPC_64K_PAGES
+ mm->context.pgtable_page = NULL;
+#endif
return 0;
}
@@ -97,13 +100,45 @@ void __destroy_context(int context_id)
}
EXPORT_SYMBOL_GPL(__destroy_context);
+#ifdef CONFIG_PPC_64K_PAGES
+static void destroy_pagetable_page(struct mm_struct *mm)
+{
+ int count;
+ struct page *page;
+
+ page = mm->context.pgtable_page;
+ if (!page)
+ return;
+
+ /* drop all the pending references */
+ count = atomic_read(&page->_mapcount) + 1;
+ /* We allow PTE_FRAG_NR(16) fragments from a PTE page */
+ count = atomic_sub_return(16 - count, &page->_count);
+ if (!count) {
+ pgtable_page_dtor(page);
+ reset_page_mapcount(page);
+ free_hot_cold_page(page, 0);
+ }
+}
+
+#else
+static inline void destroy_pagetable_page(struct mm_struct *mm)
+{
+ return;
+}
+#endif
+
+
void destroy_context(struct mm_struct *mm)
{
+
#ifdef CONFIG_PPC_ICSWX
drop_cop(mm->context.acop, mm);
kfree(mm->context.cop_lockp);
mm->context.cop_lockp = NULL;
#endif /* CONFIG_PPC_ICSWX */
+
+ destroy_pagetable_page(mm);
__destroy_context(mm->context.id);
subpage_prot_free(mm);
mm->context.id = MMU_NO_CONTEXT;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index e212a27..66fad08 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -337,3 +337,147 @@ EXPORT_SYMBOL(__ioremap_at);
EXPORT_SYMBOL(iounmap);
EXPORT_SYMBOL(__iounmap);
EXPORT_SYMBOL(__iounmap_at);
+
+#ifdef CONFIG_PPC_64K_PAGES
+/*
+ * we support 16 fragments per PTE page. This is limited by how many
+ * bits we can pack in page->_mapcount. We use the first half for
+ * tracking the usage for rcu page table free.
+ */
+#define PTE_FRAG_NR 16
+/*
+ * We use a 2K PTE page fragment and another 2K for storing
+ * real_pte_t hash index
+ */
+#define PTE_FRAG_SIZE (2 * PTRS_PER_PTE * sizeof(pte_t))
+
+static pte_t *__get_from_cache(struct mm_struct *mm)
+{
+ int index;
+ pte_t *ret = NULL;
+ struct page *page;
+
+ page = mm->context.pgtable_page;
+ if (page) {
+ void *p = page_address(page);
+ index = atomic_add_return(1, &page->_mapcount);
+ ret = (pte_t *) (p + (index * PTE_FRAG_SIZE));
+ /*
+ * If we have taken up all the fragments mark PTE page NULL
+ */
+ if (index == PTE_FRAG_NR - 1)
+ mm->context.pgtable_page = NULL;
+ }
+ return ret;
+}
+
+static pte_t *get_from_cache(struct mm_struct *mm)
+{
+ pte_t *ret = NULL;
+
+ spin_lock(&mm->page_table_lock);
+ ret = __get_from_cache(mm);
+ spin_unlock(&mm->page_table_lock);
+
+ return ret;
+}
+
+static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
+{
+ pte_t *ret = NULL;
+ struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
+ __GFP_REPEAT | __GFP_ZERO);
+
+ if (page) {
+ spin_lock(&mm->page_table_lock);
+ if (likely(!mm->context.pgtable_page)) {
+ atomic_set(&page->_count, PTE_FRAG_NR);
+ atomic_set(&page->_mapcount, 0);
+ mm->context.pgtable_page = page;
+ ret = (unsigned long *)page_address(page);
+ if (!kernel)
+ pgtable_page_ctor(page);
+ page = NULL;
+ } else
+ ret = __get_from_cache(mm);
+ spin_unlock(&mm->page_table_lock);
+ }
+
+ if (page)
+ free_hot_cold_page(page, 0);
+ return ret;
+}
+
+pte_t *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
+{
+ pte_t *pte;
+
+ pte = get_from_cache(mm);
+ if (pte)
+ return pte;
+
+ return __alloc_for_cache(mm, kernel);
+}
+
+void page_table_free(struct mm_struct *mm, unsigned long *table, int kernel)
+{
+ struct page *page = virt_to_page(table);
+ if (put_page_testzero(page)) {
+ if (!kernel)
+ pgtable_page_dtor(page);
+ reset_page_mapcount(page);
+ free_hot_cold_page(page, 0);
+ }
+}
+
+#ifdef CONFIG_SMP
+static void page_table_free_rcu(void *table)
+{
+ struct page *page = virt_to_page(table);
+ if (put_page_testzero(page)) {
+ pgtable_page_dtor(page);
+ reset_page_mapcount(page);
+ free_hot_cold_page(page, 0);
+ }
+}
+
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+{
+ unsigned long pgf = (unsigned long)table;
+
+ BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+ pgf |= shift;
+ tlb_remove_table(tlb, (void *)pgf);
+}
+
+void __tlb_remove_table(void *_table)
+{
+ void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+ unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+ if (!shift)
+ /* PTE page needs special handling */
+ page_table_free_rcu(table);
+ else {
+ BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+ kmem_cache_free(PGT_CACHE(shift), table);
+ }
+}
+#else
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+{
+ if (!shift) {
+ /* PTE page needs special handling */
+ struct page *page = virt_to_page(table);
+ if (put_page_testzero(page)) {
+ pgtable_page_dtor(page);
+ reset_page_mapcount(page);
+ free_hot_cold_page(page, 0);
+ }
+ } else {
+ BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+ kmem_cache_free(PGT_CACHE(shift), table);
+ }
+}
+#endif
+#endif /* CONFIG_PPC_64K_PAGES */
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 19/25] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries
From: Aneesh Kumar K.V @ 2013-03-15 9:40 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
HUGETLB clear the top bit of PMD entries and use that to indicate
a HUGETLB page directory. Since we store pfns in PMDs for THP,
we would have the top bit cleared by default. Add the top bit mask
for THP PMD entries and clear that when we are looking for pmd_pfn.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pgtable.h | 16 +++++++++++++---
arch/powerpc/mm/pgtable.c | 5 ++++-
arch/powerpc/mm/pgtable_64.c | 2 +-
3 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9fbe2a7..9681de4 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -31,7 +31,7 @@ struct mm_struct;
#define PMD_HUGE_SPLITTING 0x008
#define PMD_HUGE_SAO 0x010 /* strong Access order */
#define PMD_HUGE_HASHPTE 0x020
-#define PMD_ISHUGE 0x040
+#define _PMD_ISHUGE 0x040
#define PMD_HUGE_DIRTY 0x080 /* C: page changed */
#define PMD_HUGE_ACCESSED 0x100 /* R: page referenced */
#define PMD_HUGE_RW 0x200 /* software: user write access allowed */
@@ -44,6 +44,14 @@ struct mm_struct;
#define PMD_HUGE_RPN_SHIFT PTE_RPN_SHIFT
#define HUGE_PAGE_SIZE (ASM_CONST(1) << 24)
#define HUGE_PAGE_MASK (~(HUGE_PAGE_SIZE - 1))
+/*
+ * HugeTLB looks at the top bit of the Linux page table entries to
+ * decide whether it is a huge page directory or not. Mark HUGE
+ * PMD to differentiate
+ */
+#define PMD_HUGE_NOT_HUGETLB (ASM_CONST(1) << 63)
+#define PMD_ISHUGE (_PMD_ISHUGE | PMD_HUGE_NOT_HUGETLB)
+#define PMD_HUGE_PROTBITS (0xfff | PMD_HUGE_NOT_HUGETLB)
#ifndef __ASSEMBLY__
extern void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
@@ -70,8 +78,9 @@ static inline int pmd_trans_splitting(pmd_t pmd)
static inline int pmd_trans_huge(pmd_t pmd)
{
- return pmd_val(pmd) & PMD_ISHUGE;
+ return ((pmd_val(pmd) & PMD_ISHUGE) == PMD_ISHUGE);
}
+
/* We will enable it in the last patch */
#define has_transparent_hugepage() 0
#else
@@ -84,7 +93,8 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
/*
* Only called for hugepage pmd
*/
- return pmd_val(pmd) >> PMD_HUGE_RPN_SHIFT;
+ unsigned long val = pmd_val(pmd) & ~PMD_HUGE_PROTBITS;
+ return val >> PMD_HUGE_RPN_SHIFT;
}
static inline int pmd_young(pmd_t pmd)
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9f33780..cf3ca8e 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -517,7 +517,10 @@ static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
{
pmd_t pmd;
-
+ /*
+ * We cannot support that many PFNs
+ */
+ VM_BUG_ON(pfn & PMD_HUGE_NOT_HUGETLB);
pmd_val(pmd) = pfn << PMD_HUGE_RPN_SHIFT;
pmd_val(pmd) |= PMD_ISHUGE;
pmd = pmd_set_protbits(pmd, pgprot);
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 6be1182..01a18d6 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -345,7 +345,7 @@ EXPORT_SYMBOL(__iounmap_at);
struct page *pmd_page(pmd_t pmd)
{
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (pmd_val(pmd) & PMD_ISHUGE)
+ if ((pmd_val(pmd) & PMD_ISHUGE) == PMD_ISHUGE)
return pfn_to_page(pmd_pfn(pmd));
#endif
return virt_to_page(pmd_page_vaddr(pmd));
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 17/25] powerpc/THP: Implement transparent hugepages for ppc64
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We now have pmd entries covering to 16MB range. To implement THP on powerpc,
we double the size of PMD. The second half is used to deposit the pgtable (PTE page).
We also use the depoisted PTE page for tracking the HPTE information. The information
include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
4096 entries. Both will fit in a 4K PTE page.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/page.h | 2 +-
arch/powerpc/include/asm/pgtable-ppc64-64k.h | 3 +-
arch/powerpc/include/asm/pgtable-ppc64.h | 2 +-
arch/powerpc/include/asm/pgtable.h | 240 ++++++++++++++++++++
arch/powerpc/mm/pgtable.c | 314 ++++++++++++++++++++++++++
arch/powerpc/mm/pgtable_64.c | 13 ++
arch/powerpc/platforms/Kconfig.cputype | 1 +
7 files changed, 572 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 38e7ff6..b927447 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -40,7 +40,7 @@
#ifdef CONFIG_HUGETLB_PAGE
extern unsigned int HPAGE_SHIFT;
#else
-#define HPAGE_SHIFT PAGE_SHIFT
+#define HPAGE_SHIFT PMD_SHIFT
#endif
#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT)
#define HPAGE_MASK (~(HPAGE_SIZE - 1))
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
index 3c529b4..5c5541a 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
@@ -33,7 +33,8 @@
#define PGDIR_MASK (~(PGDIR_SIZE-1))
/* Bits to mask out from a PMD to get to the PTE page */
-#define PMD_MASKED_BITS 0x1ff
+/* PMDs point to PTE table fragments which are 4K aligned. */
+#define PMD_MASKED_BITS 0xfff
/* Bits to mask out from a PGD/PUD to get to the PMD page */
#define PUD_MASKED_BITS 0x1ff
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 0182c20..c0747c7 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -150,7 +150,7 @@
#define pmd_present(pmd) (pmd_val(pmd) != 0)
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0)
#define pmd_page_vaddr(pmd) (pmd_val(pmd) & ~PMD_MASKED_BITS)
-#define pmd_page(pmd) virt_to_page(pmd_page_vaddr(pmd))
+extern struct page *pmd_page(pmd_t pmd);
#define pud_set(pudp, pudval) (pud_val(*(pudp)) = (pudval))
#define pud_none(pud) (!pud_val(pud))
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 4b52726..9fbe2a7 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -23,7 +23,247 @@ struct mm_struct;
*/
#define PTE_PAGE_HIDX_OFFSET (PTRS_PER_PTE * 8)
+/* A large part matches with pte bits */
+#define PMD_HUGE_PRESENT 0x001 /* software: pte contains a translation */
+#define PMD_HUGE_USER 0x002 /* matches one of the PP bits */
+#define PMD_HUGE_FILE 0x002 /* (!present only) software: pte holds file offset */
+#define PMD_HUGE_EXEC 0x004 /* No execute on POWER4 and newer (we invert) */
+#define PMD_HUGE_SPLITTING 0x008
+#define PMD_HUGE_SAO 0x010 /* strong Access order */
+#define PMD_HUGE_HASHPTE 0x020
+#define PMD_ISHUGE 0x040
+#define PMD_HUGE_DIRTY 0x080 /* C: page changed */
+#define PMD_HUGE_ACCESSED 0x100 /* R: page referenced */
+#define PMD_HUGE_RW 0x200 /* software: user write access allowed */
+#define PMD_HUGE_BUSY 0x800 /* software: PTE & hash are busy */
+#define PMD_HUGE_HPTEFLAGS (PMD_HUGE_BUSY | PMD_HUGE_HASHPTE)
+/*
+ * We keep both the pmd and pte rpn shift same, eventhough we use only
+ * lower 12 bits for hugepage flags at pmd level
+ */
+#define PMD_HUGE_RPN_SHIFT PTE_RPN_SHIFT
+#define HUGE_PAGE_SIZE (ASM_CONST(1) << 24)
+#define HUGE_PAGE_MASK (~(HUGE_PAGE_SIZE - 1))
+
#ifndef __ASSEMBLY__
+extern void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
+extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
+extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot);
+extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd);
+extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmd);
+static inline int pmd_large(pmd_t pmd)
+{
+ return (pmd_val(pmd) & (PMD_ISHUGE | PMD_HUGE_PRESENT)) ==
+ (PMD_ISHUGE | PMD_HUGE_PRESENT);
+}
+
+static inline int pmd_trans_splitting(pmd_t pmd)
+{
+ return (pmd_val(pmd) & (PMD_ISHUGE|PMD_HUGE_SPLITTING)) ==
+ (PMD_ISHUGE|PMD_HUGE_SPLITTING);
+}
+
+static inline int pmd_trans_huge(pmd_t pmd)
+{
+ return pmd_val(pmd) & PMD_ISHUGE;
+}
+/* We will enable it in the last patch */
+#define has_transparent_hugepage() 0
+#else
+#define pmd_large(pmd) 0
+#define has_transparent_hugepage() 0
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+static inline unsigned long pmd_pfn(pmd_t pmd)
+{
+ /*
+ * Only called for hugepage pmd
+ */
+ return pmd_val(pmd) >> PMD_HUGE_RPN_SHIFT;
+}
+
+static inline int pmd_young(pmd_t pmd)
+{
+ return pmd_val(pmd) & PMD_HUGE_ACCESSED;
+}
+
+static inline pmd_t pmd_mkhuge(pmd_t pmd)
+{
+ /* Do nothing, mk_pmd() does this part. */
+ return pmd;
+}
+
+#define __HAVE_ARCH_PMD_WRITE
+static inline int pmd_write(pmd_t pmd)
+{
+ return pmd_val(pmd) & PMD_HUGE_RW;
+}
+
+static inline pmd_t pmd_mkold(pmd_t pmd)
+{
+ pmd_val(pmd) &= ~PMD_HUGE_ACCESSED;
+ return pmd;
+}
+
+static inline pmd_t pmd_wrprotect(pmd_t pmd)
+{
+ pmd_val(pmd) &= ~PMD_HUGE_RW;
+ return pmd;
+}
+
+static inline pmd_t pmd_mkdirty(pmd_t pmd)
+{
+ pmd_val(pmd) |= PMD_HUGE_DIRTY;
+ return pmd;
+}
+
+static inline pmd_t pmd_mkyoung(pmd_t pmd)
+{
+ pmd_val(pmd) |= PMD_HUGE_ACCESSED;
+ return pmd;
+}
+
+static inline pmd_t pmd_mkwrite(pmd_t pmd)
+{
+ pmd_val(pmd) |= PMD_HUGE_RW;
+ return pmd;
+}
+
+static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+{
+ pmd_val(pmd) &= ~PMD_HUGE_PRESENT;
+ return pmd;
+}
+
+static inline pmd_t pmd_mksplitting(pmd_t pmd)
+{
+ pmd_val(pmd) |= PMD_HUGE_SPLITTING;
+ return pmd;
+}
+
+/*
+ * Set the dirty and/or accessed bits atomically in a linux hugepage PMD, this
+ * function doesn't need to flush the hash entry
+ */
+static inline void __pmdp_set_access_flags(pmd_t *pmdp, pmd_t entry)
+{
+ unsigned long bits = pmd_val(entry) & (PMD_HUGE_DIRTY |
+ PMD_HUGE_ACCESSED |
+ PMD_HUGE_RW | PMD_HUGE_EXEC);
+#ifdef PTE_ATOMIC_UPDATES
+ unsigned long old, tmp;
+
+ __asm__ __volatile__(
+ "1: ldarx %0,0,%4\n\
+ andi. %1,%0,%6\n\
+ bne- 1b \n\
+ or %0,%3,%0\n\
+ stdcx. %0,0,%4\n\
+ bne- 1b"
+ :"=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+ :"r" (bits), "r" (pmdp), "m" (*pmdp), "i" (PMD_HUGE_BUSY)
+ :"cc");
+#else
+ unsigned long old = pmd_val(*pmdp);
+ *pmdp = __pmd(old | bits);
+#endif
+}
+
+#define __HAVE_ARCH_PMD_SAME
+static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
+{
+ return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~PMD_HUGE_HPTEFLAGS) == 0);
+}
+
+#define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
+extern int pmdp_set_access_flags(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp,
+ pmd_t entry, int dirty);
+
+static inline unsigned long pmd_hugepage_update(struct mm_struct *mm,
+ unsigned long addr,
+ pmd_t *pmdp, unsigned long clr)
+{
+#ifdef PTE_ATOMIC_UPDATES
+ unsigned long old, tmp;
+
+ __asm__ __volatile__(
+ "1: ldarx %0,0,%3\n\
+ andi. %1,%0,%6\n\
+ bne- 1b \n\
+ andc %1,%0,%4 \n\
+ stdcx. %1,0,%3 \n\
+ bne- 1b"
+ : "=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+ : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (PMD_HUGE_BUSY)
+ : "cc" );
+#else
+ unsigned long old = pmd_val(*pmdp);
+ *pmdp = __pmd(old & ~clr);
+#endif
+
+#ifdef CONFIG_PPC_STD_MMU_64
+ if (old & PMD_HUGE_HASHPTE)
+ hpte_need_hugepage_flush(mm, addr, pmdp);
+#endif
+ return old;
+}
+
+static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmdp)
+{
+ unsigned long old;
+
+ if ((pmd_val(*pmdp) & (PMD_HUGE_ACCESSED | PMD_HUGE_HASHPTE)) == 0)
+ return 0;
+ old = pmd_hugepage_update(mm, addr, pmdp, PMD_HUGE_ACCESSED);
+ return ((old & PMD_HUGE_ACCESSED) != 0);
+}
+
+#define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
+extern int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+#define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
+extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PMDP_GET_AND_CLEAR
+static inline pmd_t pmdp_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmdp)
+{
+ unsigned long old = pmd_hugepage_update(mm, addr, pmdp, ~0UL);
+ return __pmd(old);
+}
+
+#define __HAVE_ARCH_PMDP_SET_WRPROTECT
+static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+
+ if ((pmd_val(*pmdp) & PMD_HUGE_RW) == 0)
+ return;
+
+ pmd_hugepage_update(mm, addr, pmdp, PMD_HUGE_RW);
+}
+
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+extern void pmdp_splitting_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PGTABLE_DEPOSIT
+extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable);
+#define __HAVE_ARCH_PGTABLE_WITHDRAW
+extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PMDP_INVALIDATE
+extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp);
#include <asm/tlbflush.h>
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 214130a..9f33780 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -31,6 +31,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
+#include <asm/machdep.h>
#include "mmu_decl.h"
@@ -240,3 +241,316 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
}
#endif /* CONFIG_DEBUG_VM */
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static pmd_t set_hugepage_access_flags_filter(pmd_t pmd,
+ struct vm_area_struct *vma,
+ int dirty)
+{
+ return pmd;
+}
+
+/*
+ * This is called when relaxing access to a hugepage. It's also called in the page
+ * fault path when we don't hit any of the major fault cases, ie, a minor
+ * update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have
+ * handled those two for us, we additionally deal with missing execute
+ * permission here on some processors
+ */
+int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp, pmd_t entry, int dirty)
+{
+ int changed;
+ entry = set_hugepage_access_flags_filter(entry, vma, dirty);
+ changed = !pmd_same(*(pmdp), entry);
+ if (changed) {
+ __pmdp_set_access_flags(pmdp, entry);
+ /*
+ * Since we are not supporting SW TLB systems, we don't
+ * have any thing similar to flush_tlb_page_nohash()
+ */
+ }
+ return changed;
+}
+
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
+}
+
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty. The generic routines only flush if the
+ * entry was young or dirty which is not good enough.
+ *
+ * We should be more intelligent about this but for the moment we override
+ * these functions and force a tlb flush unconditionally
+ */
+int pmdp_clear_flush_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
+}
+
+/*
+ * We mark the pmd splitting and invalidate all the hpte
+ * entries for this hugepage.
+ */
+void pmdp_splitting_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ unsigned long old, tmp;
+
+ VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+#ifdef PTE_ATOMIC_UPDATES
+
+ __asm__ __volatile__(
+ "1: ldarx %0,0,%3\n\
+ andi. %1,%0,%6\n\
+ bne- 1b \n\
+ ori %1,%0,%4 \n\
+ stdcx. %1,0,%3 \n\
+ bne- 1b"
+ : "=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+ : "r" (pmdp), "i" (PMD_HUGE_SPLITTING), "m" (*pmdp), "i" (PMD_HUGE_BUSY)
+ : "cc" );
+#else
+ old = pmd_val(*pmdp);
+ *pmdp = __pmd(old | PMD_HUGE_SPLITTING);
+#endif
+ /*
+ * If we didn't had the splitting flag set, go and flush the
+ * HPTE entries and serialize against gup fast.
+ */
+ if (!(old & PMD_HUGE_SPLITTING)) {
+#ifdef CONFIG_PPC_STD_MMU_64
+ /* We need to flush the hpte */
+ if (old & PMD_HUGE_HASHPTE)
+ hpte_need_hugepage_flush(vma->vm_mm, address, pmdp);
+#endif
+ /* need tlb flush only to serialize against gup-fast */
+ flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+ }
+}
+
+/*
+ * We want to put the pgtable in pmd and use pgtable for tracking
+ * the base page size hptes
+ */
+void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable)
+{
+ unsigned long *pgtable_slot;
+ assert_spin_locked(&mm->page_table_lock);
+ /*
+ * we store the pgtable in the second half of PMD
+ */
+ pgtable_slot = pmdp + PTRS_PER_PMD;
+ *pgtable_slot = (unsigned long)pgtable;
+}
+
+#define PTE_FRAG_SIZE (2 * PTRS_PER_PTE * sizeof(pte_t))
+pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
+{
+ pgtable_t pgtable;
+ unsigned long *pgtable_slot;
+
+ assert_spin_locked(&mm->page_table_lock);
+ pgtable_slot = pmdp + PTRS_PER_PMD;
+ pgtable = (pgtable_t) *pgtable_slot;
+ /*
+ * We store HPTE information in the deposited PTE fragment.
+ * zero out the content on withdraw.
+ */
+ memset(pgtable, 0, PTE_FRAG_SIZE);
+ return pgtable;
+}
+
+/*
+ * Since we are looking at latest ppc64, we don't need to worry about
+ * i/d cache coherency on exec fault
+ */
+static pmd_t set_pmd_filter(pmd_t pmd, unsigned long addr)
+{
+ pmd = __pmd(pmd_val(pmd) & ~PMD_HUGE_HPTEFLAGS);
+ return pmd;
+}
+
+/*
+ * We can make it less convoluted than __set_pte_at, because
+ * we can ignore lot of hardware here, because this is only for
+ * MPSS
+ */
+static inline void __set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd, int percpu)
+{
+ /*
+ * There is nothing in hash page table now, so nothing to
+ * invalidate, set_pte_at is used for adding new entry.
+ * For updating we should use update_hugepage_pmd()
+ */
+ *pmdp = pmd;
+}
+
+/*
+ * set a new huge pmd. We should not be called for updating
+ * an existing pmd entry. That should go via pmd_hugepage_update.
+ */
+void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd)
+{
+ /*
+ * Note: mm->context.id might not yet have been assigned as
+ * this context might not have been activated yet when this
+ * is called.
+ */
+ pmd = set_pmd_filter(pmd, addr);
+
+ __set_pmd_at(mm, addr, pmdp, pmd, 0);
+
+}
+
+void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ pmd_hugepage_update(vma->vm_mm, address, pmdp, PMD_HUGE_PRESENT);
+ flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+}
+
+/*
+ * A linux hugepage PMD was changed and the corresponding hash table entry
+ * neesd to be flushed.
+ *
+ * The linux hugepage PMD now include the pmd entries followed by the address
+ * to the stashed pgtable_t. The stashed pgtable_t contains the hpte bits.
+ * [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
+ * With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
+ * 4096 entries. Both will fit in a 4K pgtable_t.
+ */
+void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+ int ssize, i;
+ unsigned long s_addr;
+ unsigned int psize, valid;
+ unsigned char *hpte_slot_array;
+ unsigned long hidx, vpn, vsid, hash, shift, slot;
+
+ /*
+ * Flush all the hptes mapping this hugepage
+ */
+ s_addr = addr & HUGE_PAGE_MASK;
+ /*
+ * The hpte hindex are stored in the pgtable whose address is in the
+ * second half of the PMD
+ */
+ hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+
+ /* get the base page size */
+ psize = get_slice_psize(mm, s_addr);
+ shift = mmu_psize_defs[psize].shift;
+
+ for (i = 0; i < HUGE_PAGE_SIZE/(1ul << shift); i++) {
+ /*
+ * 8 bits per each hpte entries
+ * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
+ */
+ valid = hpte_slot_array[i] & 0x1;
+ if (!valid)
+ continue;
+ hidx = hpte_slot_array[i] >> 1;
+
+ /* get the vpn */
+ addr = s_addr + (i * (1ul << shift));
+ if (!is_kernel_addr(addr)) {
+ ssize = user_segment_size(addr);
+ vsid = get_vsid(mm->context.id, addr, ssize);
+ WARN_ON(vsid == 0);
+ } else {
+ vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
+ ssize = mmu_kernel_ssize;
+ }
+
+ vpn = hpt_vpn(addr, vsid, ssize);
+ hash = hpt_hash(vpn, shift, ssize);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ ppc_md.hpte_invalidate(slot, vpn, psize, ssize, 0);
+ }
+}
+
+static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
+{
+ unsigned long pmd_prot = 0;
+ unsigned long prot = pgprot_val(pgprot);
+
+ if (prot & _PAGE_PRESENT)
+ pmd_prot |= PMD_HUGE_PRESENT;
+ if (prot & _PAGE_USER)
+ pmd_prot |= PMD_HUGE_USER;
+ if (prot & _PAGE_FILE)
+ pmd_prot |= PMD_HUGE_FILE;
+ if (prot & _PAGE_EXEC)
+ pmd_prot |= PMD_HUGE_EXEC;
+ /*
+ * _PAGE_COHERENT should always be set
+ */
+ VM_BUG_ON(!(prot & _PAGE_COHERENT));
+
+ if (prot & _PAGE_SAO)
+ pmd_prot |= PMD_HUGE_SAO;
+ if (prot & _PAGE_DIRTY)
+ pmd_prot |= PMD_HUGE_DIRTY;
+ if (prot & _PAGE_ACCESSED)
+ pmd_prot |= PMD_HUGE_ACCESSED;
+ if (prot & _PAGE_RW)
+ pmd_prot |= PMD_HUGE_RW;
+
+ pmd_val(pmd) |= pmd_prot;
+ return pmd;
+}
+
+pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
+{
+ pmd_t pmd;
+
+ pmd_val(pmd) = pfn << PMD_HUGE_RPN_SHIFT;
+ pmd_val(pmd) |= PMD_ISHUGE;
+ pmd = pmd_set_protbits(pmd, pgprot);
+ return pmd;
+}
+
+pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
+{
+ return pfn_pmd(page_to_pfn(page), pgprot);
+}
+
+pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
+{
+ /* FIXME!! why are this bits cleared ? */
+ pmd_val(pmd) &= ~(PMD_HUGE_PRESENT |
+ PMD_HUGE_RW |
+ PMD_HUGE_EXEC);
+ pmd = pmd_set_protbits(pmd, newprot);
+ return pmd;
+}
+
+/*
+ * This is called at the end of handling a user page fault, when the
+ * fault has been handled by updating a HUGE PMD entry in the linux page tables.
+ * We use it to preload an HPTE into the hash table corresponding to
+ * the updated linux HUGE PMD entry.
+ */
+void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmd)
+{
+ /* FIXME!!
+ * Will be done in a later patch
+ */
+}
+
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 66fad08..6be1182 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -338,6 +338,19 @@ EXPORT_SYMBOL(iounmap);
EXPORT_SYMBOL(__iounmap);
EXPORT_SYMBOL(__iounmap_at);
+/*
+ * For hugepage we have pfn in the pmd, we use PMD_HUGE_RPN_SHIFT bits for flags
+ * For PTE page, we have a PTE_FRAG_SIZE (4K) aligned virtual address.
+ */
+struct page *pmd_page(pmd_t pmd)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if (pmd_val(pmd) & PMD_ISHUGE)
+ return pfn_to_page(pmd_pfn(pmd));
+#endif
+ return virt_to_page(pmd_page_vaddr(pmd));
+}
+
#ifdef CONFIG_PPC_64K_PAGES
/*
* we support 16 fragments per PTE page. This is limited by how many
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 72afd28..90ee19b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -71,6 +71,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select SYS_SUPPORTS_HUGETLBFS
+ select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
config PPC_BOOK3E_64
bool "Embedded processors"
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 12/25] powerpc: print both base and actual page size on hash failure
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mmu-hash64.h | 3 ++-
arch/powerpc/mm/hash_utils_64.c | 12 +++++++-----
arch/powerpc/mm/hugetlbpage-hash64.c | 2 +-
3 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index e42f4a3..e187254 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -324,7 +324,8 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
unsigned int shift, unsigned int mmu_psize);
extern void hash_failure_debug(unsigned long ea, unsigned long access,
unsigned long vsid, unsigned long trap,
- int ssize, int psize, unsigned long pte);
+ int ssize, int psize, int lpsize,
+ unsigned long pte);
extern int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
unsigned long pstart, unsigned long prot,
int psize, int ssize);
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index a5a5067..56ff4bb 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -933,14 +933,14 @@ static inline int subpage_protection(struct mm_struct *mm, unsigned long ea)
void hash_failure_debug(unsigned long ea, unsigned long access,
unsigned long vsid, unsigned long trap,
- int ssize, int psize, unsigned long pte)
+ int ssize, int psize, int lpsize, unsigned long pte)
{
if (!printk_ratelimit())
return;
pr_info("mm: Hashing failure ! EA=0x%lx access=0x%lx current=%s\n",
ea, access, current->comm);
- pr_info(" trap=0x%lx vsid=0x%lx ssize=%d psize=%d pte=0x%lx\n",
- trap, vsid, ssize, psize, pte);
+ pr_info(" trap=0x%lx vsid=0x%lx ssize=%d base psize=%d psize %d pte=0x%lx\n",
+ trap, vsid, ssize, psize, lpsize, pte);
}
/* Result code is:
@@ -1113,7 +1113,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
*/
if (rc == -1)
hash_failure_debug(ea, access, vsid, trap, ssize, psize,
- pte_val(*ptep));
+ psize, pte_val(*ptep));
#ifndef CONFIG_PPC_64K_PAGES
DBG_LOW(" o-pte: %016lx\n", pte_val(*ptep));
#else
@@ -1191,7 +1191,9 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
*/
if (rc == -1)
hash_failure_debug(ea, access, vsid, trap, ssize,
- mm->context.user_psize, pte_val(*ptep));
+ mm->context.user_psize,
+ mm->context.user_psize,
+ pte_val(*ptep));
local_irq_restore(flags);
}
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index e0d52ee..06ecb55 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -129,7 +129,7 @@ repeat:
if (unlikely(slot == -2)) {
*ptep = __pte(old_pte);
hash_failure_debug(ea, access, vsid, trap, ssize,
- mmu_psize, old_pte);
+ mmu_psize, mmu_psize, old_pte);
return -1;
}
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 14/25] mm/THP: HPAGE_SHIFT is not a #define on some arch
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: Andrea Arcangeli, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
and other derived values like HPAGE_PMD_ORDER are not constants. So move
that to hugepage_init
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/huge_mm.h | 3 ---
mm/huge_memory.c | 9 ++++++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1d76f8c..0022b70 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -119,9 +119,6 @@ extern void __split_huge_page_pmd(struct vm_area_struct *vma,
} while (0)
extern void split_huge_page_pmd_mm(struct mm_struct *mm, unsigned long address,
pmd_t *pmd);
-#if HPAGE_PMD_ORDER > MAX_ORDER
-#error "hugepages can't be allocated by the buddy allocator"
-#endif
extern int hugepage_madvise(struct vm_area_struct *vma,
unsigned long *vm_flags, int advice);
extern void __vma_adjust_trans_huge(struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b5783d8..1940ee0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -44,7 +44,7 @@ unsigned long transparent_hugepage_flags __read_mostly =
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
/* default scan 8*512 pte (or vmas) every 30 second */
-static unsigned int khugepaged_pages_to_scan __read_mostly = HPAGE_PMD_NR*8;
+static unsigned int khugepaged_pages_to_scan __read_mostly;
static unsigned int khugepaged_pages_collapsed;
static unsigned int khugepaged_full_scans;
static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
@@ -59,7 +59,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
* it would have happened if the vma was large enough during page
* fault.
*/
-static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
+static unsigned int khugepaged_max_ptes_none __read_mostly;
static int khugepaged(void *none);
static int mm_slots_hash_init(void);
@@ -621,11 +621,14 @@ static int __init hugepage_init(void)
int err;
struct kobject *hugepage_kobj;
- if (!has_transparent_hugepage()) {
+ if (!has_transparent_hugepage() || (HPAGE_PMD_ORDER > MAX_ORDER)) {
transparent_hugepage_flags = 0;
return -EINVAL;
}
+ khugepaged_pages_to_scan = HPAGE_PMD_NR*8;
+ khugepaged_max_ptes_none = HPAGE_PMD_NR-1;
+
err = hugepage_init_sysfs(&hugepage_kobj);
if (err)
return err;
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 01/25] powerpc: Use signed formatting when printing error
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
PAPR defines these errors as negative values. So print them accordingly
for easy debugging.
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/lpar.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 0da39fe..a77c35b 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -155,7 +155,7 @@ static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
*/
if (unlikely(lpar_rc != H_SUCCESS)) {
if (!(vflags & HPTE_V_BOLTED))
- pr_devel(" lpar err %lu\n", lpar_rc);
+ pr_devel(" lpar err %ld\n", lpar_rc);
return -2;
}
if (!(vflags & HPTE_V_BOLTED))
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 22/25] powerpc/THP: get_user_pages_fast changes
From: Aneesh Kumar K.V @ 2013-03-15 9:40 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
handle large pages for get_user_pages_fast. Also take care of large page splitting.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/gup.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 82 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/gup.c b/arch/powerpc/mm/gup.c
index d7efdbf..835c1ae 100644
--- a/arch/powerpc/mm/gup.c
+++ b/arch/powerpc/mm/gup.c
@@ -55,6 +55,72 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
return 1;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline int gup_huge_pmd(pmd_t *pmdp, unsigned long addr,
+ unsigned long end, int write,
+ struct page **pages, int *nr)
+{
+ int refs;
+ pmd_t pmd;
+ unsigned long mask;
+ struct page *head, *page, *tail;
+
+ pmd = *pmdp;
+ mask = PMD_HUGE_PRESENT | PMD_HUGE_USER;
+ if (write)
+ mask |= PMD_HUGE_RW;
+
+ if ((pmd_val(pmd) & mask) != mask)
+ return 0;
+
+ /* large pages are never "special" */
+ VM_BUG_ON(!pfn_valid(pmd_pfn(pmd)));
+
+ refs = 0;
+ head = pmd_page(pmd);
+ page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ tail = page;
+ do {
+ VM_BUG_ON(compound_head(page) != head);
+ pages[*nr] = page;
+ (*nr)++;
+ page++;
+ refs++;
+ } while (addr += PAGE_SIZE, addr != end);
+
+ if (!page_cache_add_speculative(head, refs)) {
+ *nr -= refs;
+ return 0;
+ }
+
+ if (unlikely(pmd_val(pmd) != pmd_val(*pmdp))) {
+ *nr -= refs;
+ while (refs--)
+ put_page(head);
+ return 0;
+ }
+ /*
+ * Any tail page need their mapcount reference taken before we
+ * return.
+ */
+ while (refs--) {
+ if (PageTail(tail))
+ get_huge_page_tail(tail);
+ tail++;
+ }
+
+ return 1;
+}
+#else
+
+static inline int gup_huge_pmd(pmd_t *pmdp, unsigned long addr,
+ unsigned long end, int write,
+ struct page **pages, int *nr)
+{
+ return 1;
+}
+#endif
+
static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
int write, struct page **pages, int *nr)
{
@@ -66,9 +132,23 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
pmd_t pmd = *pmdp;
next = pmd_addr_end(addr, end);
- if (pmd_none(pmd))
+ /*
+ * The pmd_trans_splitting() check below explains why
+ * pmdp_splitting_flush has to flush the tlb, to stop
+ * this gup-fast code from running while we set the
+ * splitting bit in the pmd. Returning zero will take
+ * the slow path that will call wait_split_huge_page()
+ * if the pmd is still in splitting state. gup-fast
+ * can't because it has irq disabled and
+ * wait_split_huge_page() would never return as the
+ * tlb flush IPI wouldn't run.
+ */
+ if (pmd_none(pmd) || pmd_trans_splitting(pmd))
return 0;
- if (is_hugepd(pmdp)) {
+ if (unlikely(pmd_large(pmd))) {
+ if (!gup_huge_pmd(pmdp, addr, next, write, pages, nr))
+ return 0;
+ } else if (is_hugepd(pmdp)) {
if (!gup_hugepd((hugepd_t *)pmdp, PMD_SHIFT,
addr, next, write, pages, nr))
return 0;
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 23/25] powerpc/THP: Enable THP on PPC64
From: Aneesh Kumar K.V @ 2013-03-15 9:40 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We enable only if the we support 16MB page size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pgtable.h | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9681de4..5617dee 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -81,8 +81,35 @@ static inline int pmd_trans_huge(pmd_t pmd)
return ((pmd_val(pmd) & PMD_ISHUGE) == PMD_ISHUGE);
}
-/* We will enable it in the last patch */
-#define has_transparent_hugepage() 0
+static inline int has_transparent_hugepage(void)
+{
+ if (!mmu_has_feature(MMU_FTR_16M_PAGE))
+ return 0;
+ /*
+ * We support THP only if HPAGE_SHIFT is 16MB.
+ */
+ if (!HPAGE_SHIFT || (HPAGE_SHIFT != mmu_psize_defs[MMU_PAGE_16M].shift))
+ return 0;
+ /*
+ * We need to make sure that we support 16MB hugepage in a segement
+ * with base page size 64K or 4K. We only enable THP with a PAGE_SIZE
+ * of 64K.
+ */
+ /*
+ * If we have 64K HPTE, we will be using that by default
+ */
+ if (mmu_psize_defs[MMU_PAGE_64K].shift &&
+ (mmu_psize_defs[MMU_PAGE_64K].penc[MMU_PAGE_16M] == -1))
+ return 0;
+ /*
+ * Ok we only have 4K HPTE
+ */
+ if (mmu_psize_defs[MMU_PAGE_4K].penc[MMU_PAGE_16M] == -1)
+ return 0;
+
+ return 1;
+}
+
#else
#define pmd_large(pmd) 0
#define has_transparent_hugepage() 0
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 15/25] mm/THP: Add pmd args to pgtable deposit and withdraw APIs
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: Andrea Arcangeli, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This will be later used by powerpc THP support. In powerpc we want to use
pgtable for storing the hash index values. So instead of adding them to
mm_context list, we would like to store them in the second half of pmd
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/s390/include/asm/pgtable.h | 5 +++--
arch/s390/mm/pgtable.c | 5 +++--
arch/sparc/include/asm/pgtable_64.h | 5 +++--
arch/sparc/mm/tlb.c | 5 +++--
include/asm-generic/pgtable.h | 5 +++--
mm/huge_memory.c | 18 +++++++++---------
mm/pgtable-generic.c | 5 +++--
7 files changed, 27 insertions(+), 21 deletions(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 098adbb..883296e 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1232,10 +1232,11 @@ static inline void __pmd_idte(unsigned long address, pmd_t *pmdp)
#define SEGMENT_RW __pgprot(_HPAGE_TYPE_RW)
#define __HAVE_ARCH_PGTABLE_DEPOSIT
-extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable);
+extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable);
#define __HAVE_ARCH_PGTABLE_WITHDRAW
-extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm);
+extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
static inline int pmd_trans_splitting(pmd_t pmd)
{
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ae44d2a..9ab3224 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -920,7 +920,8 @@ void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
}
}
-void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable)
+void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable)
{
struct list_head *lh = (struct list_head *) pgtable;
@@ -934,7 +935,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable)
mm->pmd_huge_pte = pgtable;
}
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm)
+pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
{
struct list_head *lh;
pgtable_t pgtable;
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 08fcce9..4c86de2 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -853,10 +853,11 @@ extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd);
#define __HAVE_ARCH_PGTABLE_DEPOSIT
-extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable);
+extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable);
#define __HAVE_ARCH_PGTABLE_WITHDRAW
-extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm);
+extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
#endif
/* Encode and de-code a swap entry */
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 3e8fec3..79922f4 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -150,7 +150,8 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
}
}
-void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable)
+void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable)
{
struct list_head *lh = (struct list_head *) pgtable;
@@ -164,7 +165,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable)
mm->pmd_huge_pte = pgtable;
}
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm)
+pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
{
struct list_head *lh;
pgtable_t pgtable;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 5cf680a..6f87e9e 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -163,11 +163,12 @@ extern void pmdp_splitting_flush(struct vm_area_struct *vma,
#endif
#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT
-extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable);
+extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable);
#endif
#ifndef __HAVE_ARCH_PGTABLE_WITHDRAW
-extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm);
+extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
#endif
#ifndef __HAVE_ARCH_PMDP_INVALIDATE
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1940ee0..e91b763 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -742,7 +742,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
*/
page_add_new_anon_rmap(page, vma, haddr);
set_pmd_at(mm, haddr, pmd, entry);
- pgtable_trans_huge_deposit(mm, pgtable);
+ pgtable_trans_huge_deposit(mm, pmd, pgtable);
add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
mm->nr_ptes++;
spin_unlock(&mm->page_table_lock);
@@ -784,7 +784,7 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
entry = pmd_wrprotect(entry);
entry = pmd_mkhuge(entry);
set_pmd_at(mm, haddr, pmd, entry);
- pgtable_trans_huge_deposit(mm, pgtable);
+ pgtable_trans_huge_deposit(mm, pmd, pgtable);
mm->nr_ptes++;
return true;
}
@@ -929,7 +929,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmdp_set_wrprotect(src_mm, addr, src_pmd);
pmd = pmd_mkold(pmd_wrprotect(pmd));
set_pmd_at(dst_mm, addr, dst_pmd, pmd);
- pgtable_trans_huge_deposit(dst_mm, pgtable);
+ pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
dst_mm->nr_ptes++;
ret = 0;
@@ -999,7 +999,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
pmdp_clear_flush(vma, haddr, pmd);
/* leave pmd empty until pte is filled */
- pgtable = pgtable_trans_huge_withdraw(mm);
+ pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
@@ -1094,10 +1094,10 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
goto out_free_pages;
VM_BUG_ON(!PageHead(page));
+ pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmdp_clear_flush(vma, haddr, pmd);
/* leave pmd empty until pte is filled */
- pgtable = pgtable_trans_huge_withdraw(mm);
pmd_populate(mm, &_pmd, pgtable);
for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
@@ -1380,7 +1380,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
struct page *page;
pgtable_t pgtable;
pmd_t orig_pmd;
- pgtable = pgtable_trans_huge_withdraw(tlb->mm);
+ pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
orig_pmd = pmdp_get_and_clear(tlb->mm, addr, pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
if (is_huge_zero_pmd(orig_pmd)) {
@@ -1712,7 +1712,7 @@ static int __split_huge_page_map(struct page *page,
pmd = page_check_address_pmd(page, mm, address,
PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
if (pmd) {
- pgtable = pgtable_trans_huge_withdraw(mm);
+ pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
haddr = address;
@@ -2400,7 +2400,7 @@ static void collapse_huge_page(struct mm_struct *mm,
page_add_new_anon_rmap(new_page, vma, address);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
- pgtable_trans_huge_deposit(mm, pgtable);
+ pgtable_trans_huge_deposit(mm, pmd, pgtable);
spin_unlock(&mm->page_table_lock);
*hpage = NULL;
@@ -2706,7 +2706,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
pmdp_clear_flush(vma, haddr, pmd);
/* leave pmd empty until pte is filled */
- pgtable = pgtable_trans_huge_withdraw(mm);
+ pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 0c8323f..e1a6e4f 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -124,7 +124,8 @@ void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable)
+void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable)
{
assert_spin_locked(&mm->page_table_lock);
@@ -141,7 +142,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable)
#ifndef __HAVE_ARCH_PGTABLE_WITHDRAW
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/* no "address" argument so destroys page coloring of some arch */
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm)
+pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
{
pgtable_t pgtable;
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 08/25] powerpc: Decode the pte-lp-encoding bits correctly.
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We look at both the segment base page size and actual page size and store
the pte-lp-encodings in an array per base page size.
We also update all relevant functions to take actual page size argument
so that we can use the correct PTE LP encoding in HPTE. This should also
get the basic Multiple Page Size per Segment (MPSS) support. This is needed
to enable THP on ppc64.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/machdep.h | 3 +-
arch/powerpc/include/asm/mmu-hash64.h | 33 +++++----
arch/powerpc/kvm/book3s_hv.c | 2 +-
arch/powerpc/mm/hash_low_64.S | 18 +++--
arch/powerpc/mm/hash_native_64.c | 119 +++++++++++++++++++++---------
arch/powerpc/mm/hash_utils_64.c | 121 ++++++++++++++++++++-----------
arch/powerpc/mm/hugetlbpage-hash64.c | 4 +-
arch/powerpc/platforms/cell/beat_htab.c | 16 ++--
arch/powerpc/platforms/ps3/htab.c | 6 +-
arch/powerpc/platforms/pseries/lpar.c | 6 +-
10 files changed, 214 insertions(+), 114 deletions(-)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 19d9d96..6cee6e0 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -50,7 +50,8 @@ struct machdep_calls {
unsigned long prpn,
unsigned long rflags,
unsigned long vflags,
- int psize, int ssize);
+ int psize, int apsize,
+ int ssize);
long (*hpte_remove)(unsigned long hpte_group);
void (*hpte_removebolted)(unsigned long ea,
int psize, int ssize);
diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 300ac3c..e42f4a3 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -154,7 +154,7 @@ extern unsigned long htab_hash_mask;
struct mmu_psize_def
{
unsigned int shift; /* number of bits */
- unsigned int penc; /* HPTE encoding */
+ int penc[MMU_PAGE_COUNT]; /* HPTE encoding */
unsigned int tlbiel; /* tlbiel supported for that page size */
unsigned long avpnm; /* bits to mask out in AVPN in the HPTE */
unsigned long sllp; /* SLB L||LP (exact mask to use in slbmte) */
@@ -181,6 +181,13 @@ struct mmu_psize_def
*/
#define VPN_SHIFT 12
+/*
+ * HPTE Large Page (LP) details
+ */
+#define LP_SHIFT 12
+#define LP_BITS 8
+#define LP_MASK(i) ((0xFF >> (i)) << LP_SHIFT)
+
#ifndef __ASSEMBLY__
static inline int segment_shift(int ssize)
@@ -237,14 +244,14 @@ static inline unsigned long hpte_encode_avpn(unsigned long vpn, int psize,
/*
* This function sets the AVPN and L fields of the HPTE appropriately
- * for the page size
+ * using the base page size and actual page size.
*/
-static inline unsigned long hpte_encode_v(unsigned long vpn,
- int psize, int ssize)
+static inline unsigned long hpte_encode_v(unsigned long vpn, int base_psize,
+ int actual_psize, int ssize)
{
unsigned long v;
- v = hpte_encode_avpn(vpn, psize, ssize);
- if (psize != MMU_PAGE_4K)
+ v = hpte_encode_avpn(vpn, base_psize, ssize);
+ if (actual_psize != MMU_PAGE_4K)
v |= HPTE_V_LARGE;
return v;
}
@@ -254,19 +261,17 @@ static inline unsigned long hpte_encode_v(unsigned long vpn,
* for the page size. We assume the pa is already "clean" that is properly
* aligned for the requested page size
*/
-static inline unsigned long hpte_encode_r(unsigned long pa, int psize)
+static inline unsigned long hpte_encode_r(unsigned long pa, int base_psize,
+ int actual_psize)
{
- unsigned long r;
-
/* A 4K page needs no special encoding */
- if (psize == MMU_PAGE_4K)
+ if (actual_psize == MMU_PAGE_4K)
return pa & HPTE_R_RPN;
else {
- unsigned int penc = mmu_psize_defs[psize].penc;
- unsigned int shift = mmu_psize_defs[psize].shift;
- return (pa & ~((1ul << shift) - 1)) | (penc << 12);
+ unsigned int penc = mmu_psize_defs[base_psize].penc[actual_psize];
+ unsigned int shift = mmu_psize_defs[actual_psize].shift;
+ return (pa & ~((1ul << shift) - 1)) | (penc << LP_SHIFT);
}
- return r;
}
/*
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 71d0c90..48f6d99 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1515,7 +1515,7 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
(*sps)->page_shift = def->shift;
(*sps)->slb_enc = def->sllp;
(*sps)->enc[0].page_shift = def->shift;
- (*sps)->enc[0].pte_enc = def->penc;
+ (*sps)->enc[0].pte_enc = def->penc[linux_psize];
(*sps)++;
}
diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S
index abdd5e2..0e980ac 100644
--- a/arch/powerpc/mm/hash_low_64.S
+++ b/arch/powerpc/mm/hash_low_64.S
@@ -196,7 +196,8 @@ htab_insert_pte:
mr r4,r29 /* Retrieve vpn */
li r7,0 /* !bolted, !secondary */
li r8,MMU_PAGE_4K /* page size */
- ld r9,STK_PARAM(R9)(r1) /* segment size */
+ li r9,MMU_PAGE_4K /* actual page size */
+ ld r10,STK_PARAM(R9)(r1) /* segment size */
_GLOBAL(htab_call_hpte_insert1)
bl . /* Patched by htab_finish_init() */
cmpdi 0,r3,0
@@ -219,7 +220,8 @@ _GLOBAL(htab_call_hpte_insert1)
mr r4,r29 /* Retrieve vpn */
li r7,HPTE_V_SECONDARY /* !bolted, secondary */
li r8,MMU_PAGE_4K /* page size */
- ld r9,STK_PARAM(R9)(r1) /* segment size */
+ li r9,MMU_PAGE_4K /* actual page size */
+ ld r10,STK_PARAM(R9)(r1) /* segment size */
_GLOBAL(htab_call_hpte_insert2)
bl . /* Patched by htab_finish_init() */
cmpdi 0,r3,0
@@ -515,7 +517,8 @@ htab_special_pfn:
mr r4,r29 /* Retrieve vpn */
li r7,0 /* !bolted, !secondary */
li r8,MMU_PAGE_4K /* page size */
- ld r9,STK_PARAM(R9)(r1) /* segment size */
+ li r9,MMU_PAGE_4K /* actual page size */
+ ld r10,STK_PARAM(R9)(r1) /* segment size */
_GLOBAL(htab_call_hpte_insert1)
bl . /* patched by htab_finish_init() */
cmpdi 0,r3,0
@@ -542,7 +545,8 @@ _GLOBAL(htab_call_hpte_insert1)
mr r4,r29 /* Retrieve vpn */
li r7,HPTE_V_SECONDARY /* !bolted, secondary */
li r8,MMU_PAGE_4K /* page size */
- ld r9,STK_PARAM(R9)(r1) /* segment size */
+ li r9,MMU_PAGE_4K /* actual page size */
+ ld r10,STK_PARAM(R9)(r1) /* segment size */
_GLOBAL(htab_call_hpte_insert2)
bl . /* patched by htab_finish_init() */
cmpdi 0,r3,0
@@ -840,7 +844,8 @@ ht64_insert_pte:
mr r4,r29 /* Retrieve vpn */
li r7,0 /* !bolted, !secondary */
li r8,MMU_PAGE_64K
- ld r9,STK_PARAM(R9)(r1) /* segment size */
+ li r9,MMU_PAGE_64K /* actual page size */
+ ld r10,STK_PARAM(R9)(r1) /* segment size */
_GLOBAL(ht64_call_hpte_insert1)
bl . /* patched by htab_finish_init() */
cmpdi 0,r3,0
@@ -863,7 +868,8 @@ _GLOBAL(ht64_call_hpte_insert1)
mr r4,r29 /* Retrieve vpn */
li r7,HPTE_V_SECONDARY /* !bolted, secondary */
li r8,MMU_PAGE_64K
- ld r9,STK_PARAM(R9)(r1) /* segment size */
+ li r9,MMU_PAGE_64K /* actual page size */
+ ld r10,STK_PARAM(R9)(r1) /* segment size */
_GLOBAL(ht64_call_hpte_insert2)
bl . /* patched by htab_finish_init() */
cmpdi 0,r3,0
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 9d8983a..910b50d 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -39,7 +39,7 @@
DEFINE_RAW_SPINLOCK(native_tlbie_lock);
-static inline void __tlbie(unsigned long vpn, int psize, int ssize)
+static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
{
unsigned long va;
unsigned int penc;
@@ -68,7 +68,7 @@ static inline void __tlbie(unsigned long vpn, int psize, int ssize)
break;
default:
/* We need 14 to 14 + i bits of va */
- penc = mmu_psize_defs[psize].penc;
+ penc = mmu_psize_defs[psize].penc[apsize];
va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
va |= penc << 12;
va |= ssize << 8;
@@ -80,7 +80,7 @@ static inline void __tlbie(unsigned long vpn, int psize, int ssize)
}
}
-static inline void __tlbiel(unsigned long vpn, int psize, int ssize)
+static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
{
unsigned long va;
unsigned int penc;
@@ -102,7 +102,7 @@ static inline void __tlbiel(unsigned long vpn, int psize, int ssize)
break;
default:
/* We need 14 to 14 + i bits of va */
- penc = mmu_psize_defs[psize].penc;
+ penc = mmu_psize_defs[psize].penc[apsize];
va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
va |= penc << 12;
va |= ssize << 8;
@@ -114,7 +114,8 @@ static inline void __tlbiel(unsigned long vpn, int psize, int ssize)
}
-static inline void tlbie(unsigned long vpn, int psize, int ssize, int local)
+static inline void tlbie(unsigned long vpn, int psize, int apsize,
+ int ssize, int local)
{
unsigned int use_local = local && mmu_has_feature(MMU_FTR_TLBIEL);
int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
@@ -125,10 +126,10 @@ static inline void tlbie(unsigned long vpn, int psize, int ssize, int local)
raw_spin_lock(&native_tlbie_lock);
asm volatile("ptesync": : :"memory");
if (use_local) {
- __tlbiel(vpn, psize, ssize);
+ __tlbiel(vpn, psize, apsize, ssize);
asm volatile("ptesync": : :"memory");
} else {
- __tlbie(vpn, psize, ssize);
+ __tlbie(vpn, psize, apsize, ssize);
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
if (lock_tlbie && !use_local)
@@ -156,7 +157,7 @@ static inline void native_unlock_hpte(struct hash_pte *hptep)
static long native_hpte_insert(unsigned long hpte_group, unsigned long vpn,
unsigned long pa, unsigned long rflags,
- unsigned long vflags, int psize, int ssize)
+ unsigned long vflags, int psize, int apsize, int ssize)
{
struct hash_pte *hptep = htab_address + hpte_group;
unsigned long hpte_v, hpte_r;
@@ -183,8 +184,8 @@ static long native_hpte_insert(unsigned long hpte_group, unsigned long vpn,
if (i == HPTES_PER_GROUP)
return -1;
- hpte_v = hpte_encode_v(vpn, psize, ssize) | vflags | HPTE_V_VALID;
- hpte_r = hpte_encode_r(pa, psize) | rflags;
+ hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | HPTE_V_VALID;
+ hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
if (!(vflags & HPTE_V_BOLTED)) {
DBG_LOW(" i=%x hpte_v=%016lx, hpte_r=%016lx\n",
@@ -244,6 +245,45 @@ static long native_hpte_remove(unsigned long hpte_group)
return i;
}
+static inline int hpte_actual_psize(struct hash_pte *hptep, int psize)
+{
+ int i, shift;
+ unsigned int mask;
+ /* Look at the 8 bit LP value */
+ unsigned int lp = (hptep->r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
+
+ /* First check if it is large page */
+ if (!(hptep->v & HPTE_V_LARGE))
+ return MMU_PAGE_4K;
+
+ /* start from 1 ignoring MMU_PAGE_4K */
+ for (i = 1; i < MMU_PAGE_COUNT; i++) {
+ /* valid entries have a shift value */
+ if (!mmu_psize_defs[i].shift)
+ continue;
+
+ /* invalid penc */
+ if (mmu_psize_defs[psize].penc[i] == -1)
+ continue;
+ /*
+ * encoding bits per actual page size
+ * PTE LP actual page size
+ * rrrr rrrz ≥8KB
+ * rrrr rrzz ≥16KB
+ * rrrr rzzz ≥32KB
+ * rrrr zzzz ≥64KB
+ * .......
+ */
+ shift = mmu_psize_defs[i].shift - LP_SHIFT;
+ if (shift > LP_BITS)
+ shift = LP_BITS;
+ mask = (1 << shift) - 1;
+ if ((lp & mask) == mmu_psize_defs[psize].penc[i])
+ return i;
+ }
+ return -1;
+}
+
static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
unsigned long vpn, int psize, int ssize,
int local)
@@ -251,6 +291,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
struct hash_pte *hptep = htab_address + slot;
unsigned long hpte_v, want_v;
int ret = 0;
+ int actual_psize;
want_v = hpte_encode_avpn(vpn, psize, ssize);
@@ -260,6 +301,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
native_lock_hpte(hptep);
hpte_v = hptep->v;
+ actual_psize = hpte_actual_psize(hptep, psize);
/* Even if we miss, we need to invalidate the TLB */
if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) {
@@ -274,7 +316,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
native_unlock_hpte(hptep);
/* Ensure it is out of the tlb too. */
- tlbie(vpn, psize, ssize, local);
+ tlbie(vpn, psize, actual_psize, ssize, local);
return ret;
}
@@ -315,6 +357,7 @@ static long native_hpte_find(unsigned long vpn, int psize, int ssize)
static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
int psize, int ssize)
{
+ int actual_psize;
unsigned long vpn;
unsigned long vsid;
long slot;
@@ -327,13 +370,14 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
if (slot == -1)
panic("could not find page to bolt\n");
hptep = htab_address + slot;
+ actual_psize = hpte_actual_psize(hptep, psize);
/* Update the HPTE */
hptep->r = (hptep->r & ~(HPTE_R_PP | HPTE_R_N)) |
(newpp & (HPTE_R_PP | HPTE_R_N));
/* Ensure it is out of the tlb too. */
- tlbie(vpn, psize, ssize, 0);
+ tlbie(vpn, psize, actual_psize, ssize, 0);
}
static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
@@ -343,6 +387,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
unsigned long hpte_v;
unsigned long want_v;
unsigned long flags;
+ int actual_psize;
local_irq_save(flags);
@@ -352,6 +397,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
native_lock_hpte(hptep);
hpte_v = hptep->v;
+ actual_psize = hpte_actual_psize(hptep, psize);
/* Even if we miss, we need to invalidate the TLB */
if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
native_unlock_hpte(hptep);
@@ -360,27 +406,24 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
hptep->v = 0;
/* Invalidate the TLB */
- tlbie(vpn, psize, ssize, local);
+ tlbie(vpn, psize, actual_psize, ssize, local);
local_irq_restore(flags);
}
-#define LP_SHIFT 12
-#define LP_BITS 8
-#define LP_MASK(i) ((0xFF >> (i)) << LP_SHIFT)
-
static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
- int *psize, int *ssize, unsigned long *vpn)
+ int *psize, int *apsize, int *ssize, unsigned long *vpn)
{
unsigned long avpn, pteg, vpi;
unsigned long hpte_r = hpte->r;
unsigned long hpte_v = hpte->v;
unsigned long vsid, seg_off;
- int i, size, shift, penc;
+ int i, size, a_size, shift, penc;
- if (!(hpte_v & HPTE_V_LARGE))
- size = MMU_PAGE_4K;
- else {
+ if (!(hpte_v & HPTE_V_LARGE)) {
+ size = MMU_PAGE_4K;
+ a_size = MMU_PAGE_4K;
+ } else {
for (i = 0; i < LP_BITS; i++) {
if ((hpte_r & LP_MASK(i+1)) == LP_MASK(i+1))
break;
@@ -388,19 +431,26 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
penc = LP_MASK(i+1) >> LP_SHIFT;
for (size = 0; size < MMU_PAGE_COUNT; size++) {
- /* 4K pages are not represented by LP */
- if (size == MMU_PAGE_4K)
- continue;
-
/* valid entries have a shift value */
if (!mmu_psize_defs[size].shift)
continue;
+ for (a_size = 0; a_size < MMU_PAGE_COUNT; a_size++) {
- if (penc == mmu_psize_defs[size].penc)
- break;
+ /* 4K pages are not represented by LP */
+ if (a_size == MMU_PAGE_4K)
+ continue;
+
+ /* valid entries have a shift value */
+ if (!mmu_psize_defs[a_size].shift)
+ continue;
+
+ if (penc == mmu_psize_defs[size].penc[a_size])
+ goto out;
+ }
}
}
+out:
/* This works for all page sizes, and for 256M and 1T segments */
*ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
shift = mmu_psize_defs[size].shift;
@@ -433,7 +483,8 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
default:
*vpn = size = 0;
}
- *psize = size;
+ *psize = size;
+ *apsize = a_size;
}
/*
@@ -451,7 +502,7 @@ static void native_hpte_clear(void)
struct hash_pte *hptep = htab_address;
unsigned long hpte_v;
unsigned long pteg_count;
- int psize, ssize;
+ int psize, apsize, ssize;
pteg_count = htab_hash_mask + 1;
@@ -477,9 +528,9 @@ static void native_hpte_clear(void)
* already hold the native_tlbie_lock.
*/
if (hpte_v & HPTE_V_VALID) {
- hpte_decode(hptep, slot, &psize, &ssize, &vpn);
+ hpte_decode(hptep, slot, &psize, &apsize, &ssize, &vpn);
hptep->v = 0;
- __tlbie(vpn, psize, ssize);
+ __tlbie(vpn, psize, apsize, ssize);
}
}
@@ -540,7 +591,7 @@ static void native_flush_hash_range(unsigned long number, int local)
pte_iterate_hashed_subpages(pte, psize,
vpn, index, shift) {
- __tlbiel(vpn, psize, ssize);
+ __tlbiel(vpn, psize, psize, ssize);
} pte_iterate_hashed_end();
}
asm volatile("ptesync":::"memory");
@@ -557,7 +608,7 @@ static void native_flush_hash_range(unsigned long number, int local)
pte_iterate_hashed_subpages(pte, psize,
vpn, index, shift) {
- __tlbie(vpn, psize, ssize);
+ __tlbie(vpn, psize, psize, ssize);
} pte_iterate_hashed_end();
}
asm volatile("eieio; tlbsync; ptesync":::"memory");
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index bfeab83..a5a5067 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -125,7 +125,7 @@ static struct mmu_psize_def mmu_psize_defaults_old[] = {
[MMU_PAGE_4K] = {
.shift = 12,
.sllp = 0,
- .penc = 0,
+ .penc = {[MMU_PAGE_4K] = 0, [1 ... MMU_PAGE_COUNT - 1] = -1},
.avpnm = 0,
.tlbiel = 0,
},
@@ -139,14 +139,15 @@ static struct mmu_psize_def mmu_psize_defaults_gp[] = {
[MMU_PAGE_4K] = {
.shift = 12,
.sllp = 0,
- .penc = 0,
+ .penc = {[MMU_PAGE_4K] = 0, [1 ... MMU_PAGE_COUNT - 1] = -1},
.avpnm = 0,
.tlbiel = 1,
},
[MMU_PAGE_16M] = {
.shift = 24,
.sllp = SLB_VSID_L,
- .penc = 0,
+ .penc = {[0 ... MMU_PAGE_16M - 1] = -1, [MMU_PAGE_16M] = 0,
+ [MMU_PAGE_16M + 1 ... MMU_PAGE_COUNT - 1] = -1 },
.avpnm = 0x1UL,
.tlbiel = 0,
},
@@ -208,7 +209,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
BUG_ON(!ppc_md.hpte_insert);
ret = ppc_md.hpte_insert(hpteg, vpn, paddr, tprot,
- HPTE_V_BOLTED, psize, ssize);
+ HPTE_V_BOLTED, psize, psize, ssize);
if (ret < 0)
break;
@@ -275,6 +276,30 @@ static void __init htab_init_seg_sizes(void)
of_scan_flat_dt(htab_dt_scan_seg_sizes, NULL);
}
+static int __init get_idx_from_shift(unsigned int shift)
+{
+ int idx = -1;
+
+ switch (shift) {
+ case 0xc:
+ idx = MMU_PAGE_4K;
+ break;
+ case 0x10:
+ idx = MMU_PAGE_64K;
+ break;
+ case 0x14:
+ idx = MMU_PAGE_1M;
+ break;
+ case 0x18:
+ idx = MMU_PAGE_16M;
+ break;
+ case 0x22:
+ idx = MMU_PAGE_16G;
+ break;
+ }
+ return idx;
+}
+
static int __init htab_dt_scan_page_sizes(unsigned long node,
const char *uname, int depth,
void *data)
@@ -294,60 +319,61 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
size /= 4;
cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
while(size > 0) {
- unsigned int shift = prop[0];
+ unsigned int base_shift = prop[0];
unsigned int slbenc = prop[1];
unsigned int lpnum = prop[2];
- unsigned int lpenc = 0;
struct mmu_psize_def *def;
- int idx = -1;
+ int idx, base_idx;
size -= 3; prop += 3;
- while(size > 0 && lpnum) {
- if (prop[0] == shift)
- lpenc = prop[1];
- prop += 2; size -= 2;
- lpnum--;
+ base_idx = get_idx_from_shift(base_shift);
+ if (base_idx < 0) {
+ /*
+ * skip the pte encoding also
+ */
+ prop += lpnum * 2; size -= lpnum * 2;
+ continue;
}
- switch(shift) {
- case 0xc:
- idx = MMU_PAGE_4K;
- break;
- case 0x10:
- idx = MMU_PAGE_64K;
- break;
- case 0x14:
- idx = MMU_PAGE_1M;
- break;
- case 0x18:
- idx = MMU_PAGE_16M;
+ def = &mmu_psize_defs[base_idx];
+ if (base_idx == MMU_PAGE_16M)
cur_cpu_spec->mmu_features |= MMU_FTR_16M_PAGE;
- break;
- case 0x22:
- idx = MMU_PAGE_16G;
- break;
- }
- if (idx < 0)
- continue;
- def = &mmu_psize_defs[idx];
- def->shift = shift;
- if (shift <= 23)
+
+ def->shift = base_shift;
+ if (base_shift <= 23)
def->avpnm = 0;
else
- def->avpnm = (1 << (shift - 23)) - 1;
+ def->avpnm = (1 << (base_shift - 23)) - 1;
def->sllp = slbenc;
- def->penc = lpenc;
- /* We don't know for sure what's up with tlbiel, so
+ /*
+ * We don't know for sure what's up with tlbiel, so
* for now we only set it for 4K and 64K pages
*/
- if (idx == MMU_PAGE_4K || idx == MMU_PAGE_64K)
+ if (base_idx == MMU_PAGE_4K || base_idx == MMU_PAGE_64K)
def->tlbiel = 1;
else
def->tlbiel = 0;
- DBG(" %d: shift=%02x, sllp=%04lx, avpnm=%08lx, "
- "tlbiel=%d, penc=%d\n",
- idx, shift, def->sllp, def->avpnm, def->tlbiel,
- def->penc);
+ while (size > 0 && lpnum) {
+ unsigned int shift = prop[0];
+ int penc = prop[1];
+
+ prop += 2; size -= 2;
+ lpnum--;
+
+ idx = get_idx_from_shift(shift);
+ if (idx < 0)
+ continue;
+
+ if (penc == -1)
+ pr_err("Invalid penc for base_shift=%d "
+ "shift=%d\n", base_shift, shift);
+
+ def->penc[idx] = penc;
+ DBG(" %d: shift=%02x, sllp=%04lx, "
+ "avpnm=%08lx, tlbiel=%d, penc=%d\n",
+ idx, shift, def->sllp, def->avpnm,
+ def->tlbiel, def->penc[idx]);
+ }
}
return 1;
}
@@ -396,10 +422,21 @@ static int __init htab_dt_scan_hugepage_blocks(unsigned long node,
}
#endif /* CONFIG_HUGETLB_PAGE */
+static void mmu_psize_set_default_penc(void)
+{
+ int bpsize, apsize;
+ for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
+ for (apsize = 0; apsize < MMU_PAGE_COUNT; apsize++)
+ mmu_psize_defs[bpsize].penc[apsize] = -1;
+}
+
static void __init htab_init_page_sizes(void)
{
int rc;
+ /* se the invalid penc to -1 */
+ mmu_psize_set_default_penc();
+
/* Default to 4K pages only */
memcpy(mmu_psize_defs, mmu_psize_defaults_old,
sizeof(mmu_psize_defaults_old));
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index cecad34..e0d52ee 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -103,7 +103,7 @@ repeat:
/* Insert into the hash table, primary slot */
slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
- mmu_psize, ssize);
+ mmu_psize, mmu_psize, ssize);
/* Primary is full, try the secondary */
if (unlikely(slot == -1)) {
@@ -111,7 +111,7 @@ repeat:
HPTES_PER_GROUP) & ~0x7UL;
slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags,
HPTE_V_SECONDARY,
- mmu_psize, ssize);
+ mmu_psize, mmu_psize, ssize);
if (slot == -1) {
if (mftb() & 0x1)
hpte_group = ((hash & htab_hash_mask) *
diff --git a/arch/powerpc/platforms/cell/beat_htab.c b/arch/powerpc/platforms/cell/beat_htab.c
index 472f9a7..246e1d8 100644
--- a/arch/powerpc/platforms/cell/beat_htab.c
+++ b/arch/powerpc/platforms/cell/beat_htab.c
@@ -90,7 +90,7 @@ static inline unsigned int beat_read_mask(unsigned hpte_group)
static long beat_lpar_hpte_insert(unsigned long hpte_group,
unsigned long vpn, unsigned long pa,
unsigned long rflags, unsigned long vflags,
- int psize, int ssize)
+ int psize, int apsize, int ssize)
{
unsigned long lpar_rc;
u64 hpte_v, hpte_r, slot;
@@ -103,9 +103,9 @@ static long beat_lpar_hpte_insert(unsigned long hpte_group,
"rflags=%lx, vflags=%lx, psize=%d)\n",
hpte_group, va, pa, rflags, vflags, psize);
- hpte_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M) |
+ hpte_v = hpte_encode_v(vpn, psize, apsize, MMU_SEGSIZE_256M) |
vflags | HPTE_V_VALID;
- hpte_r = hpte_encode_r(pa, psize) | rflags;
+ hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
if (!(vflags & HPTE_V_BOLTED))
DBG_LOW(" hpte_v=%016lx, hpte_r=%016lx\n", hpte_v, hpte_r);
@@ -314,7 +314,7 @@ void __init hpte_init_beat(void)
static long beat_lpar_hpte_insert_v3(unsigned long hpte_group,
unsigned long vpn, unsigned long pa,
unsigned long rflags, unsigned long vflags,
- int psize, int ssize)
+ int psize, int apsize, int ssize)
{
unsigned long lpar_rc;
u64 hpte_v, hpte_r, slot;
@@ -327,9 +327,9 @@ static long beat_lpar_hpte_insert_v3(unsigned long hpte_group,
"rflags=%lx, vflags=%lx, psize=%d)\n",
hpte_group, vpn, pa, rflags, vflags, psize);
- hpte_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M) |
+ hpte_v = hpte_encode_v(vpn, psize, apsize, MMU_SEGSIZE_256M) |
vflags | HPTE_V_VALID;
- hpte_r = hpte_encode_r(pa, psize) | rflags;
+ hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
if (!(vflags & HPTE_V_BOLTED))
DBG_LOW(" hpte_v=%016lx, hpte_r=%016lx\n", hpte_v, hpte_r);
@@ -373,7 +373,7 @@ static long beat_lpar_hpte_updatepp_v3(unsigned long slot,
unsigned long pss;
want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
- pss = (psize == MMU_PAGE_4K) ? -1UL : mmu_psize_defs[psize].penc;
+ pss = (psize == MMU_PAGE_4K) ? -1UL : mmu_psize_defs[psize].penc[psize];
DBG_LOW(" update: "
"avpnv=%016lx, slot=%016lx, psize: %d, newpp %016lx ... ",
@@ -403,7 +403,7 @@ static void beat_lpar_hpte_invalidate_v3(unsigned long slot, unsigned long vpn,
DBG_LOW(" inval : slot=%lx, vpn=%016lx, psize: %d, local: %d\n",
slot, vpn, psize, local);
want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
- pss = (psize == MMU_PAGE_4K) ? -1UL : mmu_psize_defs[psize].penc;
+ pss = (psize == MMU_PAGE_4K) ? -1UL : mmu_psize_defs[psize].penc[psize];
lpar_rc = beat_invalidate_htab_entry3(0, slot, want_v, pss);
diff --git a/arch/powerpc/platforms/ps3/htab.c b/arch/powerpc/platforms/ps3/htab.c
index 07a4bba..44f06d2 100644
--- a/arch/powerpc/platforms/ps3/htab.c
+++ b/arch/powerpc/platforms/ps3/htab.c
@@ -45,7 +45,7 @@ static DEFINE_SPINLOCK(ps3_htab_lock);
static long ps3_hpte_insert(unsigned long hpte_group, unsigned long vpn,
unsigned long pa, unsigned long rflags, unsigned long vflags,
- int psize, int ssize)
+ int psize, int apsize, int ssize)
{
int result;
u64 hpte_v, hpte_r;
@@ -61,8 +61,8 @@ static long ps3_hpte_insert(unsigned long hpte_group, unsigned long vpn,
*/
vflags &= ~HPTE_V_SECONDARY;
- hpte_v = hpte_encode_v(vpn, psize, ssize) | vflags | HPTE_V_VALID;
- hpte_r = hpte_encode_r(ps3_mm_phys_to_lpar(pa), psize) | rflags;
+ hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | HPTE_V_VALID;
+ hpte_r = hpte_encode_r(ps3_mm_phys_to_lpar(pa), psize, apsize) | rflags;
spin_lock_irqsave(&ps3_htab_lock, flags);
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index a77c35b..3daced3 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -109,7 +109,7 @@ void vpa_init(int cpu)
static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
unsigned long vpn, unsigned long pa,
unsigned long rflags, unsigned long vflags,
- int psize, int ssize)
+ int psize, int apsize, int ssize)
{
unsigned long lpar_rc;
unsigned long flags;
@@ -121,8 +121,8 @@ static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
"pa=%016lx, rflags=%lx, vflags=%lx, psize=%d)\n",
hpte_group, vpn, pa, rflags, vflags, psize);
- hpte_v = hpte_encode_v(vpn, psize, ssize) | vflags | HPTE_V_VALID;
- hpte_r = hpte_encode_r(pa, psize) | rflags;
+ hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | HPTE_V_VALID;
+ hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
if (!(vflags & HPTE_V_BOLTED))
pr_devel(" hpte_v=%016lx, hpte_r=%016lx\n", hpte_v, hpte_r);
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 21/25] powerpc: Handle hugepage in perf callchain
From: Aneesh Kumar K.V @ 2013-03-15 9:40 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/perf/callchain.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 578cac7..99262ce 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -115,7 +115,7 @@ static int read_user_stack_slow(void __user *ptr, void *ret, int nb)
{
pgd_t *pgdir;
pte_t *ptep, pte;
- unsigned shift;
+ unsigned shift, hugepage;
unsigned long addr = (unsigned long) ptr;
unsigned long offset;
unsigned long pfn;
@@ -125,20 +125,30 @@ static int read_user_stack_slow(void __user *ptr, void *ret, int nb)
if (!pgdir)
return -EFAULT;
- ptep = find_linux_pte_or_hugepte(pgdir, addr, &shift, NULL);
+ ptep = find_linux_pte_or_hugepte(pgdir, addr, &shift, &hugepage);
if (!shift)
shift = PAGE_SHIFT;
- /* align address to page boundary */
- offset = addr & ((1UL << shift) - 1);
- addr -= offset;
-
- if (ptep == NULL)
- return -EFAULT;
- pte = *ptep;
- if (!pte_present(pte) || !(pte_val(pte) & _PAGE_USER))
+ if (!ptep)
return -EFAULT;
- pfn = pte_pfn(pte);
+
+ if (hugepage) {
+ pmd_t pmd = *(pmd_t *)ptep;
+ shift = mmu_psize_defs[MMU_PAGE_16M].shift;
+ offset = addr & ((1UL << shift) - 1);
+
+ if (!pmd_large(pmd) || !(pmd_val(pmd) & PMD_HUGE_USER))
+ return -EFAULT;
+ pfn = pmd_pfn(pmd);
+ } else {
+ offset = addr & ((1UL << shift) - 1);
+
+ pte = *ptep;
+ if (!pte_present(pte) || !(pte_val(pte) & _PAGE_USER))
+ return -EFAULT;
+ pfn = pte_pfn(pte);
+ }
+
if (!page_is_ram(pfn))
return -EFAULT;
--
1.7.10
^ permalink raw reply related
* [PATCH -V3 00/25] THP support for PPC64
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev
Hi,
This patchset adds transparent hugepage support for PPC64.
TODO:
* powernv still doesn't boot
* hash preload support in update_mmu_cache_pmd
Some numbers:
The latency measurements code from Anton found at
http://ozlabs.org/~anton/junkcode/latency2001.c
THP disabled 64K page size
------------------------
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 731.73 cycles 205.77 ns
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 743.39 cycles 209.05 ns
[root@llmp24l02 ~]#
THP disabled large page via hugetlbfs
-------------------------------------
[root@llmp24l02 ~]# ./latency2001 -l 8G
8589934592 416.09 cycles 117.01 ns
[root@llmp24l02 ~]# ./latency2001 -l 8G
8589934592 415.74 cycles 116.91 ns
THP enabled 64K page size.
----------------
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 405.07 cycles 113.91 ns
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 411.82 cycles 115.81 ns
[root@llmp24l02 ~]#
We are close to hugetlbfs in latency and we can achieve this with zero
config/page reservation. Most of the allocations above are fault allocated.
I haven't really measured the collapse alloc impact.
Another test that does 50000000 random access over 1GB area goes from
2.65 seconds to 1.07 seconds with this patchset.
Change from V2:
* Change patch "powerpc: Reduce PTE table memory wastage" to use much simpler approach
for PTE page sharing.
* Changes to handle huge pages in KVM code.
* Address other review comments
Changes from V1
* Address review comments
* More patch split
* Add batch hpte invalidate for hugepages.
Changes from RFC V2:
* Address review comments
* More code cleanup and patch split
Changes from RFC V1:
* HugeTLB fs now works
* Compile issues fixed
* rebased to v3.8
* Patch series reorded so that ppc64 cleanups and MM THP changes are moved
early in the series. This should help in picking those patches early.
Thanks,
-aneesh
^ permalink raw reply
* [PATCH -V3 11/25] powerpc: Update tlbie/tlbiel as per ISA doc
From: Aneesh Kumar K.V @ 2013-03-15 9:39 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1363340407-22619-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This make sure we handle multiple page size segment correctly.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index af9eb1c..409c916 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
switch (psize) {
case MMU_PAGE_4K:
+ /* clear out bits after (52) [0....52.....63] */
+ va &= ~((1ul << (64 - 52)) - 1);
va |= ssize << 8;
+ va |= mmu_psize_defs[apsize].sllp << 6;
asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
: : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
: "memory");
@@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
default:
/* We need 14 to 14 + i bits of va */
penc = mmu_psize_defs[psize].penc[apsize];
- va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
+ /* clear out bits after (44) [0....44.....63] */
+ va &= ~((1ul << (64 - 44)) - 1);
va |= penc << 12;
va |= ssize << 8;
+ /* Add AVAL part */
+ if (psize != apsize) {
+ /*
+ * MPSS, 64K base page size and 16MB parge page size
+ * We don't need all the bits, but this seems to work.
+ * vpn cover upto 65 bits of va. (0...65) and we need
+ * 56..62 bits of va.
+ */
+ va |= ((vpn >> 2) & 0xfe);
+ }
va |= 1; /* L */
asm volatile(ASM_FTR_IFCLR("tlbie %0,1", PPC_TLBIE(%1,%0), %2)
: : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
@@ -96,16 +110,30 @@ static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
switch (psize) {
case MMU_PAGE_4K:
+ /* clear out bits after(52) [0....52.....63] */
+ va &= ~((1ul << (64 - 52)) - 1);
va |= ssize << 8;
+ va |= mmu_psize_defs[apsize].sllp << 6;
asm volatile(".long 0x7c000224 | (%0 << 11) | (0 << 21)"
: : "r"(va) : "memory");
break;
default:
/* We need 14 to 14 + i bits of va */
penc = mmu_psize_defs[psize].penc[apsize];
- va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
+ /* clear out bits after(44) [0....44.....63] */
+ va &= ~((1ul << (64 - 44)) - 1);
va |= penc << 12;
va |= ssize << 8;
+ /* Add AVAL part */
+ if (psize != apsize) {
+ /*
+ * MPSS, 64K base page size and 16MB parge page size
+ * We don't need all the bits, but this seems to work.
+ * vpn cover upto 65 bits of va. (0...65) and we need
+ * 56..62 bits of va.
+ */
+ va |= ((vpn >> 2) & 0xfe);
+ }
va |= 1; /* L */
asm volatile(".long 0x7c000224 | (%0 << 11) | (1 << 21)"
: : "r"(va) : "memory");
--
1.7.10
^ permalink raw reply related
* [PATCH 6/6] powerpc/85xx: Update corenet64_smp_defconfig for B4_QDS
From: Shaveta Leekha @ 2013-03-15 7:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Shaveta Leekha
In-Reply-To: <1363334109-21922-1-git-send-email-shaveta@freescale.com>
Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
---
arch/powerpc/configs/corenet64_smp_defconfig | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/configs/corenet64_smp_defconfig b/arch/powerpc/configs/corenet64_smp_defconfig
index 36a5c41..abf21ea 100644
--- a/arch/powerpc/configs/corenet64_smp_defconfig
+++ b/arch/powerpc/configs/corenet64_smp_defconfig
@@ -21,6 +21,7 @@ CONFIG_MODVERSIONS=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_PARTITION_ADVANCED=y
CONFIG_MAC_PARTITION=y
+CONFIG_B4_QDS=y
CONFIG_P5020_DS=y
CONFIG_P5040_DS=y
CONFIG_T4240_QDS=y
--
1.7.6.GIT
^ permalink raw reply related
* [PATCH 4/6] powerpc/fsl-booke: Add initial B4420QDS board device tree
From: Shaveta Leekha @ 2013-03-15 7:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Shaveta Leekha
In-Reply-To: <1363334109-21922-1-git-send-email-shaveta@freescale.com>
Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
---
arch/powerpc/boot/dts/b4420qds.dts | 168 ++++++++++++++++++++++++++++++++++++
1 files changed, 168 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/boot/dts/b4420qds.dts
diff --git a/arch/powerpc/boot/dts/b4420qds.dts b/arch/powerpc/boot/dts/b4420qds.dts
new file mode 100644
index 0000000..3152f8c
--- /dev/null
+++ b/arch/powerpc/boot/dts/b4420qds.dts
@@ -0,0 +1,168 @@
+/*
+ * B4420DS Device Tree Source
+ *
+ * Copyright 2012 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * This software is provided by Freescale Semiconductor "as is" and any
+ * express or implied warranties, including, but not limited to, the implied
+ * warranties of merchantability and fitness for a particular purpose are
+ * disclaimed. In no event shall Freescale Semiconductor be liable for any
+ * direct, indirect, incidental, special, exemplary, or consequential damages
+ * (including, but not limited to, procurement of substitute goods or services;
+ * loss of use, data, or profits; or business interruption) however caused and
+ * on any theory of liability, whether in contract, strict liability, or tort
+ * (including negligence or otherwise) arising in any way out of the use of
+ * this software, even if advised of the possibility of such damage.
+ */
+
+/include/ "fsl/b4420si-pre.dtsi"
+
+/ {
+ model = "fsl,B4420QDS";
+ compatible = "fsl,B4420QDS";
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-parent = <&mpic>;
+
+ ifc: localbus@ffe124000 {
+ reg = <0xf 0xfe124000 0 0x2000>;
+ ranges = <0 0 0xf 0xe8000000 0x08000000
+ 2 0 0xf 0xff800000 0x00010000
+ 3 0 0xf 0xffdf0000 0x00008000>;
+
+ nor@0,0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "cfi-flash";
+ reg = <0x0 0x0 0x8000000>;
+ bank-width = <2>;
+ device-width = <1>;
+ };
+
+ nand@2,0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "fsl,ifc-nand";
+ reg = <0x2 0x0 0x10000>;
+
+ partition@0 {
+ /* This location must not be altered */
+ /* 1MB for u-boot Bootloader Image */
+ reg = <0x0 0x00100000>;
+ label = "NAND U-Boot Image";
+ read-only;
+ };
+
+ partition@100000 {
+ /* 1MB for DTB Image */
+ reg = <0x00100000 0x00100000>;
+ label = "NAND DTB Image";
+ };
+
+ partition@200000 {
+ /* 10MB for Linux Kernel Image */
+ reg = <0x00200000 0x00A00000>;
+ label = "NAND Linux Kernel Image";
+ };
+
+ partition@c00000 {
+ /* 500MB for Root file System Image */
+ reg = <0x00c00000 0x1F400000>;
+ label = "NAND RFS Image";
+ };
+ };
+
+ board-control@3,0 {
+ compatible = "fsl,b4420qds-fpga", "fsl,fpga-qixis";
+ reg = <3 0 0x300>;
+ };
+ };
+
+ memory {
+ device_type = "memory";
+ };
+
+ soc: soc@ffe000000 {
+ ranges = <0x00000000 0xf 0xfe000000 0x1000000>;
+ reg = <0xf 0xfe000000 0 0x00001000>;
+ spi@110000 {
+ flash@0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "sst,sst25wf040";
+ reg = <0>;
+ spi-max-frequency = <40000000>; /* input clock */
+ };
+ };
+
+ sdhc@114000 {
+ /*Disabled as there is no sdhc connector on B4420QDS board*/
+ status = "disabled";
+ };
+
+ i2c@118000 {
+ eeprom@50 {
+ compatible = "at24,24c64";
+ reg = <0x50>;
+ };
+ eeprom@51 {
+ compatible = "at24,24c256";
+ reg = <0x51>;
+ };
+ eeprom@53 {
+ compatible = "at24,24c256";
+ reg = <0x53>;
+ };
+ eeprom@57 {
+ compatible = "at24,24c256";
+ reg = <0x57>;
+ };
+ rtc@68 {
+ compatible = "dallas,ds3232";
+ reg = <0x68>;
+ interrupts = <0x1 0x1 0 0>;
+ };
+ };
+
+ usb@210000 {
+ dr_mode = "host";
+ phy_type = "ulpi";
+ };
+
+ };
+
+ pci0: pcie@ffe200000 {
+ reg = <0xf 0xfe200000 0 0x10000>;
+ ranges = <0x02000000 0 0xe0000000 0xc 0x00000000 0x0 0x20000000
+ 0x01000000 0 0x00000000 0xf 0xf8000000 0x0 0x00010000>;
+ pcie@0 {
+ ranges = <0x02000000 0 0xe0000000
+ 0x02000000 0 0xe0000000
+ 0 0x20000000
+
+ 0x01000000 0 0x00000000
+ 0x01000000 0 0x00000000
+ 0 0x00010000>;
+ };
+ };
+
+};
+
+/include/ "fsl/b4420si-post.dtsi"
--
1.7.6.GIT
^ permalink raw reply related
* [PATCH 2/6] powerpc/fsl-booke: Add initial B4860QDS board device tree
From: Shaveta Leekha @ 2013-03-15 7:55 UTC (permalink / raw)
To: linuxppc-dev
Cc: Poonam Aggrwal, Shaveta Leekha, Minghuan Lian, Andy Fleming,
Ramneek Mehresh
In-Reply-To: <1363334109-21922-1-git-send-email-shaveta@freescale.com>
Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
---
arch/powerpc/boot/dts/b4860qds.dts | 178 ++++++++++++++++++++++++++++++++++++
1 files changed, 178 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/boot/dts/b4860qds.dts
diff --git a/arch/powerpc/boot/dts/b4860qds.dts b/arch/powerpc/boot/dts/b4860qds.dts
new file mode 100644
index 0000000..ae6ac05
--- /dev/null
+++ b/arch/powerpc/boot/dts/b4860qds.dts
@@ -0,0 +1,178 @@
+/*
+ * B4860DS Device Tree Source
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/b4860si-pre.dtsi"
+
+/ {
+ model = "fsl,B4860QDS";
+ compatible = "fsl,B4860QDS";
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-parent = <&mpic>;
+
+ ifc: localbus@ffe124000 {
+ reg = <0xf 0xfe124000 0 0x2000>;
+ ranges = <0 0 0xf 0xe8000000 0x08000000
+ 2 0 0xf 0xff800000 0x00010000
+ 3 0 0xf 0xffdf0000 0x00008000>;
+
+ nor@0,0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "cfi-flash";
+ reg = <0x0 0x0 0x8000000>;
+ bank-width = <2>;
+ device-width = <1>;
+ };
+
+ nand@2,0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "fsl,ifc-nand";
+ reg = <0x2 0x0 0x10000>;
+
+ partition@0 {
+ /* This location must not be altered */
+ /* 1MB for u-boot Bootloader Image */
+ reg = <0x0 0x00100000>;
+ label = "NAND U-Boot Image";
+ read-only;
+ };
+
+ partition@100000 {
+ /* 1MB for DTB Image */
+ reg = <0x00100000 0x00100000>;
+ label = "NAND DTB Image";
+ };
+
+ partition@200000 {
+ /* 10MB for Linux Kernel Image */
+ reg = <0x00200000 0x00A00000>;
+ label = "NAND Linux Kernel Image";
+ };
+
+ partition@c00000 {
+ /* 500MB for Root file System Image */
+ reg = <0x00c00000 0x1F400000>;
+ label = "NAND RFS Image";
+ };
+ };
+
+ board-control@3,0 {
+ compatible = "fsl,b4860qds-fpga", "fsl,fpga-qixis";
+ reg = <3 0 0x300>;
+ };
+ };
+
+ memory {
+ device_type = "memory";
+ };
+
+ soc: soc@ffe000000 {
+ ranges = <0x00000000 0xf 0xfe000000 0x1000000>;
+ reg = <0xf 0xfe000000 0 0x00001000>;
+ spi@110000 {
+ flash@0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "sst,sst25wf040";
+ reg = <0>;
+ spi-max-frequency = <40000000>; /* input clock */
+ };
+ };
+
+ sdhc@114000 {
+ status = "disabled";
+ };
+
+ i2c@118000 {
+ eeprom@50 {
+ compatible = "at24,24c64";
+ reg = <0x50>;
+ };
+ eeprom@51 {
+ compatible = "at24,24c256";
+ reg = <0x51>;
+ };
+ eeprom@53 {
+ compatible = "at24,24c256";
+ reg = <0x53>;
+ };
+ eeprom@57 {
+ compatible = "at24,24c256";
+ reg = <0x57>;
+ };
+ rtc@68 {
+ compatible = "dallas,ds3232";
+ reg = <0x68>;
+ interrupts = <0x1 0x1 0 0>;
+ };
+ };
+
+ usb@210000 {
+ dr_mode = "host";
+ phy_type = "ulpi";
+ };
+
+ };
+
+ pci0: pcie@ffe200000 {
+ reg = <0xf 0xfe200000 0 0x10000>;
+ ranges = <0x02000000 0 0xe0000000 0xc 0x00000000 0x0 0x20000000
+ 0x01000000 0 0x00000000 0xf 0xf8000000 0x0 0x00010000>;
+ pcie@0 {
+ ranges = <0x02000000 0 0xe0000000
+ 0x02000000 0 0xe0000000
+ 0 0x20000000
+
+ 0x01000000 0 0x00000000
+ 0x01000000 0 0x00000000
+ 0 0x00010000>;
+ };
+ };
+
+ rio: rapidio@ffe0c0000 {
+ reg = <0xf 0xfe0c0000 0 0x11000>;
+
+ port1 {
+ ranges = <0 0 0xc 0x20000000 0 0x10000000>;
+ };
+ port2 {
+ ranges = <0 0 0xc 0x30000000 0 0x10000000>;
+ };
+ };
+
+};
+
+/include/ "fsl/b4860si-post.dtsi"
--
1.7.6.GIT
^ permalink raw reply related
* [PATCH 3/6] powerpc/fsl-booke: Add initial silicon device tree files for B4420QDS
From: Shaveta Leekha @ 2013-03-15 7:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Shaveta Leekha, Andy Fleming, Zhao Chenhui
In-Reply-To: <1363334109-21922-1-git-send-email-shaveta@freescale.com>
Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Andy Fleming <afleming@freescale.com>
---
arch/powerpc/boot/dts/fsl/b4420si-post.dtsi | 151 +++++++++++++++++++++++++++
arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi | 70 ++++++++++++
2 files changed, 221 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
create mode 100644 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
new file mode 100644
index 0000000..e45d605
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
@@ -0,0 +1,151 @@
+/*
+ * B4420 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2012 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * This software is provided by Freescale Semiconductor "as is" and any
+ * express or implied warranties, including, but not limited to, the implied
+ * warranties of merchantability and fitness for a particular purpose are
+ * disclaimed. In no event shall Freescale Semiconductor be liable for any
+ * direct, indirect, incidental, special, exemplary, or consequential damages
+ * (including, but not limited to, procurement of substitute goods or services;
+ * loss of use, data, or profits; or business interruption) however caused and
+ * on any theory of liability, whether in contract, strict liability, or tort
+ * (including negligence or otherwise) arising in any way out of the use of
+ * this software, even if advised of the possibility of such damage.
+ */
+
+&ifc {
+ #address-cells = <2>;
+ #size-cells = <1>;
+ compatible = "fsl,ifc", "simple-bus";
+ interrupts = <25 2 0 0>;
+};
+
+/* controller at 0x200000 */
+&pci0 {
+ compatible = "fsl,b4420-pcie", "fsl,qoriq-pcie-v2.4";
+ device_type = "pci";
+ #size-cells = <2>;
+ #address-cells = <3>;
+ bus-range = <0x0 0xff>;
+ interrupts = <20 2 0 0>;
+ pcie@0 {
+ #interrupt-cells = <1>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ device_type = "pci";
+ interrupts = <20 2 0 0>;
+ interrupt-map-mask = <0xf800 0 0 7>;
+ interrupt-map = <
+ /* IDSEL 0x0 */
+ 0000 0 0 1 &mpic 40 1 0 0
+ 0000 0 0 2 &mpic 1 1 0 0
+ 0000 0 0 3 &mpic 2 1 0 0
+ 0000 0 0 4 &mpic 3 1 0 0
+ >;
+ };
+};
+
+&soc {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ device_type = "soc";
+ compatible = "simple-bus";
+
+ soc-sram-error {
+ compatible = "fsl,soc-sram-error";
+ interrupts = <16 2 1 2>;
+ };
+
+ corenet-law@0 {
+ compatible = "fsl,corenet-law";
+ reg = <0x0 0x1000>;
+ fsl,num-laws = <32>;
+ };
+
+ ddr1: memory-controller@8000 {
+ compatible = "fsl,qoriq-memory-controller-v4.5", "fsl,qoriq-memory-controller";
+ reg = <0x8000 0x1000>;
+ interrupts = <16 2 1 8>;
+ };
+
+ cpc: l3-cache-controller@10000 {
+ compatible = "fsl,p5020-l3-cache-controller", "fsl,p4080-l3-cache-controller", "cache";
+ reg = <0x10000 0x1000>;
+ interrupts = <16 2 1 4>;
+ };
+
+ corenet-cf@18000 {
+ compatible = "fsl,corenet-cf";
+ reg = <0x18000 0x1000>;
+ interrupts = <16 2 1 0>;
+ fsl,ccf-num-csdids = <32>;
+ fsl,ccf-num-snoopids = <32>;
+ };
+
+ iommu@20000 {
+ compatible = "fsl,pamu";
+ reg = <0x20000 0x4000>;
+ interrupts = <
+ 24 2 0 0
+ 16 2 1 1>;
+ };
+
+/include/ "qoriq-mpic.dtsi"
+
+ guts: global-utilities@e0000 {
+ compatible = "fsl,b4420-device-config";
+ reg = <0xe0000 0xe00>;
+ fsl,has-rstcr;
+ fsl,liodn-bits = <12>;
+ };
+
+ rcpm: global-utilities@e2000 {
+ compatible = "fsl,b4420-rcpm", "fsl,qoriq-rcpm-2";
+ reg = <0xe2000 0x1000>;
+ };
+
+/include/ "qoriq-dma-0.dtsi"
+/include/ "qoriq-dma-1.dtsi"
+
+/include/ "qonverge-usb2-dr-0.dtsi"
+ usb0: usb@210000 {
+ compatible = "fsl-usb2-dr-v2.4", "fsl-usb2-dr";
+ };
+
+/include/ "qoriq-espi-0.dtsi"
+ spi@110000 {
+ fsl,espi-num-chipselects = <4>;
+ };
+
+/include/ "qoriq-esdhc-0.dtsi"
+ sdhc@114000 {
+ sdhci,auto-cmd12;
+ };
+/include/ "qoriq-i2c-0.dtsi"
+/include/ "qoriq-i2c-1.dtsi"
+/include/ "qoriq-duart-0.dtsi"
+/include/ "qoriq-duart-1.dtsi"
+
+ L2: l2-cache-controller@c20000 {
+ next-level-cache = <&cpc>;
+ };
+};
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
new file mode 100644
index 0000000..c1ade3b
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
@@ -0,0 +1,70 @@
+/*
+ * B4420 Silicon/SoC Device Tree Source (pre include)
+ *
+ * Copyright 2012 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * This software is provided by Freescale Semiconductor "as is" and any
+ * express or implied warranties, including, but not limited to, the implied
+ * warranties of merchantability and fitness for a particular purpose are
+ * disclaimed. In no event shall Freescale Semiconductor be liable for any
+ * direct, indirect, incidental, special, exemplary, or consequential damages
+ * (including, but not limited to, procurement of substitute goods or services;
+ * loss of use, data, or profits; or business interruption) however caused and
+ * on any theory of liability, whether in contract, strict liability, or tort
+ * (including negligence or otherwise) arising in any way out of the use of
+ * this software, even if advised of the possibility of such damage.
+ */
+
+/dts-v1/;
+/ {
+ compatible = "fsl,B4420";
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-parent = <&mpic>;
+
+ aliases {
+ ccsr = &soc;
+
+ serial0 = &serial0;
+ serial1 = &serial1;
+ serial2 = &serial2;
+ serial3 = &serial3;
+ pci0 = &pci0;
+ dma0 = &dma0;
+ dma1 = &dma1;
+ sdhc = &sdhc;
+ };
+
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ PowerPC,e6500@0 {
+ device_type = "cpu";
+ reg = <0 1>;
+ next-level-cache = <&L2>;
+ };
+ PowerPC,e6500@1 {
+ device_type = "cpu";
+ reg = <2 3>;
+ next-level-cache = <&L2>;
+ };
+ };
+};
--
1.7.6.GIT
^ permalink raw reply related
* [PATCH 5/6] powerpc/fsl-booke: Add B4_QDS board support
From: Shaveta Leekha @ 2013-03-15 7:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Shaveta Leekha
In-Reply-To: <1363334109-21922-1-git-send-email-shaveta@freescale.com>
- Add support for B4 board's personalities in board file
b4_qds.c, It is common for B4 personalities B4860 and B4420QDS
- Add B4QDS support in Kconfig and Makefile
B4860QDS is a high-performance computing evaluation, development and
test platform supporting the B4860 QorIQ Power Architecture processor,
with following major features:
- Four dual-threaded e6500 Power Architecture processors
organized in one cluster-each core runs up to 1.8 GHz
- Two DDR3/3L controllers for high-speed memory interface each
runs at up to 1866.67 MHz
- CoreNet fabric that fully supports coherency using MESI protocol
between the e6500 cores, SC3900 FVP cores, memories and
external interfaces.
- Data Path Acceleration Architecture having FMAN, QMan, BMan, SEC 5.3 and RMAN
- Large internal cache memory with snooping and stashing capabilities
- Sixteen 10-GHz SerDes lanes that serve:
- Two SRIO interfaces. Each supports up to 4 lanes and
a total of up to 8 lanes
- Up to 8-lanes Common Public Radio Interface (CPRI) controller
for glue-less antenna connection
- Two 10-Gbit Ethernet controllers (10GEC)
- Six 1G/2.5-Gbit Ethernet controllers for network communications
- PCI Express controller
- Debug (Aurora)
- Various system peripherals
B4420 is a reduced personality of B4860 with fewer core/clusters(both SC3900 and e6500),
fewer DDR controllers, fewer serdes lanes, fewer SGMII interfaces and reduced target frequencies.
Key differences between B4860 and B4420:
B4420 has:
- Fewer e6500 cores:
1 cluster with 2 e6500 cores
- Fewer SC3900 cores/clusters:
1 cluster with 2 SC3900 cores per cluster
- Single DDRC
- 2X 4 lane serdes
- 3 SGMII interfaces
- no sRIO
- no 10G
Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
---
arch/powerpc/platforms/85xx/Kconfig | 16 +++++
arch/powerpc/platforms/85xx/Makefile | 1 +
arch/powerpc/platforms/85xx/b4_qds.c | 102 ++++++++++++++++++++++++++++++++++
3 files changed, 119 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/platforms/85xx/b4_qds.c
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 31dc066..7bbd522 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -262,6 +262,22 @@ config SGY_CTS1000
endif # PPC32
+config B4_QDS
+ bool "Freescale B4 QDS"
+ select DEFAULT_UIMAGE
+ select E500
+ select PPC_E500MC
+ select PHYS_64BIT
+ select SWIOTLB
+ select MPC8xxx_GPIO
+ select HAS_RAPIDIO
+ select PPC_EPAPR_HV_PIC
+ help
+ This option enables support for the B4 QDS board
+ The B4 application development system B4 QDS is a complete
+ debugging environment intended for engineers developing
+ applications for the B4.
+
config P5020_DS
bool "Freescale P5020 DS"
select DEFAULT_UIMAGE
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 712e233..a12ae2d 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_SMP) += smp.o
obj-y += common.o
obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
+obj-$(CONFIG_B4_QDS) += b4_qds.o corenet_ds.o
obj-$(CONFIG_MPC8540_ADS) += mpc85xx_ads.o
obj-$(CONFIG_MPC8560_ADS) += mpc85xx_ads.o
obj-$(CONFIG_MPC85xx_CDS) += mpc85xx_cds.o
diff --git a/arch/powerpc/platforms/85xx/b4_qds.c b/arch/powerpc/platforms/85xx/b4_qds.c
new file mode 100644
index 0000000..0c6702f
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/b4_qds.c
@@ -0,0 +1,102 @@
+/*
+ * B4 QDS Setup
+ * Should apply for QDS platform of B4860 and it's personalities.
+ * viz B4860/B4420/B4220QDS
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/kdev_t.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/phy.h>
+
+#include <asm/time.h>
+#include <asm/machdep.h>
+#include <asm/pci-bridge.h>
+#include <mm/mmu_decl.h>
+#include <asm/prom.h>
+#include <asm/udbg.h>
+#include <asm/mpic.h>
+
+#include <linux/of_platform.h>
+#include <sysdev/fsl_soc.h>
+#include <sysdev/fsl_pci.h>
+#include <asm/ehv_pic.h>
+
+#include "corenet_ds.h"
+
+/*
+ * Called very early, device-tree isn't unflattened
+ */
+static int __init b4_qds_probe(void)
+{
+ unsigned long root = of_get_flat_dt_root();
+#ifdef CONFIG_SMP
+ extern struct smp_ops_t smp_85xx_ops;
+#endif
+
+ if ((of_flat_dt_is_compatible(root, "fsl,B4860QDS")) ||
+ (of_flat_dt_is_compatible(root, "fsl,B4420QDS")) ||
+ (of_flat_dt_is_compatible(root, "fsl,B4220QDS")))
+ return 1;
+
+ /* Check if we're running under the Freescale hypervisor */
+ if ((of_flat_dt_is_compatible(root, "fsl,B4860QDS-hv")) ||
+ (of_flat_dt_is_compatible(root, "fsl,B4420QDS-hv")) ||
+ (of_flat_dt_is_compatible(root, "fsl,B4220QDS-hv"))) {
+ ppc_md.init_IRQ = ehv_pic_init;
+ ppc_md.get_irq = ehv_pic_get_irq;
+ ppc_md.restart = fsl_hv_restart;
+ ppc_md.power_off = fsl_hv_halt;
+ ppc_md.halt = fsl_hv_halt;
+#ifdef CONFIG_SMP
+ /*
+ * Disable the timebase sync operations because we can't write
+ * to the timebase registers under the hypervisor.
+ */
+ smp_85xx_ops.give_timebase = NULL;
+ smp_85xx_ops.take_timebase = NULL;
+#endif
+ return 1;
+ }
+
+ return 0;
+}
+
+define_machine(b4_qds) {
+ .name = "B4 QDS",
+ .probe = b4_qds_probe,
+ .setup_arch = corenet_ds_setup_arch,
+ .init_IRQ = corenet_ds_pic_init,
+#ifdef CONFIG_PCI
+ .pcibios_fixup_bus = fsl_pcibios_fixup_bus,
+#endif
+/* coreint doesn't play nice with lazy EE, use legacy mpic for now */
+#ifdef CONFIG_PPC64
+ .get_irq = mpic_get_irq,
+#else
+ .get_irq = mpic_get_coreint_irq,
+#endif
+ .restart = fsl_rstcr_restart,
+ .calibrate_decr = generic_calibrate_decr,
+ .progress = udbg_progress,
+#ifdef CONFIG_PPC64
+ .power_save = book3e_idle,
+#else
+ .power_save = e500_idle,
+#endif
+};
+
+machine_arch_initcall(b4_qds, corenet_ds_publish_devices);
+
+#ifdef CONFIG_SWIOTLB
+machine_arch_initcall(b4_qds, swiotlb_setup_bus_notifier);
+#endif
--
1.7.6.GIT
^ permalink raw reply related
* [PATCH 1/6] powerpc/fsl-booke: Add initial silicon device tree files for B4860QDS
From: Shaveta Leekha @ 2013-03-15 7:55 UTC (permalink / raw)
To: linuxppc-dev
Cc: Zhao Chenhui, Minghuan Lian, Shaveta Leekha, Tang Yuantian,
Andy Fleming, Ramneek Mehresh, Varun Sethi
Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andy Fleming <afleming@freescale.com>
---
arch/powerpc/boot/dts/fsl/b4860si-post.dtsi | 184 +++++++++++++++++++++++++++
arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi | 80 ++++++++++++
2 files changed, 264 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
create mode 100644 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
new file mode 100644
index 0000000..2db68b2
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
@@ -0,0 +1,184 @@
+/*
+ * B4860 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+&ifc {
+ #address-cells = <2>;
+ #size-cells = <1>;
+ compatible = "fsl,ifc", "simple-bus";
+ interrupts = <25 2 0 0>;
+};
+
+/* controller at 0x200000 */
+&pci0 {
+ compatible = "fsl,b4860-pcie", "fsl,qoriq-pcie-v2.4";
+ device_type = "pci";
+ #size-cells = <2>;
+ #address-cells = <3>;
+ bus-range = <0x0 0xff>;
+ interrupts = <20 2 0 0>;
+ pcie@0 {
+ #interrupt-cells = <1>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ device_type = "pci";
+ interrupts = <20 2 0 0>;
+ interrupt-map-mask = <0xf800 0 0 7>;
+ interrupt-map = <
+ /* IDSEL 0x0 */
+ 0000 0 0 1 &mpic 40 1 0 0
+ 0000 0 0 2 &mpic 1 1 0 0
+ 0000 0 0 3 &mpic 2 1 0 0
+ 0000 0 0 4 &mpic 3 1 0 0
+ >;
+ };
+};
+
+&rio {
+ compatible = "fsl,srio";
+ interrupts = <16 2 1 11>;
+ #address-cells = <2>;
+ #size-cells = <2>;
+ ranges;
+
+ port1 {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ cell-index = <1>;
+ };
+
+ port2 {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ cell-index = <2>;
+ };
+};
+
+&soc {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ device_type = "soc";
+ compatible = "simple-bus";
+
+ soc-sram-error {
+ compatible = "fsl,soc-sram-error";
+ interrupts = <16 2 1 2>;
+ };
+
+ corenet-law@0 {
+ compatible = "fsl,corenet-law";
+ reg = <0x0 0x1000>;
+ fsl,num-laws = <32>;
+ };
+
+ ddr1: memory-controller@8000 {
+ compatible = "fsl,qoriq-memory-controller-v4.5", "fsl,qoriq-memory-controller";
+ reg = <0x8000 0x1000>;
+ interrupts = <16 2 1 8>;
+ };
+
+ ddr2: memory-controller@9000 {
+ compatible = "fsl,qoriq-memory-controller-v4.5","fsl,qoriq-memory-controller";
+ reg = <0x9000 0x1000>;
+ interrupts = <16 2 1 9>;
+ };
+
+ cpc: l3-cache-controller@10000 {
+ compatible = "fsl,p5020-l3-cache-controller", "fsl,p4080-l3-cache-controller", "cache";
+ reg = <0x10000 0x1000
+ 0x11000 0x1000>;
+ interrupts = <16 2 1 4
+ 16 2 1 5>;
+ };
+
+ corenet-cf@18000 {
+ compatible = "fsl,corenet-cf";
+ reg = <0x18000 0x1000>;
+ interrupts = <16 2 1 0>;
+ fsl,ccf-num-csdids = <32>;
+ fsl,ccf-num-snoopids = <32>;
+ };
+
+ iommu@20000 {
+ compatible = "fsl,pamu-v1.0", "fsl,pamu";
+ reg = <0x20000 0x4000>;
+ interrupts = <
+ 24 2 0 0
+ 16 2 1 1>;
+ };
+
+/include/ "qoriq-mpic.dtsi"
+
+ guts: global-utilities@e0000 {
+ compatible = "fsl,b4860-device-config";
+ reg = <0xe0000 0xe00>;
+ fsl,has-rstcr;
+ fsl,liodn-bits = <12>;
+ };
+
+ clockgen: global-utilities@e1000 {
+ compatible = "fsl,b4860-clockgen", "fsl,qoriq-clockgen-2";
+ reg = <0xe1000 0x1000>;
+ };
+
+ rcpm: global-utilities@e2000 {
+ compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2";
+ reg = <0xe2000 0x1000>;
+ };
+
+/include/ "qoriq-dma-0.dtsi"
+/include/ "qoriq-dma-1.dtsi"
+
+/include/ "qonverge-usb2-dr-0.dtsi"
+ usb0: usb@210000 {
+ compatible = "fsl-usb2-dr-v2.4", "fsl-usb2-dr";
+ };
+
+/include/ "qoriq-espi-0.dtsi"
+ spi@110000 {
+ fsl,espi-num-chipselects = <4>;
+ };
+
+/include/ "qoriq-esdhc-0.dtsi"
+ sdhc@114000 {
+ sdhci,auto-cmd12;
+ };
+/include/ "qoriq-i2c-0.dtsi"
+/include/ "qoriq-i2c-1.dtsi"
+/include/ "qoriq-duart-0.dtsi"
+/include/ "qoriq-duart-1.dtsi"
+
+ L2: l2-cache-controller@c20000 {
+ next-level-cache = <&cpc>;
+ };
+};
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
new file mode 100644
index 0000000..33bc600
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
@@ -0,0 +1,80 @@
+/*
+ * B4860 Silicon/SoC Device Tree Source (pre include)
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/dts-v1/;
+/ {
+ compatible = "fsl,B4860";
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-parent = <&mpic>;
+
+ aliases {
+ ccsr = &soc;
+
+ serial0 = &serial0;
+ serial1 = &serial1;
+ serial2 = &serial2;
+ serial3 = &serial3;
+ pci0 = &pci0;
+ dma0 = &dma0;
+ dma1 = &dma1;
+ sdhc = &sdhc;
+ };
+
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ PowerPC,e6500@0 {
+ device_type = "cpu";
+ reg = <0 1>;
+ next-level-cache = <&L2>;
+ };
+ PowerPC,e6500@1 {
+ device_type = "cpu";
+ reg = <2 3>;
+ next-level-cache = <&L2>;
+ };
+ PowerPC,e6500@2 {
+ device_type = "cpu";
+ reg = <4 5>;
+ next-level-cache = <&L2>;
+ };
+ PowerPC,e6500@3 {
+ device_type = "cpu";
+ reg = <6 7>;
+ next-level-cache = <&L2>;
+ };
+ };
+};
--
1.7.6.GIT
^ permalink raw reply related
* [PATCH 3/3] VFIO: Direct access config reg without capability
From: Gavin Shan @ 2013-03-15 7:26 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: aik, alex.williamson, Gavin Shan
In-Reply-To: <1363332390-12754-1-git-send-email-shangw@linux.vnet.ibm.com>
The config registers in [0, 0x40] is being supported by VFIO. Apart
from that, the other config registers should be coverred by PCI or
PCIe capability. However, there might have some PCI devices (be2net)
who has config registers (0x7c) out of [0, 0x40], and don't have
corresponding PCI or PCIe capability. VFIO will return 0x0 on reading
those registers and writing is dropped. It caused the be2net driver
fails to be loaded because 0x0 returned from its config register 0x7c.
The patch changes the behaviour so that those config registers out
of [0, 0x40] and don't have corresponding PCI or PCIe capability
will be accessed directly.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
drivers/vfio/pci/vfio_pci_config.c | 31 ++++++++++++++++++++-----------
1 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 964ff22..5ea3afb 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1471,18 +1471,27 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
cap_id = vdev->pci_config_map[*ppos / 4];
+ /*
+ * Some PCI device config registers might not be coverred by
+ * capability and useful. We will enable direct access to
+ * those registers.
+ */
if (cap_id == PCI_CAP_ID_INVALID) {
- if (iswrite)
- return ret; /* drop */
-
- /*
- * Per PCI spec 3.0, section 6.1, reads from reserved and
- * unimplemented registers return 0
- */
- if (copy_to_user(buf, &val, count))
- return -EFAULT;
-
- return ret;
+ if (iswrite) {
+ if (copy_from_user(&val, buf, count))
+ return -EFAULT;
+ ret = vfio_user_config_write(vdev->pdev, (int)(*ppos),
+ val, count);
+ return ret ? ret : count;
+ } else {
+ ret = vfio_user_config_read(vdev->pdev, (int)(*ppos),
+ &val, count);
+ if (ret)
+ return ret;
+ if (copy_to_user(buf, &val, count))
+ return -EFAULT;
+ return count;
+ }
}
/*
--
1.7.5.4
^ permalink raw reply related
* [PATCH 2/3] VFIO: VFIO_DEVICE_SET_ADDR_MAPPING command
From: Gavin Shan @ 2013-03-15 7:26 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: aik, alex.williamson, Gavin Shan
In-Reply-To: <1363332390-12754-1-git-send-email-shangw@linux.vnet.ibm.com>
The address (domain/bus/slot/function) of the passed PCI device
looks quite different from perspective of host and guest. Some
architectures like PPC need to setup the mapping in host. The patch
introduces additional VFIO device IOCTL command to address that.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
include/uapi/linux/vfio.h | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6e58d9b..ecc4f38 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -289,6 +289,22 @@ struct vfio_irq_set {
*/
#define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 11)
+/**
+ * VFIO_DEVICE_SET_ADDR_MAPPING - _IO(VFIO_TYPE, VFIO_BASE + 12)
+ *
+ * The address, which comprised of domain/bus/slot/function looks
+ * different between host and guest. We need to setup the mapping
+ * in host for some architectures like PPC so that the passed PCI
+ * devices could support RTAS smoothly.
+ */
+struct vfio_addr_mapping {
+ __u64 buid;
+ __u8 bus;
+ __u8 slot;
+ __u8 func;
+};
+#define VFIO_DEVICE_SET_ADDR_MAPPING _IO(VFIO_TYPE, VFIO_BASE + 12)
+
/*
* The VFIO-PCI bus driver makes use of the following fixed region and
* IRQ index mapping. Unimplemented regions return a size of zero.
--
1.7.5.4
^ permalink raw reply related
* [PATCH 1/3] VFIO: Architecture dependent VFIO device operations
From: Gavin Shan @ 2013-03-15 7:26 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: aik, alex.williamson, Gavin Shan
In-Reply-To: <1363332390-12754-1-git-send-email-shangw@linux.vnet.ibm.com>
Some architectures like PPC, especailly PowerNV platform, need to
do additional operations while adding or removing VFIO devices to
or from VFIO bus. The patch adds weak functions while to open,
release or ioctl for the specific VFIO device. Those functions could
be overrided by individual architectures if necessary.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
drivers/vfio/pci/vfio_pci.c | 42 +++++++++++++++++++++++++++++++++++-------
include/linux/vfio.h | 7 ++++++-
2 files changed, 41 insertions(+), 8 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 8189cb6..1a53e77 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -143,32 +143,51 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
pci_restore_state(pdev);
}
+void __weak vfio_pci_arch_release(struct pci_dev *pdev)
+{
+ return;
+}
+
static void vfio_pci_release(void *device_data)
{
struct vfio_pci_device *vdev = device_data;
- if (atomic_dec_and_test(&vdev->refcnt))
+ if (atomic_dec_and_test(&vdev->refcnt)) {
+ vfio_pci_arch_release(vdev->pdev);
+
vfio_pci_disable(vdev);
+ }
module_put(THIS_MODULE);
}
+int __weak vfio_pci_arch_open(struct pci_dev *pdev)
+{
+ return 0;
+}
+
static int vfio_pci_open(void *device_data)
{
struct vfio_pci_device *vdev = device_data;
+ int ret;
if (!try_module_get(THIS_MODULE))
return -ENODEV;
if (atomic_inc_return(&vdev->refcnt) == 1) {
- int ret = vfio_pci_enable(vdev);
- if (ret) {
- module_put(THIS_MODULE);
- return ret;
- }
+ ret = vfio_pci_arch_open(vdev->pdev);
+ if (ret)
+ goto fail;
+
+ ret = vfio_pci_enable(vdev);
+ if (ret)
+ goto fail;
}
return 0;
+fail:
+ module_put(THIS_MODULE);
+ return ret;
}
static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
@@ -206,6 +225,12 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
return 0;
}
+long __weak vfio_pci_arch_ioctl(struct pci_dev *pdev,
+ unsigned int cmd, unsigned long arg)
+{
+ return -ENOTTY;
+}
+
static long vfio_pci_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
{
@@ -374,9 +399,12 @@ static long vfio_pci_ioctl(void *device_data,
return ret;
- } else if (cmd == VFIO_DEVICE_RESET)
+ } else if (cmd == VFIO_DEVICE_RESET) {
return vdev->reset_works ?
pci_reset_function(vdev->pdev) : -EINVAL;
+ } else {
+ return vfio_pci_arch_ioctl(vdev->pdev, cmd, arg);
+ }
return -ENOTTY;
}
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ab9e862..a991c39 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -11,9 +11,9 @@
#ifndef VFIO_H
#define VFIO_H
-
#include <linux/iommu.h>
#include <linux/mm.h>
+#include <linux/pci.h>
#include <uapi/linux/vfio.h>
/**
@@ -40,6 +40,11 @@ struct vfio_device_ops {
int (*mmap)(void *device_data, struct vm_area_struct *vma);
};
+extern int vfio_pci_arch_open(struct pci_dev *pdev);
+extern long vfio_pci_arch_ioctl(struct pci_dev *pdev,
+ unsigned int cmd,
+ unsigned long arg);
+extern void vfio_pci_arch_release(struct pci_dev *pdev);
extern int vfio_add_group_dev(struct device *dev,
const struct vfio_device_ops *ops,
void *device_data);
--
1.7.5.4
^ permalink raw reply related
* [PATCH 0/3] VFIO change for EEH support
From: Gavin Shan @ 2013-03-15 7:26 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: aik, alex.williamson, Gavin Shan
The EEH (Enhanced Error Handling) is one of RAS features on IBM Power
machines. In order to support EEH, the VFIO needs some modification
as the patchset addresses. Firstly, the address (domain:bus:slot:function)
of passed PCI devices looks quite different from host and guest perspectives.
So we have to mantain the address mapping in host so that the EEH could
direct the EEH errors from guest to proper PCI device. Unfortunately, it
seems that the VFIO implementation doesn't include the mechanism yet. On
the other hand, it's totally business of individual platforms. So I introduced
some weak functions in VFIO driver and individual platforms can override
that to figure out more information that platform needs. Apart from that,
the last patch [3/3] is changing the current behavior of accessing uncoverred
config space for specific PCI device.
The patchset is expected to be applied after Alexy's patchset (supporting
VFIO on PowerNV platform). Besides, there're patchset based on it queued
in my personal tree for EEH core to support PowerKVM guest. With all of
them (Alexy's patchset, this patchset, EEH core patchset), I can sucessfully
pass PCI device to guest and recover it from EEH errors.
drivers/vfio/pci/vfio_pci.c | 42 ++++++++++++++++++++++++++++++------
drivers/vfio/pci/vfio_pci_config.c | 31 +++++++++++++++++---------
include/linux/vfio.h | 7 +++++-
include/uapi/linux/vfio.h | 16 +++++++++++++
4 files changed, 77 insertions(+), 19 deletions(-)
Thanks,
Gavin
^ permalink raw reply
* [PATCH V2] powerpc/85xx: Add platform_device declaration to fsl_pci.h
From: Jia Hongtao @ 2013-03-15 6:14 UTC (permalink / raw)
To: linuxppc-dev, galak; +Cc: B07421, b38951
mpc85xx_pci_err_probe(struct platform_device *op) need platform_device
declaration for definition. Otherwise, it will cause compile error if any
files including fsl_pci.h without declaration of platform_device.
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
---
Changes for V2:
- Just declare platform_device instead of including <linux/platform_device.h>
arch/powerpc/sysdev/fsl_pci.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/sysdev/fsl_pci.h b/arch/powerpc/sysdev/fsl_pci.h
index c495c00..851dd56 100644
--- a/arch/powerpc/sysdev/fsl_pci.h
+++ b/arch/powerpc/sysdev/fsl_pci.h
@@ -14,6 +14,8 @@
#ifndef __POWERPC_FSL_PCI_H
#define __POWERPC_FSL_PCI_H
+struct platform_device;
+
#define PCIE_LTSSM 0x0404 /* PCIE Link Training and Status */
#define PCIE_LTSSM_L0 0x16 /* L0 state */
#define PCIE_IP_REV_2_2 0x02080202 /* PCIE IP block version Rev2.2 */
--
1.8.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox