* [RFC PATCH -V2 17/21] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries
From: Aneesh Kumar K.V @ 2013-02-21 16:47 UTC (permalink / raw)
To: benh, paulus; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1361465248-10867-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
HUGETLB clear the top bit of PMD entries and use that to indicate
a HUGETLB page directory. Since we store pfns in PMDs for THP,
we would have the top bit cleared by default. Add the top bit mask
for THP PMD entries and clear that when we are looking for pmd_pfn.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pgtable.h | 15 ++++++++++++---
arch/powerpc/mm/pgtable.c | 5 ++++-
arch/powerpc/mm/pgtable_64.c | 2 +-
3 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index ca1848a..5b8e93b 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -31,7 +31,7 @@ struct mm_struct;
#define PMD_HUGE_EXEC 0x004 /* No execute on POWER4 and newer (we invert) */
#define PMD_HUGE_SPLITTING 0x008
#define PMD_HUGE_HASHPTE 0x010
-#define PMD_ISHUGE 0x020
+#define _PMD_ISHUGE 0x020
#define PMD_HUGE_DIRTY 0x080 /* C: page changed */
#define PMD_HUGE_ACCESSED 0x100 /* R: page referenced */
#define PMD_HUGE_RW 0x200 /* software: user write access allowed */
@@ -44,6 +44,14 @@ struct mm_struct;
#define PMD_HUGE_RPN_SHIFT PTE_RPN_SHIFT
#define HUGE_PAGE_SIZE (ASM_CONST(1) << 24)
#define HUGE_PAGE_MASK (~(HUGE_PAGE_SIZE - 1))
+/*
+ * HugeTLB looks at the top bit of the Linux page table entries to
+ * decide whether it is a huge page directory or not. Mark HUGE
+ * PMD to differentiate
+ */
+#define PMD_HUGE_NOT_HUGETLB (ASM_CONST(1) << 63)
+#define PMD_ISHUGE (_PMD_ISHUGE | PMD_HUGE_NOT_HUGETLB)
+#define PMD_HUGE_PROTBITS (0xfff | PMD_HUGE_NOT_HUGETLB)
#ifndef __ASSEMBLY__
extern void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
@@ -61,7 +69,8 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
/*
* Only called for huge page pmd
*/
- return pmd_val(pmd) >> PMD_HUGE_RPN_SHIFT;
+ unsigned long val = pmd_val(pmd) & ~PMD_HUGE_PROTBITS;
+ return val >> PMD_HUGE_RPN_SHIFT;
}
static inline int pmd_young(pmd_t pmd)
@@ -95,7 +104,7 @@ static inline int pmd_trans_splitting(pmd_t pmd)
static inline int pmd_trans_huge(pmd_t pmd)
{
- return pmd_val(pmd) & PMD_ISHUGE;
+ return ((pmd_val(pmd) & PMD_ISHUGE) == PMD_ISHUGE);
}
/* We will enable it in the last patch */
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index d117982..ef91331 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -528,7 +528,10 @@ static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
{
pmd_t pmd;
-
+ /*
+ * We cannot support that many PFNs
+ */
+ VM_BUG_ON(pfn & PMD_HUGE_NOT_HUGETLB);
pmd_val(pmd) = pfn << PMD_HUGE_RPN_SHIFT;
/*
* pgtable_t is always 4K aligned, even in case where we use the
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 3dc131d..5f22232 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -346,7 +346,7 @@ EXPORT_SYMBOL(__iounmap_at);
struct page *pmd_page(pmd_t pmd)
{
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (pmd_val(pmd) & PMD_ISHUGE)
+ if ((pmd_val(pmd) & PMD_ISHUGE) == PMD_ISHUGE)
return pfn_to_page(pmd_pfn(pmd));
#endif
return virt_to_page(pmd_page_vaddr(pmd));
--
1.7.10
^ permalink raw reply related
* [RFC PATCH -V2 19/21] powerpc/THP: hypervisor require few WIMG bit set
From: Aneesh Kumar K.V @ 2013-02-21 16:47 UTC (permalink / raw)
To: benh, paulus; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1361465248-10867-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Without this insert will return H_PARAMETER error. Also use
the signed variant when printing error.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/largepage-hash64.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/mm/largepage-hash64.c b/arch/powerpc/mm/largepage-hash64.c
index 2a5fc39..20a626e 100644
--- a/arch/powerpc/mm/largepage-hash64.c
+++ b/arch/powerpc/mm/largepage-hash64.c
@@ -123,6 +123,8 @@ repeat:
/* Add in WIMG bits. FIXME!! enabled by default */
rflags |= (new_pmd & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
_PAGE_COHERENT | _PAGE_GUARDED));
+#else
+ rflags |= _PAGE_COHERENT;
#endif
/* Insert into the hash table, primary slot */
slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
--
1.7.10
^ permalink raw reply related
* [RFC PATCH -V2 20/21] powerpc/THP: get_user_pages_fast changes
From: Aneesh Kumar K.V @ 2013-02-21 16:47 UTC (permalink / raw)
To: benh, paulus; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1361465248-10867-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
handle large pages for get_user_pages_fast. Also take care of large page splitting.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/gup.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 82 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/gup.c b/arch/powerpc/mm/gup.c
index d7efdbf..835c1ae 100644
--- a/arch/powerpc/mm/gup.c
+++ b/arch/powerpc/mm/gup.c
@@ -55,6 +55,72 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
return 1;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline int gup_huge_pmd(pmd_t *pmdp, unsigned long addr,
+ unsigned long end, int write,
+ struct page **pages, int *nr)
+{
+ int refs;
+ pmd_t pmd;
+ unsigned long mask;
+ struct page *head, *page, *tail;
+
+ pmd = *pmdp;
+ mask = PMD_HUGE_PRESENT | PMD_HUGE_USER;
+ if (write)
+ mask |= PMD_HUGE_RW;
+
+ if ((pmd_val(pmd) & mask) != mask)
+ return 0;
+
+ /* large pages are never "special" */
+ VM_BUG_ON(!pfn_valid(pmd_pfn(pmd)));
+
+ refs = 0;
+ head = pmd_page(pmd);
+ page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ tail = page;
+ do {
+ VM_BUG_ON(compound_head(page) != head);
+ pages[*nr] = page;
+ (*nr)++;
+ page++;
+ refs++;
+ } while (addr += PAGE_SIZE, addr != end);
+
+ if (!page_cache_add_speculative(head, refs)) {
+ *nr -= refs;
+ return 0;
+ }
+
+ if (unlikely(pmd_val(pmd) != pmd_val(*pmdp))) {
+ *nr -= refs;
+ while (refs--)
+ put_page(head);
+ return 0;
+ }
+ /*
+ * Any tail page need their mapcount reference taken before we
+ * return.
+ */
+ while (refs--) {
+ if (PageTail(tail))
+ get_huge_page_tail(tail);
+ tail++;
+ }
+
+ return 1;
+}
+#else
+
+static inline int gup_huge_pmd(pmd_t *pmdp, unsigned long addr,
+ unsigned long end, int write,
+ struct page **pages, int *nr)
+{
+ return 1;
+}
+#endif
+
static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
int write, struct page **pages, int *nr)
{
@@ -66,9 +132,23 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
pmd_t pmd = *pmdp;
next = pmd_addr_end(addr, end);
- if (pmd_none(pmd))
+ /*
+ * The pmd_trans_splitting() check below explains why
+ * pmdp_splitting_flush has to flush the tlb, to stop
+ * this gup-fast code from running while we set the
+ * splitting bit in the pmd. Returning zero will take
+ * the slow path that will call wait_split_huge_page()
+ * if the pmd is still in splitting state. gup-fast
+ * can't because it has irq disabled and
+ * wait_split_huge_page() would never return as the
+ * tlb flush IPI wouldn't run.
+ */
+ if (pmd_none(pmd) || pmd_trans_splitting(pmd))
return 0;
- if (is_hugepd(pmdp)) {
+ if (unlikely(pmd_large(pmd))) {
+ if (!gup_huge_pmd(pmdp, addr, next, write, pages, nr))
+ return 0;
+ } else if (is_hugepd(pmdp)) {
if (!gup_hugepd((hugepd_t *)pmdp, PMD_SHIFT,
addr, next, write, pages, nr))
return 0;
--
1.7.10
^ permalink raw reply related
* [RFC PATCH -V2 07/21] powerpc: Use encode avpn where we need only avpn values
From: Aneesh Kumar K.V @ 2013-02-21 16:47 UTC (permalink / raw)
To: benh, paulus; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1361465248-10867-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/hash_native_64.c | 8 ++++----
arch/powerpc/platforms/cell/beat_htab.c | 10 +++++-----
arch/powerpc/platforms/ps3/htab.c | 2 +-
3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index ffc1e00..9d8983a 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -252,7 +252,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
unsigned long hpte_v, want_v;
int ret = 0;
- want_v = hpte_encode_v(vpn, psize, ssize);
+ want_v = hpte_encode_avpn(vpn, psize, ssize);
DBG_LOW(" update(vpn=%016lx, avpnv=%016lx, group=%lx, newpp=%lx)",
vpn, want_v & HPTE_V_AVPN, slot, newpp);
@@ -288,7 +288,7 @@ static long native_hpte_find(unsigned long vpn, int psize, int ssize)
unsigned long want_v, hpte_v;
hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
- want_v = hpte_encode_v(vpn, psize, ssize);
+ want_v = hpte_encode_avpn(vpn, psize, ssize);
/* Bolted mappings are only ever in the primary group */
slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
@@ -348,7 +348,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
DBG_LOW(" invalidate(vpn=%016lx, hash: %lx)\n", vpn, slot);
- want_v = hpte_encode_v(vpn, psize, ssize);
+ want_v = hpte_encode_avpn(vpn, psize, ssize);
native_lock_hpte(hptep);
hpte_v = hptep->v;
@@ -520,7 +520,7 @@ static void native_flush_hash_range(unsigned long number, int local)
slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
slot += hidx & _PTEIDX_GROUP_IX;
hptep = htab_address + slot;
- want_v = hpte_encode_v(vpn, psize, ssize);
+ want_v = hpte_encode_avpn(vpn, psize, ssize);
native_lock_hpte(hptep);
hpte_v = hptep->v;
if (!HPTE_V_COMPARE(hpte_v, want_v) ||
diff --git a/arch/powerpc/platforms/cell/beat_htab.c b/arch/powerpc/platforms/cell/beat_htab.c
index 0f6f839..472f9a7 100644
--- a/arch/powerpc/platforms/cell/beat_htab.c
+++ b/arch/powerpc/platforms/cell/beat_htab.c
@@ -191,7 +191,7 @@ static long beat_lpar_hpte_updatepp(unsigned long slot,
u64 dummy0, dummy1;
unsigned long want_v;
- want_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M);
+ want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
DBG_LOW(" update: "
"avpnv=%016lx, slot=%016lx, psize: %d, newpp %016lx ... ",
@@ -228,7 +228,7 @@ static long beat_lpar_hpte_find(unsigned long vpn, int psize)
unsigned long want_v, hpte_v;
hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, MMU_SEGSIZE_256M);
- want_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M);
+ want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
for (j = 0; j < 2; j++) {
slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
@@ -283,7 +283,7 @@ static void beat_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
DBG_LOW(" inval : slot=%lx, va=%016lx, psize: %d, local: %d\n",
slot, va, psize, local);
- want_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M);
+ want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
raw_spin_lock_irqsave(&beat_htab_lock, flags);
dummy1 = beat_lpar_hpte_getword0(slot);
@@ -372,7 +372,7 @@ static long beat_lpar_hpte_updatepp_v3(unsigned long slot,
unsigned long want_v;
unsigned long pss;
- want_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M);
+ want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
pss = (psize == MMU_PAGE_4K) ? -1UL : mmu_psize_defs[psize].penc;
DBG_LOW(" update: "
@@ -402,7 +402,7 @@ static void beat_lpar_hpte_invalidate_v3(unsigned long slot, unsigned long vpn,
DBG_LOW(" inval : slot=%lx, vpn=%016lx, psize: %d, local: %d\n",
slot, vpn, psize, local);
- want_v = hpte_encode_v(vpn, psize, MMU_SEGSIZE_256M);
+ want_v = hpte_encode_avpn(vpn, psize, MMU_SEGSIZE_256M);
pss = (psize == MMU_PAGE_4K) ? -1UL : mmu_psize_defs[psize].penc;
lpar_rc = beat_invalidate_htab_entry3(0, slot, want_v, pss);
diff --git a/arch/powerpc/platforms/ps3/htab.c b/arch/powerpc/platforms/ps3/htab.c
index d00d7b0..07a4bba 100644
--- a/arch/powerpc/platforms/ps3/htab.c
+++ b/arch/powerpc/platforms/ps3/htab.c
@@ -115,7 +115,7 @@ static long ps3_hpte_updatepp(unsigned long slot, unsigned long newpp,
unsigned long flags;
long ret;
- want_v = hpte_encode_v(vpn, psize, ssize);
+ want_v = hpte_encode_avpn(vpn, psize, ssize);
spin_lock_irqsave(&ps3_htab_lock, flags);
--
1.7.10
^ permalink raw reply related
* [PATCH] Handling of IRQ in MPC8xx GPIO
From: Christophe Leroy @ 2013-02-21 16:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Vitaly Bordug,
Marcelo Tosatti, Thomas Gleixner
Cc: linuxppc-dev, linux-kernel
This patch allows the use IRQ to notify the change of GPIO status on the MPC8xx
CPM IO ports. This then allows to associate IRQs to GPIOs in the Device Tree. Ex:
CPM1_PIO_C: gpio-controller@960 {
#gpio-cells = <2>;
compatible = "fsl,cpm1-pario-bank-c";
reg = <0x960 0x10>;
interrupts = <255 255 255 255 1 2 6 9 10 11 14 15 23 24 26 31>;
interrupt-parent = <&CPM_PIC>;
gpio-controller;
};
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
diff -ur linux-3.7.9/arch/powerpc/include/asm/cpm1.h linux/arch/powerpc/include/asm/cpm1.h
--- linux-3.7.9/arch/powerpc/include/asm/cpm1.h 2013-02-17 19:53:32.000000000 +0100
+++ linux/arch/powerpc/include/asm/cpm1.h 2012-11-03 03:18:35.000000000 +0100
@@ -560,6 +560,8 @@
#define CPM_PIN_SECONDARY 2
#define CPM_PIN_GPIO 4
#define CPM_PIN_OPENDRAIN 8
+#define CPM_PIN_FALLEDGE 16
+#define CPM_PIN_ANYEDGE 0
enum cpm_port {
CPM_PORTA,
diff -ur linux-3.7.9/arch/powerpc/sysdev/cpm1.c linux/arch/powerpc/sysdev/cpm1.c
--- linux-3.7.9/arch/powerpc/sysdev/cpm1.c 2013-02-17 19:53:32.000000000 +0100
+++ linux/arch/powerpc/sysdev/cpm1.c 2013-02-21 15:52:51.000000000 +0100
@@ -375,6 +375,10 @@
setbits16(&iop->odr_sor, pin);
else
clrbits16(&iop->odr_sor, pin);
+ if (flags & CPM_PIN_FALLEDGE)
+ setbits16(&iop->intr, pin);
+ else
+ clrbits16(&iop->intr, pin);
}
}
@@ -526,6 +530,9 @@
/* shadowed data register to clear/set bits safely */
u16 cpdata;
+
+ /* IRQ associated with Pins when relevant */
+ int irq[16];
};
static inline struct cpm1_gpio16_chip *
@@ -581,6 +588,30 @@
spin_unlock_irqrestore(&cpm1_gc->lock, flags);
}
+static int __cpm1_gpio16_to_irq(struct of_mm_gpio_chip *mm_gc,
+ unsigned int gpio)
+{
+ struct cpm1_gpio16_chip *cpm1_gc = to_cpm1_gpio16_chip(mm_gc);
+
+ return cpm1_gc->irq[gpio] ? cpm1_gc->irq[gpio] : -ENXIO;
+}
+
+static int cpm1_gpio16_to_irq(struct gpio_chip *gc, unsigned int gpio)
+{
+ struct of_mm_gpio_chip *mm_gc = to_of_mm_gpio_chip(gc);
+ struct cpm1_gpio16_chip *cpm1_gc = to_cpm1_gpio16_chip(mm_gc);
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&cpm1_gc->lock, flags);
+
+ ret = __cpm1_gpio16_to_irq(mm_gc, gpio);
+
+ spin_unlock_irqrestore(&cpm1_gc->lock, flags);
+
+ return ret;
+}
+
static int cpm1_gpio16_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
{
struct of_mm_gpio_chip *mm_gc = to_of_mm_gpio_chip(gc);
@@ -621,6 +652,7 @@
struct cpm1_gpio16_chip *cpm1_gc;
struct of_mm_gpio_chip *mm_gc;
struct gpio_chip *gc;
+ int i;
cpm1_gc = kzalloc(sizeof(*cpm1_gc), GFP_KERNEL);
if (!cpm1_gc)
@@ -628,6 +660,9 @@
spin_lock_init(&cpm1_gc->lock);
+ for (i = 0; i < 16; i++)
+ cpm1_gc->irq[i] = irq_of_parse_and_map(np, i);
+
mm_gc = &cpm1_gc->mm_gc;
gc = &mm_gc->gc;
@@ -637,6 +672,7 @@
gc->direction_output = cpm1_gpio16_dir_out;
gc->get = cpm1_gpio16_get;
gc->set = cpm1_gpio16_set;
+ gc->to_irq = cpm1_gpio16_to_irq;
return of_mm_gpiochip_add(np, mm_gc);
}
diff -ur linux-3.7.9/kernel/irq/irqdomain.c linux/kernel/irq/irqdomain.c
--- linux-3.7.9/kernel/irq/irqdomain.c 2013-02-17 19:53:32.000000000 +0100
+++ linux/kernel/irq/irqdomain.c 2012-12-13 19:52:38.000000000 +0100
@@ -763,7 +763,8 @@
BUG_ON(domain->revmap_type != IRQ_DOMAIN_MAP_LINEAR);
/* Check revmap bounds; complain if exceeded */
- if (WARN_ON(hwirq >= domain->revmap_data.linear.size))
+ /* 255 is a trick to allow UNDEF value in DTS */
+ if (hwirq == 255 || WARN_ON(hwirq >= domain->revmap_data.linear.size))
return 0;
return domain->revmap_data.linear.revmap[hwirq];
^ permalink raw reply
* Re: PS3: Strange issue with kexec and FreeBSD loader
From: Phileas Fogg @ 2013-02-21 20:38 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1361406741.4676.44.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Wed, 2013-02-20 at 21:43 +0100, Phileas Fogg wrote:
>
>> I found the single commit which brakes kexec stuff for FreeBSD loader or other
>> custom ELF kernels on the PS3 console.
>>
>>
>> From 7230c5644188cd9e3fb380cc97dde00c464a3ba7 Mon Sep 17 00:00:00 2001
>> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Date: Tue, 6 Mar 2012 18:27:59 +1100
>> Subject: [PATCH] powerpc: Rework lazy-interrupt handling
>
> Odd... That rework had its own issues and so several patches went in
> subsequently to address them. It's possible that the PS3 does more
> horrid stuff we missed here but I don't quite see how to relate that to
> your specific memory corruption problem...
>
> Do you see any "pattern" to the corruption ? Does it looks like
> something known ? IE., exception frame, ASCII data, MSR values, ...
>
> Ben.
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Hi,
here is some data for analyzing.
First, i modified kexec-tools and dumped the kernel and DT segments before they
are passed to the kexec_load syscall. I also modified the purgatory code and
made it dump the computed SHA256 checksum, the original SHA256 checksum and
the DT.
Here is the output from kexec-tools:
--------------------------------------
root@ps3-linux:~# kexec -l loader.ps3
segment[0].mem:0x1371000 memsz:262144
segment[1].mem:0x13b1000 memsz:36864
segment[2].mem:0x7fff000 memsz:4096
sha256_digest: 66 a6 c0 be d5 3c ba c2 85 6 97 4 d2 e1 aa 28 63 fa 7f 79 ce de
e7 7f 26 14 a1 fa 2a ea bc 83
Here is the output from the purgatory code:
---------------------------------------------
I'm in purgatory
sha256 digests do not match :(
digest: d4 dc 50 0a ef 78 8e 28 e0 9a fe 52 e1 72 1c b3 23 a6 f4 ea 40
7a 2d fd 6b 2a 66 95 63 f6 99 2a
sha256_digest: 66 a6 c0 be d5 3c ba c2 85 06 97 04 d2 e1 aa 28 63 fa 7f 79 ce
de e7 7f 26 14 a1 fa 2a ea bc 83
sha256_regions:
start=0x0000000001371000 len=0x0000000000040000
start=0x0000000007fff000 len=0x0000000000001000
Here is the DT dump from kexec-tools:
---------------------------------------
00000000 d0 0d fe ed 00 00 03 70 00 00 00 40 00 00 02 74 |.......p...@...t|
00000010 00 00 00 20 00 00 00 02 00 00 00 02 00 00 00 00 |... ............|
00000020 00 00 00 00 07 ff f0 00 00 00 00 00 00 00 03 70 |...............p|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 01 2f 00 00 00 00 00 00 03 00 00 00 04 |..../...........|
00000050 00 00 00 00 00 00 00 02 00 00 00 03 00 00 00 04 |................|
00000060 00 00 00 0f 00 00 00 02 00 00 00 03 00 00 00 09 |................|
00000070 00 00 00 1b 00 00 00 00 73 6f 6e 79 2c 70 73 33 |........sony,ps3|
00000080 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 26 |...............&|
00000090 00 00 00 00 00 00 00 03 00 00 00 08 00 00 00 39 |...............9|
000000a0 00 00 00 00 38 6d 43 80 00 00 00 03 00 00 00 08 |....8mC.........|
000000b0 00 00 00 48 00 00 00 00 53 6f 6e 79 50 53 33 00 |...H....SonyPS3.|
000000c0 00 00 00 03 00 00 00 01 00 00 00 4e 00 00 00 00 |...........N....|
000000d0 00 00 00 01 2f 63 68 6f 73 65 6e 00 00 00 00 03 |..../chosen.....|
000000e0 00 00 00 08 00 00 00 53 00 00 00 00 00 00 00 00 |.......S........|
000000f0 00 00 00 03 00 00 00 07 00 00 00 4e 63 68 6f 73 |...........Nchos|
00000100 65 6e 00 00 00 00 00 03 00 00 00 02 00 00 00 66 |en.............f|
00000110 20 00 00 00 00 00 00 02 00 00 00 01 2f 63 70 75 | .........../cpu|
00000120 73 00 00 00 00 00 00 03 00 00 00 04 00 00 00 00 |s...............|
00000130 00 00 00 01 00 00 00 03 00 00 00 04 00 00 00 0f |................|
00000140 00 00 00 00 00 00 00 03 00 00 00 05 00 00 00 4e |...............N|
00000150 63 70 75 73 00 00 00 00 00 00 00 01 2f 63 70 75 |cpus......../cpu|
00000160 73 2f 63 70 75 40 30 00 00 00 00 03 00 00 00 04 |s/cpu@0.........|
00000170 00 00 00 6f 00 00 00 00 00 00 00 03 00 00 00 04 |...o............|
00000180 00 00 00 7f 00 00 00 80 00 00 00 03 00 00 00 04 |................|
00000190 00 00 00 91 00 00 80 00 00 00 00 03 00 00 00 04 |................|
000001a0 00 00 00 9e 63 70 75 00 00 00 00 03 00 00 00 04 |....cpu.........|
000001b0 00 00 00 aa 00 00 00 80 00 00 00 03 00 00 00 04 |................|
000001c0 00 00 00 bc 00 00 80 00 00 00 00 03 00 00 00 08 |................|
000001d0 00 00 00 c9 00 00 00 00 00 00 00 00 00 00 00 01 |................|
000001e0 00 00 00 03 00 00 00 04 00 00 00 4e 63 70 75 00 |...........Ncpu.|
000001f0 00 00 00 03 00 00 00 04 00 00 00 e4 00 00 00 00 |................|
00000200 00 00 00 03 00 00 00 04 00 00 00 e8 00 00 00 00 |................|
00000210 00 00 00 02 00 00 00 02 00 00 00 01 2f 6d 65 6d |............/mem|
00000220 6f 72 79 00 00 00 00 03 00 00 00 07 00 00 00 9e |ory.............|
00000230 6d 65 6d 6f 72 79 00 00 00 00 00 03 00 00 00 07 |memory..........|
00000240 00 00 00 4e 6d 65 6d 6f 72 79 00 00 00 00 00 03 |...Nmemory......|
00000250 00 00 00 10 00 00 00 e4 00 00 00 00 00 00 00 00 |................|
00000260 00 00 00 00 08 00 00 00 00 00 00 02 00 00 00 02 |................|
00000270 00 00 00 09 23 61 64 64 72 65 73 73 2d 63 65 6c |....#address-cel|
00000280 6c 73 00 23 73 69 7a 65 2d 63 65 6c 6c 73 00 63 |ls.#size-cells.c|
00000290 6f 6d 70 61 74 69 62 6c 65 00 6c 69 6e 75 78 2c |ompatible.linux,|
000002a0 61 76 5f 6d 75 6c 74 69 5f 6f 75 74 00 6c 69 6e |av_multi_out.lin|
000002b0 75 78 2c 72 74 63 5f 64 69 66 66 00 6d 6f 64 65 |ux,rtc_diff.mode|
000002c0 6c 00 6e 61 6d 65 00 6c 69 6e 75 78 2c 6d 65 6d |l.name.linux,mem|
000002d0 6f 72 79 2d 6c 69 6d 69 74 00 62 6f 6f 74 61 72 |ory-limit.bootar|
000002e0 67 73 00 63 6c 6f 63 6b 2d 66 72 65 71 75 65 6e |gs.clock-frequen|
000002f0 63 79 00 64 2d 63 61 63 68 65 2d 6c 69 6e 65 2d |cy.d-cache-line-|
00000300 73 69 7a 65 00 64 2d 63 61 63 68 65 2d 73 69 7a |size.d-cache-siz|
00000310 65 00 64 65 76 69 63 65 5f 74 79 70 65 00 69 2d |e.device_type.i-|
00000320 63 61 63 68 65 2d 6c 69 6e 65 2d 73 69 7a 65 00 |cache-line-size.|
00000330 69 2d 63 61 63 68 65 2d 73 69 7a 65 00 69 62 6d |i-cache-size.ibm|
00000340 2c 70 70 63 2d 69 6e 74 65 72 72 75 70 74 2d 73 |,ppc-interrupt-s|
00000350 65 72 76 65 72 23 73 00 72 65 67 00 74 69 6d 65 |erver#s.reg.time|
00000360 62 61 73 65 2d 66 72 65 71 75 65 6e 63 79 00 00 |base-frequency..|
00000370
Here is the DT dump from the purgatory code after the verify function failed:
------------------------------------------------------------------------------
00000000 d0 0d fe ed 00 00 03 70 00 00 00 40 00 00 02 74 |.......p...@...t|
00000010 00 00 00 20 00 00 00 02 00 00 00 02 00 00 00 00 |... ............|
00000020 00 00 00 00 07 ff f0 00 00 00 00 00 00 00 03 70 |...............p|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 01 2f 00 00 00 00 00 00 03 00 00 00 04 |..../...........|
00000050 00 00 00 00 00 00 00 02 00 00 00 03 00 00 00 04 |................|
00000060 00 00 00 0f 00 00 00 02 00 00 00 03 00 00 00 09 |................|
00000070 00 00 00 1b 00 00 00 00 73 6f 6e 79 2c 70 73 33 |........sony,ps3|
00000080 80 00 00 00 00 00 80 30 80 00 00 00 00 00 80 02 |.......0........|
00000090 c0 00 00 00 00 01 a4 a0 00 00 00 08 00 00 00 39 |...............9|
000000a0 00 00 00 00 38 6d 43 80 00 00 00 03 00 00 00 08 |....8mC.........|
000000b0 00 00 00 48 00 00 00 00 53 6f 6e 79 50 53 33 00 |...H....SonyPS3.|
000000c0 00 00 00 03 00 00 00 01 00 00 00 4e 00 00 00 00 |...........N....|
000000d0 00 00 00 01 2f 63 68 6f 73 65 6e 00 00 00 00 03 |..../chosen.....|
000000e0 00 00 00 08 00 00 00 53 00 00 00 00 00 00 00 00 |.......S........|
000000f0 00 00 00 03 00 00 00 07 00 00 00 4e 63 68 6f 73 |...........Nchos|
00000100 65 6e 00 00 00 00 00 03 00 00 00 02 00 00 00 66 |en.............f|
00000110 20 00 00 00 00 00 00 02 00 00 00 01 2f 63 70 75 | .........../cpu|
00000120 73 00 00 00 00 00 00 03 00 00 00 04 00 00 00 00 |s...............|
00000130 00 00 00 01 00 00 00 03 00 00 00 04 00 00 00 0f |................|
00000140 00 00 00 00 00 00 00 03 00 00 00 05 00 00 00 4e |...............N|
00000150 63 70 75 73 00 00 00 00 00 00 00 01 2f 63 70 75 |cpus......../cpu|
00000160 73 2f 63 70 75 40 30 00 00 00 00 03 00 00 00 04 |s/cpu@0.........|
00000170 00 00 00 6f 00 00 00 00 00 00 00 03 00 00 00 04 |...o............|
00000180 00 00 00 7f 00 00 00 80 00 00 00 03 00 00 00 04 |................|
00000190 00 00 00 91 00 00 80 00 00 00 00 03 00 00 00 04 |................|
000001a0 00 00 00 9e 63 70 75 00 00 00 00 03 00 00 00 04 |....cpu.........|
000001b0 00 00 00 aa 00 00 00 80 00 00 00 03 00 00 00 04 |................|
000001c0 00 00 00 bc 00 00 80 00 00 00 00 03 00 00 00 08 |................|
000001d0 00 00 00 c9 00 00 00 00 00 00 00 00 00 00 00 01 |................|
000001e0 00 00 00 03 00 00 00 04 00 00 00 4e 63 70 75 00 |...........Ncpu.|
000001f0 00 00 00 03 00 00 00 04 00 00 00 e4 00 00 00 00 |................|
00000200 00 00 00 03 00 00 00 04 00 00 00 e8 00 00 00 00 |................|
00000210 00 00 00 02 00 00 00 02 00 00 00 09 2f 6d 65 6d |............/mem|
00000220 6f 72 79 00 00 00 00 03 00 00 00 07 00 00 00 9e |ory.............|
00000230 6d 65 6d 6f 72 79 00 00 00 00 00 03 00 00 00 07 |memory..........|
00000240 00 00 00 4e 6d 65 6d 6f 72 79 00 00 00 00 00 03 |...Nmemory......|
00000250 00 00 00 10 00 00 00 e4 00 00 00 00 00 00 00 00 |................|
00000260 00 00 00 00 08 00 00 00 00 00 00 02 00 00 00 02 |................|
00000270 00 00 00 09 23 61 64 64 72 65 73 73 2d 63 65 6c |....#address-cel|
00000280 6c 73 00 23 73 69 7a 65 2d 63 65 6c 6c 73 00 63 |ls.#size-cells.c|
00000290 6f 6d 70 61 74 69 62 6c 65 00 6c 69 6e 75 78 2c |ompatible.linux,|
000002a0 61 76 5f 6d 75 6c 74 69 5f 6f 75 74 00 6c 69 6e |av_multi_out.lin|
000002b0 75 78 2c 72 74 63 5f 64 69 66 66 00 6d 6f 64 65 |ux,rtc_diff.mode|
000002c0 6c 00 6e 61 6d 65 00 6c 69 6e 75 78 2c 6d 65 6d |l.name.linux,mem|
000002d0 6f 72 79 2d 6c 69 6d 69 74 00 62 6f 6f 74 61 72 |ory-limit.bootar|
000002e0 67 73 00 63 6c 6f 63 6b 2d 66 72 65 71 75 65 6e |gs.clock-frequen|
000002f0 63 79 00 64 2d 63 61 63 68 65 2d 6c 69 6e 65 2d |cy.d-cache-line-|
00000300 73 69 7a 65 00 64 2d 63 61 63 68 65 2d 73 69 7a |size.d-cache-siz|
00000310 65 00 64 65 76 69 63 65 5f 74 79 70 65 00 69 2d |e.device_type.i-|
00000320 63 61 63 68 65 2d 6c 69 6e 65 2d 73 69 7a 65 00 |cache-line-size.|
00000330 69 2d 63 61 63 68 65 2d 73 69 7a 65 00 69 62 6d |i-cache-size.ibm|
00000340 2c 70 70 63 2d 69 6e 74 65 72 72 75 70 74 2d 73 |,ppc-interrupt-s|
00000350 65 72 76 65 72 23 73 00 72 65 67 00 74 69 6d 65 |erver#s.reg.time|
00000360 62 61 73 65 2d 66 72 65 71 75 65 6e 63 79 00 00 |base-frequency..|
00000370
And here is the diff between 2 hexdumps:
-----------------------------------------
--- dt.kexec.hex
+++ dt.dump.hex
@@ -6,8 +6,8 @@
00000050 00 00 00 00 00 00 00 02 00 00 00 03 00 00 00 04 |................|
00000060 00 00 00 0f 00 00 00 02 00 00 00 03 00 00 00 09 |................|
00000070 00 00 00 1b 00 00 00 00 73 6f 6e 79 2c 70 73 33 |........sony,ps3|
-00000080 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 26 |...............&|
-00000090 00 00 00 00 00 00 00 03 00 00 00 08 00 00 00 39 |...............9|
+00000080 80 00 00 00 00 00 80 30 80 00 00 00 00 00 80 02 |.......0........|
+00000090 c0 00 00 00 00 01 a4 a0 00 00 00 08 00 00 00 39 |...............9|
000000a0 00 00 00 00 38 6d 43 80 00 00 00 03 00 00 00 08 |....8mC.........|
000000b0 00 00 00 48 00 00 00 00 53 6f 6e 79 50 53 33 00 |...H....SonyPS3.|
000000c0 00 00 00 03 00 00 00 01 00 00 00 4e 00 00 00 00 |...........N....|
@@ -31,7 +31,7 @@
000001e0 00 00 00 03 00 00 00 04 00 00 00 4e 63 70 75 00 |...........Ncpu.|
000001f0 00 00 00 03 00 00 00 04 00 00 00 e4 00 00 00 00 |................|
00000200 00 00 00 03 00 00 00 04 00 00 00 e8 00 00 00 00 |................|
-00000210 00 00 00 02 00 00 00 02 00 00 00 01 2f 6d 65 6d |............/mem|
+00000210 00 00 00 02 00 00 00 02 00 00 00 09 2f 6d 65 6d |............/mem|
00000220 6f 72 79 00 00 00 00 03 00 00 00 07 00 00 00 9e |ory.............|
00000230 6d 65 6d 6f 72 79 00 00 00 00 00 03 00 00 00 07 |memory..........|
00000240 00 00 00 4e 6d 65 6d 6f 72 79 00 00 00 00 00 03 |...Nmemory......|
As you see, the data is different at offsets 0x80, 0x90 and 0x210.
The new 8 bytes at offset 0x90 in dt.dump.hex look suspicously like the kernel
virtual address: 0xc00000000001a4a0.
I'll try out the advice with DABR register from Geoff later and see if i can get
the code address which corrupts the data in DT.
regards
^ permalink raw reply
* Re: PS3: Strange issue with kexec and FreeBSD loader
From: Benjamin Herrenschmidt @ 2013-02-21 20:35 UTC (permalink / raw)
To: Phileas Fogg; +Cc: linuxppc-dev
In-Reply-To: <512685B7.5080404@mail.ru>
On Thu, 2013-02-21 at 21:38 +0100, Phileas Fogg wrote:
> The new 8 bytes at offset 0x90 in dt.dump.hex look suspicously like
> the kernel virtual address: 0xc00000000001a4a0.
It does indeed. What does that address correspond to in the kernel
text ? Can you disassemble around it with "objdump -D vmlinux" ?
Cheers,
Ben.
^ permalink raw reply
* Re: linux-next: manual merge of the signal tree with the powerpc tree
From: Michael Neuling @ 2013-02-21 20:43 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Stephen Rothwell, linux-kernel, linux-next, Paul Mackerras,
Al Viro, linuxppc-dev
In-Reply-To: <1361425813.4676.47.camel@pasglop>
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Thu, 2013-02-21 at 15:52 +1100, Stephen Rothwell wrote:
> > Hi Al,
> >
> > Today's linux-next merge of the signal tree got conflicts in
> > arch/powerpc/kernel/signal_32.c and arch/powerpc/kernel/signal_64.c
> > between commit 2b0a576d15e0 ("powerpc: Add new transactional memory state
> > to the signal context") from the powerpc tree and commit 7cce246557bf
> > ("powerpc: switch to generic sigaltstack") from the signal tree.
> >
> > I fixed it up (I think - see below) and can carry the fix as necessary
> > (no action is required).
>
> Mikey, can you check everything's all right ?
>
> I'm happy to wait for Al stuff to go in first & fixup the conflict
> before I send the pull request to Linus. I'm off travelling around but I
> should be able to get stuff out this week-end.
The merge looks fine to me. My TM signal tests still pass on
next-20130221.
Thanks sfr!
Mikey
^ permalink raw reply
* Re: PS3: Strange issue with kexec and FreeBSD loader
From: Phileas Fogg @ 2013-02-21 21:44 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1361478942.4676.53.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Thu, 2013-02-21 at 21:38 +0100, Phileas Fogg wrote:
>> The new 8 bytes at offset 0x90 in dt.dump.hex look suspicously like
>> the kernel virtual address: 0xc00000000001a4a0.
>
> It does indeed. What does that address correspond to in the kernel
> text ? Can you disassemble around it with "objdump -D vmlinux" ?
>
> Cheers,
> Ben.
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Here.
I used OpenWRT ELF for testing and it's stripped.
Then i compiled Linux 3.8 myself and didn't strip it.
Addresses are different in both cases but the code is the same and
it is kexec code :)
Stripped OpenWRT image:
------------------------
c00000000001a474: 48 00 00 05 bl 0xc00000000001a478
c00000000001a478: 7c a8 02 a6 mflr r5
c00000000001a47c: 38 a5 00 1c addi r5,r5,28
c00000000001a480: 7c 21 0b 78 mr r1,r1
c00000000001a484: 80 85 00 00 lwz r4,0(r5)
c00000000001a488: 2c 04 00 00 cmpwi r4,0
c00000000001a48c: 40 82 00 62 bnea- 0x60
c00000000001a490: 4b ff ff f0 b 0xc00000000001a480
c00000000001a494: 00 00 00 00 .long 0x0
c00000000001a498: a0 6d 00 48 lhz r3,72(r13)
c00000000001a49c: 48 00 00 11 bl 0xc00000000001a4ac
c00000000001a4a0: 38 80 00 02 li r4,2 <-------- !!!
c00000000001a4a4: 98 8d 00 4b stb r4,75(r13)
c00000000001a4a8: 4b ff ff cc b 0xc00000000001a474
c00000000001a4ac: 39 20 00 02 li r9,2
c00000000001a4b0: 39 40 00 30 li r10,48
c00000000001a4b4: 7d 68 02 a6 mflr r11
c00000000001a4b8: 7d 80 00 a6 mfmsr r12
c00000000001a4bc: 7d 89 48 78 andc r9,r12,r9
c00000000001a4c0: 7d 8a 50 78 andc r10,r12,r10
c00000000001a4c4: 7d 21 01 64 mtmsrd r9,1
Unstripped Linux 3.8 kernel:
-----------------------------
c00000000001c02c <.kexec_wait>:
c00000000001c02c: 48 00 00 05 bl c00000000001c030 <.kexec_wait+0x4>
c00000000001c030: 7c a8 02 a6 mflr r5
c00000000001c034: 38 a5 00 1c addi r5,r5,28
c00000000001c038: 7c 21 0b 78 mr r1,r1
c00000000001c03c: 80 85 00 00 lwz r4,0(r5)
c00000000001c040: 2c 04 00 00 cmpwi r4,0
c00000000001c044: 40 82 00 62 bnea- 60 <reloc_start+0x60>
c00000000001c048: 4b ff ff f0 b c00000000001c038 <.kexec_wait+0xc>
c00000000001c04c <kexec_flag>:
c00000000001c04c: 00 00 00 00 .long 0x0
c00000000001c050 <.kexec_smp_wait>:
c00000000001c050: a0 6d 00 48 lhz r3,72(r13)
c00000000001c054: 48 00 00 11 bl c00000000001c064 <real_mode>
c00000000001c058: 38 80 00 02 li r4,2 <---------- !!!
c00000000001c05c: 98 8d 00 4b stb r4,75(r13)
c00000000001c060: 4b ff ff cc b c00000000001c02c <.kexec_wait>
c00000000001c064 <real_mode>:
c00000000001c064: 39 20 00 02 li r9,2
c00000000001c068: 39 40 00 30 li r10,48
regards
^ permalink raw reply
* Re: PS3: Strange issue with kexec and FreeBSD loader
From: Phileas Fogg @ 2013-02-21 22:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1361478942.4676.53.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Thu, 2013-02-21 at 21:38 +0100, Phileas Fogg wrote:
>> The new 8 bytes at offset 0x90 in dt.dump.hex look suspicously like
>> the kernel virtual address: 0xc00000000001a4a0.
>
> It does indeed. What does that address correspond to in the kernel
> text ? Can you disassemble around it with "objdump -D vmlinux" ?
>
> Cheers,
> Ben.
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Does it look like the new data at offset 0x80 and 0x88 in DT are MSR flags
MSR_DR, MSR_IR and MSR_EE ?
^ permalink raw reply
* Re: linux-next: manual merge of the signal tree with the powerpc tree
From: Stephen Rothwell @ 2013-02-21 21:30 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Michael Neuling, linux-kernel, linux-next, Paul Mackerras,
Al Viro, linuxppc-dev
In-Reply-To: <27231.1361479429@ale.ozlabs.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]
Hi Ben,
On Thu, 21 Feb 2013 14:43:49 -0600 Michael Neuling <mikey@neuling.org> wrote:
>
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > On Thu, 2013-02-21 at 15:52 +1100, Stephen Rothwell wrote:
> > >
> > > Today's linux-next merge of the signal tree got conflicts in
> > > arch/powerpc/kernel/signal_32.c and arch/powerpc/kernel/signal_64.c
> > > between commit 2b0a576d15e0 ("powerpc: Add new transactional memory state
> > > to the signal context") from the powerpc tree and commit 7cce246557bf
> > > ("powerpc: switch to generic sigaltstack") from the signal tree.
> > >
> > > I fixed it up (I think - see below) and can carry the fix as necessary
> > > (no action is required).
> >
> > Mikey, can you check everything's all right ?
> >
> > I'm happy to wait for Al stuff to go in first & fixup the conflict
> > before I send the pull request to Linus. I'm off travelling around but I
> > should be able to get stuff out this week-end.
>
> The merge looks fine to me. My TM signal tests still pass on
> next-20130221.
I think all you (or Al) need do is mention it to Linus when you send the
pull request - he is usually smart enough to fix these things :-) and
likes to see the interactions.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* [patch 1/2] mm: remove free_area_cache use in powerpc architecture
From: akpm @ 2013-02-21 23:05 UTC (permalink / raw)
To: benh; +Cc: paulus, akpm, walken, linuxppc-dev, riel
From: Michel Lespinasse <walken@google.com>
Subject: mm: remove free_area_cache use in powerpc architecture
As all other architectures have been converted to use vm_unmapped_area(),
we are about to retire the free_area_cache.
This change simply removes the use of that cache in
slice_get_unmapped_area(), which will most certainly have a
performance cost. Next one will convert that function to use the
vm_unmapped_area() infrastructure and regain the performance.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/include/asm/page_64.h | 3
arch/powerpc/mm/hugetlbpage.c | 2
arch/powerpc/mm/slice.c | 108 +++------------------
arch/powerpc/platforms/cell/spufs/file.c | 2
4 files changed, 22 insertions(+), 93 deletions(-)
diff -puN arch/powerpc/include/asm/page_64.h~mm-remove-free_area_cache-use-in-powerpc-architecture arch/powerpc/include/asm/page_64.h
--- a/arch/powerpc/include/asm/page_64.h~mm-remove-free_area_cache-use-in-powerpc-architecture
+++ a/arch/powerpc/include/asm/page_64.h
@@ -99,8 +99,7 @@ extern unsigned long slice_get_unmapped_
unsigned long len,
unsigned long flags,
unsigned int psize,
- int topdown,
- int use_cache);
+ int topdown);
extern unsigned int get_slice_psize(struct mm_struct *mm,
unsigned long addr);
diff -puN arch/powerpc/mm/hugetlbpage.c~mm-remove-free_area_cache-use-in-powerpc-architecture arch/powerpc/mm/hugetlbpage.c
--- a/arch/powerpc/mm/hugetlbpage.c~mm-remove-free_area_cache-use-in-powerpc-architecture
+++ a/arch/powerpc/mm/hugetlbpage.c
@@ -742,7 +742,7 @@ unsigned long hugetlb_get_unmapped_area(
struct hstate *hstate = hstate_file(file);
int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));
- return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1, 0);
+ return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
}
#endif
diff -puN arch/powerpc/mm/slice.c~mm-remove-free_area_cache-use-in-powerpc-architecture arch/powerpc/mm/slice.c
--- a/arch/powerpc/mm/slice.c~mm-remove-free_area_cache-use-in-powerpc-architecture
+++ a/arch/powerpc/mm/slice.c
@@ -240,23 +240,15 @@ static void slice_convert(struct mm_stru
static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
unsigned long len,
struct slice_mask available,
- int psize, int use_cache)
+ int psize)
{
struct vm_area_struct *vma;
- unsigned long start_addr, addr;
+ unsigned long addr;
struct slice_mask mask;
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
- if (use_cache) {
- if (len <= mm->cached_hole_size) {
- start_addr = addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
- } else
- start_addr = addr = mm->free_area_cache;
- } else
- start_addr = addr = TASK_UNMAPPED_BASE;
+ addr = TASK_UNMAPPED_BASE;
-full_search:
for (;;) {
addr = _ALIGN_UP(addr, 1ul << pshift);
if ((TASK_SIZE - len) < addr)
@@ -272,63 +264,24 @@ full_search:
addr = _ALIGN_UP(addr + 1, 1ul << SLICE_HIGH_SHIFT);
continue;
}
- if (!vma || addr + len <= vma->vm_start) {
- /*
- * Remember the place where we stopped the search:
- */
- if (use_cache)
- mm->free_area_cache = addr + len;
+ if (!vma || addr + len <= vma->vm_start)
return addr;
- }
- if (use_cache && (addr + mm->cached_hole_size) < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
addr = vma->vm_end;
}
- /* Make sure we didn't miss any holes */
- if (use_cache && start_addr != TASK_UNMAPPED_BASE) {
- start_addr = addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
- goto full_search;
- }
return -ENOMEM;
}
static unsigned long slice_find_area_topdown(struct mm_struct *mm,
unsigned long len,
struct slice_mask available,
- int psize, int use_cache)
+ int psize)
{
struct vm_area_struct *vma;
unsigned long addr;
struct slice_mask mask;
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
- /* check if free_area_cache is useful for us */
- if (use_cache) {
- if (len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = mm->mmap_base;
- }
-
- /* either no address requested or can't fit in requested
- * address hole
- */
- addr = mm->free_area_cache;
-
- /* make sure it can fit in the remaining address space */
- if (addr > len) {
- addr = _ALIGN_DOWN(addr - len, 1ul << pshift);
- mask = slice_range_to_mask(addr, len);
- if (slice_check_fit(mask, available) &&
- slice_area_is_free(mm, addr, len))
- /* remember the address as a hint for
- * next time
- */
- return (mm->free_area_cache = addr);
- }
- }
-
addr = mm->mmap_base;
while (addr > len) {
/* Go down by chunk size */
@@ -352,16 +305,8 @@ static unsigned long slice_find_area_top
* return with success:
*/
vma = find_vma(mm, addr);
- if (!vma || (addr + len) <= vma->vm_start) {
- /* remember the address as a hint for next time */
- if (use_cache)
- mm->free_area_cache = addr;
+ if (!vma || (addr + len) <= vma->vm_start)
return addr;
- }
-
- /* remember the largest hole we saw so far */
- if (use_cache && (addr + mm->cached_hole_size) < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
/* try just below the current vma->vm_start */
addr = vma->vm_start;
@@ -373,28 +318,18 @@ static unsigned long slice_find_area_top
* can happen with large stack limits and large mmap()
* allocations.
*/
- addr = slice_find_area_bottomup(mm, len, available, psize, 0);
-
- /*
- * Restore the topdown base:
- */
- if (use_cache) {
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = ~0UL;
- }
-
- return addr;
+ return slice_find_area_bottomup(mm, len, available, psize);
}
static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
struct slice_mask mask, int psize,
- int topdown, int use_cache)
+ int topdown)
{
if (topdown)
- return slice_find_area_topdown(mm, len, mask, psize, use_cache);
+ return slice_find_area_topdown(mm, len, mask, psize);
else
- return slice_find_area_bottomup(mm, len, mask, psize, use_cache);
+ return slice_find_area_bottomup(mm, len, mask, psize);
}
#define or_mask(dst, src) do { \
@@ -415,7 +350,7 @@ static unsigned long slice_find_area(str
unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
unsigned long flags, unsigned int psize,
- int topdown, int use_cache)
+ int topdown)
{
struct slice_mask mask = {0, 0};
struct slice_mask good_mask;
@@ -430,8 +365,8 @@ unsigned long slice_get_unmapped_area(un
BUG_ON(mm->task_size == 0);
slice_dbg("slice_get_unmapped_area(mm=%p, psize=%d...\n", mm, psize);
- slice_dbg(" addr=%lx, len=%lx, flags=%lx, topdown=%d, use_cache=%d\n",
- addr, len, flags, topdown, use_cache);
+ slice_dbg(" addr=%lx, len=%lx, flags=%lx, topdown=%d\n",
+ addr, len, flags, topdown);
if (len > mm->task_size)
return -ENOMEM;
@@ -503,8 +438,7 @@ unsigned long slice_get_unmapped_area(un
/* Now let's see if we can find something in the existing
* slices for that size
*/
- newaddr = slice_find_area(mm, len, good_mask, psize, topdown,
- use_cache);
+ newaddr = slice_find_area(mm, len, good_mask, psize, topdown);
if (newaddr != -ENOMEM) {
/* Found within the good mask, we don't have to setup,
* we thus return directly
@@ -536,8 +470,7 @@ unsigned long slice_get_unmapped_area(un
* anywhere in the good area.
*/
if (addr) {
- addr = slice_find_area(mm, len, good_mask, psize, topdown,
- use_cache);
+ addr = slice_find_area(mm, len, good_mask, psize, topdown);
if (addr != -ENOMEM) {
slice_dbg(" found area at 0x%lx\n", addr);
return addr;
@@ -547,15 +480,14 @@ unsigned long slice_get_unmapped_area(un
/* Now let's see if we can find something in the existing slices
* for that size plus free slices
*/
- addr = slice_find_area(mm, len, potential_mask, psize, topdown,
- use_cache);
+ addr = slice_find_area(mm, len, potential_mask, psize, topdown);
#ifdef CONFIG_PPC_64K_PAGES
if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
/* retry the search with 4k-page slices included */
or_mask(potential_mask, compat_mask);
addr = slice_find_area(mm, len, potential_mask, psize,
- topdown, use_cache);
+ topdown);
}
#endif
@@ -586,8 +518,7 @@ unsigned long arch_get_unmapped_area(str
unsigned long flags)
{
return slice_get_unmapped_area(addr, len, flags,
- current->mm->context.user_psize,
- 0, 1);
+ current->mm->context.user_psize, 0);
}
unsigned long arch_get_unmapped_area_topdown(struct file *filp,
@@ -597,8 +528,7 @@ unsigned long arch_get_unmapped_area_top
const unsigned long flags)
{
return slice_get_unmapped_area(addr0, len, flags,
- current->mm->context.user_psize,
- 1, 1);
+ current->mm->context.user_psize, 1);
}
unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
diff -puN arch/powerpc/platforms/cell/spufs/file.c~mm-remove-free_area_cache-use-in-powerpc-architecture arch/powerpc/platforms/cell/spufs/file.c
--- a/arch/powerpc/platforms/cell/spufs/file.c~mm-remove-free_area_cache-use-in-powerpc-architecture
+++ a/arch/powerpc/platforms/cell/spufs/file.c
@@ -352,7 +352,7 @@ static unsigned long spufs_get_unmapped_
/* Else, try to obtain a 64K pages slice */
return slice_get_unmapped_area(addr, len, flags,
- MMU_PAGE_64K, 1, 0);
+ MMU_PAGE_64K, 1);
}
#endif /* CONFIG_SPU_FS_64K_LS */
_
^ permalink raw reply
* [patch 2/2] mm: use vm_unmapped_area() on powerpc architecture
From: akpm @ 2013-02-21 23:05 UTC (permalink / raw)
To: benh; +Cc: paulus, akpm, walken, linuxppc-dev
From: Michel Lespinasse <walken@google.com>
Subject: mm: use vm_unmapped_area() on powerpc architecture
Update the powerpc slice_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/mm/slice.c | 123 ++++++++++++++++++++++++--------------
1 file changed, 78 insertions(+), 45 deletions(-)
diff -puN arch/powerpc/mm/slice.c~mm-use-vm_unmapped_area-on-powerpc-architecture arch/powerpc/mm/slice.c
--- a/arch/powerpc/mm/slice.c~mm-use-vm_unmapped_area-on-powerpc-architecture
+++ a/arch/powerpc/mm/slice.c
@@ -237,36 +237,69 @@ static void slice_convert(struct mm_stru
#endif
}
+/*
+ * Compute which slice addr is part of;
+ * set *boundary_addr to the start or end boundary of that slice
+ * (depending on 'end' parameter);
+ * return boolean indicating if the slice is marked as available in the
+ * 'available' slice_mark.
+ */
+static bool slice_scan_available(unsigned long addr,
+ struct slice_mask available,
+ int end,
+ unsigned long *boundary_addr)
+{
+ unsigned long slice;
+ if (addr < SLICE_LOW_TOP) {
+ slice = GET_LOW_SLICE_INDEX(addr);
+ *boundary_addr = (slice + end) << SLICE_LOW_SHIFT;
+ return !!(available.low_slices & (1u << slice));
+ } else {
+ slice = GET_HIGH_SLICE_INDEX(addr);
+ *boundary_addr = (slice + end) ?
+ ((slice + end) << SLICE_HIGH_SHIFT) : SLICE_LOW_TOP;
+ return !!(available.high_slices & (1u << slice));
+ }
+}
+
static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
unsigned long len,
struct slice_mask available,
int psize)
{
- struct vm_area_struct *vma;
- unsigned long addr;
- struct slice_mask mask;
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
+ unsigned long addr, found, next_end;
+ struct vm_unmapped_area_info info;
- addr = TASK_UNMAPPED_BASE;
-
- for (;;) {
- addr = _ALIGN_UP(addr, 1ul << pshift);
- if ((TASK_SIZE - len) < addr)
- break;
- vma = find_vma(mm, addr);
- BUG_ON(vma && (addr >= vma->vm_end));
+ info.flags = 0;
+ info.length = len;
+ info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
+ info.align_offset = 0;
- mask = slice_range_to_mask(addr, len);
- if (!slice_check_fit(mask, available)) {
- if (addr < SLICE_LOW_TOP)
- addr = _ALIGN_UP(addr + 1, 1ul << SLICE_LOW_SHIFT);
- else
- addr = _ALIGN_UP(addr + 1, 1ul << SLICE_HIGH_SHIFT);
+ addr = TASK_UNMAPPED_BASE;
+ while (addr < TASK_SIZE) {
+ info.low_limit = addr;
+ if (!slice_scan_available(addr, available, 1, &addr))
continue;
+
+ next_slice:
+ /*
+ * At this point [info.low_limit; addr) covers
+ * available slices only and ends at a slice boundary.
+ * Check if we need to reduce the range, or if we can
+ * extend it to cover the next available slice.
+ */
+ if (addr >= TASK_SIZE)
+ addr = TASK_SIZE;
+ else if (slice_scan_available(addr, available, 1, &next_end)) {
+ addr = next_end;
+ goto next_slice;
}
- if (!vma || addr + len <= vma->vm_start)
- return addr;
- addr = vma->vm_end;
+ info.high_limit = addr;
+
+ found = vm_unmapped_area(&info);
+ if (!(found & ~PAGE_MASK))
+ return found;
}
return -ENOMEM;
@@ -277,39 +310,39 @@ static unsigned long slice_find_area_top
struct slice_mask available,
int psize)
{
- struct vm_area_struct *vma;
- unsigned long addr;
- struct slice_mask mask;
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
+ unsigned long addr, found, prev;
+ struct vm_unmapped_area_info info;
- addr = mm->mmap_base;
- while (addr > len) {
- /* Go down by chunk size */
- addr = _ALIGN_DOWN(addr - len, 1ul << pshift);
+ info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+ info.length = len;
+ info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
+ info.align_offset = 0;
- /* Check for hit with different page size */
- mask = slice_range_to_mask(addr, len);
- if (!slice_check_fit(mask, available)) {
- if (addr < SLICE_LOW_TOP)
- addr = _ALIGN_DOWN(addr, 1ul << SLICE_LOW_SHIFT);
- else if (addr < (1ul << SLICE_HIGH_SHIFT))
- addr = SLICE_LOW_TOP;
- else
- addr = _ALIGN_DOWN(addr, 1ul << SLICE_HIGH_SHIFT);
+ addr = mm->mmap_base;
+ while (addr > PAGE_SIZE) {
+ info.high_limit = addr;
+ if (!slice_scan_available(addr - 1, available, 0, &addr))
continue;
- }
+ prev_slice:
/*
- * Lookup failure means no vma is above this address,
- * else if new region fits below vma->vm_start,
- * return with success:
+ * At this point [addr; info.high_limit) covers
+ * available slices only and starts at a slice boundary.
+ * Check if we need to reduce the range, or if we can
+ * extend it to cover the previous available slice.
*/
- vma = find_vma(mm, addr);
- if (!vma || (addr + len) <= vma->vm_start)
- return addr;
+ if (addr < PAGE_SIZE)
+ addr = PAGE_SIZE;
+ else if (slice_scan_available(addr - 1, available, 0, &prev)) {
+ addr = prev;
+ goto prev_slice;
+ }
+ info.low_limit = addr;
- /* try just below the current vma->vm_start */
- addr = vma->vm_start;
+ found = vm_unmapped_area(&info);
+ if (!(found & ~PAGE_MASK))
+ return found;
}
/*
_
^ permalink raw reply
* Re: PS3: Strange issue with kexec and FreeBSD loader
From: Benjamin Herrenschmidt @ 2013-02-21 23:46 UTC (permalink / raw)
To: Phileas Fogg; +Cc: linuxppc-dev
In-Reply-To: <5126955B.9070808@mail.ru>
On Thu, 2013-02-21 at 22:44 +0100, Phileas Fogg wrote:
> Stripped OpenWRT image:
> ------------------------
>
> c00000000001a474: 48 00 00 05 bl 0xc00000000001a478
> c00000000001a478: 7c a8 02 a6 mflr r5
> c00000000001a47c: 38 a5 00 1c addi r5,r5,28
> c00000000001a480: 7c 21 0b 78 mr r1,r1
> c00000000001a484: 80 85 00 00 lwz r4,0(r5)
> c00000000001a488: 2c 04 00 00 cmpwi r4,0
> c00000000001a48c: 40 82 00 62 bnea- 0x60
> c00000000001a490: 4b ff ff f0 b 0xc00000000001a480
> c00000000001a494: 00 00 00 00 .long 0x0
> c00000000001a498: a0 6d 00 48 lhz r3,72(r13)
> c00000000001a49c: 48 00 00 11 bl 0xc00000000001a4ac
Smell like a bad stack pointer to me...
One thing I noticed is that kexec doesn't seem to hard disable
interrupts, which is ... fishy at best. It should do that
before it switches stacks around. Dunno if that's the cause
of the problem but it might be worth adding a hard_irq_disable()
after all the local_irq_disable(), making sure we are hard
disabled before going into asm.
Cheers,
Ben.
^ permalink raw reply
* Re: PS3: Strange issue with kexec and FreeBSD loader
From: Benjamin Herrenschmidt @ 2013-02-21 23:47 UTC (permalink / raw)
To: Phileas Fogg; +Cc: linuxppc-dev
In-Reply-To: <51269A4B.1020501@mail.ru>
On Thu, 2013-02-21 at 23:06 +0100, Phileas Fogg wrote:
> Does it look like the new data at offset 0x80 and 0x88 in DT are MSR
> flags
> MSR_DR, MSR_IR and MSR_EE ?
Yes, that looks plausible though I would have expected ME to be set as
well ... Or it could be a CCR value. But it does look like something
splattered the DT as if it was a stack... ie, bad r1 value.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
From: Rusty Russell @ 2013-02-22 0:31 UTC (permalink / raw)
To: Srivatsa S. Bhat, tglx, peterz, tj, oleg, paulmck, mingo, akpm,
namhyung
Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
linux-arm-kernel
In-Reply-To: <20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com>
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).
If you're doing a v7, please put your benchmark results somewhere!
The obvious place is in the 44/46.
Thanks,
Rusty.
^ permalink raw reply
* Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
From: David Gibson @ 2013-02-22 0:32 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]
On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We now have PTE page consuming only 2K of the 64K page.This is in order to
> facilitate transparent huge page support, which works much better if our PMDs
> cover 16MB instead of 256MB.
>
> Inorder to reduce the wastage, we now have multiple PTE page fragment
> from the same PTE page.
This needs a much better description of what you're doing here to
manage the allocations. It's certainly not easy to figure out from
the code.
[snip]
> +#ifdef CONFIG_PPC_64K_PAGES
> +typedef pte_t *pgtable_t;
> +#else
> typedef struct page *pgtable_t;
> +#endif
This looks really bogus. A pgtable_t is a pointer to PTEs on 64K, but
a pointer to a struct page on 4k.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* [PATCH] powerpc: Remove unused postfix parameter to DEFINE_BITOP()
From: Michael Ellerman @ 2013-02-22 3:25 UTC (permalink / raw)
To: linuxppc-dev
None of the users of DEFINE_BITOP pass a postfix, and as far as I can
tell none ever did, so drop it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
arch/powerpc/include/asm/bitops.h | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index ef918a2..810c5fc 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -55,7 +55,7 @@
#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7)
/* Macro for generating the ***_bits() functions */
-#define DEFINE_BITOP(fn, op, prefix, postfix) \
+#define DEFINE_BITOP(fn, op, prefix) \
static __inline__ void fn(unsigned long mask, \
volatile unsigned long *_p) \
{ \
@@ -68,16 +68,15 @@ static __inline__ void fn(unsigned long mask, \
PPC405_ERR77(0,%3) \
PPC_STLCX "%0,0,%3\n" \
"bne- 1b\n" \
- postfix \
: "=&r" (old), "+m" (*p) \
: "r" (mask), "r" (p) \
: "cc", "memory"); \
}
-DEFINE_BITOP(set_bits, or, "", "")
-DEFINE_BITOP(clear_bits, andc, "", "")
-DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER, "")
-DEFINE_BITOP(change_bits, xor, "", "")
+DEFINE_BITOP(set_bits, or, "")
+DEFINE_BITOP(clear_bits, andc, "")
+DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)
+DEFINE_BITOP(change_bits, xor, "")
static __inline__ void set_bit(int nr, volatile unsigned long *addr)
{
--
1.7.10.4
^ permalink raw reply related
* Re: [RFC PATCH -V2 01/21] powerpc: Use signed formatting when printing error
From: Paul Mackerras @ 2013-02-22 5:00 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, Feb 21, 2013 at 10:17:08PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> PAPR define these errors as negative values. So print them accordingly
^ defines
> for easy debugging.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
^ permalink raw reply
* Re: [RFC PATCH -V2 02/21] powerpc: Save DAR and DSISR in pt_regs on MCE
From: Paul Mackerras @ 2013-02-22 5:03 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, Feb 21, 2013 at 10:17:09PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We were not saving DAR and DSISR on MCE. Save then and also print the values
> along with exception details in xmon.
The one reservation I have about this is that xmon will now be
printing bogus values on 32-bit and embedded processors. However, it
seems 32-bit doesn't set regs->dar on a DSI (300) interrupt either.
So:
Acked-by: Paul Mackerras <paulus@samba.org>
^ permalink raw reply
* Re: [RFC PATCH -V2 03/21] powerpc: Don't hard code the size of pte page
From: Paul Mackerras @ 2013-02-22 5:06 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, Feb 21, 2013 at 10:17:10PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> USE PTRS_PER_PTE to indicate the size of pte page.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> powerpc: Don't hard code the size of pte page
>
> USE PTRS_PER_PTE to indicate the size of pte page.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Description and signoff are duplicated. Description could be more
informative, for example - why would we want to do this?
> +/*
> + * hidx is in the second half of the page table. We use the
> + * 8 bytes per each pte entry.
The casual reader probably wouldn't know what "hidx" is. The comment
needs at least to use a better name than "hidx".
Paul.
^ permalink raw reply
* Re: [RFC PATCH -V2 04/21] powerpc: Reduce the PTE_INDEX_SIZE
From: Paul Mackerras @ 2013-02-22 5:07 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, Feb 21, 2013 at 10:17:11PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> This make one PMD cover 16MB range. That helps in easier implementation of THP
> on power. THP core code make use of one pmd entry to track the huge page and
> the range mapped by a single pmd entry should be equal to the huge page size
> supported by the hardware.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
^ permalink raw reply
* Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
From: Aneesh Kumar K.V @ 2013-02-22 5:14 UTC (permalink / raw)
To: David Gibson; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130222003235.GJ21011@truffula.fritz.box>
David Gibson <david@gibson.dropbear.id.au> writes:
> On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> We now have PTE page consuming only 2K of the 64K page.This is in order to
>> facilitate transparent huge page support, which works much better if our PMDs
>> cover 16MB instead of 256MB.
>>
>> Inorder to reduce the wastage, we now have multiple PTE page fragment
>> from the same PTE page.
>
> This needs a much better description of what you're doing here to
> manage the allocations. It's certainly not easy to figure out from
> the code.
I will add more detailed description in the commit message.
We allocate one page for the last level of linux page table. With THP and
large page size of 16MB, that would mean we are be wasting large part
of that page. To map 16MB area, we only need a PTE space of 2K with 64K
Page size. This patch reduce the space wastage by sharing the page
allocated for the last level of linux page table with multiple pmd
entries. We call these smaller chunks PTE page fragments and allocated
page, PTE page. We use the page->_mapcount as bitmap to indicate which
PTE fragments are free.
>
>
> [snip]
>> +#ifdef CONFIG_PPC_64K_PAGES
>> +typedef pte_t *pgtable_t;
>> +#else
>> typedef struct page *pgtable_t;
>> +#endif
>
> This looks really bogus. A pgtable_t is a pointer to PTEs on 64K, but
> a pointer to a struct page on 4k.
>
We enable all the above only with 64K Pages.
-aneesh
^ permalink raw reply
* Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
From: Paul Mackerras @ 2013-02-22 5:23 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We now have PTE page consuming only 2K of the 64K page.This is in order to
In fact the PTE page together with the hash table indexes occupies 4k,
doesn't it? The comments in the code are similarly confusing since
they talk about 2k but actually allocate 4k.
> facilitate transparent huge page support, which works much better if our PMDs
> cover 16MB instead of 256MB.
>
> Inorder to reduce the wastage, we now have multiple PTE page fragment
^ In order (two words)
> from the same PTE page.
A patch like this needs a more complete description and explanation
than you have given. For instance, you could mention that the code
that you're adding for the 32-bit and non-64k cases are just copies of
the previously generic code from pgalloc.h (actually, this movement
might be something that could be split out as a separate patch).
Also, you should describe in outline how you keep a list of pages that
aren't fully allocated and have a bitmap of which 4k sections are in
use, and also how your scheme interacts with RCU.
[snip]
> +#ifdef CONFIG_PPC_64K_PAGES
> +/*
> + * we support 15 fragments per PTE page. This is limited by how many
Why only 15? Don't we get 16 fragments per page?
> + * bits we can pack in page->_mapcount. We use the first half for
> + * tracking the usage for rcu page table free.
What does "first" mean? The high half or the low half?
> +unsigned long *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr)
> +{
> + struct page *page;
> + unsigned int mask, bit;
> + unsigned long *table;
> +
> + /* Allocate fragments of a 4K page as 1K/2K page table */
A 4k page? Do you mean a 64k page? And what is 1K to do with
anything?
> +#ifdef CONFIG_SMP
> +static void __page_table_free_rcu(void *table)
> +{
> + unsigned int bit;
> + struct page *page;
> + /*
> + * this is a PTE page free 2K page table
> + * fragment of a 64K page.
> + */
> + page = virt_to_page(table);
> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
> + bit <<= FRAG_MASK_BITS;
> + /*
> + * clear the higher half and if nobody used the page in
> + * between, even lower half would be zero.
> + */
> + if (atomic_xor_bits(&page->_mapcount, bit) == 0) {
> + pgtable_page_dtor(page);
> + atomic_set(&page->_mapcount, -1);
> + __free_page(page);
> + }
> +}
> +
> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table)
> +{
> + struct page *page;
> + struct mm_struct *mm;
> + unsigned int bit, mask;
> +
> + mm = tlb->mm;
> + /* Free 2K page table fragment of a 64K page */
> + page = virt_to_page(table);
> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
> + spin_lock(&mm->page_table_lock);
> + /*
> + * stash the actual mask in higher half, and clear the lower half
> + * and selectively, add remove from pgtable list
> + */
> + mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
> + if (!(mask & FRAG_MASK))
> + list_del(&page->lru);
> + else {
> + /*
> + * Add the page table page to pgtable_list so that
> + * the free fragment can be used by the next alloc
> + */
> + list_del_init(&page->lru);
> + list_add_tail(&page->lru, &mm->context.pgtable_list);
> + }
> + spin_unlock(&mm->page_table_lock);
> + tlb_remove_table(tlb, table);
> +}
This looks like you're allowing a fragment that is being freed to be
reallocated and used again during the grace period when we are waiting
for any references to the fragment to disappear. Doesn't that allow a
race where one CPU traversing the page table and using the fragment in
its old location in the tree could see a PTE created after the
fragment was reallocated? In other words, why is it safe to allow the
fragment to be used during the grace period? If it is safe, it at
least needs a comment explaining why.
Paul.
^ permalink raw reply
* Re: [RFC PATCH -V2 06/21] powerpc: Add size argument to pgtable_cache_add
From: Paul Mackerras @ 2013-02-22 5:27 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, linux-mm
In-Reply-To: <1361465248-10867-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, Feb 21, 2013 at 10:17:13PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We will use this later with THP changes. With THP we want to create PMD with
> twice the size. The second half will be used to depoist pgtable, which will
^ deposit?
> carry the hpte hash index value
I'm not familiar with what "deposit" and "withdraw" mean in the THP
context. If you can find a way to make the patch description more
informative for people who are not completely familiar with THP
(without adding a full-blown description of THP, of course) that would
be good.
Paul.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox