* Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Benjamin Herrenschmidt @ 2013-12-05 21:00 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linuxppc-dev, paulus, Aneesh Kumar K.V
In-Reply-To: <52A0B786.608@redhat.com>
On Thu, 2013-12-05 at 12:27 -0500, Rik van Riel wrote:
> However, it appears that since the code was #ifdefed
> like that, the called code was made generic enough,
> that change_prot_numa should actually work for
> everything.
>
> In other words:
>
> Reviewed-by: Rik van Riel <riel@redhat.com>
Ok thanks, that's what I needed. Do you have any objection of me merging
that change via the powerpc tree along with the corresponding powerpc
bits from Aneesh ?
The other option would be to have it in a topic branch that I pull from
you.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH -V3] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Rik van Riel @ 2013-12-05 20:25 UTC (permalink / raw)
To: Aneesh Kumar K.V, benh, paulus, mgorman; +Cc: linux-mm, linuxppc-dev
In-Reply-To: <1386268702-30806-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On 12/05/2013 01:38 PM, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
> On archs like ppc64 that don't use _PAGE_PROTNONE and also have
> a separate page table outside linux pagetable, we just need to
> make sure that when calling change_prot_numa we flush the
> hardware page table entry so that next page access result in a numa
> fault.
>
> We still need to make sure we use the numa faulting logic only
> when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault
> (Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING
> is set.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
^ permalink raw reply
* [PATCH -V3] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Aneesh Kumar K.V @ 2013-12-05 18:38 UTC (permalink / raw)
To: benh, paulus, mgorman, riel; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
On archs like ppc64 that don't use _PAGE_PROTNONE and also have
a separate page table outside linux pagetable, we just need to
make sure that when calling change_prot_numa we flush the
hardware page table entry so that next page access result in a numa
fault.
We still need to make sure we use the numa faulting logic only
when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault
(Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING
is set.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Previous discussion around the patch can be found at
http://article.gmane.org/gmane.linux.kernel.mm/109305
changes from V2:
* Move the numa faulting definition within CONFIG_NUMA_BALANCING
include/linux/mm.h | 2 +-
mm/mempolicy.c | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1cedd000cf29..a7b4e310bf42 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1842,7 +1842,7 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
}
#endif
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING
unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#endif
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index eca4a3129129..9f73b29d304d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -613,7 +613,7 @@ static inline int queue_pages_pgd_range(struct vm_area_struct *vma,
return 0;
}
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING
/*
* This is used to mark a range of virtual addresses to be inaccessible.
* These are later cleared by a NUMA hinting fault. Depending on these
@@ -627,7 +627,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long addr, unsigned long end)
{
int nr_updated;
- BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);
if (nr_updated)
@@ -641,7 +640,7 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma,
{
return 0;
}
-#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
+#endif /* CONFIG_NUMA_BALANCING */
/*
* Walk through page tables and collect pages to be migrated.
--
1.8.3.2
^ permalink raw reply related
* Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Rik van Riel @ 2013-12-05 17:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Aneesh Kumar K.V
Cc: linux-mm, linuxppc-dev, paulus, Mel Gorman
In-Reply-To: <1386220835.21910.21.camel@pasglop>
On 12/05/2013 12:20 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote:
>>
>> Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
>> change_prot_numa from task_numa_work and queue_pages_range(). The later
>> may be an issue. So doing the below will help ?
>>
>> -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
>> +#ifdef CONFIG_NUMA_BALANCING
>
> I will defer to Mel and Rik (should we also CC Andrea ?)
It looks like manual numa binding can also use lazy
page migration, but I am not sure if that can happen
without CONFIG_NUMA_BALANCING, or if mbind always uses
MPOL_MF_STRICT...
^ permalink raw reply
* Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Rik van Riel @ 2013-12-05 17:27 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Aneesh Kumar K.V; +Cc: linux-mm, paulus, linuxppc-dev
In-Reply-To: <1386126782.16703.137.camel@pasglop>
On 12/03/2013 10:13 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
>> On archs like ppc64 that don't use _PAGE_PROTNONE and also have
>> a separate page table outside linux pagetable, we just need to
>> make sure that when calling change_prot_numa we flush the
>> hardware page table entry so that next page access result in a numa
>> fault.
>
> That patch doesn't look right...
At first glance, indeed...
> You are essentially making change_prot_numa() do whatever it does (which
> I don't completely understand) *for all architectures* now, whether they
> have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
> behaviour on powerpc book3s64, you change everybody.
However, it appears that since the code was #ifdefed
like that, the called code was made generic enough,
that change_prot_numa should actually work for
everything.
In other words:
Reviewed-by: Rik van Riel <riel@redhat.com>
^ permalink raw reply
* [PATCH] powerpc: Fix PTE page address mismatch in pgtable ctor/dtor
From: Hong H. Pham @ 2013-12-05 15:54 UTC (permalink / raw)
To: linux-rt-users, linuxppc-dev
Cc: Paul Mackerras, Hong H. Pham, Aneesh Kumar K.V, linux-stable
In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
not been converted by page_address() to the newly allocated PTE page.
When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
with an address to the PTE page that has been converted by page_address().
The mismatch in the PTE's page address causes pgtable_page_dtor() to access
invalid memory, so resources for that PTE (such as the page lock) is not
properly cleaned up.
This bug was introduced by commit d614bb041209fd7cb5e4b35e11a7b2f6ee8f62b8
"powerpc: Move the pte free routines from common header".
On a preempt-rt kernel, a spinlock is dynamically allocated for each
PTE in pgtable_page_ctor(). When the PTE is freed, calling
pgtable_page_dtor() with a mismatched page address causes a memory leak,
as the pointer to the PTE's spinlock is bogus.
On mainline, there isn't any immediately obvious symptoms, but the
problem still exists here.
Fixes: d614bb041209fd7c "powerpc: Move the pte free routes from common header"
Cc: Paul Mackerras <paulus@samba.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-stable <stable@vger.kernel.org> # v3.10+
Signed-off-by: Hong H. Pham <hong.pham@windriver.com>
---
arch/powerpc/include/asm/pgalloc-32.h | 2 +-
arch/powerpc/include/asm/pgalloc-64.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/pgalloc-32.h b/arch/powerpc/include/asm/pgalloc-32.h
index 27b2386..7ff24f0 100644
--- a/arch/powerpc/include/asm/pgalloc-32.h
+++ b/arch/powerpc/include/asm/pgalloc-32.h
@@ -87,7 +87,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
struct page *page = page_address(table);
tlb_flush_pgtable(tlb, address);
- pgtable_page_dtor(page);
+ pgtable_page_dtor(table);
pgtable_free_tlb(tlb, page, 0);
}
#endif /* _ASM_POWERPC_PGALLOC_32_H */
diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index f65e27b..b187dc5 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -147,7 +147,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
struct page *page = page_address(table);
tlb_flush_pgtable(tlb, address);
- pgtable_page_dtor(page);
+ pgtable_page_dtor(table);
pgtable_free_tlb(tlb, page, 0);
}
--
1.8.3.2
^ permalink raw reply related
* Re: [PATCH 3/3] powerpc: mm: optimize for the correctly placed page
From: Liu ping fan @ 2013-12-05 13:27 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <877gbjva24.fsf@linux.vnet.ibm.com>
On Thu, Dec 5, 2013 at 6:58 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Liu Ping Fan <kernelfans@gmail.com> writes:
>
>> The period check of _PAGE_NUMA can probably trigger the check on
>> the correctly placed page. For this case, we can just insert hpte and
>> do fast exception return.
>
> I still don't understand why we need to handle numa faults in hash
> page ? Are you trying to optimize the code path ? If so can you explain
> the benefits ? Some numbers showing it is helping ?
>
When return from hash_page(), we will take fast_exc_return_irq, while
from do_page_fault(), we take ret_from_except.
As the "fast" implies that there are more complicated logic to sync
the interrupt states in ret_from_except, which cost much.
Do you think so?
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/mm/hash_utils_64.c | 34 +++++++++++++++++++++++++++++++++-
>> 1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
>> index 9bf1195..735678c 100644
>> --- a/arch/powerpc/mm/hash_utils_64.c
>> +++ b/arch/powerpc/mm/hash_utils_64.c
>> @@ -965,6 +965,10 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>> const struct cpumask *tmp;
>> int rc, user_region = 0, local = 0;
>> int psize, ssize;
>> + pte_t old, new;
>> + struct vm_area_struct *vma;
>> + int page_nid, target_nid;
>> + struct page *test_page;
>>
>> DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
>> ea, access, trap);
>> @@ -1033,12 +1037,40 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>>
>> /* Get PTE and page size from page tables */
>> ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
>> - if (ptep == NULL || !pte_present(*ptep) || pte_numa(*ptep)) {
>> + if (ptep == NULL || !pte_present(*ptep)) {
>> DBG_LOW(" no PTE !\n");
>> rc = 1;
>> goto bail;
>> }
>>
>> + old = pte_val(*ptep);
>> + if (pte_numa(old)) {
>> + /* If fail to lock, let do_page_fault() to handle it */
>> + if (down_read_trylock(&mm->mmap_sem)) {
>
> hmm is that something we want to do in hash_page ?
>
Yes, the function has no relation with hash. But I think it depends on
whether it is worth to optimize or not.
Thanks and regards,
Pingfan
>> + vma = find_vma(mm, ea);
>> + up_read(&mm->mmap_sem);
>> + test_page = pte_page(old);
>> + page_nid = page_to_nid(test_page);
>> + target_nid = numa_migrate_prep(test_page, vma, ea,
>> + page_nid);
>> + if (target_nid < 0) {
>> + new = pte_mknonnuma(old);
>> + /* If ptep is modified under us,
>> + * just retry the access
>> + */
>> + if (unlikely(cmpxchg(ptep, old, new) != old)) {
>> + put_page(test_page);
>> + return 0;
>> + }
>> + put_page(test_page);
>> + }
>> + } else {
>> + put_page(test_page);
>> + rc = 1;
>> + goto bail;
>> + }
>> + }
>> +
>> /* Add _PAGE_PRESENT to the required access perm */
>> access |= _PAGE_PRESENT;
>>
>
> -aneesh
>
^ permalink raw reply
* [PATCH 6/9] powerpc/pci: Use dev_is_pci() to check whether it is pci device
From: Yijing Wang @ 2013-12-05 12:01 UTC (permalink / raw)
To: Bjorn Helgaas, Benjamin Herrenschmidt, Paul Mackerras
Cc: linux-pci, linux-kernel, Yijing Wang, Hanjun Guo, Scott Wood,
linuxppc-dev
Use PCI standard marco dev_is_pci() instead of directly compare
pci_bus_type to check whether it is pci device.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
arch/powerpc/sysdev/fsl_pci.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4dfd61d..7066e52 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -122,7 +122,7 @@ static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
* address width of the SoC such that we can address any internal
* SoC address from across PCI if needed
*/
- if ((dev->bus == &pci_bus_type) &&
+ if ((dev_is_pci(dev)) &&
dma_mask >= DMA_BIT_MASK(MAX_PHYS_ADDR_BITS)) {
set_dma_ops(dev, &dma_direct_ops);
set_dma_offset(dev, pci64_dma_offset);
--
1.7.1
^ permalink raw reply related
* Re: [PATCH 3/3] powerpc: mm: optimize for the correctly placed page
From: Aneesh Kumar K.V @ 2013-12-05 10:58 UTC (permalink / raw)
To: Liu Ping Fan, linuxppc-dev; +Cc: Paul Mackerras
In-Reply-To: <1386140348-7854-4-git-send-email-pingfank@linux.vnet.ibm.com>
Liu Ping Fan <kernelfans@gmail.com> writes:
> The period check of _PAGE_NUMA can probably trigger the check on
> the correctly placed page. For this case, we can just insert hpte and
> do fast exception return.
I still don't understand why we need to handle numa faults in hash
page ? Are you trying to optimize the code path ? If so can you explain
the benefits ? Some numbers showing it is helping ?
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
> arch/powerpc/mm/hash_utils_64.c | 34 +++++++++++++++++++++++++++++++++-
> 1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 9bf1195..735678c 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -965,6 +965,10 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> const struct cpumask *tmp;
> int rc, user_region = 0, local = 0;
> int psize, ssize;
> + pte_t old, new;
> + struct vm_area_struct *vma;
> + int page_nid, target_nid;
> + struct page *test_page;
>
> DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> ea, access, trap);
> @@ -1033,12 +1037,40 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>
> /* Get PTE and page size from page tables */
> ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
> - if (ptep == NULL || !pte_present(*ptep) || pte_numa(*ptep)) {
> + if (ptep == NULL || !pte_present(*ptep)) {
> DBG_LOW(" no PTE !\n");
> rc = 1;
> goto bail;
> }
>
> + old = pte_val(*ptep);
> + if (pte_numa(old)) {
> + /* If fail to lock, let do_page_fault() to handle it */
> + if (down_read_trylock(&mm->mmap_sem)) {
hmm is that something we want to do in hash_page ?
> + vma = find_vma(mm, ea);
> + up_read(&mm->mmap_sem);
> + test_page = pte_page(old);
> + page_nid = page_to_nid(test_page);
> + target_nid = numa_migrate_prep(test_page, vma, ea,
> + page_nid);
> + if (target_nid < 0) {
> + new = pte_mknonnuma(old);
> + /* If ptep is modified under us,
> + * just retry the access
> + */
> + if (unlikely(cmpxchg(ptep, old, new) != old)) {
> + put_page(test_page);
> + return 0;
> + }
> + put_page(test_page);
> + }
> + } else {
> + put_page(test_page);
> + rc = 1;
> + goto bail;
> + }
> + }
> +
> /* Add _PAGE_PRESENT to the required access perm */
> access |= _PAGE_PRESENT;
>
-aneesh
^ permalink raw reply
* RE: Error in frreing hugepages with preemption enabled
From: Bharat Bhushan @ 2013-12-05 11:14 UTC (permalink / raw)
To: Andrea Arcangeli, Alexander Graf
Cc: Scott Wood, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
kvm-ppc@vger.kernel.org
In-Reply-To: <20131203222121.GB18764@redhat.com>
> -----Original Message-----
> From: Andrea Arcangeli [mailto:aarcange@redhat.com]
> Sent: Wednesday, December 04, 2013 3:51 AM
> To: Alexander Graf
> Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; kvm-
> ppc@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Ben Herrensc=
hmidt
> Subject: Re: Error in frreing hugepages with preemption enabled
>=20
> Hi everyone,
>=20
> On Fri, Nov 29, 2013 at 12:13:03PM +0100, Alexander Graf wrote:
> >
> > On 29.11.2013, at 05:38, Bharat Bhushan <Bharat.Bhushan@freescale.com> =
wrote:
> >
> > > Hi Alex,
> > >
> > > I am running KVM guest with host kernel having CONFIG_PREEMPT enabled=
. With
> allocated pages things seems to work fine but I uses hugepages for guest =
I see
> below prints when "quit" from qemu.
> > >
> > > (qemu) QEMU waiting for connection on: telnet:0.0.0.0:4444,server
> > > qemu-system-ppc64: pci_add_option_rom: failed to find romfile "efi-
> virtio.rom"
> > > q
> > > debug_smp_processor_id: 15 callbacks suppressed
> > > BUG: using smp_processor_id() in preemptible [00000000] code:
> > > qemu-system-ppc/2504 caller is .free_hugepd_range+0xb0/0x21c
> > > CPU: 1 PID: 2504 Comm: qemu-system-ppc Not tainted
> > > 3.12.0-rc3-07733-gabf4907 #175 Call Trace:
> > > [c0000000fb433400] [c000000000007d38] .show_stack+0x7c/0x1cc
> > > (unreliable) [c0000000fb4334d0] [c0000000005e8ce0]
> > > .dump_stack+0x9c/0xf4 [c0000000fb433560] [c0000000002de5ec]
> > > .debug_smp_processor_id+0x108/0x11c
> > > [c0000000fb4335f0] [c000000000025e10] .free_hugepd_range+0xb0/0x21c
> > > [c0000000fb433680] [c0000000000265bc]
> > > .hugetlb_free_pgd_range+0x2c8/0x3b0
> > > [c0000000fb4337a0] [c0000000000e428c] .free_pgtables+0x14c/0x158
> > > [c0000000fb433840] [c0000000000ef320] .exit_mmap+0xec/0x194
> > > [c0000000fb433960] [c00000000004d780] .mmput+0x64/0x124
> > > [c0000000fb4339e0] [c000000000051f40] .do_exit+0x29c/0x9c8
> > > [c0000000fb433ae0] [c0000000000527c8] .do_group_exit+0x50/0xc4
> > > [c0000000fb433b70] [c0000000000606a0]
> > > .get_signal_to_deliver+0x21c/0x5d8
> > > [c0000000fb433c70] [c000000000009b08] .do_signal+0x54/0x278
> > > [c0000000fb433db0] [c000000000009e50] .do_notify_resume+0x64/0x78
> > > [c0000000fb433e30] [c000000000000b44]
> > > .ret_from_except_lite+0x70/0x74
> > >
> > >
> > > This mean that free_hugepd_range() must be called with preemption ena=
bled.
> >
> > with preemption disabled.
> >
> > > I tried below change and this seems to work fine (I am not having
> > > expertise in this area so not sure this is correct way)
> >
> > Not sure - the scope looks odd to me. Let's ask Andrea - I'm sure he kn=
ows
> what to do :).
>=20
> :) So I had a look at the top of this function (0xb0) in the upstream ker=
nel and
> no smp_processor_id() call is apparent, is this stock git or a ppc tree? =
The
> first few calls seem not to call it but I may have overlooked something. =
It's
> just quicker if somebody with vmlinux finds the location of it.
>=20
> static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int
> pdshift,
> unsigned long start, unsigned long end,
> unsigned long floor, unsigned long ceiling) {
> pte_t *hugepte =3D hugepd_page(*hpdp);
> int i;
>=20
> unsigned long pdmask =3D ~((1UL << pdshift) - 1);
> unsigned int num_hugepd =3D 1;
>=20
> #ifdef CONFIG_PPC_FSL_BOOK3E
> /* Note: On fsl the hpdp may be the first of several */
> num_hugepd =3D (1 << (hugepd_shift(*hpdp) - pdshift)); #else
> unsigned int shift =3D hugepd_shift(*hpdp); #endif
>=20
> start &=3D pdmask;
> if (start < floor)
> return;
> if (ceiling) {
> ceiling &=3D pdmask;
> if (! ceiling)
> return;
> }
> if (end - 1 > ceiling - 1)
> return;
>=20
> for (i =3D 0; i < num_hugepd; i++, hpdp++)
> hpdp->pd =3D 0;
>=20
> tlb->need_flush =3D 1;
>=20
> #ifdef CONFIG_PPC_FSL_BOOK3E
> hugepd_free(tlb, hugepte);
> #else
> pgtable_free_tlb(tlb, hugepte, pdshift - shift); #endif }
>=20
> Generally smp_processor_id should never be used, exactly to avoid problem=
s like
> above with preempion enabled in .config.
>=20
> Instead it should be replaced with a get_cpu()/put_cpu() pair that is exa=
ctly
> meant to fix bugs like this and define proper critical sections around th=
e per-
> cpu variables.
>=20
> #define get_cpu() ({ preempt_disable(); smp_processor_id(); })
> #define put_cpu() preempt_enable()
>=20
> After you find where that smp_processor_id() is located, you should simpl=
y
> replace it with a get_cpu() and then you should insert a put_cpu immediat=
ely
> after the "cpu" info is not used anymore. That will define a proper and s=
trict
> critical section around the use of the per-cpu variables.
>=20
> With a ppc vmlinux it should be immediate to find the location of
> smp_processor_id but I don't have the ppc vmlinux here.
Thanks Andrea for the details description. It is really helpful
I will look into this.
Thanks
-Bharat
>=20
> Thanks!
> Andrea
>=20
> >
> >
> > Alex
> >
> > >
> > > diff --git a/arch/powerpc/mm/hugetlbpage.c
> > > b/arch/powerpc/mm/hugetlbpage.c index d67db4b..6bf8459 100644
> > > --- a/arch/powerpc/mm/hugetlbpage.c
> > > +++ b/arch/powerpc/mm/hugetlbpage.c
> > > @@ -563,8 +563,10 @@ static void hugetlb_free_pmd_range(struct mmu_ga=
ther
> *tlb, pud_t *pud,
> > > */
> > > next =3D addr + (1 << hugepd_shift(*(hugepd_t *)pmd));
> > > #endif
> > > + preempt_disable();
> > > free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT,
> > > addr, next, floor, ceiling);
> > > + preempt_enable();
> > > } while (addr =3D next, addr !=3D end);
> > >
> > > start &=3D PUD_MASK;
> > >
> > >
> > > Thanks
> > > -Bharat
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm-ppc"
> > > in the body of a message to majordomo@vger.kernel.org More majordomo
> > > info at http://vger.kernel.org/majordomo-info.html
> >
^ permalink raw reply
* Re: [PATCH 1/3] powerpc: mm: make _PAGE_NUMA take effect
From: Aneesh Kumar K.V @ 2013-12-05 10:53 UTC (permalink / raw)
To: Liu Ping Fan, linuxppc-dev; +Cc: Paul Mackerras
In-Reply-To: <1386140348-7854-2-git-send-email-pingfank@linux.vnet.ibm.com>
Liu Ping Fan <kernelfans@gmail.com> writes:
> To enable the do_numa_page(), we should not fix _PAGE_NUMA in
> hash_page(), so bail out for the case of pte_numa().
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
> arch/powerpc/mm/hash_utils_64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index fb176e9..9bf1195 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -1033,7 +1033,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>
> /* Get PTE and page size from page tables */
> ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
> - if (ptep == NULL || !pte_present(*ptep)) {
> + if (ptep == NULL || !pte_present(*ptep) || pte_numa(*ptep)) {
> DBG_LOW(" no PTE !\n");
> rc = 1;
> goto bail;
why ? , All the hash routines do check for _PAGE_PRESENT via access
variable.
-aneesh
^ permalink raw reply
* Re: [PATCH] powerpc: fix xmon disassembler for little-endian
From: Philippe Bergheaud @ 2013-12-05 8:50 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Tom Musta, linuxppc-dev
In-Reply-To: <1386218366.21910.16.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Wed, 2013-12-04 at 14:45 +0100, Philippe Bergheaud wrote:
>
>
>>>>+#ifdef __LITTLE_ENDIAN__
>>>>+#define GETWORD(v) (((v)[3] << 24) + ((v)[2] << 16) + ((v)[1] << 8) + (v)[0])
>>>>+#else
>>>>#define GETWORD(v) (((v)[0] << 24) + ((v)[1] << 16) + ((v)[2] << 8) + (v)[3])
>>>>+#endif
>>>>
>>>>#define isxdigit(c) (('0' <= (c) && (c) <= '9') \
>>>> || ('a' <= (c) && (c) <= 'f') \
>>>>
>>>
>>>
>>>Philippe: Wouldn't it be better to just do a 32-bit load and let the endianness be worked out
>>>by the hardware? i.e.
>>>
>>>#define GETWORD(v) (*(u32 *)v)
>>
>>Yes, your alternative is better.
>>Wouldn't it narrow the scope of the macro to aligned words on POWER7?
>>I think that all references to GETWORD operate on aligned words anyway.
>
>
> Well, xmon has to be robust ... as long as you are *certain* that even
> with crap entry state it won't try to access unaligned boundaries then
> go for it but we aren't looking at performance here.
Thank you Tom and Ben. We are definitely not looking at performance here.
I prefer to stay on the safe side, and leave the original patch untouched.
Philippe
^ permalink raw reply
* [PATCH] powerpc/p1010rdb-pa: modify phy interrupt.
From: Zhao Qiang @ 2013-12-05 6:00 UTC (permalink / raw)
To: bcousson, tony, devicetree; +Cc: Zhao Qiang, linuxppc-dev
It is not correct according to p1010rdb-pa user guide.
So modify it.
Signed-off-by: Zhao Qiang <B45475@freescale.com>
---
arch/powerpc/boot/dts/p1010rdb-pa.dtsi | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/boot/dts/p1010rdb-pa.dtsi b/arch/powerpc/boot/dts/p1010rdb-pa.dtsi
index 3d275e0..0cf6c34 100644
--- a/arch/powerpc/boot/dts/p1010rdb-pa.dtsi
+++ b/arch/powerpc/boot/dts/p1010rdb-pa.dtsi
@@ -39,7 +39,7 @@
};
&phy0 {
- interrupts = <3 1 0 0>;
+ interrupts = <1 1 0 0>;
};
&phy1 {
@@ -47,5 +47,5 @@
};
&phy2 {
- interrupts = <2 1 0 0>;
+ interrupts = <4 1 0 0>;
};
--
1.8.4
^ permalink raw reply related
* Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Benjamin Herrenschmidt @ 2013-12-05 5:20 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linux-mm, Rik van Riel, linuxppc-dev, paulus, Mel Gorman
In-Reply-To: <87a9gfri3u.fsf@linux.vnet.ibm.com>
On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote:
>
> Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
> change_prot_numa from task_numa_work and queue_pages_range(). The later
> may be an issue. So doing the below will help ?
>
> -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
> +#ifdef CONFIG_NUMA_BALANCING
I will defer to Mel and Rik (should we also CC Andrea ?)
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Aneesh Kumar K.V @ 2013-12-05 5:18 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Mel Gorman, Rik van Riel
Cc: linux-mm, paulus, linuxppc-dev
In-Reply-To: <1386126782.16703.137.camel@pasglop>
Adding Mel and Rik to cc:
Benjamin Herrenschmidt <benh@au1.ibm.com> writes:
> On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
>> On archs like ppc64 that don't use _PAGE_PROTNONE and also have
>> a separate page table outside linux pagetable, we just need to
>> make sure that when calling change_prot_numa we flush the
>> hardware page table entry so that next page access result in a numa
>> fault.
>
> That patch doesn't look right...
>
> You are essentially making change_prot_numa() do whatever it does (which
> I don't completely understand) *for all architectures* now, whether they
> have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
> behaviour on powerpc book3s64, you change everybody.
>
> Is that correct ?
Yes.
>
> Also what exactly is that doing, can you explain ? From what I can see,
> it calls back into the core of mprotect to change the protection to
> vma->vm_page_prot, which I would have expected is already the protection
> there, with the added "prot_numa" flag passed down.
it set the _PAGE_NUMA bit. Now we also want to make sure that when
we set _PAGE_NUMA, we would get a pagefault on that so that we can track
that fault as a numa fault. To ensure that, we had the below BUILD_BUG
BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
But other than that the function doesn't really have any dependency on
_PAGE_PROTNONE. The only requirement is when we set _PAGE_NUMA, the
architecture should do enough to ensure that we get a page fault. Now on
ppc64 we does that by clearlying hpte entry and also clearing
_PAGE_PRESENT. Since we have _PAGE_PRESENT cleared hash_page will return
1 and we get to page fault handler.
>
> Your changeset comment says "On archs like ppc64 [...] we just need to
> make sure that when calling change_prot_numa we flush the
> hardware page table entry so that next page access result in a numa
> fault."
>
> But change_prot_numa() does a lot more than that ... it does
> pte_mknuma(), do we need it ? I assume we do or we wouldn't have added
> that PTE bit to begin with...
>
> Now it *might* be allright and it might be that no other architecture
> cares anyway etc... but I need at least some mm folks to ack on that
> patch before I can take it because it *will* change behaviour of other
> architectures.
>
Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
change_prot_numa from task_numa_work and queue_pages_range(). The later
may be an issue. So doing the below will help ?
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING
-aneesh
^ permalink raw reply
* Re: [PATCH 3/8] IBM Akebono: Add support for a new PHY interface to the IBM emac driver
From: Benjamin Herrenschmidt @ 2013-12-05 4:49 UTC (permalink / raw)
To: Alistair Popple; +Cc: linuxppc-dev, David S. Miller, netdev
In-Reply-To: <1385086116-10972-3-git-send-email-alistair@popple.id.au>
On Fri, 2013-11-22 at 13:08 +1100, Alistair Popple wrote:
> The IBM PPC476GTR SoC that is used on the Akebono board uses a
> different ethernet PHY interface that has wake on lan (WOL) support
> with the IBM emac. This patch adds support to the IBM emac driver for
> this new PHY interface.
>
> At this stage the wake on lan functionality has not been implemented.
>
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
^ permalink raw reply
* Re: [PATCH V4 00/10] perf: New conditional branch filter
From: Michael Ellerman @ 2013-12-05 4:47 UTC (permalink / raw)
To: acme
Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, sukadev, mingo,
Anshuman Khandual
In-Reply-To: <1386153162-11225-1-git-send-email-khandual@linux.vnet.ibm.com>
On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> also enables SW based branch filtering support for book3s powerpc platforms which
> have PMU HW backed branch stack sampling support.
>
> Summary of code changes in this patchset:
>
> (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> (2) Add the "cond" branch filter options in the "perf record" tool
> (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform
> (5) Update the documentation regarding "perf record" tool
Hi Arnaldo,
Can you please take just patches 1-5 into the perf tree? And do you mind
putting them in a topic branch so Benh can merge that.
The remaining patches are powerpc specific and still need some more review.
cheers
^ permalink raw reply
* Re: [PATCH] powerpc: fix xmon disassembler for little-endian
From: Benjamin Herrenschmidt @ 2013-12-05 4:39 UTC (permalink / raw)
To: Philippe Bergheaud; +Cc: Tom Musta, linuxppc-dev
In-Reply-To: <529F31F8.6040302@linux.vnet.ibm.com>
On Wed, 2013-12-04 at 14:45 +0100, Philippe Bergheaud wrote:
> >>+#ifdef __LITTLE_ENDIAN__
> >>+#define GETWORD(v) (((v)[3] << 24) + ((v)[2] << 16) + ((v)[1] << 8) + (v)[0])
> >>+#else
> >> #define GETWORD(v) (((v)[0] << 24) + ((v)[1] << 16) + ((v)[2] << 8) + (v)[3])
> >>+#endif
> >>
> >> #define isxdigit(c) (('0' <= (c) && (c) <= '9') \
> >> || ('a' <= (c) && (c) <= 'f') \
> >>
> >
> >
> > Philippe: Wouldn't it be better to just do a 32-bit load and let the endianness be worked out
> > by the hardware? i.e.
> >
> > #define GETWORD(v) (*(u32 *)v)
> Yes, your alternative is better.
> Wouldn't it narrow the scope of the macro to aligned words on POWER7?
> I think that all references to GETWORD operate on aligned words anyway.
Well, xmon has to be robust ... as long as you are *certain* that even
with crap entry state it won't try to access unaligned boundaries then
go for it but we aren't looking at performance here.
Cheers,
Ben.
^ permalink raw reply
* [PATCH v2] arch/powerpc: Dynamically allocate slb_shadow from memblock
From: Jeremy Kerr @ 2013-12-05 3:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Michael Neuling
In-Reply-To: <1386214268.68246.485420017657.2.gpush@pablo>
Currently, the slb_shadow buffer is our largest symbol:
[jk@pablo linux]$ nm --size-sort -r -S obj/vmlinux | head -1
c000000000da0000 0000000000040000 d slb_shadow
- we allocate 128 bytes per cpu; so 256k with NR_CPUS=2048. As we have
constant initialisers, it's allocated in .text, causing a larger vmlinux
image. We may also allocate unecessary slb_shadow buffers (> no. pacas),
since we use the build-time NR_CPUS rather than the run-time nr_cpu_ids.
We could move this to the bss, but then we still have the NR_CPUS vs
nr_cpu_ids potential for overallocation.
This change dynamically allocates the slb_shadow array, during
initialise_pacas(). At a cost of 104 bytes of text, we save 256k of
data:
[jk@pablo linux]$ size obj/vmlinux{.orig,}
text data bss dec hex filename
9202795 5244676 1169576 15617047 ee4c17 obj/vmlinux.orig
9202899 4982532 1169576 15355007 ea4c7f obj/vmlinux
Tested on pseries.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
v2: don't drop the endian conversion, use SLB_NUM_BOLTED
---
arch/powerpc/kernel/paca.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 9095a6f7..623c356f 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -99,12 +99,28 @@ static inline void free_lppacas(void) { }
* 3 persistent SLBs are registered here. The buffer will be zero
* initially, hence will all be invaild until we actually write them.
*/
-static struct slb_shadow slb_shadow[] __cacheline_aligned = {
- [0 ... (NR_CPUS-1)] = {
- .persistent = cpu_to_be32(SLB_NUM_BOLTED),
- .buffer_length = cpu_to_be32(sizeof(struct slb_shadow)),
- },
-};
+static struct slb_shadow *slb_shadow;
+
+static void __init allocate_slb_shadows(int nr_cpus, int limit)
+{
+ int size = PAGE_ALIGN(sizeof(struct slb_shadow) * nr_cpus);
+ slb_shadow = __va(memblock_alloc_base(size, PAGE_SIZE, limit));
+ memset(slb_shadow, 0, size);
+}
+
+static struct slb_shadow * __init init_slb_shadow(int cpu)
+{
+ struct slb_shadow *s = &slb_shadow[cpu];
+
+ s->persistent = cpu_to_be32(SLB_NUM_BOLTED);
+ s->buffer_length = cpu_to_be32(sizeof(*s));
+
+ return s;
+}
+
+#else /* CONFIG_PPC_STD_MMU_64 */
+
+static void __init allocate_slb_shadows(int nr_cpus, int limit) { }
#endif /* CONFIG_PPC_STD_MMU_64 */
@@ -142,7 +158,7 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
new_paca->__current = &init_task;
new_paca->data_offset = 0xfeeeeeeeeeeeeeeeULL;
#ifdef CONFIG_PPC_STD_MMU_64
- new_paca->slb_shadow_ptr = &slb_shadow[cpu];
+ new_paca->slb_shadow_ptr = init_slb_shadow(cpu);
#endif /* CONFIG_PPC_STD_MMU_64 */
}
@@ -190,6 +206,8 @@ void __init allocate_pacas(void)
allocate_lppacas(nr_cpu_ids, limit);
+ allocate_slb_shadows(nr_cpu_ids, limit);
+
/* Can't use for_each_*_cpu, as they aren't functional yet */
for (cpu = 0; cpu < nr_cpu_ids; cpu++)
initialise_paca(&paca[cpu], cpu);
^ permalink raw reply related
* [PATCH 1/2] arch/powerpc: Make slb_shadow a local
From: Jeremy Kerr @ 2013-12-05 3:31 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Michael Neuling
The only external user of slb_shadow is the pseries lpar code, and it
can access through the paca array instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
arch/powerpc/include/asm/lppaca.h | 2 --
arch/powerpc/kernel/paca.c | 2 +-
arch/powerpc/platforms/pseries/lpar.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 844c28de..d0a2a2f9 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -132,8 +132,6 @@ struct slb_shadow {
} save_area[SLB_NUM_BOLTED];
} ____cacheline_aligned;
-extern struct slb_shadow slb_shadow[];
-
/*
* Layout of entries in the hypervisor's dispatch trace log buffer.
*/
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 0620eaaa..9095a6f7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -99,7 +99,7 @@ static inline void free_lppacas(void) { }
* 3 persistent SLBs are registered here. The buffer will be zero
* initially, hence will all be invaild until we actually write them.
*/
-struct slb_shadow slb_shadow[] __cacheline_aligned = {
+static struct slb_shadow slb_shadow[] __cacheline_aligned = {
[0 ... (NR_CPUS-1)] = {
.persistent = cpu_to_be32(SLB_NUM_BOLTED),
.buffer_length = cpu_to_be32(sizeof(struct slb_shadow)),
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 4fca3def..28cf0f33 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -92,7 +92,7 @@ void vpa_init(int cpu)
* PAPR says this feature is SLB-Buffer but firmware never
* reports that. All SPLPAR support SLB shadow buffer.
*/
- addr = __pa(&slb_shadow[cpu]);
+ addr = __pa(paca[cpu].slb_shadow_ptr);
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
ret = register_slb_shadow(hwcpu, addr);
if (ret)
^ permalink raw reply related
* [PATCH 2/2] arch/powerpc: Dynamically allocate slb_shadow from memblock
From: Jeremy Kerr @ 2013-12-05 3:31 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Michael Neuling
In-Reply-To: <1386214268.68092.227432126043.1.gpush@pablo>
Currently, the slb_shadow buffer is our largest symbol:
[jk@pablo linux]$ nm --size-sort -r -S obj/vmlinux | head -1
c000000000da0000 0000000000040000 d slb_shadow
- we allocate 128 bytes per cpu; so 256k with NR_CPUS=2048. As we have
constant initialisers, it's allocated in .text, causing a larger vmlinux
image. We may also allocate unecessary slb_shadow buffers (> no. pacas),
since we use the build-time NR_CPUS rather than the run-time nr_cpu_ids.
We could move this to the bss, but then we still have the NR_CPUS vs
nr_cpu_ids potential for overallocation.
This change dynamically allocates the slb_shadow array, during
initialise_pacas(). At a cost of 104 bytes of text, we save 256k of
data:
[jk@pablo linux]$ size obj/vmlinux{.orig,}
text data bss dec hex filename
9202795 5244676 1169576 15617047 ee4c17 obj/vmlinux.orig
9202899 4982532 1169576 15355007 ea4c7f obj/vmlinux
Tested on pseries.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
arch/powerpc/kernel/paca.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 9095a6f7..ad7a485f 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -99,12 +99,28 @@ static inline void free_lppacas(void) { }
* 3 persistent SLBs are registered here. The buffer will be zero
* initially, hence will all be invaild until we actually write them.
*/
-static struct slb_shadow slb_shadow[] __cacheline_aligned = {
- [0 ... (NR_CPUS-1)] = {
- .persistent = cpu_to_be32(SLB_NUM_BOLTED),
- .buffer_length = cpu_to_be32(sizeof(struct slb_shadow)),
- },
-};
+static struct slb_shadow *slb_shadow;
+
+static void __init allocate_slb_shadows(int nr_cpus, int limit)
+{
+ int size = PAGE_ALIGN(sizeof(struct slb_shadow) * nr_cpus);
+ slb_shadow = __va(memblock_alloc_base(size, PAGE_SIZE, limit));
+ memset(slb_shadow, 0, size);
+}
+
+static struct slb_shadow * __init init_slb_shadow(int cpu)
+{
+ struct slb_shadow *s = &slb_shadow[cpu];
+
+ s->persistent = ARRAY_SIZE(s->save_area);
+ s->buffer_length = sizeof(*s);
+
+ return s;
+}
+
+#else /* CONFIG_PPC_STD_MMU_64 */
+
+static void __init allocate_slb_shadows(int nr_cpus, int limit) { }
#endif /* CONFIG_PPC_STD_MMU_64 */
@@ -142,7 +158,7 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
new_paca->__current = &init_task;
new_paca->data_offset = 0xfeeeeeeeeeeeeeeeULL;
#ifdef CONFIG_PPC_STD_MMU_64
- new_paca->slb_shadow_ptr = &slb_shadow[cpu];
+ new_paca->slb_shadow_ptr = init_slb_shadow(cpu);
#endif /* CONFIG_PPC_STD_MMU_64 */
}
@@ -190,6 +206,8 @@ void __init allocate_pacas(void)
allocate_lppacas(nr_cpu_ids, limit);
+ allocate_slb_shadows(nr_cpu_ids, limit);
+
/* Can't use for_each_*_cpu, as they aren't functional yet */
for (cpu = 0; cpu < nr_cpu_ids; cpu++)
initialise_paca(&paca[cpu], cpu);
^ permalink raw reply related
* Re: [PATCH v6 09/17] serial: mpc512x: adjust for OF based clock lookup
From: Greg Kroah-Hartman @ 2013-12-05 0:55 UTC (permalink / raw)
To: Gerhard Sittig
Cc: Mike Turquette, Detlev Zundel, Jiri Slaby, linux-serial,
Scott Wood, Anatolij Gustschin, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1385851897-23475-10-git-send-email-gsi@denx.de>
On Sat, Nov 30, 2013 at 11:51:29PM +0100, Gerhard Sittig wrote:
> after device tree based clock lookup became available, the peripheral
> driver need no longer construct clock names which include the PSC index,
> remove the "psc%d_mclk" template and unconditionally use 'mclk'
>
> acquire and release the "ipg" clock item for register access as well
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: linux-serial@vger.kernel.org
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> # for v4
> Signed-off-by: Gerhard Sittig <gsi@denx.de>
> ---
> Greg, the difference since v4 of this patch is that v4 took the 'mclk'
> and 'ipg' clock items in reverse order, and thus potentially obfuscated
> the adjusted name for 'mclk' -- the updated version of the patch is
> identical in content but cleaner diff-wise
That's fine:
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
* Re: [PATCH v6 11/17] USB: fsl-mph-dr-of: adjust for OF based clock lookup
From: Greg Kroah-Hartman @ 2013-12-05 0:55 UTC (permalink / raw)
To: Gerhard Sittig
Cc: Mike Turquette, Detlev Zundel, linux-usb, Scott Wood,
Anatolij Gustschin, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1385851897-23475-12-git-send-email-gsi@denx.de>
On Sat, Nov 30, 2013 at 11:51:31PM +0100, Gerhard Sittig wrote:
> after device tree based clock lookup became available, the peripheral
> driver need no longer construct clock names which include the component
> index -- remove the "usb%d_clk" template, always use "ipg" instead
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: linux-usb@vger.kernel.org
> Signed-off-by: Gerhard Sittig <gsi@denx.de>
> ---
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
* Re: [PATCH v6 10/17] serial: mpc512x: setup the PSC FIFO clock as well
From: Greg Kroah-Hartman @ 2013-12-05 0:55 UTC (permalink / raw)
To: Gerhard Sittig
Cc: Mike Turquette, Detlev Zundel, Jiri Slaby, linux-serial,
Scott Wood, Anatolij Gustschin, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1385851897-23475-11-git-send-email-gsi@denx.de>
On Sat, Nov 30, 2013 at 11:51:30PM +0100, Gerhard Sittig wrote:
> prepare and enable the FIFO clock upon PSC FIFO initialization,
> check for and propagage errors when enabling the PSC FIFO clock,
> disable and unprepare the FIFO clock upon PSC FIFO uninitialization
>
> devm_{get,put}_clk() doesn't apply here, as the SoC provides a
> single FIFO component which is shared among several PSC components,
> thus the FIFO isn't associated with a device (while the PSCs are)
>
> provide a fallback clock lookup approach in case the OF based clock
> lookup for the PSC FIFO fails, this allows for successful operation in
> the presence of an outdated device tree which lacks clock specs
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: linux-serial@vger.kernel.org
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> # for v4
> Signed-off-by: Gerhard Sittig <gsi@denx.de>
> ---
> Greg, the addition since v4 is the clk_get_sys() call for the 'ipg'
> clock item (backwards compat for device trees w/o clock specs)
That's fine, my ACK still stands.
thanks,
greg k-h
^ permalink raw reply
* [PATCH v3 1/2] offb: little endian fixes
From: Cédric Le Goater @ 2013-12-04 16:49 UTC (permalink / raw)
To: benh; +Cc: Cédric Le Goater, linuxppc-dev
In-Reply-To: <1385240643.4882.133.camel@pasglop>
The "screen" properties : depth, width, height, linebytes need
to be converted to the host endian order when read from the device
tree.
The offb_init_palette_hacks() routine also made assumption on the
host endian order.
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
---
Changes in v3:
- fixed offb_init_palette_hacks()
Changes in v2:
- replaced be32_to_cpu() by be32_to_cpup()
drivers/video/offb.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/video/offb.c b/drivers/video/offb.c
index 9dbea2223401..43a0a52fc527 100644
--- a/drivers/video/offb.c
+++ b/drivers/video/offb.c
@@ -301,7 +301,7 @@ static struct fb_ops offb_ops = {
static void __iomem *offb_map_reg(struct device_node *np, int index,
unsigned long offset, unsigned long size)
{
- const u32 *addrp;
+ const __be32 *addrp;
u64 asize, taddr;
unsigned int flags;
@@ -369,7 +369,11 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp
}
of_node_put(pciparent);
} else if (dp && of_device_is_compatible(dp, "qemu,std-vga")) {
- const u32 io_of_addr[3] = { 0x01000000, 0x0, 0x0 };
+#ifdef __BIG_ENDIAN
+ const __be32 io_of_addr[3] = { 0x01000000, 0x0, 0x0 };
+#else
+ const __be32 io_of_addr[3] = { 0x00000001, 0x0, 0x0 };
+#endif
u64 io_addr = of_translate_address(dp, io_of_addr);
if (io_addr != OF_BAD_ADDR) {
par->cmap_adr = ioremap(io_addr + 0x3c8, 2);
@@ -535,7 +539,7 @@ static void __init offb_init_nodriver(struct device_node *dp, int no_real_node)
unsigned int flags, rsize, addr_prop = 0;
unsigned long max_size = 0;
u64 rstart, address = OF_BAD_ADDR;
- const u32 *pp, *addrp, *up;
+ const __be32 *pp, *addrp, *up;
u64 asize;
int foreign_endian = 0;
@@ -551,25 +555,25 @@ static void __init offb_init_nodriver(struct device_node *dp, int no_real_node)
if (pp == NULL)
pp = of_get_property(dp, "depth", &len);
if (pp && len == sizeof(u32))
- depth = *pp;
+ depth = be32_to_cpup(pp);
pp = of_get_property(dp, "linux,bootx-width", &len);
if (pp == NULL)
pp = of_get_property(dp, "width", &len);
if (pp && len == sizeof(u32))
- width = *pp;
+ width = be32_to_cpup(pp);
pp = of_get_property(dp, "linux,bootx-height", &len);
if (pp == NULL)
pp = of_get_property(dp, "height", &len);
if (pp && len == sizeof(u32))
- height = *pp;
+ height = be32_to_cpup(pp);
pp = of_get_property(dp, "linux,bootx-linebytes", &len);
if (pp == NULL)
pp = of_get_property(dp, "linebytes", &len);
if (pp && len == sizeof(u32) && (*pp != 0xffffffffu))
- pitch = *pp;
+ pitch = be32_to_cpup(pp);
else
pitch = width * ((depth + 7) / 8);
--
1.7.10.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox