LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Anton Blanchard @ 2014-01-07  2:21 UTC (permalink / raw)
  To: benh, paulus, cl, penberg, mpm, nacc; +Cc: linux-mm, linuxppc-dev

We noticed a huge amount of slab memory consumed on a large ppc64 box:

Slab:            2094336 kB

Almost 2GB. This box is not balanced and some nodes do not have local
memory, causing slub to be very inefficient in its slab usage.

Each time we call kmem_cache_alloc_node slub checks the per cpu slab,
sees it isn't node local, deactivates it and tries to allocate a new
slab. On empty nodes we will allocate a new remote slab and use the
first slot, but as explained above when we get called a second time
we will just deactivate that slab and retry.

As such we end up only using 1 entry in each slab:

slab                    mem  objects
                       used   active
------------------------------------
kmalloc-16384       1404 MB    4.90%
task_struct          668 MB    2.90%
kmalloc-128          193 MB    3.61%
kmalloc-192          152 MB    5.23%
kmalloc-8192          72 MB   23.40%
kmalloc-16            64 MB    7.43%
kmalloc-512           33 MB   22.41%

The patch below checks that a node is not empty before deactivating a
slab and trying to allocate it again. With this patch applied we now
use about 352MB:

Slab:             360192 kB

And our efficiency is much better:

slab                    mem  objects
                       used   active
------------------------------------
kmalloc-16384         92 MB   74.27%
task_struct           23 MB   83.46%
idr_layer_cache       18 MB  100.00%
pgtable-2^12          17 MB  100.00%
kmalloc-65536         15 MB  100.00%
inode_cache           14 MB  100.00%
kmalloc-256           14 MB   97.81%
kmalloc-8192          14 MB   85.71%

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Thoughts? It seems like we could hit a similar situation if a machine
is balanced but we run out of memory on a single node.

Index: b/mm/slub.c
===================================================================
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2278,10 +2278,17 @@ redo:

 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
+
+		/*
+		 * If the node contains no memory there is no point in trying
+		 * to allocate a new node local slab
+		 */
+		if (node_spanned_pages(node)) {
+			deactivate_slab(s, page, c->freelist);
+			c->page = NULL;
+			c->freelist = NULL;
+			goto new_slab;
+		}
 	}

 	/*

^ permalink raw reply

* Re: [PATCH -V3 1/2] powerpc: mm: Move ppc64 page table range definitions to separate header
From: Aneesh Kumar K.V @ 2014-01-07  2:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: aarcange, linuxppc-dev, paulus, kirill.shutemov, linux-mm
In-Reply-To: <1389050101.12906.13.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2014-01-06 at 14:33 +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This avoid mmu-hash64.h including pagetable-ppc64.h. That inclusion
>> cause issues like
>
> I don't like this. We have that stuff split into too many includes
> already it's a mess.

I understand. Let me know, if you have any suggestion on cleaning that
up. I can do that.

>
> Why do we need to include it from mmu*.h ?

in mmu-hash64.h added by me via 78f1dbde9fd020419313c2a0c3b602ea2427118f

/*
 * This is necessary to get the definition of PGTABLE_RANGE which we
 * need for various slices related matters. Note that this isn't the
 * complete pgtable.h but only a portion of it.
 */
#include <asm/pgtable-ppc64.h>

-aneesh

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: wyang @ 2014-01-07  1:00 UTC (permalink / raw)
  To: Scott Wood; +Cc: Linuxppc-dev, Gavin Hu
In-Reply-To: <1389045939.11795.104.camel@snotra.buserror.net>

On 01/07/2014 06:05 AM, Scott Wood wrote:
> On Mon, 2014-01-06 at 13:27 +0800, wyang wrote:
>> On 01/06/2014 11:41 AM, Gavin Hu wrote:
>>
>>> Thanks your response.  :)
>>> But that means that these optimitive operations like atomic_add()
>>> aren't optimitive actully in PPC architecture, right? Becuase they
>>> can be interrupted by loacl HW interrupts. Theoretically, the ISR
>>> also can access the atomic gloable variable.
>>>
>> Nope, my understand is that if you wanna sync kernel primitive code
>> with ISR, you have responsibility to disable local interrupts.
>> atomic_add does not guarantee to handle such case.
> atomic_add() and other atomics do handle that case.  Interrupts are not
> disabled, but there's a stwcx. in the interrupt return code to make sure
> the reservation gets cleared.

Yeah, Can you provide more detail info about why they can handle that 
case? The following is my understand:

Let us assume that there is a atomic global variable(var_a) and its 
initial value is 0.

The kernel attempts to execute atomic_add(1, var_a), after lwarx a async 
interrupt happens, and the ISR also accesses "var_a" variable and 
executes atomic_add.

static __inline__ void atomic_add(int a, atomic_t *v)
{
     int t;

     __asm__ __volatile__(
"1:    lwarx    %0,0,%3        # atomic_add\n\
----------------------------------  <----------- interrupt 
happens------->        ISR also operates this global variable "var_a" 
such as also executing atomic_add(1, var_a). so the
               var_a would is 1.
     add    %0,%2,%0\n"
     PPC405_ERR77(0,%3)
"    stwcx.    %0,0,%3 \n\ <----- After interrupt code returns, the 
reservation is cleared. so CR0 is not equal to 0, and then jump the 1 
label. the var_a will be 2.
     bne-    1b"
     : "=&r" (t), "+m" (v->counter)
     : "r" (a), "r" (&v->counter)
     : "cc");
}

So the value of var_a is 2 rather than 1. Thats why i said that 
atomic_add does not handle such case. If I miss something, please 
correct me.:-)

Wei
>
> -Scott
>
>
>

^ permalink raw reply

* Re: [PATCH -V3 1/2] powerpc: mm: Move ppc64 page table range definitions to separate header
From: Benjamin Herrenschmidt @ 2014-01-06 23:15 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: aarcange, linuxppc-dev, paulus, kirill.shutemov, linux-mm
In-Reply-To: <1388999012-14424-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Mon, 2014-01-06 at 14:33 +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This avoid mmu-hash64.h including pagetable-ppc64.h. That inclusion
> cause issues like

I don't like this. We have that stuff split into too many includes
already it's a mess.

Why do we need to include it from mmu*.h ?

Cheers,
Ben.

>   CC      arch/powerpc/kernel/asm-offsets.s
> In file included from /home/aneesh/linus/arch/powerpc/include/asm/mmu-hash64.h:23:0,
>                  from /home/aneesh/linus/arch/powerpc/include/asm/mmu.h:196,
>                  from /home/aneesh/linus/arch/powerpc/include/asm/lppaca.h:36,
>                  from /home/aneesh/linus/arch/powerpc/include/asm/paca.h:21,
>                  from /home/aneesh/linus/arch/powerpc/include/asm/hw_irq.h:41,
>                  from /home/aneesh/linus/arch/powerpc/include/asm/irqflags.h:11,
>                  from include/linux/irqflags.h:15,
>                  from include/linux/spinlock.h:53,
>                  from include/linux/seqlock.h:35,
>                  from include/linux/time.h:5,
>                  from include/uapi/linux/timex.h:56,
>                  from include/linux/timex.h:56,
>                  from include/linux/sched.h:17,
>                  from arch/powerpc/kernel/asm-offsets.c:17:
> /home/aneesh/linus/arch/powerpc/include/asm/pgtable-ppc64.h:563:42: error: unknown type name ‘spinlock_t’
>  static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> 
> NOTE: We can either do this or stuck a typdef struct spinlock spinlock_t; in pgtable-ppc64.h 
> 
>  arch/powerpc/include/asm/mmu-hash64.h          |   2 +-
>  arch/powerpc/include/asm/pgtable-ppc64-range.h | 101 +++++++++++++++++++++++++
>  arch/powerpc/include/asm/pgtable-ppc64.h       | 101 +------------------------
>  3 files changed, 103 insertions(+), 101 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/pgtable-ppc64-range.h
> 
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index 807014dde821..895b4df31fec 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -20,7 +20,7 @@
>   * need for various slices related matters. Note that this isn't the
>   * complete pgtable.h but only a portion of it.
>   */
> -#include <asm/pgtable-ppc64.h>
> +#include <asm/pgtable-ppc64-range.h>
>  #include <asm/bug.h>
>  
>  /*
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64-range.h b/arch/powerpc/include/asm/pgtable-ppc64-range.h
> new file mode 100644
> index 000000000000..b48b089fb209
> --- /dev/null
> +++ b/arch/powerpc/include/asm/pgtable-ppc64-range.h
> @@ -0,0 +1,101 @@
> +#ifndef _ASM_POWERPC_PGTABLE_PPC64_RANGE_H_
> +#define _ASM_POWERPC_PGTABLE_PPC64_RANGE_H_
> +/*
> + * This file contains the functions and defines necessary to modify and use
> + * the ppc64 hashed page table.
> + */
> +
> +#ifdef CONFIG_PPC_64K_PAGES
> +#include <asm/pgtable-ppc64-64k.h>
> +#else
> +#include <asm/pgtable-ppc64-4k.h>
> +#endif
> +#include <asm/barrier.h>
> +
> +#define FIRST_USER_ADDRESS	0
> +
> +/*
> + * Size of EA range mapped by our pagetables.
> + */
> +#define PGTABLE_EADDR_SIZE (PTE_INDEX_SIZE + PMD_INDEX_SIZE + \
> +			    PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
> +#define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define PMD_CACHE_INDEX	(PMD_INDEX_SIZE + 1)
> +#else
> +#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
> +#endif
> +/*
> + * Define the address range of the kernel non-linear virtual area
> + */
> +
> +#ifdef CONFIG_PPC_BOOK3E
> +#define KERN_VIRT_START ASM_CONST(0x8000000000000000)
> +#else
> +#define KERN_VIRT_START ASM_CONST(0xD000000000000000)
> +#endif
> +#define KERN_VIRT_SIZE	ASM_CONST(0x0000100000000000)
> +
> +/*
> + * The vmalloc space starts at the beginning of that region, and
> + * occupies half of it on hash CPUs and a quarter of it on Book3E
> + * (we keep a quarter for the virtual memmap)
> + */
> +#define VMALLOC_START	KERN_VIRT_START
> +#ifdef CONFIG_PPC_BOOK3E
> +#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 2)
> +#else
> +#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 1)
> +#endif
> +#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE)
> +
> +/*
> + * The second half of the kernel virtual space is used for IO mappings,
> + * it's itself carved into the PIO region (ISA and PHB IO space) and
> + * the ioremap space
> + *
> + *  ISA_IO_BASE = KERN_IO_START, 64K reserved area
> + *  PHB_IO_BASE = ISA_IO_BASE + 64K to ISA_IO_BASE + 2G, PHB IO spaces
> + * IOREMAP_BASE = ISA_IO_BASE + 2G to VMALLOC_START + PGTABLE_RANGE
> + */
> +#define KERN_IO_START	(KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))
> +#define FULL_IO_SIZE	0x80000000ul
> +#define  ISA_IO_BASE	(KERN_IO_START)
> +#define  ISA_IO_END	(KERN_IO_START + 0x10000ul)
> +#define  PHB_IO_BASE	(ISA_IO_END)
> +#define  PHB_IO_END	(KERN_IO_START + FULL_IO_SIZE)
> +#define IOREMAP_BASE	(PHB_IO_END)
> +#define IOREMAP_END	(KERN_VIRT_START + KERN_VIRT_SIZE)
> +
> +
> +/*
> + * Region IDs
> + */
> +#define REGION_SHIFT		60UL
> +#define REGION_MASK		(0xfUL << REGION_SHIFT)
> +#define REGION_ID(ea)		(((unsigned long)(ea)) >> REGION_SHIFT)
> +
> +#define VMALLOC_REGION_ID	(REGION_ID(VMALLOC_START))
> +#define KERNEL_REGION_ID	(REGION_ID(PAGE_OFFSET))
> +#define VMEMMAP_REGION_ID	(0xfUL)	/* Server only */
> +#define USER_REGION_ID		(0UL)
> +
> +/*
> + * Defines the address of the vmemap area, in its own region on
> + * hash table CPUs and after the vmalloc space on Book3E
> + */
> +#ifdef CONFIG_PPC_BOOK3E
> +#define VMEMMAP_BASE		VMALLOC_END
> +#define VMEMMAP_END		KERN_IO_START
> +#else
> +#define VMEMMAP_BASE		(VMEMMAP_REGION_ID << REGION_SHIFT)
> +#endif
> +#define vmemmap			((struct page *)VMEMMAP_BASE)
> +
> +#ifdef CONFIG_PPC_MM_SLICES
> +#define HAVE_ARCH_UNMAPPED_AREA
> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
> +#endif /* CONFIG_PPC_MM_SLICES */
> +
> +#endif
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index 4a191c472867..9935e9b79524 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -1,102 +1,8 @@
>  #ifndef _ASM_POWERPC_PGTABLE_PPC64_H_
>  #define _ASM_POWERPC_PGTABLE_PPC64_H_
> -/*
> - * This file contains the functions and defines necessary to modify and use
> - * the ppc64 hashed page table.
> - */
> -
> -#ifdef CONFIG_PPC_64K_PAGES
> -#include <asm/pgtable-ppc64-64k.h>
> -#else
> -#include <asm/pgtable-ppc64-4k.h>
> -#endif
> -#include <asm/barrier.h>
> -
> -#define FIRST_USER_ADDRESS	0
> -
> -/*
> - * Size of EA range mapped by our pagetables.
> - */
> -#define PGTABLE_EADDR_SIZE (PTE_INDEX_SIZE + PMD_INDEX_SIZE + \
> -                	    PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
> -#define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
> -
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -#define PMD_CACHE_INDEX	(PMD_INDEX_SIZE + 1)
> -#else
> -#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
> -#endif
> -/*
> - * Define the address range of the kernel non-linear virtual area
> - */
> -
> -#ifdef CONFIG_PPC_BOOK3E
> -#define KERN_VIRT_START ASM_CONST(0x8000000000000000)
> -#else
> -#define KERN_VIRT_START ASM_CONST(0xD000000000000000)
> -#endif
> -#define KERN_VIRT_SIZE	ASM_CONST(0x0000100000000000)
> -
> -/*
> - * The vmalloc space starts at the beginning of that region, and
> - * occupies half of it on hash CPUs and a quarter of it on Book3E
> - * (we keep a quarter for the virtual memmap)
> - */
> -#define VMALLOC_START	KERN_VIRT_START
> -#ifdef CONFIG_PPC_BOOK3E
> -#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 2)
> -#else
> -#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 1)
> -#endif
> -#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE)
> -
> -/*
> - * The second half of the kernel virtual space is used for IO mappings,
> - * it's itself carved into the PIO region (ISA and PHB IO space) and
> - * the ioremap space
> - *
> - *  ISA_IO_BASE = KERN_IO_START, 64K reserved area
> - *  PHB_IO_BASE = ISA_IO_BASE + 64K to ISA_IO_BASE + 2G, PHB IO spaces
> - * IOREMAP_BASE = ISA_IO_BASE + 2G to VMALLOC_START + PGTABLE_RANGE
> - */
> -#define KERN_IO_START	(KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))
> -#define FULL_IO_SIZE	0x80000000ul
> -#define  ISA_IO_BASE	(KERN_IO_START)
> -#define  ISA_IO_END	(KERN_IO_START + 0x10000ul)
> -#define  PHB_IO_BASE	(ISA_IO_END)
> -#define  PHB_IO_END	(KERN_IO_START + FULL_IO_SIZE)
> -#define IOREMAP_BASE	(PHB_IO_END)
> -#define IOREMAP_END	(KERN_VIRT_START + KERN_VIRT_SIZE)
> -
> -
> -/*
> - * Region IDs
> - */
> -#define REGION_SHIFT		60UL
> -#define REGION_MASK		(0xfUL << REGION_SHIFT)
> -#define REGION_ID(ea)		(((unsigned long)(ea)) >> REGION_SHIFT)
> -
> -#define VMALLOC_REGION_ID	(REGION_ID(VMALLOC_START))
> -#define KERNEL_REGION_ID	(REGION_ID(PAGE_OFFSET))
> -#define VMEMMAP_REGION_ID	(0xfUL)	/* Server only */
> -#define USER_REGION_ID		(0UL)
> -
> -/*
> - * Defines the address of the vmemap area, in its own region on
> - * hash table CPUs and after the vmalloc space on Book3E
> - */
> -#ifdef CONFIG_PPC_BOOK3E
> -#define VMEMMAP_BASE		VMALLOC_END
> -#define VMEMMAP_END		KERN_IO_START
> -#else
> -#define VMEMMAP_BASE		(VMEMMAP_REGION_ID << REGION_SHIFT)
> -#endif
> -#define vmemmap			((struct page *)VMEMMAP_BASE)
>  
> +#include <asm/pgtable-ppc64-range.h>
>  
> -/*
> - * Include the PTE bits definitions
> - */
>  #ifdef CONFIG_PPC_BOOK3S
>  #include <asm/pte-hash64.h>
>  #else
> @@ -104,11 +10,6 @@
>  #endif
>  #include <asm/pte-common.h>
>  
> -#ifdef CONFIG_PPC_MM_SLICES
> -#define HAVE_ARCH_UNMAPPED_AREA
> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
> -#endif /* CONFIG_PPC_MM_SLICES */
> -
>  #ifndef __ASSEMBLY__
>  
>  /*

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree
From: Benjamin Herrenschmidt @ 2014-01-06 23:12 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Mahesh Salgaonkar, linux-next, linuxppc-dev, linux-kernel
In-Reply-To: <20140106202856.5630590efc4bd6a466b7a668@canb.auug.org.au>

On Mon, 2014-01-06 at 20:28 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
> arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards
> 
> The last time I got this error, I needed to apply patch "powerpc: Fix
> "attempt to move .org backwards" error", but that has been included in
> the powerpc tree now, so I guess something else has added code in a
> critical place. :-(
> 
> I have just left this broken for today.

I had to modify that patch when applying it, it's possible that the
"new" version isn't making as much room. Without that change it would
fail the build on some of my configs due to some of the asm for the
maskable exception handling being too far from the conditional branches
that calls it.

I think it's time we do a bit of re-org of that file to figure out
precisely what has to be where and move things out more aggressively.

Cheers,
Ben.

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: Scott Wood @ 2014-01-06 22:05 UTC (permalink / raw)
  To: wyang; +Cc: Linuxppc-dev, Gavin Hu
In-Reply-To: <52CA3ED7.2020407@gmail.com>

On Mon, 2014-01-06 at 13:27 +0800, wyang wrote:
> 
> On 01/06/2014 11:41 AM, Gavin Hu wrote:
> 
> > Thanks your response.  :) 
> > But that means that these optimitive operations like atomic_add()
> > aren't optimitive actully in PPC architecture, right? Becuase they
> > can be interrupted by loacl HW interrupts. Theoretically, the ISR
> > also can access the atomic gloable variable.
> > 
> 
> Nope, my understand is that if you wanna sync kernel primitive code
> with ISR, you have responsibility to disable local interrupts.
> atomic_add does not guarantee to handle such case.

atomic_add() and other atomics do handle that case.  Interrupts are not
disabled, but there's a stwcx. in the interrupt return code to make sure
the reservation gets cleared.

-Scott

^ permalink raw reply

* Re: [PATCH 1/2] powerpc: Fix the setup of CPU-to-Node mappings during CPU online
From: Srivatsa S. Bhat @ 2014-01-06 16:04 UTC (permalink / raw)
  To: benh, paulus, nfont; +Cc: maddy, linuxppc-dev, linux-kernel
In-Reply-To: <20131230113517.11508.7224.stgit@srivatsabhat.in.ibm.com>

On 12/30/2013 05:05 PM, Srivatsa S. Bhat wrote:
> On POWER platforms, the hypervisor can notify the guest kernel about dynamic
> changes in the cpu-numa associativity (VPHN topology update). Hence the
> cpu-to-node mappings that we got from the firmware during boot, may no longer
> be valid after such updates. This is handled using the arch_update_cpu_topology()
> hook in the scheduler, and the sched-domains are rebuilt according to the new
> mappings.
> 
> But unfortunately, at the moment, CPU hotplug ignores these updated mappings
> and instead queries the firmware for the cpu-to-numa relationships and uses
> them during CPU online. So the kernel can end up assigning wrong NUMA nodes
> to CPUs during subsequent CPU hotplug online operations (after booting).
> 
> Further, a particularly problematic scenario can result from this bug:
> On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
> threads per core. The switch to Single-Threaded (ST) mode is performed by
> offlining all except the first CPU thread in each core. Switching back to
> SMT mode involves onlining those other threads back, in each core.
> 
> Now consider this scenario:
> 
> 1. During boot, the kernel gets the cpu-to-node mappings from the firmware
>    and assigns the CPUs to NUMA nodes appropriately, during CPU online.
> 
> 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
>    communicates this update to the kernel. The kernel in turn updates its
>    cpu-to-node associations and rebuilds its sched domains. Everything is
>    fine so far.
> 
> 3. Now, the user switches the machine from SMT to ST mode (say, by running
>    ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
>    core.
> 
> 4. The user then tries to switch back from ST to SMT mode (say, by running
>    ppc64_cpu --smt=4), and this involves onlining those threads back. Since
>    CPU hotplug ignores the new mappings, it queries the firmware and tries to
>    associate the newly onlined sibling threads to the old NUMA nodes. This
>    results in sibling threads within the same core getting associated with
>    different NUMA nodes, which is incorrect.
> 
>    The scheduler's build-sched-domains code gets thoroughly confused with this
>    and enters an infinite loop and causes soft-lockups, as explained in detail
>    in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings).
> 
> 
> So to fix this, use the numa_cpu_lookup_table to remember the updated
> cpu-to-node mappings, and use them during CPU hotplug online operations.
> Further, we also need to ensure that all threads in a core are assigned to a
> common NUMA node, irrespective of whether all those threads were online during
> the topology update. To achieve this, we take care not to use cpu_sibling_mask()
> since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
> and set up the mappings manually using the 'threads_per_core' value for that
> particular platform. This helps us ensure that we don't hit this bug with any
> combination of CPU hotplug and SMT mode switching.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>

Any thoughts about these patches?

Regards,
Srivatsa S. Bhat

 
>  arch/powerpc/include/asm/topology.h |   10 +++++
>  arch/powerpc/mm/numa.c              |   70 ++++++++++++++++++++++++++++++++++-
>  2 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 89e3ef2..d0b5fca 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -22,7 +22,15 @@ struct device_node;
> 
>  static inline int cpu_to_node(int cpu)
>  {
> -	return numa_cpu_lookup_table[cpu];
> +	int nid;
> +
> +	nid = numa_cpu_lookup_table[cpu];
> +
> +	/*
> +	 * During early boot, the numa-cpu lookup table might not have been
> +	 * setup for all CPUs yet. In such cases, default to node 0.
> +	 */
> +	return (nid < 0) ? 0 : nid;
>  }
> 
>  #define parent_node(node)	(node)
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 078d3e0..6847d50 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -31,6 +31,8 @@
>  #include <asm/sparsemem.h>
>  #include <asm/prom.h>
>  #include <asm/smp.h>
> +#include <asm/cputhreads.h>
> +#include <asm/topology.h>
>  #include <asm/firmware.h>
>  #include <asm/paca.h>
>  #include <asm/hvcall.h>
> @@ -152,9 +154,22 @@ static void __init get_node_active_region(unsigned long pfn,
>  	}
>  }
> 
> -static void map_cpu_to_node(int cpu, int node)
> +static void reset_numa_cpu_lookup_table(void)
> +{
> +	unsigned int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		numa_cpu_lookup_table[cpu] = -1;
> +}
> +
> +static void update_numa_cpu_lookup_table(unsigned int cpu, int node)
>  {
>  	numa_cpu_lookup_table[cpu] = node;
> +}
> +
> +static void map_cpu_to_node(int cpu, int node)
> +{
> +	update_numa_cpu_lookup_table(cpu, node);
> 
>  	dbg("adding cpu %d to node %d\n", cpu, node);
> 
> @@ -522,11 +537,24 @@ static int of_drconf_to_nid_single(struct of_drconf_cell *drmem,
>   */
>  static int numa_setup_cpu(unsigned long lcpu)
>  {
> -	int nid = 0;
> -	struct device_node *cpu = of_get_cpu_node(lcpu, NULL);
> +	int nid;
> +	struct device_node *cpu;
> +
> +	/*
> +	 * If a valid cpu-to-node mapping is already available, use it
> +	 * directly instead of querying the firmware, since it represents
> +	 * the most recent mapping notified to us by the platform (eg: VPHN).
> +	 */
> +	if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
> +		map_cpu_to_node(lcpu, nid);
> +		return nid;
> +	}
> +
> +	cpu = of_get_cpu_node(lcpu, NULL);
> 
>  	if (!cpu) {
>  		WARN_ON(1);
> +		nid = 0;
>  		goto out;
>  	}
> 
> @@ -1067,6 +1095,7 @@ void __init do_init_bootmem(void)
>  	 */
>  	setup_node_to_cpumask_map();
> 
> +	reset_numa_cpu_lookup_table();
>  	register_cpu_notifier(&ppc64_numa_nb);
>  	cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
>  			  (void *)(unsigned long)boot_cpuid);
> @@ -1445,6 +1474,33 @@ static int update_cpu_topology(void *data)
>  	return 0;
>  }
> 
> +static int update_lookup_table(void *data)
> +{
> +	struct topology_update_data *update;
> +
> +	if (!data)
> +		return -EINVAL;
> +
> +	/*
> +	 * Upon topology update, the numa-cpu lookup table needs to be updated
> +	 * for all threads in the core, including offline CPUs, to ensure that
> +	 * future hotplug operations respect the cpu-to-node associativity
> +	 * properly.
> +	 */
> +	for (update = data; update; update = update->next) {
> +		int nid, base, j;
> +
> +		nid = update->new_nid;
> +		base = cpu_first_thread_sibling(update->cpu);
> +
> +		for (j = 0; j < threads_per_core; j++) {
> +			update_numa_cpu_lookup_table(base + j, nid);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Update the node maps and sysfs entries for each cpu whose home node
>   * has changed. Returns 1 when the topology has changed, and 0 otherwise.
> @@ -1513,6 +1569,14 @@ int arch_update_cpu_topology(void)
> 
>  	stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
> 
> +	/*
> +	 * Update the numa-cpu lookup table with the new mappings, even for
> +	 * offline CPUs. It is best to perform this update from the stop-
> +	 * machine context.
> +	 */
> +	stop_machine(update_lookup_table, &updates[0],
> +					cpumask_of(raw_smp_processor_id()));
> +
>  	for (ud = &updates[0]; ud; ud = ud->next) {
>  		unregister_cpu_under_node(ud->cpu, ud->old_nid);
>  		register_cpu_under_node(ud->cpu, ud->new_nid);
> 

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_ssi: Fix printing return code on clk error
From: Mark Brown @ 2014-01-06 13:20 UTC (permalink / raw)
  To: Alexander Shiyan
  Cc: alsa-devel, Liam Girdwood, Takashi Iwai, Timur Tabi,
	Jaroslav Kysela, linuxppc-dev
In-Reply-To: <1388902876-29964-1-git-send-email-shc_work@mail.ru>

[-- Attachment #1: Type: text/plain, Size: 137 bytes --]

On Sun, Jan 05, 2014 at 10:21:16AM +0400, Alexander Shiyan wrote:
> Signed-off-by: Alexander Shiyan <shc_work@mail.ru>

Applied, thanks.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC] linux/pci: move pci_platform_pm_ops to linux/pci.h
From: Rafael J. Wysocki @ 2014-01-06 12:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linux PM list, roy.zang, Dongsheng Wang,
	linux-pci@vger.kernel.org, Scott Wood, linuxppc-dev
In-Reply-To: <CAErSpo7+8sLZVi24fi+xxTEW2_DTJ2_zUrW+vobMGsvCEFwY_Q@mail.gmail.com>

On Friday, December 20, 2013 09:42:59 AM Bjorn Helgaas wrote:
> On Fri, Dec 20, 2013 at 3:03 AM, Dongsheng Wang
> <dongsheng.wang@freescale.com> wrote:
> > From: Wang Dongsheng <dongsheng.wang@freescale.com>
> >
> > make Freescale platform use pci_platform_pm_ops struct.
> 
> This changelog doesn't say anything about what the patch does.
> 
> I infer that you want to use pci_platform_pm_ops from some Freescale
> code.  This patch should be posted along with the patches that add
> that Freescale code, so we can see how you intend to use it.
> 
> The existing use is in drivers/pci/pci-acpi.c, so it's possible that
> your new use should be added in the same way, in drivers/pci, so we
> don't have to make pci_platform_pm_ops part of the public PCI
> interface in include/linux/pci.h.
> 
> That said, if Raphael thinks this makes sense, it's OK with me.

Well, I'd like to know why exactly the change is needed in the first place.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v2 0/9] cpuidle: rework device state count handling
From: Rafael J. Wysocki @ 2014-01-06 12:12 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-samsung-soc, linux-pm, daniel.lezcano, linux-kernel,
	kyungmin.park, linuxppc-dev, lenb
In-Reply-To: <1387565251-7051-1-git-send-email-b.zolnierkie@samsung.com>

On Friday, December 20, 2013 07:47:22 PM Bartlomiej Zolnierkiewicz wrote:
> Hi,
> 
> Some cpuidle drivers assume that cpuidle core will handle cases where
> device->state_count is smaller than driver->state_count, unfortunately
> currently this is untrue (device->state_count is used only for handling
> cpuidle state sysfs entries and driver->state_count is used for all
> other cases) and will not be fixed in the future as device->state_count
> is planned to be removed [1].
> 
> This patchset fixes such drivers (ARM EXYNOS cpuidle driver and ACPI
> cpuidle driver), removes superflous device->state_count initialization
> from drivers for which device->state_count equals driver->state_count
> (POWERPC pseries cpuidle driver and intel_idle driver) and finally
> removes state_count field from struct cpuidle_device.
> 
> Additionaly (while at it) this patchset fixes C1E promotion disable
> quirk handling (in intel_idle driver) and converts cpuidle drivers code
> to use the common cpuidle_[un]register() routines (in POWERPC pseries
> cpuidle driver and intel_idle driver).
> 
> [1] http://permalink.gmane.org/gmane.linux.power-management.general/36908
> 
> Reference to v1:
> 	http://comments.gmane.org/gmane.linux.power-management.general/37390
> 
> Changes since v1:
> - synced patch series with next-20131220
> - added ACKs from Daniel Lezcano

I've queued up the series for 3.14, thanks!

> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung R&D Institute Poland
> Samsung Electronics
> 
> 
> Bartlomiej Zolnierkiewicz (9):
>   ARM: EXYNOS: cpuidle: fix AFTR mode check
>   POWERPC: pseries: cpuidle: remove superfluous dev->state_count
>     initialization
>   POWERPC: pseries: cpuidle: use the common cpuidle_[un]register()
>     routines
>   ACPI / cpuidle: fix max idle state handling with hotplug CPU support
>   ACPI / cpuidle: remove dev->state_count setting
>   intel_idle: do C1E promotion disable quirk for hotplugged CPUs
>   intel_idle: remove superfluous dev->state_count initialization
>   intel_idle: use the common cpuidle_[un]register() routines
>   cpuidle: remove state_count field from struct cpuidle_device
> 
>  arch/arm/mach-exynos/cpuidle.c                  |   8 +-
>  arch/powerpc/platforms/pseries/processor_idle.c |  59 +---------
>  drivers/acpi/processor_idle.c                   |  29 +++--
>  drivers/cpuidle/cpuidle.c                       |   3 -
>  drivers/cpuidle/sysfs.c                         |   5 +-
>  drivers/idle/intel_idle.c                       | 140 +++++-------------------
>  include/linux/cpuidle.h                         |   1 -
>  7 files changed, 51 insertions(+), 194 deletions(-)
> 
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* linux-next: build failure after merge of the final tree
From: Stephen Rothwell @ 2014-01-06  9:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev
  Cc: Mahesh Salgaonkar, linux-next, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

Hi all,

After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards

The last time I got this error, I needed to apply patch "powerpc: Fix
"attempt to move .org backwards" error", but that has been included in
the powerpc tree now, so I guess something else has added code in a
critical place. :-(

I have just left this broken for today.
-- 
Cheers,
Stephen Rothwell <sfr@canb.auug.org.au>

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Build regressions/improvements in v3.13-rc7
From: Geert Uytterhoeven @ 2014-01-06  9:05 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org
  Cc: the arch/x86 maintainers, linuxppc-dev@lists.ozlabs.org,
	Linux-sh list
In-Reply-To: <1388998868-31448-1-git-send-email-geert@linux-m68k.org>

On Mon, Jan 6, 2014 at 10:01 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> JFYI, when comparing v3.13-rc7[1] to v3.13-rc6[3], the summaries are:
>   - build errors: +14/-4

  + /scratch/kisskb/src/arch/sh/mm/cache-sh4.c: error:
'cached_to_uncached' undeclared (first use in this function):  =>
99:17
  + /scratch/kisskb/src/arch/sh/mm/cache-sh4.c: error: implicit
declaration of function 'cpu_context'
[-Werror=implicit-function-declaration]:  => 192:2
  + /scratch/kisskb/src/drivers/mtd/maps/vmu-flash.c: error: (near
initialization for 'vmu_flash_driver.drv'):  => 805:3, 803:3, 804:3
  + /scratch/kisskb/src/drivers/mtd/maps/vmu-flash.c: error: expected
declaration specifiers or '...' before string constant:  => 824:20,
822:16, 823:15
  + /scratch/kisskb/src/drivers/mtd/maps/vmu-flash.c: error: field
name not in record or union initializer:  => 805:3, 803:3, 804:3
  + /scratch/kisskb/src/include/linux/maple.h: error: field 'dev' has
incomplete type:  => 80:16
  + /scratch/kisskb/src/include/linux/maple.h: error: field 'drv' has
incomplete type:  => 85:23

sh-randconfig

  + /scratch/kisskb/src/drivers/tty/serial/nwpserial.c: error:
implicit declaration of function 'udelay'
[-Werror=implicit-function-declaration]:  => 53:3
  + error: No rule to make target drivers/scsi/aic7xxx/aicasm/*.[chyl]:  => N/A

powerpc-randconfig

  + error: No rule to make target /etc/sound/msndinit.bin:  => N/A
  + error: No rule to make target /etc/sound/msndperm.bin:  => N/A
  + error: No rule to make target /etc/sound/pndsperm.bin:  => N/A
  + error: No rule to make target /etc/sound/pndspini.bin:  => N/A

i386-randconfig

> [1] http://kisskb.ellerman.id.au/kisskb/head/7037/ (119 out of 120 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/7026/ (119 out of 120 configs)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH -V3 1/2] powerpc: mm: Move ppc64 page table range definitions to separate header
From: Aneesh Kumar K.V @ 2014-01-06  9:03 UTC (permalink / raw)
  To: benh, paulus, aarcange, kirill.shutemov
  Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This avoid mmu-hash64.h including pagetable-ppc64.h. That inclusion
cause issues like

  CC      arch/powerpc/kernel/asm-offsets.s
In file included from /home/aneesh/linus/arch/powerpc/include/asm/mmu-hash64.h:23:0,
                 from /home/aneesh/linus/arch/powerpc/include/asm/mmu.h:196,
                 from /home/aneesh/linus/arch/powerpc/include/asm/lppaca.h:36,
                 from /home/aneesh/linus/arch/powerpc/include/asm/paca.h:21,
                 from /home/aneesh/linus/arch/powerpc/include/asm/hw_irq.h:41,
                 from /home/aneesh/linus/arch/powerpc/include/asm/irqflags.h:11,
                 from include/linux/irqflags.h:15,
                 from include/linux/spinlock.h:53,
                 from include/linux/seqlock.h:35,
                 from include/linux/time.h:5,
                 from include/uapi/linux/timex.h:56,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:17,
                 from arch/powerpc/kernel/asm-offsets.c:17:
/home/aneesh/linus/arch/powerpc/include/asm/pgtable-ppc64.h:563:42: error: unknown type name ‘spinlock_t’
 static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---

NOTE: We can either do this or stuck a typdef struct spinlock spinlock_t; in pgtable-ppc64.h 

 arch/powerpc/include/asm/mmu-hash64.h          |   2 +-
 arch/powerpc/include/asm/pgtable-ppc64-range.h | 101 +++++++++++++++++++++++++
 arch/powerpc/include/asm/pgtable-ppc64.h       | 101 +------------------------
 3 files changed, 103 insertions(+), 101 deletions(-)
 create mode 100644 arch/powerpc/include/asm/pgtable-ppc64-range.h

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 807014dde821..895b4df31fec 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -20,7 +20,7 @@
  * need for various slices related matters. Note that this isn't the
  * complete pgtable.h but only a portion of it.
  */
-#include <asm/pgtable-ppc64.h>
+#include <asm/pgtable-ppc64-range.h>
 #include <asm/bug.h>
 
 /*
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-range.h b/arch/powerpc/include/asm/pgtable-ppc64-range.h
new file mode 100644
index 000000000000..b48b089fb209
--- /dev/null
+++ b/arch/powerpc/include/asm/pgtable-ppc64-range.h
@@ -0,0 +1,101 @@
+#ifndef _ASM_POWERPC_PGTABLE_PPC64_RANGE_H_
+#define _ASM_POWERPC_PGTABLE_PPC64_RANGE_H_
+/*
+ * This file contains the functions and defines necessary to modify and use
+ * the ppc64 hashed page table.
+ */
+
+#ifdef CONFIG_PPC_64K_PAGES
+#include <asm/pgtable-ppc64-64k.h>
+#else
+#include <asm/pgtable-ppc64-4k.h>
+#endif
+#include <asm/barrier.h>
+
+#define FIRST_USER_ADDRESS	0
+
+/*
+ * Size of EA range mapped by our pagetables.
+ */
+#define PGTABLE_EADDR_SIZE (PTE_INDEX_SIZE + PMD_INDEX_SIZE + \
+			    PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
+#define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define PMD_CACHE_INDEX	(PMD_INDEX_SIZE + 1)
+#else
+#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
+#endif
+/*
+ * Define the address range of the kernel non-linear virtual area
+ */
+
+#ifdef CONFIG_PPC_BOOK3E
+#define KERN_VIRT_START ASM_CONST(0x8000000000000000)
+#else
+#define KERN_VIRT_START ASM_CONST(0xD000000000000000)
+#endif
+#define KERN_VIRT_SIZE	ASM_CONST(0x0000100000000000)
+
+/*
+ * The vmalloc space starts at the beginning of that region, and
+ * occupies half of it on hash CPUs and a quarter of it on Book3E
+ * (we keep a quarter for the virtual memmap)
+ */
+#define VMALLOC_START	KERN_VIRT_START
+#ifdef CONFIG_PPC_BOOK3E
+#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 2)
+#else
+#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 1)
+#endif
+#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE)
+
+/*
+ * The second half of the kernel virtual space is used for IO mappings,
+ * it's itself carved into the PIO region (ISA and PHB IO space) and
+ * the ioremap space
+ *
+ *  ISA_IO_BASE = KERN_IO_START, 64K reserved area
+ *  PHB_IO_BASE = ISA_IO_BASE + 64K to ISA_IO_BASE + 2G, PHB IO spaces
+ * IOREMAP_BASE = ISA_IO_BASE + 2G to VMALLOC_START + PGTABLE_RANGE
+ */
+#define KERN_IO_START	(KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))
+#define FULL_IO_SIZE	0x80000000ul
+#define  ISA_IO_BASE	(KERN_IO_START)
+#define  ISA_IO_END	(KERN_IO_START + 0x10000ul)
+#define  PHB_IO_BASE	(ISA_IO_END)
+#define  PHB_IO_END	(KERN_IO_START + FULL_IO_SIZE)
+#define IOREMAP_BASE	(PHB_IO_END)
+#define IOREMAP_END	(KERN_VIRT_START + KERN_VIRT_SIZE)
+
+
+/*
+ * Region IDs
+ */
+#define REGION_SHIFT		60UL
+#define REGION_MASK		(0xfUL << REGION_SHIFT)
+#define REGION_ID(ea)		(((unsigned long)(ea)) >> REGION_SHIFT)
+
+#define VMALLOC_REGION_ID	(REGION_ID(VMALLOC_START))
+#define KERNEL_REGION_ID	(REGION_ID(PAGE_OFFSET))
+#define VMEMMAP_REGION_ID	(0xfUL)	/* Server only */
+#define USER_REGION_ID		(0UL)
+
+/*
+ * Defines the address of the vmemap area, in its own region on
+ * hash table CPUs and after the vmalloc space on Book3E
+ */
+#ifdef CONFIG_PPC_BOOK3E
+#define VMEMMAP_BASE		VMALLOC_END
+#define VMEMMAP_END		KERN_IO_START
+#else
+#define VMEMMAP_BASE		(VMEMMAP_REGION_ID << REGION_SHIFT)
+#endif
+#define vmemmap			((struct page *)VMEMMAP_BASE)
+
+#ifdef CONFIG_PPC_MM_SLICES
+#define HAVE_ARCH_UNMAPPED_AREA
+#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
+#endif /* CONFIG_PPC_MM_SLICES */
+
+#endif
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 4a191c472867..9935e9b79524 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -1,102 +1,8 @@
 #ifndef _ASM_POWERPC_PGTABLE_PPC64_H_
 #define _ASM_POWERPC_PGTABLE_PPC64_H_
-/*
- * This file contains the functions and defines necessary to modify and use
- * the ppc64 hashed page table.
- */
-
-#ifdef CONFIG_PPC_64K_PAGES
-#include <asm/pgtable-ppc64-64k.h>
-#else
-#include <asm/pgtable-ppc64-4k.h>
-#endif
-#include <asm/barrier.h>
-
-#define FIRST_USER_ADDRESS	0
-
-/*
- * Size of EA range mapped by our pagetables.
- */
-#define PGTABLE_EADDR_SIZE (PTE_INDEX_SIZE + PMD_INDEX_SIZE + \
-                	    PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
-#define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define PMD_CACHE_INDEX	(PMD_INDEX_SIZE + 1)
-#else
-#define PMD_CACHE_INDEX	PMD_INDEX_SIZE
-#endif
-/*
- * Define the address range of the kernel non-linear virtual area
- */
-
-#ifdef CONFIG_PPC_BOOK3E
-#define KERN_VIRT_START ASM_CONST(0x8000000000000000)
-#else
-#define KERN_VIRT_START ASM_CONST(0xD000000000000000)
-#endif
-#define KERN_VIRT_SIZE	ASM_CONST(0x0000100000000000)
-
-/*
- * The vmalloc space starts at the beginning of that region, and
- * occupies half of it on hash CPUs and a quarter of it on Book3E
- * (we keep a quarter for the virtual memmap)
- */
-#define VMALLOC_START	KERN_VIRT_START
-#ifdef CONFIG_PPC_BOOK3E
-#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 2)
-#else
-#define VMALLOC_SIZE	(KERN_VIRT_SIZE >> 1)
-#endif
-#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE)
-
-/*
- * The second half of the kernel virtual space is used for IO mappings,
- * it's itself carved into the PIO region (ISA and PHB IO space) and
- * the ioremap space
- *
- *  ISA_IO_BASE = KERN_IO_START, 64K reserved area
- *  PHB_IO_BASE = ISA_IO_BASE + 64K to ISA_IO_BASE + 2G, PHB IO spaces
- * IOREMAP_BASE = ISA_IO_BASE + 2G to VMALLOC_START + PGTABLE_RANGE
- */
-#define KERN_IO_START	(KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))
-#define FULL_IO_SIZE	0x80000000ul
-#define  ISA_IO_BASE	(KERN_IO_START)
-#define  ISA_IO_END	(KERN_IO_START + 0x10000ul)
-#define  PHB_IO_BASE	(ISA_IO_END)
-#define  PHB_IO_END	(KERN_IO_START + FULL_IO_SIZE)
-#define IOREMAP_BASE	(PHB_IO_END)
-#define IOREMAP_END	(KERN_VIRT_START + KERN_VIRT_SIZE)
-
-
-/*
- * Region IDs
- */
-#define REGION_SHIFT		60UL
-#define REGION_MASK		(0xfUL << REGION_SHIFT)
-#define REGION_ID(ea)		(((unsigned long)(ea)) >> REGION_SHIFT)
-
-#define VMALLOC_REGION_ID	(REGION_ID(VMALLOC_START))
-#define KERNEL_REGION_ID	(REGION_ID(PAGE_OFFSET))
-#define VMEMMAP_REGION_ID	(0xfUL)	/* Server only */
-#define USER_REGION_ID		(0UL)
-
-/*
- * Defines the address of the vmemap area, in its own region on
- * hash table CPUs and after the vmalloc space on Book3E
- */
-#ifdef CONFIG_PPC_BOOK3E
-#define VMEMMAP_BASE		VMALLOC_END
-#define VMEMMAP_END		KERN_IO_START
-#else
-#define VMEMMAP_BASE		(VMEMMAP_REGION_ID << REGION_SHIFT)
-#endif
-#define vmemmap			((struct page *)VMEMMAP_BASE)
 
+#include <asm/pgtable-ppc64-range.h>
 
-/*
- * Include the PTE bits definitions
- */
 #ifdef CONFIG_PPC_BOOK3S
 #include <asm/pte-hash64.h>
 #else
@@ -104,11 +10,6 @@
 #endif
 #include <asm/pte-common.h>
 
-#ifdef CONFIG_PPC_MM_SLICES
-#define HAVE_ARCH_UNMAPPED_AREA
-#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
-#endif /* CONFIG_PPC_MM_SLICES */
-
 #ifndef __ASSEMBLY__
 
 /*
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH -V3 2/2] powerpc: thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-01-06  9:03 UTC (permalink / raw)
  To: benh, paulus, aarcange, kirill.shutemov
  Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1388999012-14424-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch fix the below crash

NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
LR [c0000000000439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
[437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58

On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.

We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 14 ++++++++++++++
 include/asm-generic/pgtable.h            | 12 ++++++++++++
 mm/huge_memory.c                         | 14 +++++---------
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 9935e9b79524..ff3afce40f3b 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -12,6 +12,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/spinlock.h>
 /*
  * This is the default implementation of various PTE accessors, it's
  * used in all cases except Book3S with 64K pages where we have a
@@ -459,5 +460,18 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 #define __HAVE_ARCH_PMDP_INVALIDATE
 extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
+
+#define pmd_move_must_withdraw pmd_move_must_withdraw
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+					 spinlock_t *old_pmd_ptl)
+{
+	/*
+	 * Archs like ppc64 use pgtable to store per pmd
+	 * specific information. So when we switch the pmd,
+	 * we should also withdraw and deposit the pgtable
+	 */
+	return true;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index db0923458940..8e4f41d9af4d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -558,6 +558,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 }
 #endif
 
+#ifndef pmd_move_must_withdraw
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+					 spinlock_t *old_pmd_ptl)
+{
+	/*
+	 * With split pmd lock we also need to move preallocated
+	 * PTE page table if new_pmd is on different PMD page table.
+	 */
+	return new_pmd_ptl != old_pmd_ptl;
+}
+#endif
+
 /*
  * This function is meant to be used by sites walking pagetables with
  * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c0b17295ba0..b77bb5df4db9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1502,19 +1502,15 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
 			spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
 		pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
 		VM_BUG_ON(!pmd_none(*new_pmd));
-		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
-		if (new_ptl != old_ptl) {
-			pgtable_t pgtable;
 
-			/*
-			 * Move preallocated PTE page table if new_pmd is on
-			 * different PMD page table.
-			 */
+		if (pmd_move_must_withdraw(new_ptl, old_ptl)) {
+			pgtable_t pgtable;
 			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
 			pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
-
-			spin_unlock(new_ptl);
 		}
+		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+		if (new_ptl != old_ptl)
+			spin_unlock(new_ptl);
 		spin_unlock(old_ptl);
 	}
 out:
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH] mtd: m25p80: Make the name of mtd_info fixed
From: Hou Zhiqiang @ 2014-01-06  6:34 UTC (permalink / raw)
  To: linux-mtd, linuxppc-dev
  Cc: scottwood, Hou Zhiqiang, mingkai.hu, computersforpeace

To give spi flash layout using "mtdparts=..." in cmdline, we must
give mtd_info a fixed name,because the cmdlinepart's parser will
match the name given in cmdline with the mtd_info.

Now, if use OF node, mtd_info's name will be spi->dev->name. It
consists of spi_master->bus_num, and the spi_master->bus_num maybe
dynamically fetched.
So, give the mtd_info a new fiexd name "name.cs", "name" is name of
spi_device_id and "cs" is chip-select in spi_dev.

Signed-off-by: Hou Zhiqiang <b48286@freescale.com>
---
 drivers/mtd/devices/m25p80.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
index eb558e8..d1ed480 100644
--- a/drivers/mtd/devices/m25p80.c
+++ b/drivers/mtd/devices/m25p80.c
@@ -1012,7 +1012,8 @@ static int m25p_probe(struct spi_device *spi)
 	if (data && data->name)
 		flash->mtd.name = data->name;
 	else
-		flash->mtd.name = dev_name(&spi->dev);
+		flash->mtd.name = kasprintf(GFP_KERNEL, "%s.%d",
+				id->name, spi->chip_select);

 	flash->mtd.type = MTD_NORFLASH;
 	flash->mtd.writesize = 1;
-- 
1.8.4.1

^ permalink raw reply related

* RE: [PATCH] mtd: m25p80: Add Power Management support
From: B48286 @ 2014-01-06  7:32 UTC (permalink / raw)
  To: 'Brian Norris'
  Cc: Scott Wood, linuxppc-dev@ozlabs.org, Mingkai.Hu@freescale.com,
	linux-mtd@lists.infradead.org, dwmw2@infradead.org
In-Reply-To: <20140103190049.GG5631@ld-irv-0074>

>On Wed, Dec 11, 2013 at 04:19:30PM +0800, Hou Zhiqiang wrote:
>> Add PM support using callback function suspend and resume in .driver=20
>> of spi_driver.
>>=20
>> Signed-off-by: Hou Zhiqiang <b48286@freescale.com>
>> ---
>>  drivers/mtd/devices/m25p80.c | 37=20
>> +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>>=20
>> diff --git a/drivers/mtd/devices/m25p80.c=20
>> b/drivers/mtd/devices/m25p80.c index 7eda71d..b0c2b8c 100644
>> --- a/drivers/mtd/devices/m25p80.c
>> +++ b/drivers/mtd/devices/m25p80.c
>> @@ -66,6 +66,8 @@
>> =20
>>  /* Used for Spansion flashes only. */
>>  #define	OPCODE_BRWR		0x17	/* Bank register write */
>> +#define	OPCODE_DP		0xb9	/* Enter deep power down mode */
>> +#define	OPCODE_RES		0xab	/* Exit deep power down mode */
>
>Where did you get these opcodes from? They are not in the Spansion datashe=
ets I'm reading. And in fact, they are overloaded as RES (Read Electronic S=
ignature, 0xab) and Bank Register Access (0xb9) in the datasheet I'm lookin=
g at. So this patch is wrong.
>

In datasheet S25FL128P, Deep Power Down command is b9h and Release from Dee=
p Power Down command is abh. In S25FL-A to S25FL-P Migration Guide those co=
mmands are the same.

>Also, can you describe the purpose of these "deep power down" modes?
>I've never seen PM states where the *flash* needs to be put into a lower p=
ower mode. Typically the flash is pretty low-power when idle, and it may ev=
en be powered off completely when the system enters a lower-power state. An=
yway, please describe why this patch is needed.
>

In standby mode, the MAX currunt consumption is 200mA, and in Deep Power Do=
wn mode, the MAX is 20mA. In actually the typically value of currunt consum=
ption is 3mA, so it save power consumption significantly I think.

>> =20
>>  /* Status Register bits. */
>>  #define	SR_WIP			1	/* Write in progress */
>> @@ -1128,11 +1130,46 @@ static int m25p_remove(struct spi_device *spi)
>>  	return mtd_device_unregister(&flash->mtd);
>>  }
>> =20
>> +#ifdef CONFIG_PM
>> +static int m25p_suspend(struct device *dev, pm_message_t mesg) {
>> +	struct m25p *flash =3D dev_get_drvdata(dev);
>> +	int ret;
>> +
>> +	flash->command[0] =3D OPCODE_DP;
>
>As mentioned above, this opcode is not recognized by many flash supported =
in this driver. So we might want one or more of the following:
>
> (1) to assign different suspend/resume opcodes for use in
>     m25p_suspend/resume
> (2) to provide over-loadable callbacks so that different flash could
>     use different suspend/resume routines
>
>And of course, we need to avoid sending these commands at all to unsupport=
ed flash.
>

Yeah, in m25p_probe we can get spi flash specified PM commands from somewhe=
re, but where can we set the PM commands, in struct spi_device_id?
Do you have some good suggestion?

>> +	mutex_lock(&flash->lock);
>> +	/* Wait until finished previous write/erase command. */
>> +	ret =3D wait_till_ready(flash);
>> +	if (ret) {
>> +		mutex_unlock(&flash->lock);
>> +		return ret;
>> +	}
>> +	ret =3D spi_write(flash->spi, flash->command, 1);
>> +	mutex_unlock(&flash->lock);
>> +
>> +	return ret;
>> +}
>> +
>> +static int m25p_resume(struct device *dev) {
>> +	struct m25p *flash =3D dev_get_drvdata(dev);
>> +	int ret;
>> +
>> +	flash->command[0] =3D OPCODE_RES;
>> +	ret =3D spi_write(flash->spi, flash->command, 1);
>> +
>> +	return ret;
>> +}
>> +#endif /* CONFIG_PM */
>> =20
>>  static struct spi_driver m25p80_driver =3D {
>>  	.driver =3D {
>>  		.name	=3D "m25p80",
>>  		.owner	=3D THIS_MODULE,
>> +#ifdef CONFIG_PM
>> +		.suspend =3D m25p_suspend,
>> +		.resume =3D m25p_resume,
>> +#endif
>>  	},
>>  	.id_table	=3D m25p_ids,
>>  	.probe	=3D m25p_probe,
>
>Brian

Zhiqiang Hou

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: wyang @ 2014-01-06  6:42 UTC (permalink / raw)
  To: Gavin Hu; +Cc: Linuxppc-dev
In-Reply-To: <CABiPGEeoHCRk_8=yKWnxLAnvh+xg8G-q2r-VbdjtFXudtBS9Hw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3063 bytes --]


On 01/06/2014 02:24 PM, Gavin Hu wrote:
> So, these primitive funcitons like atomic_add() and so on also can't 
> prevent processes schedule switch on local CPU core? right?

You are right!

BR
Wei
>
> Thanks!
>
>
> BR
> Gvain. Hu
>
>
> On Mon, Jan 6, 2014 at 1:27 PM, wyang <w90p710@gmail.com 
> <mailto:w90p710@gmail.com>> wrote:
>
>
>     On 01/06/2014 11:41 AM, Gavin Hu wrote:
>>     Thanks your response.  :)
>>     But that means that these optimitive operations like atomic_add()
>>     aren't optimitive actully in PPC architecture, right? Becuase
>>     they can be interrupted by loacl HW interrupts. Theoretically,
>>     the ISR also can access the atomic gloable variable.
>
>     Nope, my understand is that if you wanna sync kernel primitive
>     code with ISR, you have responsibility to disable local
>     interrupts. atomic_add does not guarantee to handle such case.
>
>     Thanks
>     Wei
>
>
>>
>>
>>     The following codes are complete atomic_inc() copied from arch/
>>     static __inline__ void atomic_add(int a, atomic_t *v)
>>     {
>>         int t;
>>
>>         __asm__ __volatile__(
>>     "1:    lwarx    %0,0,%3        # atomic_add\n\
>>         add    %0,%2,%0\n"
>>         PPC405_ERR77(0,%3)
>>     "    stwcx.    %0,0,%3 \n\
>>         bne-    1b"
>>         : "=&r" (t), "+m" (v->counter)
>>         : "r" (a), "r" (&v->counter)
>>         : "cc");
>>     }
>>
>>
>>     BR
>>     Gavin. Hu
>>
>>
>>     On Mon, Dec 30, 2013 at 9:54 AM, wyang <w90p710@gmail.com
>>     <mailto:w90p710@gmail.com>> wrote:
>>
>>         On 12/28/2013 01:41 PM, Gavin Hu wrote:
>>>         Hi
>>>
>>>         I notice that there is a pair ppc instructions lwarx and
>>>         stwcx used to atomtic operation for instance,
>>>         atomic_inc/atomic_dec.
>>>
>>>         In some ppc manuals, they more emphasize its mechanism is
>>>         that lwarx can reseve the target memory address preventing
>>>         other CORE from modifying it.
>>>
>>>         I assume that there is atomtic operation executing on the
>>>         CORE0 in a multicore system. In this situation, does the
>>>         CORE0 disable the local HW interrupt?
>>>         Can the executing process from the beginning of lwarx and
>>>         end of stwcx be interrupted by HW interruptions/exceptions? 
>>>         Anyway, they are two assembly instructions.
>>
>>         It should just like other arch, the processor should response
>>         any interrupt after the execution of a instruction, so the
>>         local HW interrupt is not disabled.
>>
>>         Thanks
>>         Wei
>>>
>>>          Thanks a lot!
>>>
>>>         "1:    lwarx    %0,0,%2        # atomic_inc\n\
>>>             addic    %0,%0,1\n"
>>>         "    stwcx.    %0,0,%2 \n\
>>>
>>>
>>>         BR
>>>         Gavin. Hu
>>>
>>>
>>>         _______________________________________________
>>>         Linuxppc-dev mailing list
>>>         Linuxppc-dev@lists.ozlabs.org  <mailto:Linuxppc-dev@lists.ozlabs.org>
>>>         https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>>
>
>


[-- Attachment #2: Type: text/html, Size: 9361 bytes --]

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: Gavin Hu @ 2014-01-06  6:24 UTC (permalink / raw)
  To: wyang; +Cc: Linuxppc-dev
In-Reply-To: <52CA3ED7.2020407@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

So, these primitive funcitons like atomic_add() and so on also can't
prevent processes schedule switch on local CPU core? right?

Thanks!


BR
Gvain. Hu


On Mon, Jan 6, 2014 at 1:27 PM, wyang <w90p710@gmail.com> wrote:

>
> On 01/06/2014 11:41 AM, Gavin Hu wrote:
>
>  Thanks your response.  :)
> But that means that these optimitive operations like atomic_add() aren't
> optimitive actully in PPC architecture, right? Becuase they can be
> interrupted by loacl HW interrupts. Theoretically, the ISR also can access
> the atomic gloable variable.
>
>
> Nope, my understand is that if you wanna sync kernel primitive code with
> ISR, you have responsibility to disable local interrupts. atomic_add does
> not guarantee to handle such case.
>
> Thanks
> Wei
>
>
>
>
> The following codes are complete atomic_inc() copied from arch/
> static __inline__ void atomic_add(int a, atomic_t *v)
> {
>     int t;
>
>     __asm__ __volatile__(
> "1:    lwarx    %0,0,%3        # atomic_add\n\
>     add    %0,%2,%0\n"
>     PPC405_ERR77(0,%3)
> "    stwcx.    %0,0,%3 \n\
>     bne-    1b"
>     : "=&r" (t), "+m" (v->counter)
>     : "r" (a), "r" (&v->counter)
>     : "cc");
> }
>
>
>  BR
>  Gavin. Hu
>
>
> On Mon, Dec 30, 2013 at 9:54 AM, wyang <w90p710@gmail.com> wrote:
>
>>  On 12/28/2013 01:41 PM, Gavin Hu wrote:
>>
>> Hi
>>
>> I notice that there is a pair ppc instructions lwarx and stwcx used to
>> atomtic operation for instance, atomic_inc/atomic_dec.
>>
>>  In some ppc manuals, they more emphasize its mechanism is that lwarx
>> can reseve the target memory address preventing other CORE from modifying
>> it.
>>
>>  I assume that there is atomtic operation executing on the CORE0 in a
>> multicore system. In this situation, does the CORE0 disable the local HW
>> interrupt?
>>  Can the executing process from the beginning of lwarx and end of stwcx
>> be interrupted by HW interruptions/exceptions?  Anyway, they are two
>> assembly instructions.
>>
>>
>>  It should just like other arch, the processor should response any
>> interrupt after the execution of a instruction, so the local HW interrupt
>> is not disabled.
>>
>> Thanks
>> Wei
>>
>>
>>  Thanks a lot!
>>
>> "1:    lwarx    %0,0,%2        # atomic_inc\n\
>>     addic    %0,%0,1\n"
>> "    stwcx.    %0,0,%2 \n\
>>
>>
>>  BR
>>  Gavin. Hu
>>
>>
>>  _______________________________________________
>> Linuxppc-dev mailing listLinuxppc-dev@lists.ozlabs.orghttps://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>>
>>
>
>

[-- Attachment #2: Type: text/html, Size: 5930 bytes --]

^ permalink raw reply

* Re: [02/12,v3] pci: fsl: add structure fsl_pci
From: Lian Minghuan-b31939 @ 2014-01-06  6:10 UTC (permalink / raw)
  To: Scott Wood, Minghuan Lian
  Cc: Bjorn Helgaas, linux-pci, linuxppc-dev, Zang Roy-R61911
In-Reply-To: <20140103221923.GB22546@home.buserror.net>

On 01/04/2014 06:19 AM, Scott Wood wrote:
> On Wed, Oct 23, 2013 at 06:41:24PM +0800, Minghuan Lian wrote:
>> PowerPC uses structure pci_controller to describe PCI controller,
>> but ARM uses structure pci_sys_data. In order to support PowerPC
>> and ARM simultaneously, the patch adds a structure fsl_pci that
>> contains most of the members of the pci_controller and pci_sys_data.
>> Meanwhile, it defines a interface fsl_arch_sys_to_pci() which should
>> be implemented in architecture-specific PCI controller driver to
>> convert pci_controller or pci_sys_data to fsl_pci.
>>
>> Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
>>
>> ---
>> change log:
>> v1-v3:
>> Derived from http://patchwork.ozlabs.org/patch/278965/
>>
>> Based on upstream master.
>> Based on the discussion of RFC version here
>> http://patchwork.ozlabs.org/patch/274487/
>>
>>   include/linux/fsl/pci-common.h | 41 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 41 insertions(+)
>>
>> diff --git a/include/linux/fsl/pci-common.h b/include/linux/fsl/pci-common.h
>> index 5e4f683..e56a040 100644
>> --- a/include/linux/fsl/pci-common.h
>> +++ b/include/linux/fsl/pci-common.h
>> @@ -102,5 +102,46 @@ struct ccsr_pci {
>>   
>>   };
>>   
>> +/*
>> + * Structure of a PCI controller (host bridge)
>> + */
>> +struct fsl_pci {
>> +	struct list_head node;
>> +	bool is_pcie;
>> +	struct device_node *dn;
>> +	struct device *dev;
>> +
>> +	int first_busno;
>> +	int last_busno;
>> +	int self_busno;
>> +	struct resource busn;
>> +
>> +	struct pci_ops *ops;
>> +	struct ccsr_pci __iomem *regs;
>> +
>> +	u32 indirect_type;
>> +
>> +	struct resource io_resource;
>> +	resource_size_t io_base_phys;
>> +	resource_size_t pci_io_size;
>> +
>> +	struct resource mem_resources[3];
>> +	resource_size_t mem_offset[3];
>> +
>> +	int global_number;	/* PCI domain number */
>> +
>> +	resource_size_t dma_window_base_cur;
>> +	resource_size_t dma_window_size;
>> +
>> +	void *sys;
>> +};
> I don't like the extent to which this duplicates (not moves) PPC's struct
> pci_controller.  Also this leaves some fields like "indirect_type"
> unexplained (PPC_INDIRECT_TYPE_xxx is only in the PPC header).
>
> Does the arch-independent part of the driver really need all this?  Given
> how closely this tracks the PPC code, how would this work on ARM?
[Minghuan] I added the duplicate fields because PPC's struct 
pci_controller need them.
The common PCI driver gets the related information and pass to PowerPC 
driver.
And I do hope PowerPC driver to parse dts or access controller to get 
the information again.
please see the following code for PowerPC:
int fsl_arch_pci_sys_register(struct fsl_pci *pci)
+{
+    struct pci_controller *hose;

+    hose = pcibios_alloc_controller(pci->dn);
+
+    hose->private_data = pci;
+    hose->parent = pci->dev;
+    hose->first_busno = pci->first_busno;
+    hose->last_busno = pci->last_busno;
+    hose->ops = pci->ops;
+
+    hose->io_base_virt = ioremap(pci->io_base_phys + 
pci->io_resource.start,
+                     pci->pci_io_size);
+    hose->pci_io_size = pci->io_resource.start + pci->pci_io_size;
+    hose->io_base_phys = pci->io_base_phys;
+    hose->io_resource = pci->io_resource;
+
+    memcpy(hose->mem_offset, pci->mem_offset, sizeof(hose->mem_offset));
+    memcpy(hose->mem_resources, pci->mem_resources,
+        sizeof(hose->mem_resources));
+    hose->dma_window_base_cur = pci->dma_window_base_cur;
+    hose->dma_window_size = pci->dma_window_size;
+    pci->sys = hose;
+....
+    return 0;
+}



The following is for ARM, I will submit them after verification:

+
+static inline struct fsl_pcie *sys_to_pcie(struct pci_sys_data *sys)
+{
+    return sys->private_data;
+}
+
+static int fsl_pcie_setup(int nr, struct pci_sys_data *sys)
+{
+    struct fsl_pcie *pcie;
+
+    pcie = sys_to_pcie(sys);
+
+    if (!pcie)
+        return 0;
+
+    pcie->sys = sys;
+
+    sys->io_offset = pcie->io_base_phys;
+    pci_ioremap_io(sys->io_offset, pcie->io_resource.start);
+    pci_add_resource_offset(&sys->resources, &pcie->io_resource,
+                sys->io_offset);
+
+    sys->mem_offset = pcie->mem_offset;
+    pci_add_resource_offset(&sys->resources, &pcie->mem_resource,
+                sys->mem_offset);
+
+    return 1;
+}
+
+static struct pci_bus *
+fsl_pcie_scan_bus(int nr, struct pci_sys_data *sys)
+{
+    struct pci_bus *bus;
+    struct fsl_pcie *pcie = sys_to_pcie(sys);
+
+    bus = pci_create_root_bus(pcie->dev, sys->busnr,
+                  pcie->ops, sys, &sys->resources);
+    if (!bus)
+        return NULL;
+
+    pci_scan_child_bus(bus);
+
+    return bus;
+}
+
+static int fsl_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
+{
+    struct of_irq oirq;
+    int ret;
+
+    ret = of_irq_map_pci(dev, &oirq);
+    if (ret)
+        return ret;
+
+    return irq_create_of_mapping(oirq.controller, oirq.specifier,
+                     oirq.size);
+}
+
+static struct hw_pci fsl_hw_pcie = {
+    .ops        = &fsl_indirect_pci_ops;
+    .setup        = fsl_pcie_setup,
+    .scan        = fsl_pcie_scan_bus,
+    .map_irq    = fsl_pcie_map_irq,
+};

+static struct pci_bus *
+fake_pci_bus(struct fsl_pcie *pcie, int busnr)
+{
+    static struct pci_bus bus;
+    static struct pci_sys_data sys;
+
+    bus.number = busnr;
+    bus.sysdata = &sys;
+    sys.private_data = pcie;
+    bus.ops = pcie->ops;
+    return &bus;
+}
+
+static int fsl_pcie_register(struct fsl_pcie *pcie)
+{
+    pcie->controller = fsl_hw_pcie.nr_controllers;
+    fsl_hw_pcie.nr_controllers = 1;
+    fsl_hw_pcie.private_data = (void **)&pcie;
+
+    pci_common_init(&fsl_hw_pcie);
+    pci_assign_unassigned_resources();
+#ifdef CONFIG_PCI_DOMAINS
+    fsl_hw_pcie.domain++;
+#endif
+}



>
> -Scott

^ permalink raw reply

* RE: [PATCH] powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg/mtsprg
From: Dongsheng.Wang @ 2014-01-06  6:05 UTC (permalink / raw)
  To: Scott Wood, Benjamin Herrenschmidt
  Cc: anton@enomsg.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <a79e5eb1cf444a3782c9af9952efe69b@BN1PR03MB188.namprd03.prod.outlook.com>

Reviewed-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Tested-by: Wang Dongsheng <dongsheng.wang@freescale.com>

Works well. :)

-Dongsheng

> -----Original Message-----
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+b40534=3Dfreescale.com@lists.ozlabs.org] On Behalf Of
> Dongsheng.Wang@freescale.com
> Sent: Friday, January 03, 2014 6:33 PM
> To: Wood Scott-B07421; Benjamin Herrenschmidt
> Cc: Anton Vorontsov; linuxppc-dev@lists.ozlabs.org
> Subject: RE: [PATCH] powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg=
/mtsprg
>=20
> Looks good. I will test it as soon as possible.
>=20
> BTW, there is only SPRG3 need to save.
> 32bit: SPRG0-SPRG1, SPRG2-SPRG7, SPRG9 be use to deal with exception,
> those register not need to save.(SPRG8 not be used) Only SPRG3 be used
> to save current thread_info pointer.
>=20
> -Dongsheng
>=20
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Friday, January 03, 2014 6:38 AM
> > To: Benjamin Herrenschmidt
> > Cc: linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Wang Dongsheng-B4=
0534;
> > Anton Vorontsov
> > Subject: [PATCH] powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg/m=
tsprg
> >
> > This fixes a build break that was probably introduced with the removal
> > of -Wa,-me500 (commit f49596a4cf4753d13951608f24f939a59fdcc653), where
> > the assembler refuses to recognize SPRG4-7 with a generic PPC target.
> >
> > Signed-off-by: Scott Wood <scottwood@freescale.com>
> > Cc: Dongsheng Wang <dongsheng.wang@freescale.com>
> > Cc: Anton Vorontsov <avorontsov@mvista.com>
> > ---
> > Dongsheng, please test.
> > ---
> >  arch/powerpc/kernel/swsusp_booke.S | 32 ++++++++++++++++--------------=
--
> >  1 file changed, 16 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/swsusp_booke.S
> > b/arch/powerpc/kernel/swsusp_booke.S
> > index 0f20405..553c140 100644
> > --- a/arch/powerpc/kernel/swsusp_booke.S
> > +++ b/arch/powerpc/kernel/swsusp_booke.S
> > @@ -74,21 +74,21 @@ _GLOBAL(swsusp_arch_suspend)
> >  	bne	1b
> >
> >  	/* Save SPRGs */
> > -	mfsprg	r4,0
> > +	mfspr	r4,SPRN_SPRG0
> >  	stw	r4,SL_SPRG0(r11)
> > -	mfsprg	r4,1
> > +	mfspr	r4,SPRN_SPRG1
> >  	stw	r4,SL_SPRG1(r11)
> > -	mfsprg	r4,2
> > +	mfspr	r4,SPRN_SPRG2
> >  	stw	r4,SL_SPRG2(r11)
> > -	mfsprg	r4,3
> > +	mfspr	r4,SPRN_SPRG3
> >  	stw	r4,SL_SPRG3(r11)
> > -	mfsprg	r4,4
> > +	mfspr	r4,SPRN_SPRG4
> >  	stw	r4,SL_SPRG4(r11)
> > -	mfsprg	r4,5
> > +	mfspr	r4,SPRN_SPRG5
> >  	stw	r4,SL_SPRG5(r11)
> > -	mfsprg	r4,6
> > +	mfspr	r4,SPRN_SPRG6
> >  	stw	r4,SL_SPRG6(r11)
> > -	mfsprg	r4,7
> > +	mfspr	r4,SPRN_SPRG7
> >  	stw	r4,SL_SPRG7(r11)
> >
> >  	/* Call the low level suspend stuff (we should probably have made
> > @@ -150,21 +150,21 @@ _GLOBAL(swsusp_arch_resume)
> >  	bl	_tlbil_all
> >
> >  	lwz	r4,SL_SPRG0(r11)
> > -	mtsprg	0,r4
> > +	mtspr	SPRN_SPRG0,r4
> >  	lwz	r4,SL_SPRG1(r11)
> > -	mtsprg	1,r4
> > +	mtspr	SPRN_SPRG1,r4
> >  	lwz	r4,SL_SPRG2(r11)
> > -	mtsprg	2,r4
> > +	mtspr	SPRN_SPRG2,r4
> >  	lwz	r4,SL_SPRG3(r11)
> > -	mtsprg	3,r4
> > +	mtspr	SPRN_SPRG3,r4
> >  	lwz	r4,SL_SPRG4(r11)
> > -	mtsprg	4,r4
> > +	mtspr	SPRN_SPRG4,r4
> >  	lwz	r4,SL_SPRG5(r11)
> > -	mtsprg	5,r4
> > +	mtspr	SPRN_SPRG5,r4
> >  	lwz	r4,SL_SPRG6(r11)
> > -	mtsprg	6,r4
> > +	mtspr	SPRN_SPRG6,r4
> >  	lwz	r4,SL_SPRG7(r11)
> > -	mtsprg	7,r4
> > +	mtspr	SPRN_SPRG7,r4
> >
> >  	/* restore the MSR */
> >  	lwz	r3,SL_MSR(r11)
> > --
> > 1.8.3.2
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>=20

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: Gavin Hu @ 2014-01-06  5:51 UTC (permalink / raw)
  To: wyang; +Cc: Linuxppc-dev
In-Reply-To: <52CA3ED7.2020407@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

Get it. Thanks!  :)


BR
Gavin. Hu


On Mon, Jan 6, 2014 at 1:27 PM, wyang <w90p710@gmail.com> wrote:

>
> On 01/06/2014 11:41 AM, Gavin Hu wrote:
>
>  Thanks your response.  :)
> But that means that these optimitive operations like atomic_add() aren't
> optimitive actully in PPC architecture, right? Becuase they can be
> interrupted by loacl HW interrupts. Theoretically, the ISR also can access
> the atomic gloable variable.
>
>
> Nope, my understand is that if you wanna sync kernel primitive code with
> ISR, you have responsibility to disable local interrupts. atomic_add does
> not guarantee to handle such case.
>
> Thanks
> Wei
>
>
>
>
> The following codes are complete atomic_inc() copied from arch/
> static __inline__ void atomic_add(int a, atomic_t *v)
> {
>     int t;
>
>     __asm__ __volatile__(
> "1:    lwarx    %0,0,%3        # atomic_add\n\
>     add    %0,%2,%0\n"
>     PPC405_ERR77(0,%3)
> "    stwcx.    %0,0,%3 \n\
>     bne-    1b"
>     : "=&r" (t), "+m" (v->counter)
>     : "r" (a), "r" (&v->counter)
>     : "cc");
> }
>
>
>  BR
>  Gavin. Hu
>
>
> On Mon, Dec 30, 2013 at 9:54 AM, wyang <w90p710@gmail.com> wrote:
>
>>  On 12/28/2013 01:41 PM, Gavin Hu wrote:
>>
>> Hi
>>
>> I notice that there is a pair ppc instructions lwarx and stwcx used to
>> atomtic operation for instance, atomic_inc/atomic_dec.
>>
>>  In some ppc manuals, they more emphasize its mechanism is that lwarx
>> can reseve the target memory address preventing other CORE from modifying
>> it.
>>
>>  I assume that there is atomtic operation executing on the CORE0 in a
>> multicore system. In this situation, does the CORE0 disable the local HW
>> interrupt?
>>  Can the executing process from the beginning of lwarx and end of stwcx
>> be interrupted by HW interruptions/exceptions?  Anyway, they are two
>> assembly instructions.
>>
>>
>>  It should just like other arch, the processor should response any
>> interrupt after the execution of a instruction, so the local HW interrupt
>> is not disabled.
>>
>> Thanks
>> Wei
>>
>>
>>  Thanks a lot!
>>
>> "1:    lwarx    %0,0,%2        # atomic_inc\n\
>>     addic    %0,%0,1\n"
>> "    stwcx.    %0,0,%2 \n\
>>
>>
>>  BR
>>  Gavin. Hu
>>
>>
>>  _______________________________________________
>> Linuxppc-dev mailing listLinuxppc-dev@lists.ozlabs.orghttps://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>>
>>
>
>

[-- Attachment #2: Type: text/html, Size: 5799 bytes --]

^ permalink raw reply

* Re: [03/12,v3] pci: fsl: add PCI indirect access support
From: Lian Minghuan-b31939 @ 2014-01-06  5:36 UTC (permalink / raw)
  To: Scott Wood, Minghuan Lian
  Cc: Bjorn Helgaas, linux-pci, linuxppc-dev, Zang Roy-R61911
In-Reply-To: <20140103223306.GC22546@home.buserror.net>

HI Scott,

please see my comments inline.

On 01/04/2014 06:33 AM, Scott Wood wrote:
> On Wed, Oct 23, 2013 at 06:41:25PM +0800, Minghuan Lian wrote:
>> The patch adds PCI indirect read/write functions. The main code
>> is ported from arch/powerpc/sysdev/indirect_pci.c. We use general
>> IO API iowrite32be/ioread32be instead of out_be32/in_be32, and
>> use structure fsl_Pci instead of PowerPC's pci_controller.
>> The patch also provides fsl_pcie_check_link() to check PCI link.
>> The weak function fsl_arch_pci_exclude_device() is provided to
>> call ppc_md.pci_exclude_device() for PowerPC architecture.
>>
>> Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
>>
>> ---
>> change log:
>> v1-v3:
>> Derived from http://patchwork.ozlabs.org/patch/278965/
>>
>> Based on upstream master.
>> Based on the discussion of RFC version here
>> http://patchwork.ozlabs.org/patch/274487/
>>
>>   drivers/pci/host/pci-fsl-common.c | 169 ++++++++++++++++++++++++++++++++------
>>   include/linux/fsl/pci-common.h    |   6 ++
>>   2 files changed, 151 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/pci/host/pci-fsl-common.c b/drivers/pci/host/pci-fsl-common.c
>> index 69d338b..8bc9a64 100644
>> --- a/drivers/pci/host/pci-fsl-common.c
>> +++ b/drivers/pci/host/pci-fsl-common.c
>> @@ -35,52 +35,173 @@
>>   #include <sysdev/fsl_soc.h>
>>   #include <sysdev/fsl_pci.h>
>>   
>> -static int fsl_pcie_check_link(struct pci_controller *hose)
>> +/* Indirect type */
>> +#define INDIRECT_TYPE_EXT_REG			0x00000002
>> +#define INDIRECT_TYPE_SURPRESS_PRIMARY_BUS	0x00000004
>> +#define INDIRECT_TYPE_NO_PCIE_LINK		0x00000008
>> +#define INDIRECT_TYPE_BIG_ENDIAN		0x00000010
>> +#define INDIRECT_TYPE_FSL_CFG_REG_LINK		0x00000040
> Why are these here rather than in the header, given that you have
> indirect_type in the struct in the header?
[Minghuan] It's better to define the type in the header file. I will fix it.
>
>> +int __weak fsl_arch_pci_exclude_device(struct fsl_pci *pci, u8 bus, u8 devfn)
>> +{
>> +	return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>> +static int fsl_pci_read_config(struct fsl_pci *pci, int bus, int devfn,
>> +				int offset, int len, u32 *val)
>> +{
>> +	u32 bus_no, reg, data;
>> +
>> +	if (pci->indirect_type & INDIRECT_TYPE_NO_PCIE_LINK) {
>> +		if (bus != pci->first_busno)
>> +			return PCIBIOS_DEVICE_NOT_FOUND;
>> +		if (devfn != 0)
>> +			return PCIBIOS_DEVICE_NOT_FOUND;
>> +	}
> A lot of this seems duplicated from arch/powerpc/sysdev/indirect_pci.c.
>
> How generally applicable is that file to non-PPC implementations?  At a
> minimum I see a similar file in arch/microblaze.  It should probably
> eventually be moved to common code, rather than duplicated again.  A
> prerequisite for that would be making common the dependencies it has on
> the rest of what is currently arch PCI infrastructure; until then, it's
> probably better to just have the common fsl-pci code know how to
> interface with the appropriate PPC/ARM code rather than trying to copy
> the infrastructure as well.
[Minghuan] Yes, This is a duplicate except it uses struct fsl_pci. But 
it is hard to be move to common code.
because every indirect read/write functions use different PCI controller 
structure which is very basic structure and ARM has no this structure.
If we can not establish a unified pci controller structure, we can only 
abstract out a simple structure which includes indirect access related 
fields,
and need a callback function to get the pointer like this: 
((powerpc/microblaze/mips/ pci_controller 
*)(pci_bus->sysdata))->indirect_struct.
Should we provide the common code for indirect access API or wait for 
the common PCI controller structure?
> -Scott

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: wyang @ 2014-01-06  5:27 UTC (permalink / raw)
  To: Gavin Hu; +Cc: Linuxppc-dev
In-Reply-To: <CABiPGEcHxgmuLMcMN1ByutJC22RJPHSZf3n5gF9BgenbfQTJvA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2400 bytes --]


On 01/06/2014 11:41 AM, Gavin Hu wrote:
> Thanks your response.  :)
> But that means that these optimitive operations like atomic_add() 
> aren't optimitive actully in PPC architecture, right? Becuase they can 
> be interrupted by loacl HW interrupts. Theoretically, the ISR also can 
> access the atomic gloable variable.

Nope, my understand is that if you wanna sync kernel primitive code with 
ISR, you have responsibility to disable local interrupts. atomic_add 
does not guarantee to handle such case.

Thanks
Wei

>
>
> The following codes are complete atomic_inc() copied from arch/
> static __inline__ void atomic_add(int a, atomic_t *v)
> {
>     int t;
>
>     __asm__ __volatile__(
> "1:    lwarx    %0,0,%3        # atomic_add\n\
>     add    %0,%2,%0\n"
>     PPC405_ERR77(0,%3)
> "    stwcx.    %0,0,%3 \n\
>     bne-    1b"
>     : "=&r" (t), "+m" (v->counter)
>     : "r" (a), "r" (&v->counter)
>     : "cc");
> }
>
>
> BR
> Gavin. Hu
>
>
> On Mon, Dec 30, 2013 at 9:54 AM, wyang <w90p710@gmail.com 
> <mailto:w90p710@gmail.com>> wrote:
>
>     On 12/28/2013 01:41 PM, Gavin Hu wrote:
>>     Hi
>>
>>     I notice that there is a pair ppc instructions lwarx and stwcx
>>     used to atomtic operation for instance, atomic_inc/atomic_dec.
>>
>>     In some ppc manuals, they more emphasize its mechanism is that
>>     lwarx can reseve the target memory address preventing other CORE
>>     from modifying it.
>>
>>     I assume that there is atomtic operation executing on the CORE0
>>     in a multicore system. In this situation, does the CORE0 disable
>>     the local HW interrupt?
>>     Can the executing process from the beginning of lwarx and end of
>>     stwcx be interrupted by HW interruptions/exceptions?  Anyway,
>>     they are two assembly instructions.
>
>     It should just like other arch, the processor should response any
>     interrupt after the execution of a instruction, so the local HW
>     interrupt is not disabled.
>
>     Thanks
>     Wei
>>
>>      Thanks a lot!
>>
>>     "1:    lwarx    %0,0,%2        # atomic_inc\n\
>>         addic    %0,%0,1\n"
>>     "    stwcx.    %0,0,%2 \n\
>>
>>
>>     BR
>>     Gavin. Hu
>>
>>
>>     _______________________________________________
>>     Linuxppc-dev mailing list
>>     Linuxppc-dev@lists.ozlabs.org  <mailto:Linuxppc-dev@lists.ozlabs.org>
>>     https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>


[-- Attachment #2: Type: text/html, Size: 5868 bytes --]

^ permalink raw reply

* [PATCH v2 2/2] powerpc/mpic_timer: fix convert ticks to time subtraction overflow
From: Dongsheng Wang @ 2014-01-06  5:23 UTC (permalink / raw)
  To: scottwood; +Cc: linuxppc-dev, Wang Dongsheng
In-Reply-To: <1388985811-32495-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

In some cases tmp_sec may be greater than ticks, because in the process
of calculation ticks and tmp_sec will be rounded.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
---
v2:
Add the new patch in v2.

diff --git a/arch/powerpc/sysdev/mpic_timer.c b/arch/powerpc/sysdev/mpic_timer.c
index 70dcf9c..9d9b062 100644
--- a/arch/powerpc/sysdev/mpic_timer.c
+++ b/arch/powerpc/sysdev/mpic_timer.c
@@ -97,8 +97,11 @@ static void convert_ticks_to_time(struct timer_group_priv *priv,
 	time->tv_sec = (__kernel_time_t)div_u64(ticks, priv->timerfreq);
 	tmp_sec = (u64)time->tv_sec * (u64)priv->timerfreq;
 
-	time->tv_usec = (__kernel_suseconds_t)
-		div_u64((ticks - tmp_sec) * 1000000, priv->timerfreq);
+	time->tv_usec = 0;
+
+	if (tmp_sec <= ticks)
+		time->tv_usec = (__kernel_suseconds_t)
+			div_u64((ticks - tmp_sec) * 1000000, priv->timerfreq);
 
 	return;
 }
-- 
1.8.5

^ permalink raw reply related

* [PATCH v2 1/2] powerpc/mpic_timer: fix the time is not accurate caused by GTCRR toggle bit
From: Dongsheng Wang @ 2014-01-06  5:23 UTC (permalink / raw)
  To: scottwood; +Cc: linuxppc-dev, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

When the timer GTCCR toggle bit is inverted, we calculated the rest
of the time is not accurate. So we need to ignore this bit.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
---
v2:
No change.

diff --git a/arch/powerpc/sysdev/mpic_timer.c b/arch/powerpc/sysdev/mpic_timer.c
index 22d7d57..70dcf9c 100644
--- a/arch/powerpc/sysdev/mpic_timer.c
+++ b/arch/powerpc/sysdev/mpic_timer.c
@@ -41,6 +41,7 @@
 #define MPIC_TIMER_TCR_ROVR_OFFSET	24
 
 #define TIMER_STOP			0x80000000
+#define GTCCR_TOG			0x80000000
 #define TIMERS_PER_GROUP		4
 #define MAX_TICKS			(~0U >> 1)
 #define MAX_TICKS_CASCADE		(~0U)
@@ -327,11 +328,13 @@ void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time)
 	casc_priv = priv->timer[handle->num].cascade_handle;
 	if (casc_priv) {
 		tmp_ticks = in_be32(&priv->regs[handle->num].gtccr);
+		tmp_ticks &= ~GTCCR_TOG;
 		ticks = ((u64)tmp_ticks & UINT_MAX) * (u64)MAX_TICKS_CASCADE;
 		tmp_ticks = in_be32(&priv->regs[handle->num - 1].gtccr);
 		ticks += tmp_ticks;
 	} else {
 		ticks = in_be32(&priv->regs[handle->num].gtccr);
+		ticks &= ~GTCCR_TOG;
 	}
 
 	convert_ticks_to_time(priv, ticks, time);
-- 
1.8.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox