LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: Bug in reclaim logic with exhausted nodes?
From: Nishanth Aravamudan @ 2014-03-13 17:01 UTC (permalink / raw)
  To: linux-mm; +Cc: mgorman, cl, linuxppc-dev, anton, rientjes
In-Reply-To: <20140311210614.GB946@linux.vnet.ibm.com>

There might have been an error in my original mail, so resending...

On 11.03.2014 [14:06:14 -0700], Nishanth Aravamudan wrote:
> We have seen the following situation on a test system:
> 
> 2-node system, each node has 32GB of memory.
> 
> 2 gigantic (16GB) pages reserved at boot-time, both of which are
> allocated from node 1.
> 
> SLUB notices this:
> 
> [    0.000000] SLUB: Unable to allocate memory from node 1
> [    0.000000] SLUB: Allocating a useless per node structure in order to
> be able to continue
> 
> After boot, user then did:
> 
> echo 24 > /proc/sys/vm/nr_hugepages
> 
> And tasks are stuck:
> 
> [<c0000000010980b8>] kexec_stack+0xb8/0x8000
> [<c0000000000144d0>] .__switch_to+0x1c0/0x390
> [<c0000000001ac708>] .throttle_direct_reclaim.isra.31+0x238/0x2c0
> [<c0000000001b0b34>] .try_to_free_pages+0xb4/0x210
> [<c0000000001a2f1c>] .__alloc_pages_nodemask+0x75c/0xb00
> [<c0000000001eafb0>] .alloc_fresh_huge_page+0x70/0x150
> [<c0000000001eb2d0>] .set_max_huge_pages.part.37+0x130/0x2f0
> [<c0000000001eb7c8>] .hugetlb_sysctl_handler_common+0x168/0x180
> [<c0000000002ae21c>] .proc_sys_call_handler+0xfc/0x120
> [<c00000000021dcc0>] .vfs_write+0xe0/0x260
> [<c00000000021e8c8>] .SyS_write+0x58/0xd0
> [<c000000000009e7c>] syscall_exit+0x0/0x7c
> 
> [<c00000004f9334b0>] 0xc00000004f9334b0
> [<c0000000000144d0>] .__switch_to+0x1c0/0x390
> [<c0000000001ac708>] .throttle_direct_reclaim.isra.31+0x238/0x2c0
> [<c0000000001b0b34>] .try_to_free_pages+0xb4/0x210
> [<c0000000001a2f1c>] .__alloc_pages_nodemask+0x75c/0xb00
> [<c0000000001eafb0>] .alloc_fresh_huge_page+0x70/0x150
> [<c0000000001eb2d0>] .set_max_huge_pages.part.37+0x130/0x2f0
> [<c0000000001eb7c8>] .hugetlb_sysctl_handler_common+0x168/0x180
> [<c0000000002ae21c>] .proc_sys_call_handler+0xfc/0x120
> [<c00000000021dcc0>] .vfs_write+0xe0/0x260
> [<c00000000021e8c8>] .SyS_write+0x58/0xd0
> [<c000000000009e7c>] syscall_exit+0x0/0x7c
> 
> [<c00000004f91f440>] 0xc00000004f91f440
> [<c0000000000144d0>] .__switch_to+0x1c0/0x390
> [<c0000000001ac708>] .throttle_direct_reclaim.isra.31+0x238/0x2c0
> [<c0000000001b0b34>] .try_to_free_pages+0xb4/0x210
> [<c0000000001a2f1c>] .__alloc_pages_nodemask+0x75c/0xb00
> [<c0000000001eafb0>] .alloc_fresh_huge_page+0x70/0x150
> [<c0000000001eb2d0>] .set_max_huge_pages.part.37+0x130/0x2f0
> [<c0000000001eb54c>] .nr_hugepages_store_common.isra.39+0xbc/0x1b0
> [<c0000000003662cc>] .kobj_attr_store+0x2c/0x50
> [<c0000000002b2c2c>] .sysfs_write_file+0xec/0x1c0
> [<c00000000021dcc0>] .vfs_write+0xe0/0x260
> [<c00000000021e8c8>] .SyS_write+0x58/0xd0
> [<c000000000009e7c>] syscall_exit+0x0/0x7c
> 
> kswapd1 is also pegged at this point at 100% cpu.
> 
> If we go in and manually:
> 
> echo 24 >
> /sys/devices/system/node/node0/hugepages/hugepages-16384kB/nr_hugepages
> 
> rather than relying on the interleaving allocator from the sysctl, the
> allocation succeeds (and the echo returns immediately).
> 
> I think we are hitting the following:
> 
> mm/hugetlb.c::alloc_fresh_huge_page_node():
> 
>         page = alloc_pages_exact_node(nid,
>                 htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
>                                                 __GFP_REPEAT|__GFP_NOWARN,
>                 huge_page_order(h));
> 
> include/linux/gfp.h:
> 
> #define GFP_THISNODE    (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
> 
> and mm/page_alloc.c::__alloc_pages_slowpath():
> 
>         /*
>          * GFP_THISNODE (meaning __GFP_THISNODE, __GFP_NORETRY and
>          * __GFP_NOWARN set) should not cause reclaim since the subsystem
>          * (f.e. slab) using GFP_THISNODE may choose to trigger reclaim
>          * using a larger set of nodes after it has established that the
>          * allowed per node queues are empty and that nodes are
>          * over allocated.
>          */
>         if (IS_ENABLED(CONFIG_NUMA) &&
>                         (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
>                 goto nopage;
> 
> so we *do* reclaim in this callpath. Under my reading, since node1 is
> exhausted, no matter how much work kswapd1 does, it will never reclaim
> memory from node1 to satisfy a 16M page allocation request (or any
> other, for that matter).
> 
> I see the following possible changes/fixes, but am unsure if
> a) my analysis is right
> b) which is best.
> 
> 1) Since we did notice early in boot that (in this case) node 1 was
> exhausted, perhaps we should mark it as such there somehow, and if a
> __GFP_THISNODE allocation request comes through on such a node, we
> immediately fallthrough to nopage?
> 
> 2) There is the following check
>         /*
>          * For order > PAGE_ALLOC_COSTLY_ORDER, if __GFP_REPEAT is
>          * specified, then we retry until we no longer reclaim any pages
>          * (above), or we've reclaimed an order of pages at least as
>          * large as the allocation's order. In both cases, if the
>          * allocation still fails, we stop retrying.
>          */
>         if (gfp_mask & __GFP_REPEAT && pages_reclaimed < (1 << order))
>                 return 1;
> 
> I wonder if we should add a check to also be sure that the pages we are
> reclaiming, if __GFP_THISNODE is set, are from the right node?
> 
>        if (gfp_mask & __GFP_THISNODE && the progress we have made is on
>        		the node requested?)
> 
> 3) did_some_progress could be updated to track where the progress is
> occuring, and if we are in __GFP_THISNODE allocation request and we
> didn't make any progress on the correct node, we fail the allocation?
> 
> I think this situation could be reproduced (and am working on it) by
> exhausting a NUMA node with 16M hugepages and then using the generic
> RR allocator to ask for more. Other node exhaustion cases probably
> exist, but since we can't swap the hugepages, it seems like the most
> straightforward way to try and reproduce it.
> 
> Any thoughts on this? Am I way off base?
> 
> Thanks,
> Nish
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
From: Nishanth Aravamudan @ 2014-03-13 16:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Han Pingtian, Matt Mackall, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Anton Blanchard,
	David Rientjes, Joonsoo Kim, linuxppc-dev, Wanpeng Li
In-Reply-To: <alpine.DEB.2.10.1402241353070.20839@nuc>

On 24.02.2014 [13:54:35 -0600], Christoph Lameter wrote:
> On Mon, 24 Feb 2014, Joonsoo Kim wrote:
> 
> > > It will not common get there because of the tracking. Instead a per cpu
> > > object will be used.
> > > > get_partial_node() always fails even if there are some partial slab on
> > > > memoryless node's neareast node.
> > >
> > > Correct and that leads to a page allocator action whereupon the node will
> > > be marked as empty.
> >
> > Why do we need to request to a page allocator if there is partial slab?
> > Checking whether node is memoryless or not is really easy, so we don't need
> > to skip this. To skip this is suboptimal solution.
> 
> The page allocator action is also used to determine to which other node we
> should fall back if the node is empty. So we need to call the page
> allocator when the per cpu slab is exhaused with the node of the
> memoryless node to get memory from the proper fallback node.

Where do we stand with these patches? I feel like no resolution was
really found...

Thanks,
Nish

^ permalink raw reply

* Re: Node 0 not necessary for powerpc?
From: Nishanth Aravamudan @ 2014-03-13 16:49 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linuxppc-dev, anton, rientjes
In-Reply-To: <alpine.DEB.2.10.1403120839110.6865@nuc>

On 12.03.2014 [08:41:40 -0500], Christoph Lameter wrote:
> On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> > I have a P7 system that has no node0, but a node0 shows up in numactl
> > --hardware, which has no cpus and no memory (and no PCI devices):
> 
> Well as you see from the code there has been so far the assumption that
> node 0 has memory. I have never run a machine that has no node 0 memory.

Do you mean beyond the initialization? I didn't see anything obvious so
far in the code itself that assumes a given node has memory (in the
sense of the nid). What are your thoughts about how best to support
this?

Thanks,
Nish

^ permalink raw reply

* Re: Node 0 not necessary for powerpc?
From: Nishanth Aravamudan @ 2014-03-13 16:48 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, cl, linuxppc-dev, anton
In-Reply-To: <alpine.DEB.2.02.1403111900100.19193@chino.kir.corp.google.com>

On 11.03.2014 [19:02:17 -0700], David Rientjes wrote:
> On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> 
> > I have a P7 system that has no node0, but a node0 shows up in numactl
> > --hardware, which has no cpus and no memory (and no PCI devices):
> > 
> > numactl --hardware
> > available: 4 nodes (0-3)
> > node 0 cpus:
> > node 0 size: 0 MB
> > node 0 free: 0 MB
> > node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7935 MB
> > node 2 free: 7716 MB
> > node 3 cpus:
> > node 3 size: 8395 MB
> > node 3 free: 8015 MB
> > node distances:
> > node   0   1   2   3 
> >   0:  10  20  10  20 
> >   1:  20  10  20  20 
> >   2:  10  20  10  20 
> >   3:  20  20  20  10 
> > 
> > This is because we statically initialize N_ONLINE to be [0] in
> > mm/page_alloc.c:
> > 
> >         [N_ONLINE] = { { [0] = 1UL } },
> > 
> > I'm not sure what the architectural requirements are here, but at least
> > on this test system, removing this initialization, it boots fine and is
> > running. I've not yet tried stress tests, but it's survived the
> > beginnings of kernbench so far.
> > 
> > numactl --hardware
> > available: 3 nodes (1-3)
> > node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7935 MB
> > node 2 free: 7479 MB
> > node 3 cpus:
> > node 3 size: 8396 MB
> > node 3 free: 8375 MB
> > node distances:
> > node   1   2   3 
> >   1:  10  20  20 
> >   2:  20  10  20 
> >   3:  20  20  10
> > 
> > Perhaps we could put in a ARCH_DOES_NOT_NEED_NODE0 and only define it on
> > powerpc for now, conditionalizing the above initialization on that?
> > 
> 
> I don't know if anything has recently changed in the past year or so, but 
> I've booted x86 machines with a hacked BIOS so that all memory on node 0 
> is hotpluggable and offline, so I believe this is possible on x86 as well.

Good to know, thanks! This is also certainly not very common on powerpc,
but it is possible -- and the topology ends up being inaccurate because
of the static initialization.

Thanks,
Nish

^ permalink raw reply

* [PATCH] powerpc/perf: Fix handling of L3 events with bank == 1
From: Michael Ellerman @ 2014-03-13  8:30 UTC (permalink / raw)
  To: linuxppc-dev

Currently we reject events which have the L3 bank == 1, such as
0x000084918F, because the cache field is non-zero.

However that is incorrect, because although the bank is non-zero, the
value we would write into MMCRC is zero, and so we can count the event.

So fix the check to ignore the bank selector when checking whether the
cache selector is non-zero.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/power8-pmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 3ad363d..fe2763b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -325,9 +325,10 @@ static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long
 		 * HV writable, and there is no API for guest kernels to modify
 		 * it. The solution is for the hypervisor to initialise the
 		 * field to zeroes, and for us to only ever allow events that
-		 * have a cache selector of zero.
+		 * have a cache selector of zero. The bank selector (bit 3) is
+		 * irrelevant, as long as the rest of the value is 0.
 		 */
-		if (cache)
+		if (cache & 0x7)
 			return -1;
 
 	} else if (event & EVENT_IS_L1) {
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH 9/9] powerpc/pm: support deep sleep feature on T1040
From: Kevin Hao @ 2014-03-13  7:46 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Chenhui Zhao, Jason.Jin, linux-kernel
In-Reply-To: <1394646185.13761.145.camel@snotra.buserror.net>

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

On Wed, Mar 12, 2014 at 12:43:05PM -0500, Scott Wood wrote:
> > Shouldn't we use "readback, sync" here? The following is quoted form t4240RM:
> >   To guarantee that the results of any sequence of writes to configuration
> >   registers are in effect, the final configuration register write should be
> >   immediately followed by a read of the same register, and that should be
> >   followed by a SYNC instruction. Then accesses can safely be made to memory
> >   regions affected by the configuration register write.
> 
> I agree that the sync before the readback is probably not necessary,
> since transactions to the same address should already be ordered.
> 
> A sync after the readback helps if you're trying to order the readback
> with subsequent memory accesses, though in that case wouldn't a sync
> alone (no readback) be adequate?

No, we don't just want to order the subsequent memory access here.
The 'write, readback, sync' is the required sequence if we want to make
sure that the writing to CCSR register does really take effect.

>  Though maybe not always -- see the
> comment near the end of fsl_elbc_write_buf() in
> drivers/mtd/nand_fsl_elbc.c.  I guess the readback does more than just
> make sure the device has seen the write, ensuring that the device has
> finished the transaction to the point of acting on another one.

Agree.

> 
> The data dependency plus isync sequence, which is done by the normal I/O
> accessors used from C code, orders the readback versus all future
> instructions (not just I/O).  The delay loop is not I/O.

According to the PowerISA, the sequence 'load, date dependency, isync' only
order the load accesses. So if we want to order all the storage access as well
as execution synchronization, we should choose sync here.

Thanks,
Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* RE: [PATCH] T1040RDB: add qe node for T1040RDB dts
From: qiang.zhao @ 2014-03-13  1:56 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@lists.ozlabs.org, Xiaobo Xie
In-Reply-To: <1394649930.13761.154.camel@snotra.buserror.net>

T24gV2VkLCAyMDE0LTAzLTEzIGF0IDI6NDYgQU0sIFNjb3R0IHdyb3RlOg0KDQo+IC0tLS0tT3Jp
Z2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IFdvb2QgU2NvdHQtQjA3NDIxDQo+IFNlbnQ6IFRo
dXJzZGF5LCBNYXJjaCAxMywgMjAxNCAyOjQ2IEFNDQo+IFRvOiBaaGFvIFFpYW5nLUI0NTQ3NQ0K
PiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IFdvb2QgU2NvdHQtQjA3NDIxOyBY
aWUgWGlhb2JvLVI2MzA2MQ0KPiBTdWJqZWN0OiBSZTogW1BBVENIXSBUMTA0MFJEQjogYWRkIHFl
IG5vZGUgZm9yIFQxMDQwUkRCIGR0cw0KPiANCj4gT24gV2VkLCAyMDE0LTAzLTEyIGF0IDE2OjI2
ICswODAwLCBaaGFvIFFpYW5nIHdyb3RlOg0KPiA+IFNpZ25lZC1vZmYtYnk6IFpoYW8gUWlhbmcg
PEI0NTQ3NUBmcmVlc2NhbGUuY29tPg0KPiA+IC0tLQ0KPiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9k
dHMvdDEwNDByZGIuZHRzIHwgNDMNCj4gPiArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKw0KPiA+ICAxIGZpbGUgY2hhbmdlZCwgNDMgaW5zZXJ0aW9ucygrKQ0KPiANCj4gUHJl
c3VtYWJseSB0aGlzIGlzIG9uIHRvcCBvZiB0aGlzIHBhdGNoOg0KPiBodHRwOi8vcGF0Y2h3b3Jr
Lm96bGFicy5vcmcvcGF0Y2gvMzE0MTM4Lw0KPiANCj4gLi4uc2luY2UgdGhlcmUncyBubyBleGlz
dGluZyB0MTA0MCBkZXZpY2UgdHJlZSBzdXBwb3J0LiAgQWx3YXlzIG1lbnRpb24NCj4gd2hlbiB5
b3VyIHBhdGNoIGlzIG9uIHRvcCBvZiBhIHBhdGNoIHRoYXQgaGFzbid0IHlldCBiZWVuIG1lcmdl
ZCBhbmQNCj4gaXNuJ3QgaW4gdGhlIHNhbWUgcGF0Y2ggc2V0Lg0KPiANCj4gQXQgbGVhc3Qgc29t
ZSBvZiB0aGlzIHN0dWZmIHNlZW1zIGxpa2UgaXQgc2hvdWxkIGJlIGluIHQxMDQwc2ktcG9zdC5k
dHMNCj4gKG9yIGEgZmlsZSBpbmNsdWRlZCBieSBpdCksIHJhdGhlciB0aGFuIHRoZSBib2FyZCBk
dHMuDQoNCkV2ZXJ5IGJvYXJkIGNhbiB1c2UgdWNjIGRpZmZlcmVudGx5LCBJdCBpcyBub3QgY29y
cmVjdCB0byBwdXQgdGhpcyBub2RlIGludG8gdDEwNDBzaS1wb3N0LmR0c2kuDQpGb3IgZXhhbXBs
ZSB0MTA0MHFkcyBjYW4gdXNlIHVjYzEgdG8gdGRtIHdoaWxlIG1heWJlIHQxMDQwcmRiIHVzZSB1
Y2MxIHRvIHVhcnQuDQoNCj4gDQo+ID4gKwkJdGRtYTogdWNjQDIwMDAgew0KPiA+ICsJCQljb21w
YXRpYmxlID0gImZzbCx1Y2MtdGRtIjsNCj4gPiArCQkJcngtY2xvY2stbmFtZSA9ICJjbGszIjsN
Cj4gPiArCQkJdHgtY2xvY2stbmFtZSA9ICJjbGs0IjsNCj4gPiArCQkJZnNsLHJ4LXN5bmMtY2xv
Y2sgPSAicnN5bmNfcGluIjsNCj4gPiArCQkJZnNsLHR4LXN5bmMtY2xvY2sgPSAidHN5bmNfcGlu
IjsNCj4gPiArCQkJZnNsLHR4LXRpbWVzbG90ID0gPDB4ZmZmZmZmZmU+Ow0KPiA+ICsJCQlmc2ws
cngtdGltZXNsb3QgPSA8MHhmZmZmZmZmZT47DQo+ID4gKwkJCWZzbCx0ZG0tZnJhbWVyLXR5cGUg
PSAiZTEiOw0KPiA+ICsJCQlmc2wsdGRtLW1vZGUgPSAibm9ybWFsIjsNCj4gPiArCQkJZnNsLHRk
bS1pZCA9IDwwPjsNCj4gPiArCQkJZnNsLHNpcmFtLWVudHJ5LWlkID0gPDA+Ow0KPiA+ICsJCX07
DQo+ID4gKw0KPiA+ICsJCXNlcmlhbDogdWNjQDIyMDAgew0KPiA+ICsJCQlkZXZpY2VfdHlwZSA9
ICJzZXJpYWwiOw0KPiA+ICsJCQljb21wYXRpYmxlID0gInVjY191YXJ0IjsNCj4gPiArCQkJcG9y
dC1udW1iZXIgPSA8MT47DQo+ID4gKwkJCXJ4LWNsb2NrLW5hbWUgPSAiYnJnMiI7DQo+ID4gKwkJ
CXR4LWNsb2NrLW5hbWUgPSAiYnJnMiI7DQo+ID4gKwkJfTsNCj4gDQo+IE1pc3NpbmcgcmVnLg0K
PiANCj4gLVNjb3R0DQo+IA0KDQo=

^ permalink raw reply

* Re: [PATCH v3 10/52] arm, kvm: Fix CPU hotplug callback registration
From: Christoffer Dall @ 2014-03-12 23:21 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: ego, kvm, peterz, linux-kernel, linuxppc-dev, paulus, walken,
	kvmarm, linux-arch, linux, mingo, marc.zyngier, paulmck, linux-pm,
	Gleb Natapov, rusty, tglx, linux-arm-kernel, rjw, oleg, tj,
	Paolo Bonzini, akpm
In-Reply-To: <20140310203538.10746.25364.stgit@srivatsabhat.in.ibm.com>

On Tue, Mar 11, 2014 at 02:05:38AM +0530, Srivatsa S. Bhat wrote:
> Subsystems that want to register CPU hotplug callbacks, as well as perform
> initialization for the CPUs that are already online, often do it as shown
> below:
> 
> 	get_online_cpus();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
> with CPU hotplug operations).
> 
> Instead, the correct and race-free way of performing the callback
> registration is:
> 
> 	cpu_notifier_register_begin();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	/* Note the use of the double underscored version of the API */
> 	__register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	cpu_notifier_register_done();
> 
> 
> Fix the kvm code in arm by using this latter form of callback registration.
> 
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Gleb Natapov <gleb@kernel.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: kvmarm@lists.cs.columbia.edu
> Cc: kvm@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  arch/arm/kvm/arm.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index bd18bb8..f0e50a0 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -1051,21 +1051,26 @@ int kvm_arch_init(void *opaque)
>  		}
>  	}
>  
> +	cpu_notifier_register_begin();
> +
>  	err = init_hyp_mode();
>  	if (err)
>  		goto out_err;
>  
> -	err = register_cpu_notifier(&hyp_init_cpu_nb);
> +	err = __register_cpu_notifier(&hyp_init_cpu_nb);
>  	if (err) {
>  		kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
>  		goto out_err;
>  	}
>  
> +	cpu_notifier_register_done();
> +
>  	hyp_cpu_pm_init();
>  
>  	kvm_coproc_table_init();
>  	return 0;
>  out_err:
> +	cpu_notifier_register_done();
>  	return err;
>  }
>  
> 

Just so we're clear, the existing code was simply racy as not prone to
deadlocks, right?

This makes it clear that the test above for compatible CPUs can be quite
easily evaded by using CPU hotplug, but we don't really have a good
solution for handling that yet...  Hmmm, grumble grumble, I guess if you
hotplug unsupported CPUs on a KVM/ARM system for now, stuff will break.

In any case:
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply

* Re: [PATCH v3 32/52] powercap, intel-rapl: Fix CPU hotplug callback registration
From: Jacob Pan @ 2014-03-12 22:27 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-arch, ego, walken, linux, akpm, linux-pm, peterz, rusty,
	rjw, oleg, linux-kernel, linuxppc-dev, paulus,
	Srinivas Pandruvada, tj, tglx, paulmck, mingo
In-Reply-To: <20140310203926.10746.11524.stgit@srivatsabhat.in.ibm.com>

On Tue, 11 Mar 2014 02:09:26 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:

> Subsystems that want to register CPU hotplug callbacks, as well as
> perform initialization for the CPUs that are already online, often do
> it as shown below:
> 
> 	get_online_cpus();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running
> concurrently with CPU hotplug operations).
> 
> Instead, the correct and race-free way of performing the callback
> registration is:
> 
> 	cpu_notifier_register_begin();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	/* Note the use of the double underscored version of the API
> */ __register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	cpu_notifier_register_done();
> 
> 
> Fix the intel-rapl code in the powercap driver by using this latter
> form of callback registration. But retain the calls to
> get/put_online_cpus(), since they also protect the function
> rapl_cleanup_data(). By nesting get/put_online_cpus() *inside*
> cpu_notifier_register_begin/done(), we avoid the ABBA deadlock
> possibility mentioned above.
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 

Tested-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

>  drivers/powercap/intel_rapl.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/powercap/intel_rapl.c
> b/drivers/powercap/intel_rapl.c index 3c67683..d6c74c1 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -1369,6 +1369,9 @@ static int __init rapl_init(void)
>  
>  		return -ENODEV;
>  	}
> +
> +	cpu_notifier_register_begin();
> +
>  	/* prevent CPU hotplug during detection */
>  	get_online_cpus();
>  	ret = rapl_detect_topology();
> @@ -1380,20 +1383,23 @@ static int __init rapl_init(void)
>  		ret = -ENODEV;
>  		goto done;
>  	}
> -	register_hotcpu_notifier(&rapl_cpu_notifier);
> +	__register_hotcpu_notifier(&rapl_cpu_notifier);
>  done:
>  	put_online_cpus();
> +	cpu_notifier_register_done();
>  
>  	return ret;
>  }
>  
>  static void __exit rapl_exit(void)
>  {
> +	cpu_notifier_register_begin();
>  	get_online_cpus();
> -	unregister_hotcpu_notifier(&rapl_cpu_notifier);
> +	__unregister_hotcpu_notifier(&rapl_cpu_notifier);
>  	rapl_unregister_powercap();
>  	rapl_cleanup_data();
>  	put_online_cpus();
> +	cpu_notifier_register_done();
>  }
>  
>  module_init(rapl_init);
> 

[Jacob Pan]

^ permalink raw reply

* Re: [PATCH v3 00/52] CPU hotplug: Fix issues with callback registration
From: Srivatsa S. Bhat @ 2014-03-12 20:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-arch, ego, walken, linux, linux-pm, peterz, rusty, rjw,
	oleg, linux-kernel, linuxppc-dev, paulus, tj, tglx, paulmck,
	mingo
In-Reply-To: <20140311150733.efcc594dd7fe59c9c5fe9325@linux-foundation.org>

On 03/12/2014 03:37 AM, Andrew Morton wrote:
> On Tue, 11 Mar 2014 02:03:52 +0530 "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> 
>> Hi,
>>
>> Many subsystems and drivers have the need to register CPU hotplug callbacks
>> from their init routines and also perform initialization for the CPUs that are
>> already online. But unfortunately there is no race-free way to achieve this
>> today.
>>
>> For example, consider this piece of code:
>>
>> 	get_online_cpus();
>>
>> 	for_each_online_cpu(cpu)
>> 		init_cpu(cpu);
>>
>> 	register_cpu_notifier(&foobar_cpu_notifier);
>>
>> 	put_online_cpus();
>>
>> This is not safe because there is a possibility of an ABBA deadlock involving
>> the cpu_add_remove_lock and the cpu_hotplug.lock.
>>
>>           CPU 0                                         CPU 1
>>           -----                                         -----
>>
>>    Acquire cpu_hotplug.lock
>>    [via get_online_cpus()]
>>
>>                                               CPU online/offline operation
>>                                               takes cpu_add_remove_lock
>>                                               [via cpu_maps_update_begin()]
>>
>>    Try to acquire
>>    cpu_add_remove_lock
>>    [via register_cpu_notifier()]
>>
>>                                               CPU online/offline operation
>>                                               tries to acquire cpu_hotplug.lock
>>                                               [via cpu_hotplug_begin()]
> 
> Can't we fix this by using a different (ie: new) lock to protect
> cpu_chain?
> 

No, that won't be a better solution than this one :-( The reason is that
CPU_POST_DEAD notifiers are invoked with the cpu_hotplug.lock dropped (by
design). So if we introduce the new lock, the locking would look as shown
below at the CPU hotplug side:

[ Note that it is unsafe to acquire and release the cpu-chain lock separately
for each invocation of the notifiers, because that would allow manipulations
of the cpu-chain in between two sets of notifications (such as CPU_DOWN_PREPARE
and CPU_DEAD, corresponding to the same CPU hotplug operation), which is
clearly wrong. So we need to acquire the new lock at the very beginning of
the hotplug operation and release it at the very end, after all notifiers
have been invoked.]

cpu_maps_update_begin(); //acquire cpu_add_remove_lock
    cpu_hotplug_begin(); //acquire cpu_hotplug.lock

        cpu_chain_lock(); //acquire a new lock that protects the cpu_chain

		Invoke CPU_DOWN_PREPARE notifiers
		//take cpu offline using stop-machine
		Invoke CPU_DEAD notifiers

    cpu_hotplug_done(); //release cpu_hotplug.lock

		Invoke CPU_POST_DEAD notifiers

        cpu_chain_unlock(); //release a new lock that protects the cpu_chain

cpu_maps_update_done(); //release cpu_add_remove_lock

So, if you observe the nesting of locks, it looks weird, because
cpu_hotplug.lock is acquired first, followed by cpu_chain_lock,
but they are released in the same order! IOW, they don't nest "properly".

To avoid this, if we reorder the locks in such a way that cpu_chain_lock
is the outer lock compared to cpu_hotplug.lock, then it becomes exactly
same as cpu_add_remove_lock. In other words, we can reuse the
cpu_add_remove_lock for this very purpose of protecting the cpu-chains
without adding any new lock to the CPU hotplug core code. And this is
what the existing code already does. I just utilize this fact and make
sure that we don't deadlock in the scenarios mentioned in the cover-letter
of this patchset.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [PATCH] T1040RDB: add qe node for T1040RDB dts
From: Scott Wood @ 2014-03-12 18:45 UTC (permalink / raw)
  To: Zhao Qiang; +Cc: B07421, R63061, linuxppc-dev
In-Reply-To: <1394612762-36308-1-git-send-email-B45475@freescale.com>

On Wed, 2014-03-12 at 16:26 +0800, Zhao Qiang wrote:
> Signed-off-by: Zhao Qiang <B45475@freescale.com>
> ---
>  arch/powerpc/boot/dts/t1040rdb.dts | 43 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)

Presumably this is on top of this patch:
http://patchwork.ozlabs.org/patch/314138/

...since there's no existing t1040 device tree support.  Always mention
when your patch is on top of a patch that hasn't yet been merged and
isn't in the same patch set.

At least some of this stuff seems like it should be in t1040si-post.dts
(or a file included by it), rather than the board dts.

> +		tdma: ucc@2000 {
> +			compatible = "fsl,ucc-tdm";
> +			rx-clock-name = "clk3";
> +			tx-clock-name = "clk4";
> +			fsl,rx-sync-clock = "rsync_pin";
> +			fsl,tx-sync-clock = "tsync_pin";
> +			fsl,tx-timeslot = <0xfffffffe>;
> +			fsl,rx-timeslot = <0xfffffffe>;
> +			fsl,tdm-framer-type = "e1";
> +			fsl,tdm-mode = "normal";
> +			fsl,tdm-id = <0>;
> +			fsl,siram-entry-id = <0>;
> +		};
> +
> +		serial: ucc@2200 {
> +			device_type = "serial";
> +			compatible = "ucc_uart";
> +			port-number = <1>;
> +			rx-clock-name = "brg2";
> +			tx-clock-name = "brg2";
> +		};

Missing reg.

-Scott

^ permalink raw reply

* Re: [PATCH] T1040RDB: add qe node for T1040RDB dts
From: Scott Wood @ 2014-03-12 18:36 UTC (permalink / raw)
  To: Zhao Qiang; +Cc: B07421, R63061, linuxppc-dev
In-Reply-To: <1394612762-36308-1-git-send-email-B45475@freescale.com>

On Wed, 2014-03-12 at 16:26 +0800, Zhao Qiang wrote:
> Signed-off-by: Zhao Qiang <B45475@freescale.com>
> ---
>  arch/powerpc/boot/dts/t1040rdb.dts | 43 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/arch/powerpc/boot/dts/t1040rdb.dts b/arch/powerpc/boot/dts/t1040rdb.dts
> index e2eee18..6ff0412 100644
> --- a/arch/powerpc/boot/dts/t1040rdb.dts
> +++ b/arch/powerpc/boot/dts/t1040rdb.dts
> @@ -268,6 +268,49 @@
>  			fsl,fman-mac = <&enet4>;
>  		};
>  	};
> +
> +	qe: qe@ffe139999 {
> +		ranges = <0x0 0xf 0xfe140000 0x40000>;
> +		reg = <0xf 0xfe140000 0 0x480>;

reg does not match unit address

Missing compatible

> +		si1: si@700 {
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +			compatible = "fsl,qe-si";
> +			reg = <0x700 0x80>;
> +		};

Missing binding

> +
> +		siram1: siram@1000 {
> +			#address-cells = <1>;
> +			#size-cells = <1>;
> +			compatible = "fsl,qe-siram";
> +			reg = <0x1000 0x800>;
> +		};

Missing binding

> +
> +		tdma: ucc@2000 {
> +			compatible = "fsl,ucc-tdm";
> +			rx-clock-name = "clk3";
> +			tx-clock-name = "clk4";
> +			fsl,rx-sync-clock = "rsync_pin";
> +			fsl,tx-sync-clock = "tsync_pin";
> +			fsl,tx-timeslot = <0xfffffffe>;
> +			fsl,rx-timeslot = <0xfffffffe>;
> +			fsl,tdm-framer-type = "e1";
> +			fsl,tdm-mode = "normal";
> +			fsl,tdm-id = <0>;
> +			fsl,siram-entry-id = <0>;
> +		};

Missing binding

> +		serial: ucc@2200 {
> +			device_type = "serial";
> +			compatible = "ucc_uart";
> +			port-number = <1>;
> +			rx-clock-name = "brg2";
> +			tx-clock-name = "brg2";
> +		};

Missing binding

-Scott

^ permalink raw reply

* [PATCH] powerpc: Update ppc4xx maintainer
From: Josh Boyer @ 2014-03-12 18:21 UTC (permalink / raw)
  To: benh; +Cc: Alistair Popple, linuxppc-dev

Alistair Popple has volunteered to take over maintainership of the ppc4xx
stuff upstream.  Switch the MAINTAINERS entry over to him.

Signed-off-by: Josh Boyer <jwboyer@gmail.com>
---
 MAINTAINERS | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1ecfde1..6d220c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5245,11 +5245,10 @@ F:	arch/powerpc/platforms/512x/
 F:	arch/powerpc/platforms/52xx/
 
 LINUX FOR POWERPC EMBEDDED PPC4XX
-M:	Josh Boyer <jwboyer@gmail.com>
+M:  Alistair Popple <alistair@popple.id.au>
 M:	Matt Porter <mporter@kernel.crashing.org>
 W:	http://www.penguinppc.org/
 L:	linuxppc-dev@lists.ozlabs.org
-T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx.git
 S:	Maintained
 F:	arch/powerpc/platforms/40x/
 F:	arch/powerpc/platforms/44x/
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH 9/9] powerpc/pm: support deep sleep feature on T1040
From: Scott Wood @ 2014-03-12 17:43 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc-dev, Chenhui Zhao, Jason.Jin, linux-kernel
In-Reply-To: <20140312055755.GA17203@pek-khao-d1.corp.ad.wrs.com>

On Wed, 2014-03-12 at 13:57 +0800, Kevin Hao wrote:
> On Tue, Mar 11, 2014 at 08:10:24PM -0500, Scott Wood wrote:
> > > +	FSL_DIS_ALL_IRQ
> > > +
> > > +	/*
> > > +	 * Place DDR controller in self refresh mode.
> > > +	 * From here on, DDR can't be access any more.
> > > +	 */
> > > +	lwz	r10, 0(r13)
> > > +	oris	r10, r10, CCSR_DDR_SDRAM_CFG_2_FRC_SR@h
> > > +	stw	r10, 0(r13)
> > > +
> > > +	/* can't call udelay() here, so use a macro to delay */
> > > +	FSLDELAY(50)
> > 
> > A timebase loop doesn't require accessing DDR.
> > 
> > You also probably want to do a "sync, readback, data dependency, isync"
> > sequence to make sure that the store has hit CCSR before you begin your
> > delay (or is a delay required at all if you do that?).
> 
> Shouldn't we use "readback, sync" here? The following is quoted form t4240RM:
>   To guarantee that the results of any sequence of writes to configuration
>   registers are in effect, the final configuration register write should be
>   immediately followed by a read of the same register, and that should be
>   followed by a SYNC instruction. Then accesses can safely be made to memory
>   regions affected by the configuration register write.

I agree that the sync before the readback is probably not necessary,
since transactions to the same address should already be ordered.

A sync after the readback helps if you're trying to order the readback
with subsequent memory accesses, though in that case wouldn't a sync
alone (no readback) be adequate?  Though maybe not always -- see the
comment near the end of fsl_elbc_write_buf() in
drivers/mtd/nand_fsl_elbc.c.  I guess the readback does more than just
make sure the device has seen the write, ensuring that the device has
finished the transaction to the point of acting on another one.

The data dependency plus isync sequence, which is done by the normal I/O
accessors used from C code, orders the readback versus all future
instructions (not just I/O).  The delay loop is not I/O.

> > > +	/* Enable SCU15 to trigger on RCPM Concentrator 0 */
> > > +	lwz	r10, 0(r15)
> > > +	oris	r10, r10, DCSR_EPU_EPECR15_IC0@h
> > > +	stw	r10, 0(r15)
> > > +
> > > +	/* put Core0 in PH15 mode, trigger EPU FSM */
> > > +	lwz	r10, 0(r12)
> > > +	ori	r10, r10, CCSR_RCPM_PCPH15SETR_CORE0
> > > +	stw	r10, 0(r12)
> > 
> > Shouldn't there be a sync to ensure that the previous I/O happens before
> > the final store to enter PH15?
> 
> Do we really need a sync here? According to the PowerISA, the above stores
> should be performed in program order.
>   If two Store instructions or two Load instructions
>   specify storage locations that are both Caching
>   Inhibited and Guarded, the corresponding storage
>   accesses are performed in program order with
>   respect to any processor or mechanism.

OK, wasn't aware of that.

-Scott

^ permalink raw reply

* Re: Node 0 not necessary for powerpc?
From: Christoph Lameter @ 2014-03-12 13:41 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: linux-mm, linuxppc-dev, anton, rientjes
In-Reply-To: <20140311195632.GA946@linux.vnet.ibm.com>

On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> I have a P7 system that has no node0, but a node0 shows up in numactl
> --hardware, which has no cpus and no memory (and no PCI devices):

Well as you see from the code there has been so far the assumption that
node 0 has memory. I have never run a machine that has no node 0 memory.

^ permalink raw reply

* [PATCH RFC v9 6/6] dma: mpc512x: register for device tree channel lookup
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
  Cc: devicetree
In-Reply-To: <1394624875-24411-1-git-send-email-a13xp0p0v88@gmail.com>

Register the controller for device tree based lookup of DMA channels
(non-fatal for backwards compatibility with older device trees) and
provide the '#dma-cells' property in the shared mpc5121.dtsi file

Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
---
 arch/powerpc/boot/dts/mpc5121.dtsi |  1 +
 drivers/dma/mpc512x_dma.c          | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc5121.dtsi b/arch/powerpc/boot/dts/mpc5121.dtsi
index 2c0e155..7f9d14f 100644
--- a/arch/powerpc/boot/dts/mpc5121.dtsi
+++ b/arch/powerpc/boot/dts/mpc5121.dtsi
@@ -498,6 +498,7 @@
 			compatible = "fsl,mpc5121-dma";
 			reg = <0x14000 0x1800>;
 			interrupts = <65 0x8>;
+			#dma-cells = <1>;
 		};
 	};
 
diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index ff7f678..453b1cb 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -52,6 +52,7 @@
 #include <linux/of_address.h>
 #include <linux/of_device.h>
 #include <linux/of_irq.h>
+#include <linux/of_dma.h>
 #include <linux/of_platform.h>
 
 #include <linux/random.h>
@@ -1018,11 +1019,23 @@ static int mpc_dma_probe(struct platform_device *op)
 	/* Register DMA engine */
 	dev_set_drvdata(dev, mdma);
 	retval = dma_async_device_register(dma);
-	if (retval) {
-		free_irq(mdma->irq, mdma);
-		irq_dispose_mapping(mdma->irq);
+	if (retval)
+		goto out_irq;
+
+	/* Register with OF helpers for DMA lookups (nonfatal) */
+	if (dev->of_node) {
+		retval = of_dma_controller_register(dev->of_node,
+						of_dma_xlate_by_chan_id, mdma);
+		if (retval)
+			dev_warn(dev, "could not register for OF lookup\n");
 	}
 
+	return 0;
+
+out_irq:
+	free_irq(mdma->irq, mdma);
+	irq_dispose_mapping(mdma->irq);
+
 	return retval;
 }
 
@@ -1031,6 +1044,8 @@ static int mpc_dma_remove(struct platform_device *op)
 	struct device *dev = &op->dev;
 	struct mpc_dma *mdma = dev_get_drvdata(dev);
 
+	if (dev->of_node)
+		of_dma_controller_free(dev->of_node);
 	dma_async_device_unregister(&mdma->dma);
 	free_irq(mdma->irq, mdma);
 	irq_dispose_mapping(mdma->irq);
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH RFC v9 5/6] dma: mpc512x: add device tree binding document
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
  Cc: devicetree
In-Reply-To: <1394624875-24411-1-git-send-email-a13xp0p0v88@gmail.com>

From: Gerhard Sittig <gsi@denx.de>

introduce a device tree binding document for the MPC512x DMA controller

Signed-off-by: Gerhard Sittig <gsi@denx.de>
[ a13xp0p0v88@gmail.com: turn this into a separate patch ]
---
 .../devicetree/bindings/dma/mpc512x-dma.txt        | 55 ++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/mpc512x-dma.txt

diff --git a/Documentation/devicetree/bindings/dma/mpc512x-dma.txt b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
new file mode 100644
index 0000000..a4867d5
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
@@ -0,0 +1,55 @@
+* Freescale MPC512x DMA Controller
+
+The DMA controller in the Freescale MPC512x SoC can move blocks of
+memory contents between memory and peripherals or memory to memory.
+
+Refer to the "Generic DMA Controller and DMA request bindings" description
+in the dma.txt file for a more detailled discussion of the binding.  The
+MPC512x DMA engine binding follows the common scheme, but doesn't provide
+support for the optional channels and requests counters (those values are
+derived from the detected hardware features) and has a fixed client
+specifier length of 1 integer cell (the value is the DMA channel, since
+the DMA controller uses a fixed assignment of request lines per channel).
+
+
+DMA controller node properties:
+
+Required properties:
+- compatible:		should be "fsl,mpc5121-dma"
+- reg:			address and size of the DMA controller's register set
+- interrupts:		interrupt spec for the DMA controller
+
+Optional properties:
+- #dma-cells:		must be <1>, describes the number of integer cells
+			needed to specify the 'dmas' property in client nodes,
+			strongly recommended since common client helper code
+			uses this property
+
+Example:
+
+	dma0: dma@14000 {
+		compatible = "fsl,mpc5121-dma";
+		reg = <0x14000 0x1800>;
+		interrupts = <65 0x8>;
+		#dma-cells = <1>;
+	};
+
+
+Client node properties:
+
+Required properties:
+- dmas:			list of DMA specifiers, consisting each of a handle
+			for the DMA controller and integer cells to specify
+			the channel used within the DMA controller
+- dma-names:		list of identifier strings for the DMA specifiers,
+			client device driver code uses these strings to
+			have DMA channels looked up at the controller
+
+Example:
+
+	sdhc@1500 {
+		compatible = "fsl,mpc5121-sdhc";
+		/* ... */
+		dmas = <&dma0 30>;
+		dma-names = "rx-tx";
+	};
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH RFC v9 4/6] dma: of: Add common xlate function for matching by channel id
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
  Cc: devicetree
In-Reply-To: <1394624875-24411-1-git-send-email-a13xp0p0v88@gmail.com>

This patch adds a new common OF dma xlate callback function which will match a
channel by it's id. The binding expects one integer argument which it will use to
lookup the channel by the id.

Unlike of_dma_simple_xlate this function is able to handle a system with
multiple DMA controllers. When registering the of dma provider with
of_dma_controller_register a pointer to the dma_device struct which is
associated with the dt node needs to passed as the data parameter.
New function will use this pointer to match only channels which belong to the
specified DMA controller.

Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
---
 drivers/dma/of-dma.c   | 35 +++++++++++++++++++++++++++++++++++
 include/linux/of_dma.h |  4 ++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/dma/of-dma.c b/drivers/dma/of-dma.c
index e8fe9dc..d5fbeaa 100644
--- a/drivers/dma/of-dma.c
+++ b/drivers/dma/of-dma.c
@@ -218,3 +218,38 @@ struct dma_chan *of_dma_simple_xlate(struct of_phandle_args *dma_spec,
 			&dma_spec->args[0]);
 }
 EXPORT_SYMBOL_GPL(of_dma_simple_xlate);
+
+/**
+ * of_dma_xlate_by_chan_id - Translate dt property to DMA channel by channel id
+ * @dma_spec:	pointer to DMA specifier as found in the device tree
+ * @of_dma:	pointer to DMA controller data
+ *
+ * This function can be used as the of xlate callback for DMA driver which wants
+ * to match the channel based on the channel id. When using this xlate function
+ * the #dma-cells propety of the DMA controller dt node needs to be set to 1.
+ * The data parameter of of_dma_controller_register must be a pointer to the
+ * dma_device struct the function should match upon.
+ *
+ * Returns pointer to appropriate dma channel on success or NULL on error.
+ */
+struct dma_chan *of_dma_xlate_by_chan_id(struct of_phandle_args *dma_spec,
+					 struct of_dma *ofdma)
+{
+	struct dma_device *dev = ofdma->of_dma_data;
+	struct dma_chan *chan, *candidate = NULL;
+
+	if (!dev || dma_spec->args_count != 1)
+		return NULL;
+
+	list_for_each_entry(chan, &dev->channels, device_node)
+		if (chan->chan_id == dma_spec->args[0]) {
+			candidate = chan;
+			break;
+		}
+
+	if (!candidate)
+		return NULL;
+
+	return dma_get_slave_channel(candidate);
+}
+EXPORT_SYMBOL_GPL(of_dma_xlate_by_chan_id);
diff --git a/include/linux/of_dma.h b/include/linux/of_dma.h
index ae36298..56bc026 100644
--- a/include/linux/of_dma.h
+++ b/include/linux/of_dma.h
@@ -41,6 +41,8 @@ extern struct dma_chan *of_dma_request_slave_channel(struct device_node *np,
 						     const char *name);
 extern struct dma_chan *of_dma_simple_xlate(struct of_phandle_args *dma_spec,
 		struct of_dma *ofdma);
+extern struct dma_chan *of_dma_xlate_by_chan_id(struct of_phandle_args *dma_spec,
+		struct of_dma *ofdma);
 #else
 static inline int of_dma_controller_register(struct device_node *np,
 		struct dma_chan *(*of_dma_xlate)
@@ -66,6 +68,8 @@ static inline struct dma_chan *of_dma_simple_xlate(struct of_phandle_args *dma_s
 	return NULL;
 }
 
+#define of_dma_xlate_by_chan_id NULL
+
 #endif
 
 #endif /* __LINUX_OF_DMA_H */
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH RFC v9 3/6] dma: mpc512x: replace devm_request_irq() with request_irq()
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
In-Reply-To: <1394624875-24411-1-git-send-email-a13xp0p0v88@gmail.com>

Replace devm_request_irq() with request_irq() since there is no need
to use it because the original code always frees IRQ manually with
devm_free_irq(). Replace devm_free_irq() with free_irq() accordingly.

Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
---
 drivers/dma/mpc512x_dma.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index b1e430c..ff7f678 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -921,16 +921,15 @@ static int mpc_dma_probe(struct platform_device *op)
 	mdma->tcd = (struct mpc_dma_tcd *)((u8 *)(mdma->regs)
 							+ MPC_DMA_TCD_OFFSET);
 
-	retval = devm_request_irq(dev, mdma->irq, &mpc_dma_irq, 0, DRV_NAME,
-									mdma);
+	retval = request_irq(mdma->irq, &mpc_dma_irq, 0, DRV_NAME, mdma);
 	if (retval) {
 		dev_err(dev, "Error requesting IRQ!\n");
 		return -EINVAL;
 	}
 
 	if (mdma->is_mpc8308) {
-		retval = devm_request_irq(dev, mdma->irq2, &mpc_dma_irq, 0,
-				DRV_NAME, mdma);
+		retval = request_irq(mdma->irq2, &mpc_dma_irq, 0,
+							DRV_NAME, mdma);
 		if (retval) {
 			dev_err(dev, "Error requesting IRQ2!\n");
 			return -EINVAL;
@@ -1020,7 +1019,7 @@ static int mpc_dma_probe(struct platform_device *op)
 	dev_set_drvdata(dev, mdma);
 	retval = dma_async_device_register(dma);
 	if (retval) {
-		devm_free_irq(dev, mdma->irq, mdma);
+		free_irq(mdma->irq, mdma);
 		irq_dispose_mapping(mdma->irq);
 	}
 
@@ -1033,7 +1032,7 @@ static int mpc_dma_remove(struct platform_device *op)
 	struct mpc_dma *mdma = dev_get_drvdata(dev);
 
 	dma_async_device_unregister(&mdma->dma);
-	devm_free_irq(dev, mdma->irq, mdma);
+	free_irq(mdma->irq, mdma);
 	irq_dispose_mapping(mdma->irq);
 
 	return 0;
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH RFC v9 2/6] dma: mpc512x: add support for peripheral transfers
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
In-Reply-To: <1394624875-24411-1-git-send-email-a13xp0p0v88@gmail.com>

Introduce support for slave s/g transfer preparation and the associated
device control callback in the MPC512x DMA controller driver, which adds
support for data transfers between memory and peripheral I/O to the
previously supported mem-to-mem transfers.

Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
---
 drivers/dma/mpc512x_dma.c | 236 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 231 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 2ce248b..b1e430c 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -2,6 +2,7 @@
  * Copyright (C) Freescale Semicondutor, Inc. 2007, 2008.
  * Copyright (C) Semihalf 2009
  * Copyright (C) Ilya Yanok, Emcraft Systems 2010
+ * Copyright (C) Alexander Popov, Promcontroller 2013
  *
  * Written by Piotr Ziecik <kosmo@semihalf.com>. Hardware description
  * (defines, structures and comments) was taken from MPC5121 DMA driver
@@ -29,8 +30,17 @@
  */
 
 /*
- * This is initial version of MPC5121 DMA driver. Only memory to memory
- * transfers are supported (tested using dmatest module).
+ * MPC512x and MPC8308 DMA driver. It supports
+ * memory to memory data transfers (tested using dmatest module) and
+ * data transfers between memory and peripheral I/O memory
+ * by means of slave s/g with these limitations:
+ *  - chunked transfers (transfers with more than one part) are refused
+ *     as long as proper support for scatter/gather is missing;
+ *  - transfers on MPC8308 always start from software as this SoC appears
+ *     not to have external request lines for peripheral flow control;
+ *  - minimal memory <-> I/O memory transfer chunk is 4 bytes and consequently
+ *     source and destination addresses must be 4-byte aligned
+ *     and transfer size must be aligned on (4 * maxburst) boundary;
  */
 
 #include <linux/module.h>
@@ -189,6 +199,7 @@ struct mpc_dma_desc {
 	dma_addr_t			tcd_paddr;
 	int				error;
 	struct list_head		node;
+	int				will_access_peripheral;
 };
 
 struct mpc_dma_chan {
@@ -201,6 +212,10 @@ struct mpc_dma_chan {
 	struct mpc_dma_tcd		*tcd;
 	dma_addr_t			tcd_paddr;
 
+	/* Settings for access to peripheral FIFO */
+	dma_addr_t			per_paddr;	/* FIFO address */
+	u32				tcd_nunits;
+
 	/* Lock for this structure */
 	spinlock_t			lock;
 };
@@ -251,8 +266,23 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
 	struct mpc_dma_desc *mdesc;
 	int cid = mchan->chan.chan_id;
 
-	/* Move all queued descriptors to active list */
-	list_splice_tail_init(&mchan->queued, &mchan->active);
+	while (!list_empty(&mchan->queued)) {
+		mdesc = list_first_entry(&mchan->queued,
+						struct mpc_dma_desc, node);
+		/*
+		 * Grab either several mem-to-mem transfer descriptors
+		 * or one peripheral transfer descriptor,
+		 * don't mix mem-to-mem and peripheral transfer descriptors
+		 * within the same 'active' list.
+		 */
+		if (mdesc->will_access_peripheral) {
+			if (list_empty(&mchan->active))
+				list_move_tail(&mdesc->node, &mchan->active);
+			break;
+		} else {
+			list_move_tail(&mdesc->node, &mchan->active);
+		}
+	}
 
 	/* Chain descriptors into one transaction */
 	list_for_each_entry(mdesc, &mchan->active, node) {
@@ -278,7 +308,17 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
 
 	if (first != prev)
 		mdma->tcd[cid].e_sg = 1;
-	out_8(&mdma->regs->dmassrt, cid);
+
+	if (mdma->is_mpc8308) {
+		/* MPC8308, no request lines, software initiated start */
+		out_8(&mdma->regs->dmassrt, cid);
+	} else if (first->will_access_peripheral) {
+		/* Peripherals involved, start by external request signal */
+		out_8(&mdma->regs->dmaserq, cid);
+	} else {
+		/* Memory to memory transfer, software initiated start */
+		out_8(&mdma->regs->dmassrt, cid);
+	}
 }
 
 /* Handle interrupt on one half of DMA controller (32 channels) */
@@ -596,6 +636,7 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 	}
 
 	mdesc->error = 0;
+	mdesc->will_access_peripheral = 0;
 	tcd = mdesc->tcd;
 
 	/* Prepare Transfer Control Descriptor for this transaction */
@@ -643,6 +684,188 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 	return &mdesc->desc;
 }
 
+static struct dma_async_tx_descriptor *
+mpc_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
+		unsigned int sg_len, enum dma_transfer_direction direction,
+		unsigned long flags, void *context)
+{
+	struct mpc_dma *mdma = dma_chan_to_mpc_dma(chan);
+	struct mpc_dma_chan *mchan = dma_chan_to_mpc_dma_chan(chan);
+	struct mpc_dma_desc *mdesc = NULL;
+	dma_addr_t per_paddr;
+	u32 tcd_nunits;
+	struct mpc_dma_tcd *tcd;
+	unsigned long iflags;
+	struct scatterlist *sg;
+	size_t len;
+	int iter, i;
+
+	/* Currently there is no proper support for scatter/gather */
+	if (sg_len != 1)
+		return NULL;
+
+	if (!is_slave_direction(direction))
+		return NULL;
+
+	for_each_sg(sgl, sg, sg_len, i) {
+		spin_lock_irqsave(&mchan->lock, iflags);
+
+		mdesc = list_first_entry(&mchan->free,
+						struct mpc_dma_desc, node);
+		if (!mdesc) {
+			spin_unlock_irqrestore(&mchan->lock, iflags);
+			/* Try to free completed descriptors */
+			mpc_dma_process_completed(mdma);
+			return NULL;
+		}
+
+		list_del(&mdesc->node);
+
+		per_paddr = mchan->per_paddr;
+		tcd_nunits = mchan->tcd_nunits;
+
+		spin_unlock_irqrestore(&mchan->lock, iflags);
+
+		if (per_paddr == 0 || tcd_nunits == 0)
+			goto err_prep;
+
+		mdesc->error = 0;
+		mdesc->will_access_peripheral = 1;
+
+		/* Prepare Transfer Control Descriptor for this transaction */
+		tcd = mdesc->tcd;
+
+		memset(tcd, 0, sizeof(struct mpc_dma_tcd));
+
+		if (!IS_ALIGNED(sg_dma_address(sg), 4))
+			goto err_prep;
+
+		if (direction == DMA_DEV_TO_MEM) {
+			tcd->saddr = per_paddr;
+			tcd->daddr = sg_dma_address(sg);
+			tcd->soff = 0;
+			tcd->doff = 4;
+		} else {
+			tcd->saddr = sg_dma_address(sg);
+			tcd->daddr = per_paddr;
+			tcd->soff = 4;
+			tcd->doff = 0;
+		}
+
+		tcd->ssize = MPC_DMA_TSIZE_4;
+		tcd->dsize = MPC_DMA_TSIZE_4;
+
+		len = sg_dma_len(sg);
+		tcd->nbytes = tcd_nunits * 4;
+		if (!IS_ALIGNED(len, tcd->nbytes))
+			goto err_prep;
+
+		iter = len / tcd->nbytes;
+		if (iter >= 1 << 15) {
+			/* len is too big */
+			goto err_prep;
+		}
+		/* citer_linkch contains the high bits of iter */
+		tcd->biter = iter & 0x1ff;
+		tcd->biter_linkch = iter >> 9;
+		tcd->citer = tcd->biter;
+		tcd->citer_linkch = tcd->biter_linkch;
+
+		tcd->e_sg = 0;
+		tcd->d_req = 1;
+
+		/* Place descriptor in prepared list */
+		spin_lock_irqsave(&mchan->lock, iflags);
+		list_add_tail(&mdesc->node, &mchan->prepared);
+		spin_unlock_irqrestore(&mchan->lock, iflags);
+	}
+
+	return &mdesc->desc;
+
+err_prep:
+	/* Put the descriptor back */
+	spin_lock_irqsave(&mchan->lock, iflags);
+	list_add_tail(&mdesc->node, &mchan->free);
+	spin_unlock_irqrestore(&mchan->lock, iflags);
+
+	return NULL;
+}
+
+static int mpc_dma_device_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
+							unsigned long arg)
+{
+	struct mpc_dma_chan *mchan;
+	struct mpc_dma *mdma;
+	struct dma_slave_config *cfg;
+	unsigned long flags;
+
+	mchan = dma_chan_to_mpc_dma_chan(chan);
+	switch (cmd) {
+	case DMA_TERMINATE_ALL:
+		/* Disable channel requests */
+		mdma = dma_chan_to_mpc_dma(chan);
+
+		spin_lock_irqsave(&mchan->lock, flags);
+
+		out_8(&mdma->regs->dmacerq, chan->chan_id);
+		list_splice_tail_init(&mchan->prepared, &mchan->free);
+		list_splice_tail_init(&mchan->queued, &mchan->free);
+		list_splice_tail_init(&mchan->active, &mchan->free);
+
+		spin_unlock_irqrestore(&mchan->lock, flags);
+
+		return 0;
+	case DMA_SLAVE_CONFIG:
+		/* Constraints:
+		 *  - only transfers between a peripheral device and
+		 *     memory are supported;
+		 *  - minimal transfer chunk is 4 bytes and consequently
+		 *     source and destination addresses must be 4-byte aligned
+		 *     and transfer size must be aligned on (4 * maxburst)
+		 *     boundary;
+		 *  - during the transfer RAM address is being incremented by
+		 *     the size of minimal transfer chunk;
+		 *  - peripheral port's address is constant during the transfer.
+		 */
+
+		cfg = (void *)arg;
+
+		if (!is_slave_direction(cfg->direction))
+			return -EINVAL;
+
+		if (cfg->src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES &&
+			cfg->dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
+			return -EINVAL;
+
+		spin_lock_irqsave(&mchan->lock, flags);
+
+		if (cfg->direction == DMA_DEV_TO_MEM) {
+			mchan->per_paddr = cfg->src_addr;
+			mchan->tcd_nunits = cfg->src_maxburst;
+		} else {
+			mchan->per_paddr = cfg->dst_addr;
+			mchan->tcd_nunits = cfg->dst_maxburst;
+		}
+
+		if (!IS_ALIGNED(mchan->per_paddr, 4)) {
+			spin_unlock_irqrestore(&mchan->lock, flags);
+			return -EINVAL;
+		}
+
+		if (mchan->tcd_nunits == 0)
+			mchan->tcd_nunits = 1;	/* Apply default */
+
+		spin_unlock_irqrestore(&mchan->lock, flags);
+
+		return 0;
+	default:
+		/* Unknown command */
+		break;
+	}
+
+	return -ENXIO;
+}
+
 static int mpc_dma_probe(struct platform_device *op)
 {
 	struct device_node *dn = op->dev.of_node;
@@ -727,9 +950,12 @@ static int mpc_dma_probe(struct platform_device *op)
 	dma->device_issue_pending = mpc_dma_issue_pending;
 	dma->device_tx_status = mpc_dma_tx_status;
 	dma->device_prep_dma_memcpy = mpc_dma_prep_memcpy;
+	dma->device_prep_slave_sg = mpc_dma_prep_slave_sg;
+	dma->device_control = mpc_dma_device_control;
 
 	INIT_LIST_HEAD(&dma->channels);
 	dma_cap_set(DMA_MEMCPY, dma->cap_mask);
+	dma_cap_set(DMA_SLAVE, dma->cap_mask);
 
 	for (i = 0; i < dma->chancnt; i++) {
 		mchan = &mdma->channels[i];
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH RFC v9 1/6] dma: mpc512x: reorder mpc8308 specific instructions
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
In-Reply-To: <1394624875-24411-1-git-send-email-a13xp0p0v88@gmail.com>

Concentrate the specific code for MPC8308 in the 'if' branch
and handle MPC512x in the 'else' branch.
This modification only reorders instructions but doesn't change behaviour.

Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
Acked-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Gerhard Sittig <gsi@denx.de>
---
 drivers/dma/mpc512x_dma.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 448750d..2ce248b 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -52,9 +52,17 @@
 #define MPC_DMA_DESCRIPTORS	64
 
 /* Macro definitions */
-#define MPC_DMA_CHANNELS	64
 #define MPC_DMA_TCD_OFFSET	0x1000
 
+/*
+ * Maximum channel counts for individual hardware variants
+ * and the maximum channel count over all supported controllers,
+ * used for data structure size
+ */
+#define MPC8308_DMACHAN_MAX	16
+#define MPC512x_DMACHAN_MAX	64
+#define MPC_DMA_CHANNELS	64
+
 /* Arbitration mode of group and channel */
 #define MPC_DMA_DMACR_EDCG	(1 << 31)
 #define MPC_DMA_DMACR_ERGA	(1 << 3)
@@ -710,10 +718,10 @@ static int mpc_dma_probe(struct platform_device *op)
 
 	dma = &mdma->dma;
 	dma->dev = dev;
-	if (!mdma->is_mpc8308)
-		dma->chancnt = MPC_DMA_CHANNELS;
+	if (mdma->is_mpc8308)
+		dma->chancnt = MPC8308_DMACHAN_MAX;
 	else
-		dma->chancnt = 16; /* MPC8308 DMA has only 16 channels */
+		dma->chancnt = MPC512x_DMACHAN_MAX;
 	dma->device_alloc_chan_resources = mpc_dma_alloc_chan_resources;
 	dma->device_free_chan_resources = mpc_dma_free_chan_resources;
 	dma->device_issue_pending = mpc_dma_issue_pending;
@@ -747,7 +755,19 @@ static int mpc_dma_probe(struct platform_device *op)
 	 * - Round-robin group arbitration,
 	 * - Round-robin channel arbitration.
 	 */
-	if (!mdma->is_mpc8308) {
+	if (mdma->is_mpc8308) {
+		/* MPC8308 has 16 channels and lacks some registers */
+		out_be32(&mdma->regs->dmacr, MPC_DMA_DMACR_ERCA);
+
+		/* enable snooping */
+		out_be32(&mdma->regs->dmagpor, MPC_DMA_DMAGPOR_SNOOP_ENABLE);
+		/* Disable error interrupts */
+		out_be32(&mdma->regs->dmaeeil, 0);
+
+		/* Clear interrupts status */
+		out_be32(&mdma->regs->dmaintl, 0xFFFF);
+		out_be32(&mdma->regs->dmaerrl, 0xFFFF);
+	} else {
 		out_be32(&mdma->regs->dmacr, MPC_DMA_DMACR_EDCG |
 					MPC_DMA_DMACR_ERGA | MPC_DMA_DMACR_ERCA);
 
@@ -768,18 +788,6 @@ static int mpc_dma_probe(struct platform_device *op)
 		/* Route interrupts to IPIC */
 		out_be32(&mdma->regs->dmaihsa, 0);
 		out_be32(&mdma->regs->dmailsa, 0);
-	} else {
-		/* MPC8308 has 16 channels and lacks some registers */
-		out_be32(&mdma->regs->dmacr, MPC_DMA_DMACR_ERCA);
-
-		/* enable snooping */
-		out_be32(&mdma->regs->dmagpor, MPC_DMA_DMAGPOR_SNOOP_ENABLE);
-		/* Disable error interrupts */
-		out_be32(&mdma->regs->dmaeeil, 0);
-
-		/* Clear interrupts status */
-		out_be32(&mdma->regs->dmaintl, 0xFFFF);
-		out_be32(&mdma->regs->dmaerrl, 0xFFFF);
 	}
 
 	/* Register DMA engine */
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH RFC v9 0/6] MPC512x DMA slave s/g support, OF DMA lookup
From: Alexander Popov @ 2014-03-12 11:47 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
  Cc: devicetree

2013/7/14 Gerhard Sittig <gsi@denx.de>:
> this series
> - introduces slave s/g support (that's support for DMA transfers which
>    involve peripherals in contrast to mem-to-mem transfers)
> - adds device tree based lookup support for DMA channels
> - combines floating patches and related feedback which already covered
>    several aspects of what the suggested LPB driver needs, to demonstrate
>    how integration might be done
> - carries Q&D SD card support to enable another DMA client during test,
>    while this patch needs to get dropped upon pickup

Changes in v2:
> - re-order mpc8308 related code paths for improved readability, no
>    change in behaviour, introduction of symbolic channel names here
>    already
> - squash 'execute() start condition' and 'terminate all' into the
>    introduction of 'slave s/g prep' and 'device control' support; refuse
>    s/g lists with more than one item since slave support is operational
>    yet proper s/g support is missing (can get addressed later)
> - always start transfers from software on MPC8308 as there are no
>    external request lines for peripheral flow control
> - drop dt-bindings header file and symbolic channel names in OF nodes

Changes in v3 and v4:
 Part 1/5:
 - use #define instead of enum since individual channels don't require
    special handling.
 Part 2/5:
 - add a flag "will_access_peripheral" to DMA transfer descriptor
    according recommendations of Gerhard Sittig.
    This flag is set in mpc_dma_prep_memcpy() and mpc_dma_prep_slave_sg()
    and is evaluated in mpc_dma_execute() to choose a type of start for
    the transfer.
 - prevent descriptors of transfers which involve peripherals from
    being chained together;
    each of such transfers needs hardware initiated start.
 - add locking while working with struct mpc_dma_chan
    according recommendations of Lars-Peter Clausen.
 - remove default nbytes value. Client kernel modules must set
    src_maxburst and dst_maxburst fields of struct dma_slave_config (dmaengine.h).

Changes in v5:
 Part 2/5:
 - add and improve comments;
 - improve the code moving transfer descriptors from 'queued' to 'active' list
    in mpc_dma_execute();
 - allow mpc_dma_prep_slave_sg() to run with non-empty 'active' list;
 - take 'mdesc' back to 'free' list in case of error in mpc_dma_prep_slave_sg();
 - improve checks of the transfer parameters;
 - provide the default value for 'maxburst' in mpc_dma_device_control().

Changes in v6:
 Part 2/5:
 - remove doubtful comment;
 - fix coding style issues;
 - set default value for 'maxburst' to 1 which applies to most cases;
 Part 3/5:
 - use dma_get_slave_channel() instead of dma_request_channel()
    in new function of_dma_xlate_by_chan_id() according recommendations of
    Arnd Bergmann;
 Part 4/5:
 - set DMA_PRIVATE flag for MPC512x DMA controller since its driver relies on
    of_dma_xlate_by_chan_id() which doesn't use dma_request_channel()
    any more; (removed in v7)
 - resolve little patch conflict;
 Part 5/5:
 - resolve little patch conflict;

Changes in v7:
 Part 2:
 - improve comment;
 Part 4:
 - split in two separate patches. Part 4/6 contains device tree
    binding document and in part 5/6 MPC512x DMA controller is registered
    for device tree channel lookup;
 - remove setting DMA_PRIVATE flag for MPC512x DMA controller from part 5/6;

Changes in v8:
 Part 2:
 - improve comments;
 - fix style issues;
 Part 6:
 - remove since it has become obsolete;

Changes in v9:
 A new patch (part 3/6) is added to this series according the
   feedback of Andy Shevchenko.
 Part 2/6:
 - keep style of the comments;
 - use is_slave_direction() instead of manual checks;
 - remove redundant else branches of the conditions;
 - make mpc_dma_device_control() return -ENXIO for unknown command;
 Part 6/6:
 - change according the new part 3/6;
 - fix style issues;

> known issues:
> - it's yet to get confirmed whether MPC8308 can use slave support or
>    whether the DMA controller's driver shall actively reject it, the
>    information that's available so far suggests that peripheral transfers
>    to IP bus attached I/O is useful and shall not get blocked right away
 - adding support for transfers which don't increment the RAM address or
    do increment the peripheral "port's" address is easy with
    this implementation; but which options of the common API
    should be used for specifying such transfers?
2014/02/13 Gerhard Sittig <gsi@denx.de>:
> - The MPC512x DMA completely lacks a binding document, so one
>    should get added.
> - The MPC8308 hardware is similar and can re-use the MPC512x
>    binding, which should be stated.
> - The Linux implementation currently has no OF based channel
>    lookup support, so '#dma-cells' is "a future feature".  I guess
>    the binding can and should already discuss the feature,
>    regardless of whether all implementations support it.


Alexander Popov (5):
  dma: mpc512x: reorder mpc8308 specific instructions
  dma: mpc512x: add support for peripheral transfers
  dma: mpc512x: replace devm_request_irq() with request_irq()
  dma: of: Add common xlate function for matching by channel id
  dma: mpc512x: register for device tree channel lookup

Gerhard Sittig (1):
  dma: mpc512x: add device tree binding document

 .../devicetree/bindings/dma/mpc512x-dma.txt        |  55 ++++
 arch/powerpc/boot/dts/mpc5121.dtsi                 |   1 +
 drivers/dma/mpc512x_dma.c                          | 308 +++++++++++++++++++--
 drivers/dma/of-dma.c                               |  35 +++
 include/linux/of_dma.h                             |   4 +
 5 files changed, 373 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/mpc512x-dma.txt

-- 
1.8.4.2

^ permalink raw reply

* Re: [PATCH 9/9] powerpc/pm: support deep sleep feature on T1040
From: Chenhui Zhao @ 2014-03-12 10:40 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <1394586624.13761.132.camel@snotra.buserror.net>

On Tue, Mar 11, 2014 at 08:10:24PM -0500, Scott Wood wrote:
> On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > From: Zhao Chenhui <chenhui.zhao@freescale.com>
> > 
> > T1040 supports deep sleep feature, which can switch off most parts of
> > the SoC when it is in deep sleep mode. This way, it becomes more
> > energy-efficient.
> > 
> > The DDR controller will also be powered off in deep sleep. Therefore,
> > the last stage (the latter part of fsl_dp_enter_low) will run without DDR
> > access. This piece of code and related TLBs will be prefetched.
> > 
> > Due to the different initialization code between 32-bit and 64-bit, they
> > have seperate resume entry and precedure.
> > 
> > The feature supports 32-bit and 64-bit kernel mode.
> > 
> > Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
> > ---
> >  arch/powerpc/include/asm/booke_save_regs.h |    3 +
> >  arch/powerpc/kernel/cpu_setup_fsl_booke.S  |   17 ++
> >  arch/powerpc/kernel/head_fsl_booke.S       |   30 +++
> >  arch/powerpc/platforms/85xx/Makefile       |    2 +-
> >  arch/powerpc/platforms/85xx/deepsleep.c    |  201 +++++++++++++++++++
> >  arch/powerpc/platforms/85xx/qoriq_pm.c     |   38 ++++
> >  arch/powerpc/platforms/85xx/sleep.S        |  295 ++++++++++++++++++++++++++++
> >  arch/powerpc/sysdev/fsl_soc.h              |    7 +
> >  8 files changed, 592 insertions(+), 1 deletions(-)
> >  create mode 100644 arch/powerpc/platforms/85xx/deepsleep.c
> >  create mode 100644 arch/powerpc/platforms/85xx/sleep.S
> > 
> > diff --git a/arch/powerpc/include/asm/booke_save_regs.h b/arch/powerpc/include/asm/booke_save_regs.h
> > index 87c357a..37c1f6c 100644
> > --- a/arch/powerpc/include/asm/booke_save_regs.h
> > +++ b/arch/powerpc/include/asm/booke_save_regs.h
> > @@ -88,6 +88,9 @@
> >  #define HIBERNATION_FLAG	1
> >  #define DEEPSLEEP_FLAG		2
> >  
> > +#define CPLD_FLAG	1
> > +#define FPGA_FLAG	2
> 
> What is this?

We have two kind of boards, QDS and RDB. They have different register
map. Use the flag to indicate the current board is using which kind
of register map.

> 
> >  #ifndef __ASSEMBLY__
> >  extern void booke_cpu_state_save(void *buf, int type);
> >  extern void *booke_cpu_state_restore(void *buf, int type);
> > diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> > index e59d6de..ea9bc28 100644
> > --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> > +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> > @@ -318,6 +318,23 @@ flush_backside_L2_cache:
> >  2:
> >  	blr
> >  
> > +#define CPC_CPCCSR0		0x0
> > +#define CPC_CPCCSR0_CPCFL	0x800
> 
> This is supposed to be CPU setup, not platform setup.
> 
> > +/* r3 : the base address of CPC  */
> > +_GLOBAL(fsl_flush_cpc_cache)
> > +	lwz	r6, CPC_CPCCSR0(r3)
> > +	ori	r6, r6, CPC_CPCCSR0_CPCFL
> > +	stw	r6, CPC_CPCCSR0(r3)
> > +	sync
> > +
> > +	/* Wait until completing the flush */
> > +1:	lwz	r6, CPC_CPCCSR0(r3)
> > +	andi.	r6, r6, CPC_CPCCSR0_CPCFL
> > +	bne	1b
> > +
> > +	blr
> > +
> >  _GLOBAL(__flush_caches_e500v2)
> 
> I'm not sure that this belongs here either.

Will find a better place.

> 
> >  	mflr r0
> >  	bl	flush_dcache_L1
> > diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
> > index 20204fe..3285752 100644
> > --- a/arch/powerpc/kernel/head_fsl_booke.S
> > +++ b/arch/powerpc/kernel/head_fsl_booke.S
> > @@ -162,6 +162,19 @@ _ENTRY(__early_start)
> >  #include "fsl_booke_entry_mapping.S"
> >  #undef ENTRY_MAPPING_BOOT_SETUP
> >  
> > +#if defined(CONFIG_SUSPEND) && defined(CONFIG_FSL_CORENET_RCPM)
> > +	/* if deep_sleep_flag != 0, jump to the deep sleep resume entry */
> > +	LOAD_REG_ADDR(r4, deep_sleep_flag)
> > +	lwz	r3, 0(r4)
> > +	cmpwi	r3, 0
> > +	beq	11f
> > +	/* clear deep_sleep_flag */
> > +	li	r3, 0
> > +	stw	r3, 0(r4)
> > +	b	fsl_deepsleep_resume
> > +11:
> > +#endif
> 
> Why do you come in via the normal kernel entry, versus specifying a
> direct entry point for deep sleep resume?  How does U-Boot even know
> what the normal entry is when resuming?

I wish to return to a specified point (like 64-bit mode), but the code in
fsl_booke_entry_mapping.S only can run in the first page. Because it
only setups a temp mapping of 4KB.

> 
> Be careful of the "beq set_ivor" in the CONFIG_RELOCATABLE section
> above.  Also you probably don't want the relocation code to run again
> when resuming.

When resuming, will run from the point __early_start. Don't run the code
in the CONFIG_RELOCATABLE section.

> 
> > +#if defined(CONFIG_SUSPEND) && defined(CONFIG_FSL_CORENET_RCPM)
> > +_ENTRY(__entry_deep_sleep)
> > +/*
> > + * Bootloader will jump to here when resuming from deep sleep.
> > + * After executing the init code in fsl_booke_entry_mapping.S,
> > + * will jump to the real resume entry.
> > + */
> > +	li	r8, 1
> > +	bl	12f
> > +12:	mflr	r9
> > +	addi	r9, r9, (deep_sleep_flag - 12b)
> > +	stw	r8, 0(r9)
> > +	b __early_start
> > +deep_sleep_flag:
> > +	.long	0
> > +#endif
> 
> It's a bit ambiguous to say "entry_deep_sleep" when it's resuming rather
> than entering...

How about __fsl_entry_resume?

> 
> So you do have a special entry point.  Why do you go to __early_start
> only to quickly test a flag and branch away?
> 
> > diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
> > index 7fae817..9a4ea86 100644
> > --- a/arch/powerpc/platforms/85xx/Makefile
> > +++ b/arch/powerpc/platforms/85xx/Makefile
> > @@ -3,7 +3,7 @@
> >  #
> >  obj-$(CONFIG_SMP) += smp.o
> >  ifeq ($(CONFIG_FSL_CORENET_RCPM), y)
> > -obj-$(CONFIG_SUSPEND)	+= qoriq_pm.o
> > +obj-$(CONFIG_SUSPEND)	+= qoriq_pm.o deepsleep.o sleep.o
> >  endif
> >  
> >  obj-y += common.o
> > diff --git a/arch/powerpc/platforms/85xx/deepsleep.c b/arch/powerpc/platforms/85xx/deepsleep.c
> > new file mode 100644
> > index 0000000..ddd7185
> > --- /dev/null
> > +++ b/arch/powerpc/platforms/85xx/deepsleep.c
> > @@ -0,0 +1,201 @@
> > +/*
> > + * Support deep sleep feature
> 
> AFAICT this supports deep sleep on T1040, not on all 85xx that has it.

Yes, for now T1040 and T1042.

> 
> > + *
> > + * Copyright 2014 Freescale Semiconductor Inc.
> > + *
> > + * Author: Chenhui Zhao <chenhui.zhao@freescale.com>
> > + *
> > + * This program is free software; you can redistribute	it and/or modify it
> > + * under  the terms of	the GNU General	 Public License as published by the
> > + * Free Software Foundation;  either version 2 of the  License, or (at your
> > + * option) any later version.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/of_platform.h>
> > +#include <linux/of_address.h>
> > +#include <asm/machdep.h>
> > +#include <sysdev/fsl_soc.h>
> > +#include <asm/booke_save_regs.h>
> > +
> > +#define SIZE_1MB	0x100000
> > +#define SIZE_2MB	0x200000
> > +
> > +#define CCSR_SCFG_DPSLPCR	0xfc000
> > +#define CCSR_SCFG_DPSLPCR_WDRR_EN	0x1
> > +#define CCSR_SCFG_SPARECR2	0xfc504
> > +#define CCSR_SCFG_SPARECR3	0xfc508
> > +
> > +#define CCSR_GPIO1_GPDIR	0x130000
> > +#define CCSR_GPIO1_GPODR	0x130004
> > +#define CCSR_GPIO1_GPDAT	0x130008
> > +#define CCSR_GPIO1_GPDIR_29		0x4
> > +
> > +/* 128 bytes buffer for restoring data broke by DDR training initialization */
> > +#define DDR_BUF_SIZE	128
> > +static u8 ddr_buff[DDR_BUF_SIZE] __aligned(64);
> > +
> > +static void *dcsr_base, *ccsr_base, *pld_base;
> > +static int pld_flag;
> > +
> > +int fsl_dp_iomap(void)
> > +{
> > +	struct device_node *np;
> > +	const u32 *prop;
> > +	int ret = 0;
> > +	u64 ccsr_phy_addr, dcsr_phy_addr;
> > +
> > +	np = of_find_node_by_type(NULL, "soc");
> > +	if (!np) {
> > +		pr_err("%s: Can't find the node of \"soc\"\n", __func__);
> > +		ret = -EINVAL;
> > +		goto ccsr_err;
> > +	}
> > +	prop = of_get_property(np, "ranges", NULL);
> > +	if (!prop) {
> > +		pr_err("%s: Can't find the property of \"ranges\"\n", __func__);
> > +		of_node_put(np);
> > +		ret = -EINVAL;
> > +		goto ccsr_err;
> > +	}
> 
> Use get_immrbase(), or better use specific nodes in the device tree.
> 
> > +	ccsr_phy_addr = of_translate_address(np, prop + 1);
> > +	ccsr_base = ioremap((phys_addr_t)ccsr_phy_addr, SIZE_2MB);
> > +	of_node_put(np);
> > +	if (!ccsr_base) {
> > +		ret = -ENOMEM;
> > +		goto ccsr_err;
> > +	}
> 
> Unnecessary cast.
> 
> Why 2 MiB?

All registers used are inside the 2MB address space.

> 
> > +	np = of_find_compatible_node(NULL, NULL, "fsl,dcsr");
> > +	if (!np) {
> > +		pr_err("%s: Can't find the node of \"fsl,dcsr\"\n", __func__);
> > +		ret = -EINVAL;
> > +		goto dcsr_err;
> > +	}
> > +	prop = of_get_property(np, "ranges", NULL);
> > +	if (!prop) {
> > +		pr_err("%s: Can't find the property of \"ranges\"\n", __func__);
> > +		of_node_put(np);
> > +		ret = -EINVAL;
> > +		goto dcsr_err;
> > +	}
> > +	dcsr_phy_addr = of_translate_address(np, prop + 1);
> > +	dcsr_base = ioremap((phys_addr_t)dcsr_phy_addr, SIZE_1MB);
> > +	of_node_put(np);
> > +	if (!dcsr_base) {
> > +		ret = -ENOMEM;
> > +		goto dcsr_err;
> > +	}
> 
> If you must do this, add a helper to get the dcsr base -- but do we not
> already have dcsr subnodes for what you are using?
> 
> > +
> > +	np = of_find_compatible_node(NULL, NULL, "fsl,fpga-qixis");
> > +	if (np) {
> > +		pld_flag = FPGA_FLAG;
> > +	} else {
> > +		np = of_find_compatible_node(NULL, NULL, "fsl,p104xrdb-cpld");
> > +		if (np) {
> > +			pld_flag = CPLD_FLAG;
> > +		} else {
> > +			pr_err("%s: Can't find the FPGA/CPLD node\n",
> > +					__func__);
> > +			ret = -EINVAL;
> > +			goto pld_err;
> > +		}
> > +	}
> 
> OK, so this file isn't even specific to T1040 -- it's specific to our
> reference boards?

Yes. There are some FPGA/CPLD setting which needed by deep sleep.

> 
> > +static void fsl_dp_ddr_save(void *ccsr_base)
> > +{
> > +	u32 ddr_buff_addr;
> > +
> > +	/*
> > +	 * DDR training initialization will break 128 bytes at the beginning
> > +	 * of DDR, therefore, save them so that the bootloader will restore
> > +	 * them. Assume that DDR is mapped to the address space started with
> > +	 * CONFIG_PAGE_OFFSET.
> > +	 */
> > +	memcpy(ddr_buff, (void *)CONFIG_PAGE_OFFSET, DDR_BUF_SIZE);
> > +
> > +	/* assume ddr_buff is in the physical address space of 4GB */
> > +	ddr_buff_addr = (u32)(__pa(ddr_buff) & 0xffffffff);
> 
> That assumption may break with a relocatable kernel.
> 
> > +
> > +}
> > +
> > +int fsl_enter_epu_deepsleep(void)
> > +{
> > +
> > +
> 
> No blank lines at begin/end of function.
> 
> > diff --git a/arch/powerpc/platforms/85xx/qoriq_pm.c b/arch/powerpc/platforms/85xx/qoriq_pm.c
> > index 915b13b..5f2c016 100644
> > --- a/arch/powerpc/platforms/85xx/qoriq_pm.c
> > +++ b/arch/powerpc/platforms/85xx/qoriq_pm.c
> > @@ -20,6 +20,8 @@
> >  #define FSL_SLEEP		0x1
> >  #define FSL_DEEP_SLEEP		0x2
> >  
> > +int (*fsl_enter_deepsleep)(void);
> > +
> >  /* specify the sleep state of the present platform */
> >  int sleep_pm_state;
> >  /* supported sleep modes by the present platform */
> > @@ -28,6 +30,7 @@ static unsigned int sleep_modes;
> >  static int qoriq_suspend_enter(suspend_state_t state)
> >  {
> >  	int ret = 0;
> > +	int cpu;
> >  
> >  	switch (state) {
> >  	case PM_SUSPEND_STANDBY:
> > @@ -39,6 +42,17 @@ static int qoriq_suspend_enter(suspend_state_t state)
> >  
> >  		break;
> >  
> > +	case PM_SUSPEND_MEM:
> > +
> > +		cpu = smp_processor_id();
> > +		qoriq_pm_ops->irq_mask(cpu);
> > +
> > +		ret = fsl_enter_deepsleep();
> > +
> > +		qoriq_pm_ops->irq_unmask(cpu);
> > +
> > +		break;
> > +
> >  	default:
> >  		ret = -EINVAL;
> >  
> > @@ -52,12 +66,30 @@ static int qoriq_suspend_valid(suspend_state_t state)
> >  	if (state == PM_SUSPEND_STANDBY && (sleep_modes & FSL_SLEEP))
> >  		return 1;
> >  
> > +	if (state == PM_SUSPEND_MEM && (sleep_modes & FSL_DEEP_SLEEP))
> > +		return 1;
> > +
> >  	return 0;
> >  }
> >  
> > +static int qoriq_suspend_begin(suspend_state_t state)
> > +{
> > +	if (state == PM_SUSPEND_MEM)
> > +		return fsl_dp_iomap();
> > +
> > +	return 0;
> > +}
> > +
> > +static void qoriq_suspend_end(void)
> > +{
> > +	fsl_dp_iounmap();
> > +}
> > +
> >  static const struct platform_suspend_ops qoriq_suspend_ops = {
> >  	.valid = qoriq_suspend_valid,
> >  	.enter = qoriq_suspend_enter,
> > +	.begin = qoriq_suspend_begin,
> > +	.end = qoriq_suspend_end,
> >  };
> >  
> >  static int __init qoriq_suspend_init(void)
> > @@ -71,6 +103,12 @@ static int __init qoriq_suspend_init(void)
> >  	if (np)
> >  		sleep_pm_state = PLAT_PM_LPM20;
> >  
> > +	np = of_find_compatible_node(NULL, NULL, "fsl,t1040-rcpm");
> > +	if (np) {
> > +		fsl_enter_deepsleep = fsl_enter_epu_deepsleep;
> > +		sleep_modes |= FSL_DEEP_SLEEP;
> > +	}
> > +
> >  	suspend_set_ops(&qoriq_suspend_ops);
> >  
> >  	return 0;
> > diff --git a/arch/powerpc/platforms/85xx/sleep.S b/arch/powerpc/platforms/85xx/sleep.S
> > new file mode 100644
> > index 0000000..95a5746
> > --- /dev/null
> > +++ b/arch/powerpc/platforms/85xx/sleep.S
> > @@ -0,0 +1,295 @@
> > +/*
> > + * Implement the low level part of deep sleep
> > + *
> > + * Copyright 2014 Freescale Semiconductor Inc.
> > + *
> > + * This program is free software; you can redistribute	it and/or modify it
> > + * under  the terms of	the GNU General	 Public License as published by the
> > + * Free Software Foundation;  either version 2 of the  License, or (at your
> > + * option) any later version.
> > + */
> > +
> > +#include <asm/page.h>
> > +#include <asm/ppc_asm.h>
> > +#include <asm/reg.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/booke_save_regs.h>
> > +#include <asm/mmu.h>
> > +
> > +#define FSLDELAY(count)		\
> > +	li	r3, (count)@l;	\
> > +	slwi	r3, r3, 10;	\
> > +	mtctr	r3;		\
> > +101:	nop;			\
> > +	bdnz	101b;
> 
> You don't need a namespace prefix on local macros in a non-header file.
> 
> Is the timebase stopped where you're calling this from?

No. My purpose is to avoid jump in the last stage of entering deep sleep.
Jump may cause problem at that time.

> 
> > +#define FSL_DIS_ALL_IRQ		\
> > +	mfmsr	r8;			\
> > +	rlwinm	r8, r8, 0, ~MSR_CE;	\
> > +	rlwinm	r8, r8, 0, ~MSR_ME;	\
> > +	rlwinm	r8, r8, 0, ~MSR_EE;	\
> > +	rlwinm	r8, r8, 0, ~MSR_DE;	\
> > +	mtmsr	r8;			\
> > +	isync
> > +
> > +
> > +	.section .data
> > +	.align	6
> > +booke_regs_buffer:
> > +	.space REGS_BUFFER_SIZE
> > +
> > +	.section .txt
> > +	.align 6
> > +
> > +_GLOBAL(fsl_dp_enter_low)
> > +deepsleep_start:
> > +	LOAD_REG_ADDR(r9, buf_tmp)
> > +	PPC_STL	r3, 0(r9)
> > +	PPC_STL	r4, 8(r9)
> > +	PPC_STL	r5, 16(r9)
> > +	PPC_STL	r6, 24(r9)
> > +
> > +	LOAD_REG_ADDR(r3, booke_regs_buffer)
> > +	/* save the return address */
> > +	mflr	r5
> > +	PPC_STL r5, SR_LR(r3)
> > +	mfmsr	r5
> > +	PPC_STL r5, SR_MSR(r3)
> > +	li	r4, DEEPSLEEP_FLAG
> > +	bl	booke_cpu_state_save
> > +
> > +	LOAD_REG_ADDR(r9, buf_tmp)
> > +	PPC_LL	r31, 0(r9)
> > +	PPC_LL	r30, 8(r9)
> > +	PPC_LL	r29, 16(r9)
> > +	PPC_LL	r28, 24(r9)
> > +
> > +	/* flush caches */
> > +	LOAD_REG_ADDR(r3, cur_cpu_spec)
> > +	PPC_LL	r3, 0(r3)
> > +	PPC_LL	r3, CPU_FLUSH_CACHES(r3)
> > +	PPC_LCMPI  0, r3, 0
> > +	beq	6f
> > +#ifdef CONFIG_PPC64
> > +	PPC_LL	r3, 0(r3)
> > +#endif
> > +	mtctr	r3
> > +	bctrl
> > +6:
> > +#define CPC_OFFSET	0x10000
> > +	mr	r3, r31
> > +	addis	r3, r3, CPC_OFFSET@h
> > +	bl	fsl_flush_cpc_cache
> > +
> > +	LOAD_REG_ADDR(r8, deepsleep_start)
> > +	LOAD_REG_ADDR(r9, deepsleep_end)
> > +
> > +	/* prefecth TLB */
> > +#define CCSR_GPIO1_GPDAT	0x130008
> > +#define CCSR_GPIO1_GPDAT_29	0x4
> > +	LOAD_REG_IMMEDIATE(r11, CCSR_GPIO1_GPDAT)
> > +	add	r11, r31, r11
> > +	lwz	r10, 0(r11)
> > +
> > +#define CCSR_RCPM_PCPH15SETR	0xe20b4
> > +#define CCSR_RCPM_PCPH15SETR_CORE0	0x1
> > +	LOAD_REG_IMMEDIATE(r12, CCSR_RCPM_PCPH15SETR)
> > +	add	r12, r31, r12
> > +	lwz	r10, 0(r12)
> > +
> > +#define CCSR_DDR_SDRAM_CFG_2	0x8114
> > +#define CCSR_DDR_SDRAM_CFG_2_FRC_SR	0x80000000
> > +	LOAD_REG_IMMEDIATE(r13, CCSR_DDR_SDRAM_CFG_2)
> > +	add	r13, r31, r13
> > +	lwz	r10, 0(r13)
> > +
> > +#define	DCSR_EPU_EPGCR		0x000
> > +#define DCSR_EPU_EPGCR_GCE	0x80000000
> > +	li	r14, DCSR_EPU_EPGCR
> > +	add	r14, r30, r14
> > +	lwz	r10, 0(r14)
> > +
> > +#define	DCSR_EPU_EPECR15	0x33C
> > +#define DCSR_EPU_EPECR15_IC0	0x80000000
> > +	li	r15, DCSR_EPU_EPECR15
> > +	add	r15, r30, r15
> > +	lwz	r10, 0(r15)
> > +
> > +#define CCSR_SCFG_QMCRDTRSTCR		0xfc40c
> > +#define CCSR_SCFG_QMCRDTRSTCR_CRDTRST	0x80000000
> > +	LOAD_REG_IMMEDIATE(r16, CCSR_SCFG_QMCRDTRSTCR)
> > +	add	r16, r31, r16
> > +	lwz	r10, 0(r16)
> > +
> > +/*
> > + * There are two kind of register maps, one for CPLD and the other for FPGA
> > + */
> > +#define CPLD_MISCCSR		0x17
> > +#define CPLD_MISCCSR_SLEEPEN	0x40
> > +#define QIXIS_PWR_CTL2		0x21
> > +#define QIXIS_PWR_CTL2_PCTL	0x2
> > +	PPC_LCMPI  0, r28, FPGA_FLAG
> > +	beq	20f
> > +	addi	r29, r29, CPLD_MISCCSR
> > +20:
> > +	addi	r29, r29, QIXIS_PWR_CTL2
> > +	lbz	r10, 0(r29)
> 
> 
> Again, this is not marked as a board-specific file.  How do you expect
> customers to adapt this mechanism to their boards?

Will add comment.

> 
> > +
> > +	/* prefecth code to cache so that executing code after disable DDR */
> > +1:	lwz	r3, 0(r8)
> > +	addi	r8, r8, 4
> > +	cmpw	r8, r9
> > +	blt	1b
> > +	msync
> 
> Instructions don't execute from dcache...  I guess you're assuming it
> will allocate in, and stay in, L2/CPC.  It'd be safer to lock those
> cache lines in icache, or copy the code to SRAM.

Yes, they should be in L2/CPC. Will try to lock them in icache.

> 
> > +	FSL_DIS_ALL_IRQ
> > +
> > +	/*
> > +	 * Place DDR controller in self refresh mode.
> > +	 * From here on, DDR can't be access any more.
> > +	 */
> > +	lwz	r10, 0(r13)
> > +	oris	r10, r10, CCSR_DDR_SDRAM_CFG_2_FRC_SR@h
> > +	stw	r10, 0(r13)
> > +
> > +	/* can't call udelay() here, so use a macro to delay */
> > +	FSLDELAY(50)
> 
> A timebase loop doesn't require accessing DDR.

Don't wish to jump at this time. Maybe cause problems.

> 
> You also probably want to do a "sync, readback, data dependency, isync"
> sequence to make sure that the store has hit CCSR before you begin your
> delay (or is a delay required at all if you do that?).

Yes. It is safer with a sync sequence.

The DDR controller need some time to signal the external DDR modules to
enter self refresh mode.

-Chenhui

^ permalink raw reply

* Re: [PATCH 8/9] powerpc/85xx: add save/restore functions for core registers
From: Chenhui Zhao @ 2014-03-12  9:42 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, linux-kernel, Jason.Jin, Wang Dongsheng-B40534
In-Reply-To: <1394585114.13761.112.camel@snotra.buserror.net>

On Tue, Mar 11, 2014 at 07:45:14PM -0500, Scott Wood wrote:
> On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > From: Wang Dongsheng <dongsheng.wang@freescale.com>
> > 
> > Add booke_cpu_state_save() and booke_cpu_state_restore() functions which can be
> > used to save/restore CPU's registers in the case of deep sleep and hibernation.
> > 
> > Supported processors: E6500, E5500, E500MC, E500v2 and E500v1.
> > 
> > Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> > Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
> > ---
> >  arch/powerpc/include/asm/booke_save_regs.h |   96 ++++++++
> >  arch/powerpc/kernel/Makefile               |    1 +
> >  arch/powerpc/kernel/booke_save_regs.S      |  361 ++++++++++++++++++++++++++++
> >  3 files changed, 458 insertions(+), 0 deletions(-)
> >  create mode 100644 arch/powerpc/include/asm/booke_save_regs.h
> >  create mode 100644 arch/powerpc/kernel/booke_save_regs.S
> > 
> > diff --git a/arch/powerpc/include/asm/booke_save_regs.h b/arch/powerpc/include/asm/booke_save_regs.h
> > new file mode 100644
> > index 0000000..87c357a
> > --- /dev/null
> > +++ b/arch/powerpc/include/asm/booke_save_regs.h
> > @@ -0,0 +1,96 @@
> > +/*
> > + *  Save/restore e500 series core registers
> 
> Filename says booke, comment says e500.
> 
> Filename and comment also fail to point out that this is specifically
> for standby/suspend, not for hibernate which is implemented in
> swsusp_booke.S/swsusp_asm64.S.

Sorry for inconsistency. Will changes e500 to booke.
Hibernation and suspend can share the code.

> 
> > + *
> > + * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
> > + *
> > + * Copyright 2014 Freescale Semiconductor Inc.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#ifndef __ASM_FSL_SLEEP_H
> > +#define __ASM_FSL_SLEEP_H
> > +
> > +/*
> > + * 8 bytes for each register, which is compatible with
> > + * both 32-bit and 64-bit registers
> > + *
> > + * Acronyms:
> > + *	dw(data width)	0x08
> > + *
> > + * Map:
> > + * General-Purpose Registers
> > + *	GPR1(sp)		0
> > + *	GPR2			0x8		(dw * 1)
> > + *	GPR13 - GPR31		0x10 ~ 0xa0	(dw * 2 ~ dw * 20)
> 
> Putting these values in a comment separate from the code that defines it
> sounds like a good way to get a mismatch between the two.

Ok.

> 
> > + * Foating-point registers
> > + *	FPR14 - FPR31		0xa8 ~ 0x130	(dw * 21 ~ dw * 38)
> 
> Call enable_kernel_fp() or similar, rather than saving FP regs here.
> Likewise with altivec.  And why starting at FPR14?  Volatile versus
> nonvolatile is irrelevant because Linux doesn't participate in the FP
> ABI.  Everything is non-volatile *if* we have a user FP context
> resident, and everything is volatile otherwise.

Will remove them.

> 
> > + * Timer Registers
> > + *	TCR			0x168		(dw * 45)
> > + *	TB(64bit)		0x170		(dw * 46)
> > + *	TBU(32bit)		0x178		(dw * 47)
> > + *	TBL(32bit)		0x180		(dw * 48)
> 
> Why are you saving TBU/L separate from TB?  They're the same thing.

Will remove TBU and TBL.

> 
> > + * Interrupt Registers
> > + *	IVPR			0x188		(dw * 49)
> > + *	IVOR0 - IVOR15		0x190 ~ 0x208	(dw * 50 ~ dw * 65)
> > + *	IVOR32 - IVOR41		0x210 ~ 0x258	(dw * 66 ~ dw * 75)
> 
> What about IVOR42 (LRAT error)?

Will add it.

> 
> > + * Software-Use Registers
> > + *	SPRG1			0x260		(dw * 76), 64-bit need to save.
> > + *	SPRG3			0x268		(dw * 77), 32-bit need to save.
> 
> What about "CPU and NUMA node for VDSO getcpu" on 64-bit?  Currently
> SPRG3, but it will need to change for critical interrupt support.
> 
> > + * MMU Registers
> > + *	PID0 - PID2		0x270 ~ 0x280	(dw * 78 ~ dw * 80)
> 
> PID1/PID2 are e500v1/v2 only -- and Linux doesn't use them outside of
> KVM (and you're not in KVM when you're running this code).
> 
> Are we ever going to have a non-zero PID at this point?

I incline to the view that saving all registers regardless of used or
unused. The good point is that it can be compliant to the future
changes of the usage of registers.

What do you think?

> 
> > + * Debug Registers
> > + *	DBCR0 - DBCR2		0x288 ~ 0x298	(dw * 81 ~ dw * 83)
> > + *	IAC1 - IAC4		0x2a0 ~ 0x2b8	(dw * 84 ~ dw * 87)
> > + *	DAC1 - DAC2		0x2c0 ~ 0x2c8	(dw * 88 ~ dw * 89)
> > + *
> > + */
> 
> IAC3-4 are not implemented on e500.
> 
> Do we really need to save the debug registers?  We're not going to be in
> a debugged process when we do suspend.  If the concern is kgdb, it
> probably needs to be told to get out of the way during suspend for other
> reasons.

I think in the ideal case the suspend would not break any context. We
should try to save/restore all cpu state. Of cause, trade-off is
unavoidable in practice.

> 
> > +#define SR_GPR1			0x000
> > +#define SR_GPR2			0x008
> > +#define SR_GPR13		0x010
> > +#define SR_FPR14		0x0a8
> > +#define SR_CR			0x138
> > +#define SR_LR			0x140
> > +#define SR_MSR			0x148
> 
> These are very vague names to be putting in a header file.

How about BOOKE_xx_OFFSET?

> 
> > +/*
> > + * hibernation and deepsleep save/restore different number of registers,
> > + * use these flags to indicate.
> > + */
> > +#define HIBERNATION_FLAG	1
> > +#define DEEPSLEEP_FLAG		2
> 
> Again, namespacing -- but why is hibernation using this at all?  What's
> wrong with the existing hibernation support?

How about BOOKE_HIBERNATION_FLAG?

Just wish to share code between hibernation and suspend.

> 
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index fcc9a89..64acae6 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -73,6 +73,7 @@ obj-$(CONFIG_HIBERNATION)	+= swsusp_booke.o
> >  else
> >  obj-$(CONFIG_HIBERNATION)	+= swsusp_$(CONFIG_WORD_SIZE).o
> >  endif
> > +obj-$(CONFIG_E500)		+= booke_save_regs.o
> 
> Shouldn't this depend on whether suspend is enabled?
> 
> > diff --git a/arch/powerpc/kernel/booke_save_regs.S b/arch/powerpc/kernel/booke_save_regs.S
> > new file mode 100644
> > index 0000000..9ccd576
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/booke_save_regs.S
> > @@ -0,0 +1,361 @@
> > +/*
> > + * Freescale Power Management, Save/Restore core state
> > + *
> > + * Copyright 2014 Freescale Semiconductor, Inc.
> > + * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#include <asm/ppc_asm.h>
> > +#include <asm/booke_save_regs.h>
> > +
> > +/*
> > + * Supported Core List:
> > + * E500v1, E500v2, E500MC, E5500, E6500.
> > + */
> > +
> > +/* Save/Restore register operation define */
> > +#define do_save_gpr_reg(reg, addr) \
> > +	PPC_STL reg, addr(r10)
> > +
> > +#define do_save_fpr_reg(reg, addr) \
> > +	stfd	reg, addr(r10)
> > +
> > +#define do_save_spr_reg(reg, addr) \
> > +	mfspr	r0, SPRN_##reg ;\
> > +	PPC_STL r0, addr(r10)
> > +
> > +#define do_save_special_reg(special, addr) \
> > +	mf##special	r0 ;\
> > +	PPC_STL r0, addr(r10)
> > +
> > +#define do_restore_gpr_reg(reg, addr) \
> > +	PPC_LL reg, addr(r10)
> > +
> > +#define do_restore_fpr_reg(reg, addr) \
> > +	lfd	reg, addr(r10)
> > +
> > +#define do_restore_spr_reg(reg, addr) \
> > +	PPC_LL r0, addr(r10) ;\
> > +	mtspr	SPRN_##reg, r0
> > +
> > +#define do_restore_special_reg(special, addr) \
> > +	PPC_LL r0, addr(r10) ;\
> > +	mt##special	r0
> > +
> > +#define do_sr_general_gpr_regs(func) \
> > +	do_##func##_gpr_reg(r1, SR_GPR1) ;\
> > +	do_##func##_gpr_reg(r2, SR_GPR2) ;\
> > +	do_##func##_gpr_reg(r13, SR_GPR13 + 0x00) ;\
> > +	do_##func##_gpr_reg(r14, SR_GPR13 + 0x08) ;\
> > +	do_##func##_gpr_reg(r15, SR_GPR13 + 0x10) ;\
> > +	do_##func##_gpr_reg(r16, SR_GPR13 + 0x18) ;\
> > +	do_##func##_gpr_reg(r17, SR_GPR13 + 0x20) ;\
> > +	do_##func##_gpr_reg(r18, SR_GPR13 + 0x28) ;\
> > +	do_##func##_gpr_reg(r19, SR_GPR13 + 0x30) ;\
> > +	do_##func##_gpr_reg(r20, SR_GPR13 + 0x38) ;\
> > +	do_##func##_gpr_reg(r21, SR_GPR13 + 0x40) ;\
> > +	do_##func##_gpr_reg(r22, SR_GPR13 + 0x48) ;\
> > +	do_##func##_gpr_reg(r23, SR_GPR13 + 0x50) ;\
> > +	do_##func##_gpr_reg(r24, SR_GPR13 + 0x58) ;\
> > +	do_##func##_gpr_reg(r25, SR_GPR13 + 0x60) ;\
> > +	do_##func##_gpr_reg(r26, SR_GPR13 + 0x68) ;\
> > +	do_##func##_gpr_reg(r27, SR_GPR13 + 0x70) ;\
> > +	do_##func##_gpr_reg(r28, SR_GPR13 + 0x78) ;\
> > +	do_##func##_gpr_reg(r29, SR_GPR13 + 0x80) ;\
> > +	do_##func##_gpr_reg(r30, SR_GPR13 + 0x88) ;\
> > +	do_##func##_gpr_reg(r31, SR_GPR13 + 0x90)
> > +
> > +#define do_sr_general_pcr_regs(func) \
> > +	do_##func##_spr_reg(EPCR, SR_EPCR) ;\
> > +	do_##func##_spr_reg(HID0, SR_HID0 + 0x00)
> > +
> > +#define do_sr_e500_pcr_regs(func) \
> > +	do_##func##_spr_reg(HID1, SR_HID0 + 0x08)
> 
> PCR?
> 
> In the comments you said "Only e500, e500v2 need to save HID0 - HID1",
> yet you save HID0 in the "general" macro.

Will change it.

> 
> > +#define do_sr_save_tb_regs \
> > +	do_save_spr_reg(TBRU, SR_TBU) ;\
> > +	do_save_spr_reg(TBRL, SR_TBL)
> 
> What if TBU increments between those two reads?  Use the standard
> sequence to read the timebase.
> 
> > +#define do_sr_interrupt_regs(func) \
> > +	do_##func##_spr_reg(IVPR, SR_IVPR) ;\
> > +	do_##func##_spr_reg(IVOR0, SR_IVOR0 + 0x00) ;\
> > +	do_##func##_spr_reg(IVOR1, SR_IVOR0 + 0x08) ;\
> > +	do_##func##_spr_reg(IVOR2, SR_IVOR0 + 0x10) ;\
> > +	do_##func##_spr_reg(IVOR3, SR_IVOR0 + 0x18) ;\
> > +	do_##func##_spr_reg(IVOR4, SR_IVOR0 + 0x20) ;\
> > +	do_##func##_spr_reg(IVOR5, SR_IVOR0 + 0x28) ;\
> > +	do_##func##_spr_reg(IVOR6, SR_IVOR0 + 0x30) ;\
> > +	do_##func##_spr_reg(IVOR7, SR_IVOR0 + 0x38) ;\
> > +	do_##func##_spr_reg(IVOR8, SR_IVOR0 + 0x40) ;\
> > +	do_##func##_spr_reg(IVOR10, SR_IVOR0 + 0x50) ;\
> > +	do_##func##_spr_reg(IVOR11, SR_IVOR0 + 0x58) ;\
> > +	do_##func##_spr_reg(IVOR12, SR_IVOR0 + 0x60) ;\
> > +	do_##func##_spr_reg(IVOR13, SR_IVOR0 + 0x68) ;\
> > +	do_##func##_spr_reg(IVOR14, SR_IVOR0 + 0x70) ;\
> > +	do_##func##_spr_reg(IVOR15, SR_IVOR0 + 0x78)
> > +
> > +#define do_e500_sr_interrupt_regs(func) \
> > +	do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
> > +	do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
> > +	do_##func##_spr_reg(IVOR34, SR_IVOR32 + 0x10)
> > +
> > +#define do_e500mc_sr_interrupt_regs(func) \
> > +	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
> > +	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
> > +	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
> > +	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
> > +	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
> > +	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
> > +	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
> > +	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
> > +
> > +#define do_e5500_sr_interrupt_regs(func) \
> > +	do_e500mc_sr_interrupt_regs(func)
> > +
> > +#define do_e6500_sr_interrupt_regs(func) \
> > +	do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
> > +	do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
> > +	do_e5500_sr_interrupt_regs(func)
> > +
> > +#define do_sr_general_software_regs(func) \
> > +	do_##func##_spr_reg(SPRG1, SR_SPRG1) ;\
> > +	do_##func##_spr_reg(SPRG3, SR_SPRG3) ;\
> > +	do_##func##_spr_reg(PID0, SR_PID0)
> > +
> > +#define do_sr_e500_mmu_regs(func) \
> > +	do_##func##_spr_reg(PID1, SR_PID0 + 0x08) ;\
> > +	do_##func##_spr_reg(PID2, SR_PID0 + 0x10)
> > +
> > +#define do_sr_debug_regs(func) \
> > +	do_##func##_spr_reg(DBCR0, SR_DBCR0 + 0x00) ;\
> > +	do_##func##_spr_reg(DBCR1, SR_DBCR0 + 0x08) ;\
> > +	do_##func##_spr_reg(DBCR2, SR_DBCR0 + 0x10) ;\
> > +	do_##func##_spr_reg(IAC1, SR_IAC1 + 0x00) ;\
> > +	do_##func##_spr_reg(IAC2, SR_IAC1 + 0x08) ;\
> > +	do_##func##_spr_reg(DAC1, SR_DAC1 + 0x00) ;\
> > +	do_##func##_spr_reg(DAC2, SR_DAC1 + 0x08)
> > +
> > +#define do_e6500_sr_debug_regs(func) \
> > +	do_##func##_spr_reg(IAC3, SR_IAC1 + 0x10) ;\
> > +	do_##func##_spr_reg(IAC4, SR_IAC1 + 0x18)
> > +
> > +	.section .text
> > +	.align	5
> > +booke_cpu_base_save:
> > +	do_sr_general_gpr_regs(save)
> > +	do_sr_general_pcr_regs(save)
> > +	do_sr_general_software_regs(save)
> > +1:
> > +	mfspr	r5, SPRN_TBRU
> > +	do_sr_general_time_regs(save)
> > +	mfspr	r6, SPRN_TBRU
> > +	cmpw	r5, r6
> > +	bne	1b
> 
> Oh, here's where you deal with that.  Why?  It just obfuscates things.

Will make it more readable.

> 
> > +booke_cpu_base_restore:
> > +	do_sr_general_gpr_regs(restore)
> > +	do_sr_general_pcr_regs(restore)
> > +	do_sr_general_software_regs(restore)
> > +
> > +	isync
> > +
> > +	/* Restore Time registers, clear tb lower to avoid wrap */
> > +	li	r0, 0
> > +	mtspr	SPRN_TBWL, r0
> > +	do_sr_general_time_regs(restore)
> 
> Why is zeroing TBL not done in the same place as you load the new TB?

Will fix it.

> 
> > +/* Base registers, e500v1, e500v2 need to do some special save/restore */
> > +e500_base_special_save:
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500V1@l
> > +	cmpw	r11, r12
> > +	beq	1f
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500V2@l
> > +	cmpw	r11, r12
> > +	bne	2f
> > +1:
> > +	do_sr_e500_pcr_regs(save)
> > +	do_sr_e500_mmu_regs(save)
> > +2:
> > +	blr
> > +
> > +e500_base_special_restore:
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500V1@l
> > +	cmpw	r11, r12
> > +	beq	1f
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500V2@l
> > +	cmpw	r11, r12
> > +	bne	2f
> > +1:
> > +	do_sr_e500_pcr_regs(save)
> > +	do_sr_e500_mmu_regs(save)
> > +2:
> > +	blr
> 
> Why is this separate from the other CPU-specific "append" code?
> 
> > +booke_cpu_append_save:
> > +	mfspr	r0, SPRN_PVR
> > +	rlwinm	r11, r0, 16, 16, 31
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E6500@l
> > +	cmpw	r11, r12
> > +	beq	e6500_append_save
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E5500@l
> > +	cmpw	r11, r12
> > +	beq	e5500_append_save
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500MC@l
> > +	cmpw	r11, r12
> > +	beq	e500mc_append_save
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500V2@l
> > +	cmpw	r11, r12
> > +	beq	e500v2_append_save
> > +
> > +	lis	r12, 0
> > +	ori	r12, r12, PVR_VER_E500V1@l
> > +	cmpw	r11, r12
> > +	beq	e500v1_append_save
> > +	b	1f
> > +
> > +e6500_append_save:
> > +	do_e6500_sr_interrupt_regs(save)
> > +	do_e6500_sr_debug_regs(save)
> > +	b	1f
> > +
> > +e5500_append_save:
> > +	do_e5500_sr_interrupt_regs(save)
> > +	b	1f
> > +
> > +e500mc_append_save:
> > +	do_e500mc_sr_interrupt_regs(save)
> > +	b	1f
> > +
> > +e500v2_append_save:
> > +e500v1_append_save:
> > +	do_e500_sr_interrupt_regs(save)
> > +1:
> > +	do_sr_interrupt_regs(save)
> > +	do_sr_debug_regs(save)
> > +
> > +	blr
> 
> What is meant by "append" here?

Means extra registers for specific e500 derivative core.

> 
> > +/*
> > + * r3 = the address of buffer
> > + * r4 = type: HIBERNATION_FLAG or DEEPSLEEP_FLAG
> > + */
> > +_GLOBAL(booke_cpu_state_save)
> > +	mflr	r9
> > +	mr	r10, r3
> > +
> > +	cmpwi	r4, HIBERNATION_FLAG
> > +	beq	1f
> > +	bl	booke_cpu_append_save
> > +1:
> > +	bl	e500_base_special_save
> > +	bl	booke_cpu_base_save
> > +
> > +	mtlr	r9
> > +	blr
> 
> You're assuming a special ABI from these subfunctions (e.g. r9
> non-volatile) but I don't see any comment to that effect on those
> functions.
> 
> -Scott

Will add comments.

-Chenhui

^ permalink raw reply

* Re: [PATCH 7/9] fsl: add EPU FSM configuration for deep sleep
From: Chenhui Zhao @ 2014-03-12  8:34 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <1394582923.13761.81.camel@snotra.buserror.net>

On Tue, Mar 11, 2014 at 07:08:43PM -0500, Scott Wood wrote:
> On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > From: Hongbo Zhang <hongbo.zhang@freescale.com>
> > 
> > In the last stage of deep sleep, software will trigger a Finite
> > State Machine (FSM) to control the hardware precedure, such as
> > board isolation, killing PLLs, removing power, and so on.
> > 
> > When the system is waked up by an interrupt, the FSM controls the
> > hardware to complete the early resume precedure.
> > 
> > This patch configure the EPU FSM preparing for deep sleep.
> > 
> > Signed-off-by: Hongbo Zhang <hongbo.zhang@freescale.com>
> > Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
> 
> Couldn't this be part of qoriq_pm.c?

Put the code in drivers/platform/fsl/ so that LS1 can share these code.

> 
> > diff --git a/arch/powerpc/sysdev/fsl_soc.h b/arch/powerpc/sysdev/fsl_soc.h
> > index 9b9a34a..eb83a30 100644
> > --- a/arch/powerpc/sysdev/fsl_soc.h
> > +++ b/arch/powerpc/sysdev/fsl_soc.h
> > @@ -69,5 +69,8 @@ extern const struct fsl_pm_ops *qoriq_pm_ops;
> >  
> >  extern int fsl_rcpm_init(void);
> >  
> > +extern void fsl_dp_fsm_setup(void *dcsr_base);
> > +extern void fsl_dp_fsm_clean(void *dcsr_base);
> 
> __iomem

Thanks. Will add.

> 
> > +
> >  #endif
> >  #endif
> > diff --git a/drivers/platform/Kconfig b/drivers/platform/Kconfig
> > index 09fde58..6539e6d 100644
> > --- a/drivers/platform/Kconfig
> > +++ b/drivers/platform/Kconfig
> > @@ -6,3 +6,7 @@ source "drivers/platform/goldfish/Kconfig"
> >  endif
> >  
> >  source "drivers/platform/chrome/Kconfig"
> > +
> > +if FSL_SOC
> > +source "drivers/platform/fsl/Kconfig"
> > +endif
> 
> Chrome doesn't need an ifdef -- why does this?

Don't wish other platform see these options, and the X86 and GOLDFISH have
ifdefs.

> 
> > diff --git a/drivers/platform/Makefile b/drivers/platform/Makefile
> > index 3656b7b..37c6f72 100644
> > --- a/drivers/platform/Makefile
> > +++ b/drivers/platform/Makefile
> > @@ -6,3 +6,4 @@ obj-$(CONFIG_X86)		+= x86/
> >  obj-$(CONFIG_OLPC)		+= olpc/
> >  obj-$(CONFIG_GOLDFISH)		+= goldfish/
> >  obj-$(CONFIG_CHROME_PLATFORMS)	+= chrome/
> > +obj-$(CONFIG_FSL_SOC)		+= fsl/
> > diff --git a/drivers/platform/fsl/Kconfig b/drivers/platform/fsl/Kconfig
> > new file mode 100644
> > index 0000000..72ed053
> > --- /dev/null
> > +++ b/drivers/platform/fsl/Kconfig
> > @@ -0,0 +1,10 @@
> > +#
> > +# Freescale Specific Power Management Drivers
> > +#
> > +
> > +config FSL_SLEEP_FSM
> > +	bool
> > +	help
> > +	  This driver configures a hardware FSM (Finite State Machine) for deep sleep.
> > +	  The FSM is used to finish clean-ups at the last stage of system entering deep
> > +	  sleep, and also wakes up system when a wake up event happens.
> > diff --git a/drivers/platform/fsl/Makefile b/drivers/platform/fsl/Makefile
> > new file mode 100644
> > index 0000000..d99ca0e
> > --- /dev/null
> > +++ b/drivers/platform/fsl/Makefile
> > @@ -0,0 +1,5 @@
> > +#
> > +# Makefile for linux/drivers/platform/fsl
> > +# Freescale Specific Power Management Drivers
> > +#
> > +obj-$(CONFIG_FSL_SLEEP_FSM)	+= sleep_fsm.o
> 
> Why is this here while the other stuff is in arch/powerpc/sysdev?
> 
> > +/* Block offsets */
> > +#define	RCPM_BLOCK_OFFSET	0x00022000
> > +#define	EPU_BLOCK_OFFSET	0x00000000
> > +#define	NPC_BLOCK_OFFSET	0x00001000
> 
> Why don't these block offsets come from the device tree?

Have maped DCSR registers. Don't wish to remap them.

> 
> > +static void *g_dcsr_base;
> 
> __iomem

OK.

> 
> > +	/* Configure the EPU Counters */
> > +	epu_write(EPCCR15, 0x92840000);
> > +	epu_write(EPCCR14, 0x92840000);
> > +	epu_write(EPCCR12, 0x92840000);
> > +	epu_write(EPCCR11, 0x92840000);
> > +	epu_write(EPCCR10, 0x92840000);
> > +	epu_write(EPCCR9, 0x92840000);
> > +	epu_write(EPCCR8, 0x92840000);
> > +	epu_write(EPCCR5, 0x92840000);
> > +	epu_write(EPCCR4, 0x92840000);
> > +	epu_write(EPCCR2, 0x92840000);
> > +
> > +	/* Configure the SCUs Inputs */
> > +	epu_write(EPSMCR15, 0x76000000);
> > +	epu_write(EPSMCR14, 0x00000031);
> > +	epu_write(EPSMCR13, 0x00003100);
> > +	epu_write(EPSMCR12, 0x7F000000);
> > +	epu_write(EPSMCR11, 0x31740000);
> > +	epu_write(EPSMCR10, 0x65000030);
> > +	epu_write(EPSMCR9, 0x00003000);
> > +	epu_write(EPSMCR8, 0x64300000);
> > +	epu_write(EPSMCR7, 0x30000000);
> > +	epu_write(EPSMCR6, 0x7C000000);
> > +	epu_write(EPSMCR5, 0x00002E00);
> > +	epu_write(EPSMCR4, 0x002F0000);
> > +	epu_write(EPSMCR3, 0x2F000000);
> > +	epu_write(EPSMCR2, 0x6C700000);
> 
> Where do these magic numbers come from?  Which chips are they valid for?

They are for T1040. Can be found in the RCPM chapter of T1040RM.

> 
> > +void fsl_dp_fsm_clean(void *dcsr_base)
> > +{
> > +
> > +	epu_write(EPEVTCR2, 0);
> > +	epu_write(EPEVTCR9, 0);
> > +
> > +	epu_write(EPGCR, 0);
> > +	epu_write(EPECR15, 0);
> > +
> > +	rcpm_write(CSTTACR0, 0);
> > +	rcpm_write(CG1CR0, 0);
> > +
> > +}
> 
> Don't put blank lines at the beginning/end of a block.
> 
> -Scott

Thanks.

-Chenhui

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox