LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: powerpc -next rebase WARNING
From: Josh Boyer @ 2012-05-22 11:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Linux Kernel list
In-Reply-To: <1337661655.2779.167.camel@pasglop>

On Tue, May 22, 2012 at 12:40 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2012-05-22 at 11:51 +1000, Benjamin Herrenschmidt wrote:
>> Folks, bad news ... my fault.
>>
>> I accidentally forgot a --signoff on a git am command last week, meaning
>> that a pair of patches are in -next and not signed off by me.
>>
>> For various (legal) reasons that cannot go into Linus tree as-is, so I
>> have to rebase the tree to fix it.
>>
>> Sorry about that ...
>
> Note that the rebase only affects the top 3 commits, so if your
> tree is based on something older you're fine (Kumar, you seem to
> be ok, I haven't checked Josh).

Should be fine.  Even if not, all 4 of its users will manage to cope.

josh

^ permalink raw reply

* [PATCH] gianfar:don't add FCB length to hard_header_len
From: Jiajun Wu @ 2012-05-22  9:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jiajun Wu, linuxppc-dev

FCB(Frame Control Block) isn't the part of netdev hard header.
Add FCB to hard_header_len will make GRO fail at MAC comparision stage.

Signed-off-by: Jiajun Wu <b06378@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 1adb024..0741ade 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1082,7 +1082,7 @@ static int gfar_probe(struct platform_device *ofdev)
 
 	if (dev->features & NETIF_F_IP_CSUM ||
 			priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER)
-		dev->hard_header_len += GMAC_FCB_LEN;
+		dev->needed_headroom = GMAC_FCB_LEN;
 
 	/* Program the isrg regs only if number of grps > 1 */
 	if (priv->num_grps > 1) {
-- 
1.5.6.5

^ permalink raw reply related

* Re: powerpc -next rebase WARNING
From: Benjamin Herrenschmidt @ 2012-05-22  4:40 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Linux Kernel list
In-Reply-To: <1337651473.2779.140.camel@pasglop>

On Tue, 2012-05-22 at 11:51 +1000, Benjamin Herrenschmidt wrote:
> Folks, bad news ... my fault.
> 
> I accidentally forgot a --signoff on a git am command last week, meaning
> that a pair of patches are in -next and not signed off by me.
> 
> For various (legal) reasons that cannot go into Linus tree as-is, so I
> have to rebase the tree to fix it.
> 
> Sorry about that ...

Note that the rebase only affects the top 3 commits, so if your
tree is based on something older you're fine (Kumar, you seem to
be ok, I haven't checked Josh).

Cheers,
Ben.

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Stephen Rothwell @ 2012-05-22  3:25 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Peter Zijlstra, Linus, LKML, linux-next,
	H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <20120522130354.fdb335eb294f4206b4b2fed5@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

On Tue, 22 May 2012 13:03:54 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> >
> > Yeah, it's sched/numa since that's what introduced numa_init().  It does 
> > for_each_node() for each node and does a kmalloc_node() even though that 
> > node may not be online.  Slub ends up passing this node to the page 
> > allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
> > caught this and your config confirms its not enabled.
> > 
> > sched/numa either needs a memory hotplug notifier or it needs to pass 
> > NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
> > following should fix it.
> > 
> > 
> > sched, numa: Allocate node_queue on any node for offline nodes
> > 
> > struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
> > not (yet) online, otherwise the page allocator has a bad zonelist.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Thanks, that fixes it.
> 
> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>

And I will put that patch in linux-next until it (or something better)
appears.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Michael Neuling @ 2012-05-22  3:12 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <alpine.DEB.2.00.1205211954340.13522@chino.kir.corp.google.com>

David Rientjes <rientjes@google.com> wrote:

> On Tue, 22 May 2012, Michael Neuling wrote:
> 
> > Sorry, got it... CONFIG_DEBUG_VM enabled below...
> > 
> > pid_max: default: 32768 minimum: 301
> > Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> > Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> > Mount-cache hash table entries: 4096
> > Initializing cgroup subsys cpuacct
> > Initializing cgroup subsys devices
> > Initializing cgroup subsys freezer
> > POWER7 performance monitor hardware support registered
> > ------------[ cut here ]------------
> > kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!
> 
> Yeah, this is what I was expecting, it's tripping on
> 
> 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> 
> and slub won't pass nid < 0.  You're sure my patch is applied? :)

I did have your patch applied but at "b4cdf91 sched/numa: Implement numa
balancer" (where git bisect spotted the fail).  

If I apply your patch on the full next-20120521 it does fix the problem.

Sorry for the confusion.

Thanks!
Mikey

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Stephen Rothwell @ 2012-05-22  3:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Peter Zijlstra, Linus, LKML, linux-next,
	H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <alpine.DEB.2.00.1205211846120.20916@chino.kir.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

Hi David,

On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
>
> Yeah, it's sched/numa since that's what introduced numa_init().  It does 
> for_each_node() for each node and does a kmalloc_node() even though that 
> node may not be online.  Slub ends up passing this node to the page 
> allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
> caught this and your config confirms its not enabled.
> 
> sched/numa either needs a memory hotplug notifier or it needs to pass 
> NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
> following should fix it.
> 
> 
> sched, numa: Allocate node_queue on any node for offline nodes
> 
> struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
> not (yet) online, otherwise the page allocator has a bad zonelist.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Thanks, that fixes it.

Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: David Rientjes @ 2012-05-22  2:58 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <3120.1337655100@neuling.org>

On Tue, 22 May 2012, Michael Neuling wrote:

> Sorry, got it... CONFIG_DEBUG_VM enabled below...
> 
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> ------------[ cut here ]------------
> kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!

Yeah, this is what I was expecting, it's tripping on

	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));

and slub won't pass nid < 0.  You're sure my patch is applied? :)

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Michael Neuling @ 2012-05-22  2:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <alpine.DEB.2.00.1205211942540.29814@chino.kir.corp.google.com>

David Rientjes <rientjes@google.com> wrote:

> On Tue, 22 May 2012, Michael Neuling wrote:
> 
> > > > > Trying David's patch just posted doesn't fix it.
> > > > > 
> > > > 
> > > > Hmm, what does CONFIG_DEBUG_VM say?
> > > 
> > > No set.
> > 
> > Sorry, should have read "Not set"
> > 
> 
> I mean if it's set, what does it emit to the kernel log with my patch 
> applied?
> 
> I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I 
> was thinking it would have caught this if either you or Stephen enable it.

Sorry, got it... CONFIG_DEBUG_VM enabled below...

pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
------------[ cut here ]------------
kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c000000000199164 LR: c0000000001993e0 CTR: c0000000000b6b70
REGS: c00000007e583830 TRAP: 0700   Tainted: G        W     (3.4.0-rc6-mikey)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004028  XER: 02000000
SOFTE: 1
CFAR: c0000000001993c4
TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
GPR00: 0000000000000001 c00000007e583ab0 c000000000c035a0 00000000000012d0 
GPR04: 0000000000000000 0000000000000001 c000000000e14900 0005055500000001 
GPR08: 0000000000000001 00000000000012d0 c000000000c6f398 0000000000000001 
GPR12: 0000000028004022 c00000000ff20000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000001380 0000000000000000 
GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000210d00 
GPR24: 0000000000000001 00000000000000d0 00000000000002aa 0000000000000000 
GPR28: 00000000000000d0 0000000000000001 c000000000b58fc8 c00000007e021200 
NIP [c000000000199164] .new_slab+0xb4/0x440
LR [c0000000001993e0] .new_slab+0x330/0x440
Call Trace:
[c00000007e583ab0] [c0000000001993e0] .new_slab+0x330/0x440 (unreliable)
[c00000007e583b60] [c00000000072ce84] .__slab_alloc+0x3bc/0x52c
[c00000007e583ca0] [c000000000199b08] .kmem_cache_alloc_node_trace+0x98/0x280
[c00000007e583d60] [c000000000a5a440] .numa_init+0x9c/0x188
[c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
[c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
[c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
Instruction dump:
7b5b8402 7f6407b4 7c1ce378 7d29e038 7b990020 61291200 79230020 419202b8 
2b9d00ff 78840020 38000001 409d0240 <0b000000> e95e8140 792977e2 7bab1f24 
---[ end trace 31fd0ba7d8756002 ]---

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: David Rientjes @ 2012-05-22  2:44 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <2246.1337654423@neuling.org>

On Tue, 22 May 2012, Michael Neuling wrote:

> > > > Trying David's patch just posted doesn't fix it.
> > > > 
> > > 
> > > Hmm, what does CONFIG_DEBUG_VM say?
> > 
> > No set.
> 
> Sorry, should have read "Not set"
> 

I mean if it's set, what does it emit to the kernel log with my patch 
applied?

I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I 
was thinking it would have caught this if either you or Stephen enable it.

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Michael Neuling @ 2012-05-22  2:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <2182.1337654352@neuling.org>

Michael Neuling <mikey@neuling.org> wrote:

> > > Trying David's patch just posted doesn't fix it.
> > > 
> > 
> > Hmm, what does CONFIG_DEBUG_VM say?
> 
> No set.

Sorry, should have read "Not set"

mikey

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Michael Neuling @ 2012-05-22  2:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <alpine.DEB.2.00.1205211924390.13682@chino.kir.corp.google.com>

> > Trying David's patch just posted doesn't fix it.
> > 
> 
> Hmm, what does CONFIG_DEBUG_VM say?

No set.

Mikey

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: David Rientjes @ 2012-05-22  2:25 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Lee Schermerhorn, Stephen Rothwell, Peter Zijlstra, Linus, LKML,
	linux-next, H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <328.1337652722@neuling.org>

On Tue, 22 May 2012, Michael Neuling wrote:

> console [tty0] enabled
> console [hvc0] enabled
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> Unable to handle kernel paging request for data at address 0x00001388
> Faulting instruction address: 0xc00000000014a070
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in:
> NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
> REGS: c00000007e5836b0 TRAP: 0300   Tainted: G        W     (3.4.0-rc6-mikey)
> MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004022  XER: 02000000
> SOFTE: 1
> CFAR: 00000000000050fc
> DAR: 0000000000001388, DSISR: 40000000
> TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
> GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0 
> GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001 
> GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000 
> GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000 
> GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380 
> GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001 
> GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa 
> GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200 
> NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
> LR [c0000000001978cc] .new_slab+0xcc/0x3d0
> Call Trace:
> [c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
> [c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
> [c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
> [c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
> [c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
> [c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
> [c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
> [c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
> Instruction dump:
> 0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000 
> 41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000 
> ---[ end trace 31fd0ba7d8756002 ]---
> 
> Which seems to be this code in __alloc_pages_nodemask
> ---
>         /*
>          * Check the zones suitable for the gfp_mask contain at least one
>          * valid zone. It's possible to have an empty zonelist as a result
>          * of GFP_THISNODE and a memoryless node
>          */
>         if (unlikely(!zonelist->_zonerefs->zone))
> c00000000014a070:       e9 3a 00 08     ld      r9,8(r26)
> ---
> 
> r26 is coming from r5 which is the struct zonelist *zonelist parameter
> to __alloc_pages_nodemask.  Having 0000000000001380 in there is clearly
> a bogus pointer.
> 
> Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
>   b4cdf91 sched/numa: Implement numa balancer
> 
> Trying David's patch just posted doesn't fix it.
> 

Hmm, what does CONFIG_DEBUG_VM say?

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: Michael Neuling @ 2012-05-22  2:12 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Lee Schermerhorn, Peter Zijlstra, Linus, LKML, linux-next,
	H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <20120522114051.0c9db9a7c2d660bc9e0e1be2@canb.auug.org.au>

> Hi all,
> 
> Last nights boot tests on various PowerPC systems failed like this:
> 
> calling  .numa_group_init+0x0/0x3c @ 1
> initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs
> calling  .numa_init+0x0/0x1dc @ 1
> Unable to handle kernel paging request for data at address 0x00001688
> Faulting instruction address: 0xc00000000016e154
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in:
> NIP: c00000000016e154 LR: c0000000001b9140 CTR: 0000000000000000
> REGS: c0000003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
> MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24044022  XER: 00000003
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 0000000000001688, DSISR: 40000000
> TASK = c0000003fc8c8000[1] 'swapper/0' THREAD: c0000003fc8c4000 CPU: 0
> GPR00: 0000000000000000 c0000003fc8c7950 c000000000d05b30 00000000000012d0 
> GPR04: 0000000000000000 0000000000001680 0000000000000000 c0000003fe032f60 
> GPR08: 0004005400000001 0000000000000000 ffffffffffffc980 c000000000d24fe0 
> GPR12: 0000000024044024 c00000000f33b000 0000000001a3fa78 00000000009bac00 
> GPR16: 0000000000e1f338 0000000002d513f0 0000000000001680 0000000000000000 
> GPR20: 0000000000000001 c0000003fc8c7c00 0000000000000000 0000000000000001 
> GPR24: 0000000000000001 c000000000d1b490 0000000000000000 0000000000001680 
> GPR28: 0000000000000000 0000000000000000 c000000000c7ce58 c0000003fe009200 
> NIP [c00000000016e154] .__alloc_pages_nodemask+0xc4/0x8f0
> LR [c0000000001b9140] .new_slab+0xd0/0x3c0
> Call Trace:
> [c0000003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
> [c0000003fc8c7ae0] [c0000000001b9140] .new_slab+0xd0/0x3c0
> [c0000003fc8c7b90] [c0000000001b9844] .__slab_alloc+0x254/0x5b0
> [c0000003fc8c7cd0] [c0000000001bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
> [c0000003fc8c7d80] [c000000000ba36d0] .numa_init+0x98/0x1dc
> [c0000003fc8c7e10] [c00000000000ace4] .do_one_initcall+0x1a4/0x1e0
> [c0000003fc8c7ed0] [c000000000b7b354] .kernel_init+0x124/0x2e0
> [c0000003fc8c7f90] [c0000000000211c8] .kernel_thread+0x54/0x70
> Instruction dump:
> 5400d97e 7b170020 0b000000 eb3e8000 3b800000 80190088 2f800000 40de0014 
> 7860efe2 787c6fe2 78000fa4 7f9c0378 <e81b0008> 83f90000 2fa00000 7fff1838 
> ---[ end trace 31fd0ba7d8756001 ]---
> 
> swapper/0 (1) used greatest stack depth: 10864 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> 
> I may be completely wrong, but I guess the obvious target would be the
> sched/numa branch that came in via the tip tree.
> 
> Config file attached.  I haven't had a chance to try to bisect this yet.
> 
> Anyone have any ideas?

I'm getting similar here:


console [tty0] enabled
console [hvc0] enabled
pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
Unable to handle kernel paging request for data at address 0x00001388
Faulting instruction address: 0xc00000000014a070
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
REGS: c00000007e5836b0 TRAP: 0300   Tainted: G        W     (3.4.0-rc6-mikey)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004022  XER: 02000000
SOFTE: 1
CFAR: 00000000000050fc
DAR: 0000000000001388, DSISR: 40000000
TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0 
GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001 
GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000 
GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380 
GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001 
GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa 
GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200 
NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
LR [c0000000001978cc] .new_slab+0xcc/0x3d0
Call Trace:
[c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
[c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
[c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
[c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
[c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
[c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
[c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
[c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
Instruction dump:
0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000 
41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000 
---[ end trace 31fd0ba7d8756002 ]---

Which seems to be this code in __alloc_pages_nodemask
---
        /*
         * Check the zones suitable for the gfp_mask contain at least one
         * valid zone. It's possible to have an empty zonelist as a result
         * of GFP_THISNODE and a memoryless node
         */
        if (unlikely(!zonelist->_zonerefs->zone))
c00000000014a070:       e9 3a 00 08     ld      r9,8(r26)
---

r26 is coming from r5 which is the struct zonelist *zonelist parameter
to __alloc_pages_nodemask.  Having 0000000000001380 in there is clearly
a bogus pointer.

Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
  b4cdf91 sched/numa: Implement numa balancer

Trying David's patch just posted doesn't fix it.

Mikey

^ permalink raw reply

* Re: linux-next: PowerPC boot failures in next-20120521
From: David Rientjes @ 2012-05-22  1:53 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Lee Schermerhorn, Peter Zijlstra, Linus, LKML, linux-next,
	H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar
In-Reply-To: <20120522114051.0c9db9a7c2d660bc9e0e1be2@canb.auug.org.au>

On Tue, 22 May 2012, Stephen Rothwell wrote:

> Unable to handle kernel paging request for data at address 0x00001688
> Faulting instruction address: 0xc00000000016e154
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in:
> NIP: c00000000016e154 LR: c0000000001b9140 CTR: 0000000000000000
> REGS: c0000003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
> MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24044022  XER: 00000003
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 0000000000001688, DSISR: 40000000
> TASK = c0000003fc8c8000[1] 'swapper/0' THREAD: c0000003fc8c4000 CPU: 0
> GPR00: 0000000000000000 c0000003fc8c7950 c000000000d05b30 00000000000012d0 
> GPR04: 0000000000000000 0000000000001680 0000000000000000 c0000003fe032f60 
> GPR08: 0004005400000001 0000000000000000 ffffffffffffc980 c000000000d24fe0 
> GPR12: 0000000024044024 c00000000f33b000 0000000001a3fa78 00000000009bac00 
> GPR16: 0000000000e1f338 0000000002d513f0 0000000000001680 0000000000000000 
> GPR20: 0000000000000001 c0000003fc8c7c00 0000000000000000 0000000000000001 
> GPR24: 0000000000000001 c000000000d1b490 0000000000000000 0000000000001680 
> GPR28: 0000000000000000 0000000000000000 c000000000c7ce58 c0000003fe009200 
> NIP [c00000000016e154] .__alloc_pages_nodemask+0xc4/0x8f0
> LR [c0000000001b9140] .new_slab+0xd0/0x3c0
> Call Trace:
> [c0000003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
> [c0000003fc8c7ae0] [c0000000001b9140] .new_slab+0xd0/0x3c0
> [c0000003fc8c7b90] [c0000000001b9844] .__slab_alloc+0x254/0x5b0
> [c0000003fc8c7cd0] [c0000000001bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
> [c0000003fc8c7d80] [c000000000ba36d0] .numa_init+0x98/0x1dc
> [c0000003fc8c7e10] [c00000000000ace4] .do_one_initcall+0x1a4/0x1e0
> [c0000003fc8c7ed0] [c000000000b7b354] .kernel_init+0x124/0x2e0
> [c0000003fc8c7f90] [c0000000000211c8] .kernel_thread+0x54/0x70
> Instruction dump:
> 5400d97e 7b170020 0b000000 eb3e8000 3b800000 80190088 2f800000 40de0014 
> 7860efe2 787c6fe2 78000fa4 7f9c0378 <e81b0008> 83f90000 2fa00000 7fff1838 
> ---[ end trace 31fd0ba7d8756001 ]---
> 
> swapper/0 (1) used greatest stack depth: 10864 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> 
> I may be completely wrong, but I guess the obvious target would be the
> sched/numa branch that came in via the tip tree.
> 
> Config file attached.  I haven't had a chance to try to bisect this yet.
> 
> Anyone have any ideas?

Yeah, it's sched/numa since that's what introduced numa_init().  It does 
for_each_node() for each node and does a kmalloc_node() even though that 
node may not be online.  Slub ends up passing this node to the page 
allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
caught this and your config confirms its not enabled.

sched/numa either needs a memory hotplug notifier or it needs to pass 
NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
following should fix it.


sched, numa: Allocate node_queue on any node for offline nodes

struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
not (yet) online, otherwise the page allocator has a bad zonelist.

Signed-off-by: David Rientjes <rientjes@google.com>
---
diff --git a/kernel/sched/numa.c b/kernel/sched/numa.c
--- a/kernel/sched/numa.c
+++ b/kernel/sched/numa.c
@@ -885,7 +885,8 @@ static __init int numa_init(void)
 
 	for_each_node(node) {
 		struct node_queue *nq = kmalloc_node(sizeof(*nq),
-				GFP_KERNEL | __GFP_ZERO, node);
+				GFP_KERNEL | __GFP_ZERO,
+				node_online(node) ? node : NUMA_NO_NODE);
 		BUG_ON(!nq);
 
 		spin_lock_init(&nq->lock);

^ permalink raw reply

* powerpc -next rebase WARNING
From: Benjamin Herrenschmidt @ 2012-05-22  1:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Linux Kernel list

Folks, bad news ... my fault.

I accidentally forgot a --signoff on a git am command last week, meaning
that a pair of patches are in -next and not signed off by me.

For various (legal) reasons that cannot go into Linus tree as-is, so I
have to rebase the tree to fix it.

Sorry about that ...

Cheers,
Ben.

^ permalink raw reply

* linux-next: PowerPC boot failures in next-20120521
From: Stephen Rothwell @ 2012-05-22  1:40 UTC (permalink / raw)
  To: LKML
  Cc: Lee Schermerhorn, Peter Zijlstra, Linus, linux-next,
	H. Peter Anvin, Thomas Gleixner, ppc-dev, Ingo Molnar


[-- Attachment #1.1: Type: text/plain, Size: 2690 bytes --]

Hi all,

Last nights boot tests on various PowerPC systems failed like this:

calling  .numa_group_init+0x0/0x3c @ 1
initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs
calling  .numa_init+0x0/0x1dc @ 1
Unable to handle kernel paging request for data at address 0x00001688
Faulting instruction address: 0xc00000000016e154
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c00000000016e154 LR: c0000000001b9140 CTR: 0000000000000000
REGS: c0000003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24044022  XER: 00000003
SOFTE: 1
CFAR: 000000000000562c
DAR: 0000000000001688, DSISR: 40000000
TASK = c0000003fc8c8000[1] 'swapper/0' THREAD: c0000003fc8c4000 CPU: 0
GPR00: 0000000000000000 c0000003fc8c7950 c000000000d05b30 00000000000012d0 
GPR04: 0000000000000000 0000000000001680 0000000000000000 c0000003fe032f60 
GPR08: 0004005400000001 0000000000000000 ffffffffffffc980 c000000000d24fe0 
GPR12: 0000000024044024 c00000000f33b000 0000000001a3fa78 00000000009bac00 
GPR16: 0000000000e1f338 0000000002d513f0 0000000000001680 0000000000000000 
GPR20: 0000000000000001 c0000003fc8c7c00 0000000000000000 0000000000000001 
GPR24: 0000000000000001 c000000000d1b490 0000000000000000 0000000000001680 
GPR28: 0000000000000000 0000000000000000 c000000000c7ce58 c0000003fe009200 
NIP [c00000000016e154] .__alloc_pages_nodemask+0xc4/0x8f0
LR [c0000000001b9140] .new_slab+0xd0/0x3c0
Call Trace:
[c0000003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
[c0000003fc8c7ae0] [c0000000001b9140] .new_slab+0xd0/0x3c0
[c0000003fc8c7b90] [c0000000001b9844] .__slab_alloc+0x254/0x5b0
[c0000003fc8c7cd0] [c0000000001bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
[c0000003fc8c7d80] [c000000000ba36d0] .numa_init+0x98/0x1dc
[c0000003fc8c7e10] [c00000000000ace4] .do_one_initcall+0x1a4/0x1e0
[c0000003fc8c7ed0] [c000000000b7b354] .kernel_init+0x124/0x2e0
[c0000003fc8c7f90] [c0000000000211c8] .kernel_thread+0x54/0x70
Instruction dump:
5400d97e 7b170020 0b000000 eb3e8000 3b800000 80190088 2f800000 40de0014 
7860efe2 787c6fe2 78000fa4 7f9c0378 <e81b0008> 83f90000 2fa00000 7fff1838 
---[ end trace 31fd0ba7d8756001 ]---

swapper/0 (1) used greatest stack depth: 10864 bytes left
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

I may be completely wrong, but I guess the obvious target would be the
sched/numa branch that came in via the tip tree.

Config file attached.  I haven't had a chance to try to bisect this yet.

Anyone have any ideas?
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #1.2: dotconfig.bz2 --]
[-- Type: application/octet-stream, Size: 15419 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Build regressions/improvements in v3.4
From: Geert Uytterhoeven @ 2012-05-21 20:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Howells, Linuxppc-dev
In-Reply-To: <1337629862-23393-1-git-send-email-geert@linux-m68k.org>

On Mon, May 21, 2012 at 9:51 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> JFYI, when comparing v3.4 to v3.4-rc7[3], the summaries are:
> =C2=A0- build errors: +5/-14

  + error: via-pmu-event.c: undefined reference to
`.input_allocate_device':  =3D> .init.text+0x8aa4)
  + error: via-pmu-event.c: undefined reference to
`.input_free_device':  =3D> .init.text+0x8b68)
  + error: via-pmu-event.c: undefined reference to
`.input_register_device':  =3D> .init.text+0x8b54)

powerpc-randconfig

  + kernel/fork.c: error: implicit declaration of function
'alloc_task_struct_node' [-Werror=3Dimplicit-function-declaration]:  =3D>
266:2
  + kernel/fork.c: error: implicit declaration of function
'free_task_struct' [-Werror=3Dimplicit-function-declaration]:  =3D> 174:2

frv-defconfig (ouch)

Gr{oetje,eeting}s,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k=
.org

In personal conversations with technical people, I call myself a hacker. Bu=
t
when I'm talking to journalists I just say "programmer" or something like t=
hat.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD
From: Felipe Balbi @ 2012-05-21 19:04 UTC (permalink / raw)
  To: Christoph Fritz
  Cc: Ben Dooks, Chen Peter-B29397, Nicolas Ferre, Hans J. Koch,
	Fabio Estevam, Kukjin Kim, Russell King, Thomas Dahlmann,
	Sascha Hauer, Christian Hemp, Haojian Zhuang, Daniel Mack,
	Neil Zhang, Oliver Neukum, Eric Miao, Li Yang-R58472,
	Greg Kroah-Hartman, linux-usb@vger.kernel.org, Felipe Balbi,
	Ido Shayevitz, Estevam Fabio-R49496,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20120521065722.GA4363@mars>

[-- Attachment #1: Type: text/plain, Size: 1891 bytes --]

Hi,

On Mon, May 21, 2012 at 08:57:22AM +0200, Christoph Fritz wrote:
> USB controller may access a wrong address for the dTD (endpoint transfer
> descriptor) and then hang. This happens a lot when doing tests with
> g_ether module and iperf, a tool for measuring maximum TCP and UDP
> bandwidth.
> 
> This hardware bug is explained in detail by errata number 2858 for i.MX23:
> http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf
> 
> All (?) SOCs with an IP from chipidea suffer from this problem.
> mv_udc_core fixes this bug by commit daec765.  There still may be
> unfixed drivers.
> 
> Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
> Signed-off-by: Christian Hemp <c.hemp@phytec.de>
> ---
>  drivers/usb/gadget/fsl_udc_core.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/usb/gadget/fsl_udc_core.c b/drivers/usb/gadget/fsl_udc_core.c
> index 55abfb6..72f2139 100644
> --- a/drivers/usb/gadget/fsl_udc_core.c
> +++ b/drivers/usb/gadget/fsl_udc_core.c
> @@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs;
>  /* it is initialized in probe()  */
>  static struct fsl_udc *udc_controller = NULL;
>  
> +static struct ep_td_struct *last_free_td;

I don't want to see global variables anymore. In fact, please convert
this to the new udc_start()/udc_stop() calls and use the generic
map/unmap routines.

That'll help you get rid of a bunch of useless code on the driver. After
that you should remove all <asm/*> header includes and drop the ARCH
dependency.

You can also drop the big-/little-endian helpers as you can make use of
generic writel()/readl() routines.

Please make sure these series comes in with enough time to reach v3.6
merge window in about 3 months.

You can put this fix together on that series after you drop the global.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [patch] hvc_xen: NULL dereference on allocation failure
From: Konrad Rzeszutek Wilk @ 2012-05-21 14:40 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Greg Kroah-Hartman, kernel-janitors@vger.kernel.org,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Dan Carpenter, Alan Cox
In-Reply-To: <alpine.DEB.2.00.1205151119260.26786@kaball-desktop>

On Tue, May 15, 2012 at 11:20:23AM +0100, Stefano Stabellini wrote:
> On Tue, 15 May 2012, Dan Carpenter wrote:
> > If kzalloc() returns a NULL here, we pass a NULL to
> > xencons_disconnect_backend() which will cause an Oops.
> > 
> > Also I removed the __GFP_ZERO while I was at it since kzalloc() implies
> > __GFP_ZERO.
> > 
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

applied.
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* RE: [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD
From: Chen Peter-B29397 @ 2012-05-21  7:25 UTC (permalink / raw)
  To: Christoph Fritz, Li Yang-R58472, Felipe Balbi, Greg Kroah-Hartman
  Cc: Oliver Neukum, Kukjin Kim, Eric Miao, Ben Dooks, Fabio Estevam,
	Sascha Hauer, linux-usb@vger.kernel.org, Nicolas Ferre,
	Haojian Zhuang, Ido Shayevitz, Thomas Dahlmann,
	Estevam Fabio-R49496, Hans J. Koch, Daniel Mack, Christian Hemp,
	Russell King, Neil Zhang, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20120521065722.GA4363@mars>

=20
>=20
> USB controller may access a wrong address for the dTD (endpoint transfer
> descriptor) and then hang. This happens a lot when doing tests with
> g_ether module and iperf, a tool for measuring maximum TCP and UDP
> bandwidth.
>=20
> This hardware bug is explained in detail by errata number 2858 for i.MX23=
:
> http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf
>=20

> All (?) SOCs with an IP from chipidea suffer from this problem.
> mv_udc_core fixes this bug by commit daec765.  There still may be
> unfixed drivers.
>=20
> Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
> Signed-off-by: Christian Hemp <c.hemp@phytec.de>
> ---
>  drivers/usb/gadget/fsl_udc_core.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
>=20
> diff --git a/drivers/usb/gadget/fsl_udc_core.c
> b/drivers/usb/gadget/fsl_udc_core.c
> index 55abfb6..72f2139 100644
> --- a/drivers/usb/gadget/fsl_udc_core.c
> +++ b/drivers/usb/gadget/fsl_udc_core.c
> @@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs;
>  /* it is initialized in probe()  */
>  static struct fsl_udc *udc_controller =3D NULL;
>=20
> +static struct ep_td_struct *last_free_td;
> +
>  static const struct usb_endpoint_descriptor
>  fsl_ep0_desc =3D {
>  	.bLength =3D		USB_DT_ENDPOINT_SIZE,
> @@ -180,8 +182,13 @@ static void done(struct fsl_ep *ep, struct fsl_req
> *req, int status)
>  		curr_td =3D next_td;
>  		if (j !=3D req->dtd_count - 1) {
>  			next_td =3D curr_td->next_td_virt;
> +			dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma);
> +		} else {
> +			if (last_free_td !=3D NULL)
> +				dma_pool_free(udc->td_pool, last_free_td,
> +						last_free_td->td_dma);
> +			last_free_td =3D curr_td;
>  		}
> -		dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma);
>  	}
>=20
>  	if (req->mapped) {
> @@ -2579,6 +2586,8 @@ static int __init fsl_udc_probe(struct
> platform_device *pdev)
>  		goto err_unregister;
>  	}
>=20
> +	last_free_td =3D NULL;
> +
>  	ret =3D usb_add_gadget_udc(&pdev->dev, &udc_controller->gadget);
>  	if (ret)
>  		goto err_del_udc;
> @@ -2633,6 +2642,10 @@ static int __exit fsl_udc_remove(struct
> platform_device *pdev)
>  	kfree(udc_controller->status_req);
>  	kfree(udc_controller->eps);
>=20
> +	if (last_free_td !=3D NULL)
> +		dma_pool_free(udc_controller->td_pool, last_free_td,
> +				last_free_td->td_dma);
> +
>  	dma_pool_destroy(udc_controller->td_pool);
>  	free_irq(udc_controller->irq, udc_controller);
>  	iounmap(dr_regs);

Reviewed-by: Peter Chen <peter.chen@freescale.com>
> --
> 1.7.2.5
>=20
>=20

^ permalink raw reply

* [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD
From: Christoph Fritz @ 2012-05-21  6:57 UTC (permalink / raw)
  To: Chen Peter-B29397, Li Yang-R58472, Felipe Balbi,
	Greg Kroah-Hartman
  Cc: Oliver Neukum, Kukjin Kim, Eric Miao, Ben Dooks, Fabio Estevam,
	Sascha Hauer, linux-usb@vger.kernel.org, Nicolas Ferre,
	Haojian Zhuang, Ido Shayevitz, Thomas Dahlmann,
	Estevam Fabio-R49496, Hans J. Koch, Daniel Mack, Christian Hemp,
	Russell King, Neil Zhang, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1337583221.3394.21.camel@mars>

USB controller may access a wrong address for the dTD (endpoint transfer
descriptor) and then hang. This happens a lot when doing tests with
g_ether module and iperf, a tool for measuring maximum TCP and UDP
bandwidth.

This hardware bug is explained in detail by errata number 2858 for i.MX23:
http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf

All (?) SOCs with an IP from chipidea suffer from this problem.
mv_udc_core fixes this bug by commit daec765.  There still may be
unfixed drivers.

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: Christian Hemp <c.hemp@phytec.de>
---
 drivers/usb/gadget/fsl_udc_core.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/gadget/fsl_udc_core.c b/drivers/usb/gadget/fsl_udc_core.c
index 55abfb6..72f2139 100644
--- a/drivers/usb/gadget/fsl_udc_core.c
+++ b/drivers/usb/gadget/fsl_udc_core.c
@@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs;
 /* it is initialized in probe()  */
 static struct fsl_udc *udc_controller = NULL;
 
+static struct ep_td_struct *last_free_td;
+
 static const struct usb_endpoint_descriptor
 fsl_ep0_desc = {
 	.bLength =		USB_DT_ENDPOINT_SIZE,
@@ -180,8 +182,13 @@ static void done(struct fsl_ep *ep, struct fsl_req *req, int status)
 		curr_td = next_td;
 		if (j != req->dtd_count - 1) {
 			next_td = curr_td->next_td_virt;
+			dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma);
+		} else {
+			if (last_free_td != NULL)
+				dma_pool_free(udc->td_pool, last_free_td,
+						last_free_td->td_dma);
+			last_free_td = curr_td;
 		}
-		dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma);
 	}
 
 	if (req->mapped) {
@@ -2579,6 +2586,8 @@ static int __init fsl_udc_probe(struct platform_device *pdev)
 		goto err_unregister;
 	}
 
+	last_free_td = NULL;
+
 	ret = usb_add_gadget_udc(&pdev->dev, &udc_controller->gadget);
 	if (ret)
 		goto err_del_udc;
@@ -2633,6 +2642,10 @@ static int __exit fsl_udc_remove(struct platform_device *pdev)
 	kfree(udc_controller->status_req);
 	kfree(udc_controller->eps);
 
+	if (last_free_td != NULL)
+		dma_pool_free(udc_controller->td_pool, last_free_td,
+				last_free_td->td_dma);
+
 	dma_pool_destroy(udc_controller->td_pool);
 	free_irq(udc_controller->irq, udc_controller);
 	iounmap(dr_regs);
-- 
1.7.2.5

^ permalink raw reply related

* RE: [PATCH] usb: fsl_udc: errata - postpone freeing current dTD
From: Christoph Fritz @ 2012-05-21  6:53 UTC (permalink / raw)
  To: Chen Peter-B29397
  Cc: Oliver Neukum, Kukjin Kim, Eric Miao, Li Yang-R58472,
	Greg Kroah-Hartman, Sascha Hauer, linux-usb@vger.kernel.org,
	Nicolas Ferre, linuxppc-dev@lists.ozlabs.org, Felipe Balbi,
	Ido Shayevitz, Thomas Dahlmann, Estevam Fabio-R49496,
	Hans J. Koch, Haojian Zhuang, Daniel Mack, Christian Hemp,
	Russell King, Neil Zhang, Fabio Estevam, Ben Dooks
In-Reply-To: <F281D0F91ED19E4D8E63A7504E8A649803BB1E25@039-SN2MPN1-023.039d.mgd.msft.net>

Hi Chen,

On Mon, 2012-05-21 at 01:05 +0000, Chen Peter-B29397 wrote:
> 
>  
> > 
> > USB controller may access a wrong address for the dTD (endpoint transfer
> > descriptor) and then hang. This happens a lot when doing tests with
> > g_ether module and iperf, a tool for measuring maximum TCP and UDP
> > bandwidth.
> > 
> > This hardware bug is explained in detail by errata number 2858 for i.MX23:
> > http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf
> > 
> 
> Does this patch fix your problem?

yes, it does! :-)

> > +#if defined CONFIG_ARCH_MX51 || defined CONFIG_SOC_IMX35
> > +#define POSTPONE_FREE_LAST_DTD
> > +#else
> > +#undef POSTPONE_FREE_LAST_DTD
> > +#endif
> > +
> All i.mx SoC has this problem, if PowerPC also has this problem, you can 
> delete #if defined. Else, you can define it for all i.mx SoC 
> (CONFIG_ARCH_MXC | CONFIG_ARCH_MXS)

I was unsure about this too. I can only test imx35. And mx51 was defined
in your tree. So these two should be defined anyway.

Marvell doesn't use any defines in mv_udc_core for their fix, so I would
be fine without ifdefs too. Any objections? Please see next mail with
patch v2.


Thanks,
 -- Christoph

^ permalink raw reply

* RE: ppc/sata-fsl: orphan config value: CONFIG_MPC8315_DS
From: Li Yang-R58472 @ 2012-05-21  6:31 UTC (permalink / raw)
  To: Anthony Foiani, linuxppc-dev@lists.ozlabs.org
  Cc: Robert P.J.Day, Jeff Garzik, ashish kalra, Adrian Bunk
In-Reply-To: <ghave7n25.fsf@dworkin.scrye.com>



> -----Original Message-----
> From: Anthony Foiani [mailto:tkil@scrye.com]
> Sent: Friday, May 18, 2012 1:08 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: ashish kalra; Li Yang-R58472; Jeff Garzik; Robert P.J.Day; Adrian
> Bunk
> Subject: ppc/sata-fsl: orphan config value: CONFIG_MPC8315_DS
>=20
>=20
> Greetings.
>=20
> I was occasionally running into problems at boot time on an MPC8315-based
> board (derived from the MPC831xRDB, apparently), using SATA to talk to an
> SSD.  My vendor suggested that I enable CONFIG_MPC8315_DS.
>=20
> That symbol is only found once in the entire kernel codebase:
>=20
>   $ git checkout v3.4-rc7
>   HEAD is now at 36be505... Linux 3.4-rc7
>=20
>   $ git grep -nH CONFIG_MPC8315_DS
>   drivers/ata/sata_fsl.c:729:#ifdef CONFIG_MPC8315_DS
>=20
> There is no kconfig support for it at all.
>=20
> It was added in 2007; further, this is the only commit in the entire git
> history that contains this string:
>=20
>    commit e7eac96e8f0e57a6e9f94943557bc2b23be31471
>    Author: ashish kalra <ashish.kalra@freescale.com>
>    Date:   Wed Oct 31 19:28:02 2007 +0800
>=20
>        ata/sata_fsl: Move MPC8315DS link speed limit workaround to
> specific ifdef
>=20
>        Signed-off-by: ashish kalra <ashish.kalra@freescale.com>
>        Signed-off-by: Li Yang <leoli@freescale.com>
>        Signed-off-by: Jeff Garzik <jeff@garzik.org>
>=20
>    diff --git a/drivers/ata/sata_fsl.c b/drivers/ata/sata_fsl.c
>    index 5892472..e076e1f 100644
>    --- a/drivers/ata/sata_fsl.c
>    +++ b/drivers/ata/sata_fsl.c
>    @@ -652,6 +652,7 @@ static int sata_fsl_port_start(struct ata_port *ap=
)
>            VPRINTK("HControl =3D 0x%x\n", ioread32(hcr_base + HCONTROL));
>            VPRINTK("CHBA  =3D 0x%x\n", ioread32(hcr_base + CHBA));
>=20
>    +#ifdef CONFIG_MPC8315_DS
>            /*
>             * Workaround for 8315DS board 3gbps link-up issue,
>             * currently limit SATA port to GEN1 speed
>    @@ -664,6 +665,7 @@ static int sata_fsl_port_start(struct ata_port *ap=
)
>            sata_fsl_scr_read(ap, SCR_CONTROL, &temp);
>            dev_printk(KERN_WARNING, dev, "scr_control, speed limited
> to %x\n",
>                            temp);
>    +#endif
>=20
>            return 0;
>     }
>=20
> This otherwise-unsupported variable was noted by Robert Day in 2008;
> Adrian Bunk suggested a patch, but the Freescale folks said that it was
> for a not-yet-mainlined board, so the patch was dropped:
>=20
>    http://marc.info/?l=3Dlinux-ide&m=3D121783965216004&w=3D2
>=20
> As Robert notied again in 2010, it still wasn't mainlined:
>=20
>    http://marc.info/?l=3Dlinux-ide&m=3D121783965216004&w=3D2
>=20
> And, obviously, it still isn't today.
>=20
> Can the Freescale people tell us exactly what we should be testing to
> determine when to enforce this restriction?  A config variable that
> points to a non-existent board doesn't seem much help.

Thanks for bringing it up again.  Looks like we do have a problem here.

Btw, did it help with your problem by enabling it?

Leo

^ permalink raw reply

* Re: [PATCH] cpuidle: (POWER) Replace pseries_notify_cpuidle_add call with a elegant notifier to fix lockdep problem in start_secondary
From: Deepthi Dharwar @ 2012-05-21  4:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, Paul E. McKenney, PowerPC email list,
	linux-kernel@vger.kernel.org, Li Zhong
In-Reply-To: <1337561221.2458.1.camel@pasglop>

Hi Ben,

On 05/21/2012 06:17 AM, Benjamin Herrenschmidt wrote:

> On Fri, 2012-05-18 at 18:58 +0530, Deepthi Dharwar wrote:
>> The following patch is to remove the pseries_notify_add_cpu() call
>> and replace it by a hot plug notifier.
>> This would prevent cpuidle resources being
>> released and allocated each time cpu comes online on pseries.
>> The earlier design was causing a lockdep problem
>> in start_secondary as reported on this thread
>>         -https://lkml.org/lkml/2012/5/17/2
>>
>> This applies on 3.4-rc7
>>
>> Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
>> ---
> 
> Any reason why you don't do cpuidle_disable_device() when the
> CPU is going offline and cpuidle_enable_device() when it's coming
> back ?

  In the current design disable and enable device are called
  when the cpu comes online. This is to make sure that we clean up and
  re-register again. All the counters are reset.

  Not calling cpu disable when cpu goes offline currently, would only
  retain the counters right now.

  But I could add a offline check and disable the device there, if that
  results in cleaner design.  I will test and send across the patch with
  couple more pseries-idle fixes soon.

  Thanks for your review comments !

> I'm applying the patch for now since it fixes a real problem but
> if the above makes sense, please send a followup fix.
> 
> Cheers,
> Ben.
> 
>>  arch/powerpc/include/asm/processor.h            |    2 --
>>  arch/powerpc/platforms/pseries/processor_idle.c |   25
>> +++++++++++++++++------
>>  arch/powerpc/platforms/pseries/smp.c            |    1 -
>>  3 files changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/processor.h
>> b/arch/powerpc/include/asm/processor.h
>> index 8e2d037..c6bc22b 100644
>> --- a/arch/powerpc/include/asm/processor.h
>> +++ b/arch/powerpc/include/asm/processor.h
>> @@ -390,10 +390,8 @@ void cpu_idle_wait(void);
>>
>>  #ifdef CONFIG_PSERIES_IDLE
>>  extern void update_smt_snooze_delay(int snooze);
>> -extern int pseries_notify_cpuidle_add_cpu(int cpu);
>>  #else
>>  static inline void update_smt_snooze_delay(int snooze) {}
>> -static inline int pseries_notify_cpuidle_add_cpu(int cpu) { return 0; }
>>  #endif
>>
>>  extern void flush_instruction_cache(void);
>> diff --git a/arch/powerpc/platforms/pseries/processor_idle.c
>> b/arch/powerpc/platforms/pseries/processor_idle.c
>> index 41a34bc..d1a7dc0 100644
>> --- a/arch/powerpc/platforms/pseries/processor_idle.c
>> +++ b/arch/powerpc/platforms/pseries/processor_idle.c
>> @@ -11,6 +11,7 @@
>>  #include <linux/moduleparam.h>
>>  #include <linux/cpuidle.h>
>>  #include <linux/cpu.h>
>> +#include <linux/notifier.h>
>>
>>  #include <asm/paca.h>
>>  #include <asm/reg.h>
>> @@ -186,17 +187,28 @@ static struct cpuidle_state
>> shared_states[MAX_IDLE_STATE_COUNT] = {
>>  		.enter = &shared_cede_loop },
>>  };
>>
>> -int pseries_notify_cpuidle_add_cpu(int cpu)
>> +static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
>> +			unsigned long action, void *hcpu)
>>  {
>> +	int hotcpu = (unsigned long)hcpu;
>>  	struct cpuidle_device *dev =
>> -			per_cpu_ptr(pseries_cpuidle_devices, cpu);
>> -	if (dev && cpuidle_get_driver()) {
>> -		cpuidle_disable_device(dev);
>> -		cpuidle_enable_device(dev);
>> +			per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
>> +
>> +	switch (action & 0xf) {
>> +	case CPU_ONLINE:
>> +		if (dev && cpuidle_get_driver()) {
>> +			cpuidle_disable_device(dev);
>> +			cpuidle_enable_device(dev);
>> +		}
>> +		break;
>>  	}
>> -	return 0;
>> +	return NOTIFY_OK;
>>  }
>>
>> +static struct notifier_block setup_hotplug_notifier = {
>> +	.notifier_call = pseries_cpuidle_add_cpu_notifier,
>> +};
>> +
>>  /*
>>   * pseries_cpuidle_driver_init()
>>   */
>> @@ -321,6 +333,7 @@ static int __init pseries_processor_idle_init(void)
>>  		return retval;
>>  	}
>>
>> +	register_cpu_notifier(&setup_hotplug_notifier);
>>  	printk(KERN_DEBUG "pseries_idle_driver registered\n");
>>
>>  	return 0;
>> diff --git a/arch/powerpc/platforms/pseries/smp.c
>> b/arch/powerpc/platforms/pseries/smp.c
>> index e16bb8d..71706bc 100644
>> --- a/arch/powerpc/platforms/pseries/smp.c
>> +++ b/arch/powerpc/platforms/pseries/smp.c
>> @@ -147,7 +147,6 @@ static void __devinit smp_xics_setup_cpu(int cpu)
>>  	set_cpu_current_state(cpu, CPU_STATE_ONLINE);
>>  	set_default_offline_state(cpu);
>>  #endif
>> -	pseries_notify_cpuidle_add_cpu(cpu);
>>  }
>>
>>  static int __devinit smp_pSeries_kick_cpu(int nr)
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: [PATCH RESEND] cpuidle: (POWER) Replace pseries_notify_cpuidle_add call with a elegant notifier to fix lockdep problem in start_secondary
From: Deepthi Dharwar @ 2012-05-21  4:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, Paul E. McKenney, PowerPC email list,
	linux-kernel@vger.kernel.org, Li Zhong
In-Reply-To: <1337561378.2458.3.camel@pasglop>


The following patch is to remove the pseries_notify_add_cpu() call
and replace it by a hot plug notifier.
This would prevent cpuidle resources being
released and allocated each time cpu comes online on pseries.
The earlier design was causing a lockdep problem
in start_secondary as reported on this thread
	-https://lkml.org/lkml/2012/5/17/2

This applies on 3.4-rc7

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/processor.h            |    2 --
 arch/powerpc/platforms/pseries/processor_idle.c |   25 +++++++++++++++++------
 arch/powerpc/platforms/pseries/smp.c            |    1 -
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 8e2d037..c6bc22b 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -390,10 +390,8 @@ void cpu_idle_wait(void);
 
 #ifdef CONFIG_PSERIES_IDLE
 extern void update_smt_snooze_delay(int snooze);
-extern int pseries_notify_cpuidle_add_cpu(int cpu);
 #else
 static inline void update_smt_snooze_delay(int snooze) {}
-static inline int pseries_notify_cpuidle_add_cpu(int cpu) { return 0; }
 #endif
 
 extern void flush_instruction_cache(void);
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index 41a34bc..d1a7dc0 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -11,6 +11,7 @@
 #include <linux/moduleparam.h>
 #include <linux/cpuidle.h>
 #include <linux/cpu.h>
+#include <linux/notifier.h>
 
 #include <asm/paca.h>
 #include <asm/reg.h>
@@ -186,17 +187,28 @@ static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
 		.enter = &shared_cede_loop },
 };
 
-int pseries_notify_cpuidle_add_cpu(int cpu)
+static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
+			unsigned long action, void *hcpu)
 {
+	int hotcpu = (unsigned long)hcpu;
 	struct cpuidle_device *dev =
-			per_cpu_ptr(pseries_cpuidle_devices, cpu);
-	if (dev && cpuidle_get_driver()) {
-		cpuidle_disable_device(dev);
-		cpuidle_enable_device(dev);
+			per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
+
+	switch (action & 0xf) {
+	case CPU_ONLINE:
+		if (dev && cpuidle_get_driver()) {
+			cpuidle_disable_device(dev);
+			cpuidle_enable_device(dev);
+		}
+		break;
 	}
-	return 0;
+	return NOTIFY_OK;
 }
 
+static struct notifier_block setup_hotplug_notifier = {
+	.notifier_call = pseries_cpuidle_add_cpu_notifier,
+};
+
 /*
  * pseries_cpuidle_driver_init()
  */
@@ -321,6 +333,7 @@ static int __init pseries_processor_idle_init(void)
 		return retval;
 	}
 
+	register_cpu_notifier(&setup_hotplug_notifier);
 	printk(KERN_DEBUG "pseries_idle_driver registered\n");
 
 	return 0;
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index e16bb8d..71706bc 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -147,7 +147,6 @@ static void __devinit smp_xics_setup_cpu(int cpu)
 	set_cpu_current_state(cpu, CPU_STATE_ONLINE);
 	set_default_offline_state(cpu);
 #endif
-	pseries_notify_cpuidle_add_cpu(cpu);
 }
 
 static int __devinit smp_pSeries_kick_cpu(int nr)

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox