linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* 4.0.0-rc4: panic in free_block
@ 2015-03-20 15:07 David Ahern
  2015-03-20 16:48 ` Linus Torvalds
  2015-03-20 21:17 ` Linus Torvalds
  0 siblings, 2 replies; 46+ messages in thread
From: David Ahern @ 2015-03-20 15:07 UTC (permalink / raw)
  To: linux-mm; +Cc: LKML, Linus Torvalds

I can easily reproduce the panic below doing a kernel build with make -j 
N, N=128, 256, etc. This is a 1024 cpu system running 4.0.0-rc4.

The top 3 frames are consistently:
     free_block+0x60
     cache_flusharray+0xac
     kmem_cache_free+0xfc

After that one path has been from __mmdrop and the others are like 
below, from remove_vma.

Unable to handle kernel paging request at virtual address 0006100000000000
tsk->{mm,active_mm}->context = 00000000000000ce
tsk->{mm,active_mm}->pgd = fff8803b56698000
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
sh(173167): Oops [#1]
CPU: 760 PID: 173167 Comm: sh Not tainted 4.0.0-rc4+ #1
task: fff8803b4e928b00 ti: fff8803b51800000 task.ti: fff8803b51800000
TSTATE: 0000009911e01601 TPC: 000000000055de88 TNPC: 000000000055de8c Y: 
000003f0    Not tainted
TPC: <free_block+0x60/0x16c>
g0: fff1ffef00000001 g1: fff8803b75826e00 g2: 0000000000100100 g3: 
0000100000000000
g4: fff8803b4e928b00 g5: fff8803b76664000 g6: fff8803b51800000 g7: 
0006000000000000
o0: 0000000000008000 o1: 0000000000000046 o2: 0000000000000023 o3: 
00060100767db6a0
o4: 0000000000000000 o5: 0000000000000015 sp: fff8803b51802d11 ret_pc: 
fff8803b75826e08
RPC: <0xfff8803b75826e08>
l0: 0000000000200200 l1: 0000000000c005e8 l2: 0000000000d7e2b8 l3: 
fff8803b3edb4000
l4: 0000000000000007 l5: 0000000000000001 l6: 00000000000000b1 l7: 
ffffffffffefffff
i0: fff8000050409c60 i1: fff8803b7738f168 i2: 000000000000003c i3: 
fff8803b75826e28
i4: fff8803b51803670 i5: 0000000000000000 i6: fff8803b51802dc1 i7: 
000000000055eaa4
I7: <cache_flusharray+0xac/0xf4>
Call Trace:
  [000000000055eaa4] cache_flusharray+0xac/0xf4
  [000000000055e66c] kmem_cache_free+0xfc/0x1ac
  [000000000054139c] remove_vma+0x68/0x78
  [00000000005414ac] exit_mmap+0x100/0x130
  [000000000045acb4] mmput+0x50/0xe8
  [000000000056c284] flush_old_exec+0x500/0x5d8
  [00000000005b0614] load_elf_binary+0x254/0xff4
  [000000000056ba70] search_binary_handler+0xa4/0x28c
  [000000000056d068] do_execveat_common+0x44c/0x624
  [000000000056d3e0] do_execve+0x34/0x48
  [000000000056d40c] SyS_execve+0x18/0x2c
  [0000000000406254] linux_sparc_syscall+0x34/0x44
Disabling lock debugging due to kernel taint
Caller[000000000055eaa4]: cache_flusharray+0xac/0xf4
Caller[000000000055e66c]: kmem_cache_free+0xfc/0x1ac
Caller[000000000054139c]: remove_vma+0x68/0x78
Caller[00000000005414ac]: exit_mmap+0x100/0x130
Caller[000000000045acb4]: mmput+0x50/0xe8
Caller[000000000056c284]: flush_old_exec+0x500/0x5d8
Caller[00000000005b0614]: load_elf_binary+0x254/0xff4
Caller[000000000056ba70]: search_binary_handler+0xa4/0x28c
Caller[000000000056d068]: do_execveat_common+0x44c/0x624
Caller[000000000056d3e0]: do_execve+0x34/0x48
Caller[000000000056d40c]: SyS_execve+0x18/0x2c
Caller[0000000000406254]: linux_sparc_syscall+0x34/0x44
Caller[fff80001004134c8]: 0xfff80001004134c8
Instruction DUMP: 86230003  8730f00d  8728f006 <d658c007> 8600c007 
8e0ac008  2ac1c002  c658e030  d458e028

####

objdump for free_block on the vmlinux:

vmlinux-4.0.0-rc4+:     file format elf64-sparc


Disassembly of section .text:

000000000055de28 <free_block>:
free_block():
...
free_block():
/opt/dahern/linux.git/kbuild/../mm/slab.c:3265
   55de64:       10 68 00 47     b  %xcc, 55df80 <free_block+0x158>
   55de68:       85 30 b0 02     srlx  %g2, 2, %g2
clear_obj_pfmemalloc():
/opt/dahern/linux.git/kbuild/../mm/slab.c:224
   55de6c:       98 0b 3f fe     and  %o4, -2, %o4
   55de70:       d8 76 40 00     stx  %o4, [ %i1 ]
virt_to_head_page():
/opt/dahern/linux.git/kbuild/../include/linux/mm.h:554
   55de74:       c6 5c 80 00     ldx  [ %l2 ], %g3
   55de78:       ce 5c 40 00     ldx  [ %l1 ], %g7
   55de7c:       86 23 00 03     sub  %o4, %g3, %g3
   55de80:       87 30 f0 0d     srlx  %g3, 0xd, %g3
   55de84:       87 28 f0 06     sllx  %g3, 6, %g3
test_bit():
/opt/dahern/linux.git/kbuild/../include/asm-generic/bitops/non-atomic.h:105
   55de88:       d6 58 c0 07     ldx  [ %g3 + %g7 ], %o3
virt_to_head_page():
/opt/dahern/linux.git/kbuild/../include/linux/mm.h:554
   55de8c:       86 00 c0 07     add  %g3, %g7, %g3
...

Let me know if you need anything else.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 15:07 4.0.0-rc4: panic in free_block David Ahern
@ 2015-03-20 16:48 ` Linus Torvalds
  2015-03-20 16:53   ` David Ahern
  2015-03-20 21:17 ` Linus Torvalds
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-20 16:48 UTC (permalink / raw)
  To: David Ahern, David S. Miller; +Cc: linux-mm, LKML, sparclinux

[ Added Davem and the sparc mailing list, since it happens on sparc
and that just makes me suspicious ]

On Fri, Mar 20, 2015 at 8:07 AM, David Ahern <david.ahern@oracle.com> wrote:
> I can easily reproduce the panic below doing a kernel build with make -j N,
> N=128, 256, etc. This is a 1024 cpu system running 4.0.0-rc4.

3.19 is fine? Because I dont' think I've seen any reports like this
for others, and what stands out is sparc (and to a lesser degree "1024
cpus", which obviously gets a lot less testing)

> The top 3 frames are consistently:
>     free_block+0x60
>     cache_flusharray+0xac
>     kmem_cache_free+0xfc
>
> After that one path has been from __mmdrop and the others are like below,
> from remove_vma.
>
> Unable to handle kernel paging request at virtual address 0006100000000000

One thing you *might* check is if the problem goes away if you select
CONFIG_SLUB instead of CONFIG_SLAB. I'd really like to just get rid of
SLAB. The whole "we have multiple different allocators" is a mess and
causes test coverage issues.

Apart from testing with CONFIG_SLUB, if 3.19 is ok and you seem to be
able to "easily reproduce" this, the obvious thing to do is to try to
bisect it.

                         Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 16:48 ` Linus Torvalds
@ 2015-03-20 16:53   ` David Ahern
  2015-03-20 16:58     ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-20 16:53 UTC (permalink / raw)
  To: Linus Torvalds, David S. Miller; +Cc: linux-mm, LKML, sparclinux

On 3/20/15 10:48 AM, Linus Torvalds wrote:
> [ Added Davem and the sparc mailing list, since it happens on sparc
> and that just makes me suspicious ]
>
> On Fri, Mar 20, 2015 at 8:07 AM, David Ahern <david.ahern@oracle.com> wrote:
>> I can easily reproduce the panic below doing a kernel build with make -j N,
>> N=128, 256, etc. This is a 1024 cpu system running 4.0.0-rc4.
>
> 3.19 is fine? Because I dont' think I've seen any reports like this
> for others, and what stands out is sparc (and to a lesser degree "1024
> cpus", which obviously gets a lot less testing)

I haven't tried 3.19 yet. Just backed up to 3.18 and it shows the same 
problem. And I can reproduce the 4.0 crash in a 128 cpu ldom (VM).

>
>> The top 3 frames are consistently:
>>      free_block+0x60
>>      cache_flusharray+0xac
>>      kmem_cache_free+0xfc
>>
>> After that one path has been from __mmdrop and the others are like below,
>> from remove_vma.
>>
>> Unable to handle kernel paging request at virtual address 0006100000000000
>
> One thing you *might* check is if the problem goes away if you select
> CONFIG_SLUB instead of CONFIG_SLAB. I'd really like to just get rid of
> SLAB. The whole "we have multiple different allocators" is a mess and
> causes test coverage issues.
>
> Apart from testing with CONFIG_SLUB, if 3.19 is ok and you seem to be
> able to "easily reproduce" this, the obvious thing to do is to try to
> bisect it.

I'll try SLUB. The ldom reboots 1000 times faster then resetting the h/w 
so a better chance of bisecting - if I can find a known good release.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 16:53   ` David Ahern
@ 2015-03-20 16:58     ` Linus Torvalds
  2015-03-20 18:05       ` David Ahern
                         ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Linus Torvalds @ 2015-03-20 16:58 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, linux-mm, LKML, sparclinux

On Fri, Mar 20, 2015 at 9:53 AM, David Ahern <david.ahern@oracle.com> wrote:
>
> I haven't tried 3.19 yet. Just backed up to 3.18 and it shows the same
> problem. And I can reproduce the 4.0 crash in a 128 cpu ldom (VM).

Ok, so if 3.18 also has it, then trying 3.19 is pointless, this is
obviously an old problem. Which makes it even more likely that it's
sparc-related.

128 cpu's is still "unusual", of course, but by no means unheard of,
and I'f have expected others to report it too if it was wasy to
trigger on x86-64.

That said, SLAB is probably also almost unheard of in high-CPU
configurations, since slub has all the magical unlocked lists etc for
scalability. So maybe it's a generic SLAB bug, and nobody with lots of
CPU's is testing SLAB.

                      Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 16:58     ` Linus Torvalds
@ 2015-03-20 18:05       ` David Ahern
  2015-03-20 18:53         ` Linus Torvalds
  2015-03-20 19:47         ` David Miller
  2015-03-20 19:42       ` David Miller
  2015-03-20 20:01       ` Dave Hansen
  2 siblings, 2 replies; 46+ messages in thread
From: David Ahern @ 2015-03-20 18:05 UTC (permalink / raw)
  To: Linus Torvalds, linux-mm; +Cc: David S. Miller, LKML, sparclinux

On 3/20/15 10:58 AM, Linus Torvalds wrote:
> That said, SLAB is probably also almost unheard of in high-CPU
> configurations, since slub has all the magical unlocked lists etc for
> scalability. So maybe it's a generic SLAB bug, and nobody with lots of
> CPU's is testing SLAB.
>

Evidently, it is a well known problem internally that goes back to at 
least 2.6.39.

To this point I have not paid attention to the allocators. At what point 
is SLUB considered stable for large systems? Is 2.6.39 stable?

As for SLAB it is not clear if this is a sparc only problem. Perhaps the 
config should have a warning? It looks like SLAB is still the default 
for most arch.

DaveM: do you mind if I submit a patch to change the default for sparc 
to SLUB?

Now that the monster is unleashed, off to other problems...

Thanks,
David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 18:05       ` David Ahern
@ 2015-03-20 18:53         ` Linus Torvalds
  2015-03-20 19:04           ` David Ahern
  2015-03-20 19:47         ` David Miller
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-20 18:53 UTC (permalink / raw)
  To: David Ahern; +Cc: linux-mm, David S. Miller, LKML, sparclinux

On Fri, Mar 20, 2015 at 11:05 AM, David Ahern <david.ahern@oracle.com> wrote:
>
> Evidently, it is a well known problem internally that goes back to at least
> 2.6.39.
>
> To this point I have not paid attention to the allocators. At what point is
> SLUB considered stable for large systems? Is 2.6.39 stable?

SLUB should definitely be considered a stable allocator.  It's the
default allocator for at least Fedora, and that presumably means all
of Redhat.

SuSE seems to use SLAB still, though, so it must be getting lots of
testing on x86 too.

Did you test with SLUB? Does it work there?

> As for SLAB it is not clear if this is a sparc only problem. Perhaps the
> config should have a warning? It looks like SLAB is still the default for
> most arch.

I definitely think SLAB should work (although I *would* like to get
rid of the duplicate allocators), and I still do think it's likely a
sparc issue. Especially as it apparently goes way back. x86 gets a
*lot* more testing, and I don't really recall anything like this.

I'm not comfy enough reading sparc asm code to really pinpoint exactly
what looks to go wrong, so I have very little input on what the
problem could be.

                         Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 18:53         ` Linus Torvalds
@ 2015-03-20 19:04           ` David Ahern
  0 siblings, 0 replies; 46+ messages in thread
From: David Ahern @ 2015-03-20 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, David S. Miller, LKML, sparclinux

On 3/20/15 12:53 PM, Linus Torvalds wrote:
> SLUB should definitely be considered a stable allocator.  It's the
> default allocator for at least Fedora, and that presumably means all
> of Redhat.
>
> SuSE seems to use SLAB still, though, so it must be getting lots of
> testing on x86 too.
>
> Did you test with SLUB? Does it work there?

sorry, forgot to add that detail in the last response: it works fine. No 
panics at all with SLUB.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 16:58     ` Linus Torvalds
  2015-03-20 18:05       ` David Ahern
@ 2015-03-20 19:42       ` David Miller
  2015-03-20 20:01       ` Dave Hansen
  2 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-03-20 19:42 UTC (permalink / raw)
  To: torvalds; +Cc: david.ahern, linux-mm, linux-kernel, sparclinux

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 20 Mar 2015 09:58:25 -0700

> 128 cpu's is still "unusual"

As unusual as the system I do all of my kernel builds on :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 18:05       ` David Ahern
  2015-03-20 18:53         ` Linus Torvalds
@ 2015-03-20 19:47         ` David Miller
  2015-03-20 19:54           ` David Ahern
  1 sibling, 1 reply; 46+ messages in thread
From: David Miller @ 2015-03-20 19:47 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, linux-mm, linux-kernel, sparclinux

From: David Ahern <david.ahern@oracle.com>
Date: Fri, 20 Mar 2015 12:05:05 -0600

> DaveM: do you mind if I submit a patch to change the default for sparc
> to SLUB?

I think we're jumping the gun about all of this, and doing anything
with default Kconfig settings would be entirely premature until we
know what the real bug is.

On my T4-2 I've used nothing but SLAB and haven't hit any of these
problems.  I can't even remember the last time I turned SLUB on,
and it's just because I'm lazy.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 19:47         ` David Miller
@ 2015-03-20 19:54           ` David Ahern
  2015-03-20 20:19             ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-20 19:54 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, linux-mm, linux-kernel, sparclinux

On 3/20/15 1:47 PM, David Miller wrote:
> From: David Ahern <david.ahern@oracle.com>
> Date: Fri, 20 Mar 2015 12:05:05 -0600
>
>> DaveM: do you mind if I submit a patch to change the default for sparc
>> to SLUB?
>
> I think we're jumping the gun about all of this, and doing anything
> with default Kconfig settings would be entirely premature until we
> know what the real bug is.

The suggestion to change to SLUB as the default was based on Linus' 
comment "SLAB is probably also almost unheard of in high-CPU
configurations, since slub has all the magical unlocked lists etc for
scalability."

>
> On my T4-2 I've used nothing but SLAB and haven't hit any of these
> problems.  I can't even remember the last time I turned SLUB on,
> and it's just because I'm lazy.
>

Interesting. With -j <64 and talking softly it completes. But -j 128 and 
higher always ends in a panic.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 16:58     ` Linus Torvalds
  2015-03-20 18:05       ` David Ahern
  2015-03-20 19:42       ` David Miller
@ 2015-03-20 20:01       ` Dave Hansen
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Hansen @ 2015-03-20 20:01 UTC (permalink / raw)
  To: Linus Torvalds, David Ahern; +Cc: David S. Miller, linux-mm, LKML, sparclinux

On 03/20/2015 09:58 AM, Linus Torvalds wrote:
> 128 cpu's is still "unusual", of course, but by no means unheard of,
> and I'f have expected others to report it too if it was wasy to
> trigger on x86-64.

FWIW, I configured a kernel with SLAB and kicked off a bunch of compiles
on a 160-thread x86_64 system.  It definitely doesn't die _quickly_.
It's been running for an hour or two.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 19:54           ` David Ahern
@ 2015-03-20 20:19             ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-03-20 20:19 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, linux-mm, linux-kernel, sparclinux

From: David Ahern <david.ahern@oracle.com>
Date: Fri, 20 Mar 2015 13:54:09 -0600

> Interesting. With -j <64 and talking softly it completes. But -j 128
> and higher always ends in a panic.

Please share more details of your configuration.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 15:07 4.0.0-rc4: panic in free_block David Ahern
  2015-03-20 16:48 ` Linus Torvalds
@ 2015-03-20 21:17 ` Linus Torvalds
  2015-03-20 22:49   ` David Ahern
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-20 21:17 UTC (permalink / raw)
  To: David Ahern; +Cc: linux-mm, LKML

On Fri, Mar 20, 2015 at 8:07 AM, David Ahern <david.ahern@oracle.com> wrote:
> Instruction DUMP: 86230003  8730f00d  8728f006 <d658c007> 8600c007 8e0ac008 2ac1c002  c658e030  d458e028

Ok, so it's d658c007 that faults, which is that

        ldx  [ %g3 + %g7 ], %o3

instruction.

Looking at your objdump:

> free_block():
> /opt/dahern/linux.git/kbuild/../mm/slab.c:3265
>   55de64:       10 68 00 47     b  %xcc, 55df80 <free_block+0x158>
>   55de68:       85 30 b0 02     srlx  %g2, 2, %g2
> clear_obj_pfmemalloc():
> /opt/dahern/linux.git/kbuild/../mm/slab.c:224
>   55de6c:       98 0b 3f fe     and  %o4, -2, %o4
>   55de70:       d8 76 40 00     stx  %o4, [ %i1 ]
> virt_to_head_page():
> /opt/dahern/linux.git/kbuild/../include/linux/mm.h:554
>   55de74:       c6 5c 80 00     ldx  [ %l2 ], %g3
>   55de78:       ce 5c 40 00     ldx  [ %l1 ], %g7
>   55de7c:       86 23 00 03     sub  %o4, %g3, %g3
>   55de80:       87 30 f0 0d     srlx  %g3, 0xd, %g3
>   55de84:       87 28 f0 06     sllx  %g3, 6, %g3
> test_bit():
> /opt/dahern/linux.git/kbuild/../include/asm-generic/bitops/non-atomic.h:105
>   55de88:       d6 58 c0 07     ldx  [ %g3 + %g7 ], %o3
> virt_to_head_page():
> /opt/dahern/linux.git/kbuild/../include/linux/mm.h:554
>   55de8c:       86 00 c0 07     add  %g3, %g7, %g3

I think that's the load of "page->flags" which is almost certainly
part of that initial

                page = virt_to_head_page(objp);

at the top of the loop in free_block(). In fact, it's probably from
compound_head_fast() which does

        if (unlikely(PageTail(page)))
                if (likely(__get_page_tail(page)))
                        return;

so I think it's about to test the PG_tail bit.

So it's the virt_to_page(x) that has returned 0006100000000000. For
sparc64, that's

   #define virt_to_page(kaddr)    pfn_to_page(__pa(kaddr)>>PAGE_SHIFT)

Looking at the code generation, I think %g7 (0x0006000000000000) is
VMEMMAP_BASE, and %g3 is "pfn << 6", where the "<< 6" is because a
"struct page" is 64 bytes.

And looking at that

        sub  %o4, %g3, %g3

I think that's "__pa(x)", so I think %o4 is 'x'. That also matches the
"and  %o4, -2, %o4", which would be the clear_obj_pfmemalloc().

And %o4 is 0.

In other words, if I read that sparc asm right (and it is very likely
that I do *not*), then "objp" is NULL, and that's why you crash.

That's odd, because we know that objp cannot be NULL in
kmem_slab_free() (even if we allowed it, like with kfree(),
remove_vma() cannot possibly have a NULL vma, since ti dereferences it
multiple times).

So I must be misreading this completely. Somebody with better sparc
debugging mojo should double-check my logic. How would objp be NULL?

                        Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 21:17 ` Linus Torvalds
@ 2015-03-20 22:49   ` David Ahern
  2015-03-21  0:18     ` David Ahern
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-20 22:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, LKML

On 3/20/15 3:17 PM, Linus Torvalds wrote:
> In other words, if I read that sparc asm right (and it is very likely
> that I do *not*), then "objp" is NULL, and that's why you crash.

That does appear to be why. I put a WARN_ON before 
clear_obj_pfmemalloc() if objpp[i] is NULL. I got 2 splats during an 
'allyesconfig' build and the system stayed up.

>
> That's odd, because we know that objp cannot be NULL in
> kmem_slab_free() (even if we allowed it, like with kfree(),
> remove_vma() cannot possibly have a NULL vma, since ti dereferences it
> multiple times).
>
> So I must be misreading this completely. Somebody with better sparc
> debugging mojo should double-check my logic. How would objp be NULL?

I'll add checks to higher layers and see if it reveals anything.

I did ask around and apparently this bug is hit only with the new M7 
processors. DaveM: that's why you are not hitting this.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-20 22:49   ` David Ahern
@ 2015-03-21  0:18     ` David Ahern
  2015-03-21  0:34       ` David Rientjes
  2015-03-21  0:47       ` Linus Torvalds
  0 siblings, 2 replies; 46+ messages in thread
From: David Ahern @ 2015-03-21  0:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, LKML

On 3/20/15 4:49 PM, David Ahern wrote:
> On 3/20/15 3:17 PM, Linus Torvalds wrote:
>> In other words, if I read that sparc asm right (and it is very likely
>> that I do *not*), then "objp" is NULL, and that's why you crash.
>
> That does appear to be why. I put a WARN_ON before
> clear_obj_pfmemalloc() if objpp[i] is NULL. I got 2 splats during an
> 'allyesconfig' build and the system stayed up.
>
>>
>> That's odd, because we know that objp cannot be NULL in
>> kmem_slab_free() (even if we allowed it, like with kfree(),
>> remove_vma() cannot possibly have a NULL vma, since ti dereferences it
>> multiple times).
>>
>> So I must be misreading this completely. Somebody with better sparc
>> debugging mojo should double-check my logic. How would objp be NULL?
>
> I'll add checks to higher layers and see if it reveals anything.
>
> I did ask around and apparently this bug is hit only with the new M7
> processors. DaveM: that's why you are not hitting this.

Here's another data point: If I disable NUMA I don't see the problem. 
Performance drops, but no NULL pointer splats which would have been panics.

The 128 cpu ldom with NUMA enabled shows the problem every single time I 
do a kernel compile (-j 128). With NUMA disabled I have done 3 
allyesconfig compiles without hitting the problem. I'll put the compiles 
into a loop while I head out for dinner.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-21  0:18     ` David Ahern
@ 2015-03-21  0:34       ` David Rientjes
  2015-03-21  0:39         ` David Ahern
  2015-03-21  0:47       ` Linus Torvalds
  1 sibling, 1 reply; 46+ messages in thread
From: David Rientjes @ 2015-03-21  0:34 UTC (permalink / raw)
  To: David Ahern; +Cc: Linus Torvalds, linux-mm, LKML

On Fri, 20 Mar 2015, David Ahern wrote:

> Here's another data point: If I disable NUMA I don't see the problem.
> Performance drops, but no NULL pointer splats which would have been panics.
> 
> The 128 cpu ldom with NUMA enabled shows the problem every single time I do a
> kernel compile (-j 128). With NUMA disabled I have done 3 allyesconfig
> compiles without hitting the problem. I'll put the compiles into a loop while
> I head out for dinner.
> 

It might be helpful to enable CONFIG_DEBUG_SLAB if you're reproducing it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-21  0:34       ` David Rientjes
@ 2015-03-21  0:39         ` David Ahern
  0 siblings, 0 replies; 46+ messages in thread
From: David Ahern @ 2015-03-21  0:39 UTC (permalink / raw)
  To: David Rientjes; +Cc: Linus Torvalds, linux-mm, LKML

On 3/20/15 6:34 PM, David Rientjes wrote:
> On Fri, 20 Mar 2015, David Ahern wrote:
>
>> Here's another data point: If I disable NUMA I don't see the problem.
>> Performance drops, but no NULL pointer splats which would have been panics.
>>
>> The 128 cpu ldom with NUMA enabled shows the problem every single time I do a
>> kernel compile (-j 128). With NUMA disabled I have done 3 allyesconfig
>> compiles without hitting the problem. I'll put the compiles into a loop while
>> I head out for dinner.
>>
>
> It might be helpful to enable CONFIG_DEBUG_SLAB if you're reproducing it.
>

I enabled that and a few other DEBUG configs earlier -- nothing popped 
out. I'll do it again -- and others when I return.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-21  0:18     ` David Ahern
  2015-03-21  0:34       ` David Rientjes
@ 2015-03-21  0:47       ` Linus Torvalds
  2015-03-21 17:45         ` David Ahern
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-21  0:47 UTC (permalink / raw)
  To: David Ahern; +Cc: linux-mm, LKML

On Fri, Mar 20, 2015 at 5:18 PM, David Ahern <david.ahern@oracle.com> wrote:
> On 3/20/15 4:49 PM, David Ahern wrote:
>>
>> I did ask around and apparently this bug is hit only with the new M7
>> processors. DaveM: that's why you are not hitting this.

Quite frankly, this smells even more like an architecture bug. It
could be anywhere: it could be a CPU memory ordering issue, a compiler
bug, or a missing barrier or other thing.

How confident are you in the M7 memory ordering rules? It's a fairly
new core, no? With new speculative reads etc? Maybe the Linux
spinlocks don't have the right serialization, and more aggressive
reordering in the new core shows a bug?

Looking at this code, if this is a race, I see a few things that are
worth checking out

 - it does a very much overlapping "memmove()". The
sparc/lib/memmove.S file looks suspiciously bad (is that a
byte-at-a-time loop? Is it even correctly checking overlap?)

 - it relies on both percpu data and a spinlock. I'm sure the sparc
spinlock code has been tested *extensively* with old cores, but maybe
some new speculative read ends up breaking them?

I'm assuming M7 still TSO and 'ldsub' has acquire semantics? Is it
configurable like some sparc versions? I'm wondering whether the
Solaris locks might have some extra memory barriers due to supporting
the other (weaker) sparc memory models, and maybe they hid some M7
"feature" by mistake...

*Some* of the sparc memcpy routines have odd membar's in them.  Why
would a TSO machine need a memory barrier inside a memcpy. That just
makes me go "Ehh?"

> Here's another data point: If I disable NUMA I don't see the problem.
> Performance drops, but no NULL pointer splats which would have been panics.

So the NUMA case triggers the per-node "n->shared" logic, which
*should* be protected by "n->list_lock". Maybe there is some bug there
- but since that code seems to do ok on x86-64 (and apparently older
sparc too), I really would look at arch-specific issues first.

                        Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-21  0:47       ` Linus Torvalds
@ 2015-03-21 17:45         ` David Ahern
  2015-03-21 18:49           ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-21 17:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, LKML

On 3/20/15 6:47 PM, Linus Torvalds wrote:
>
>> Here's another data point: If I disable NUMA I don't see the problem.
>> Performance drops, but no NULL pointer splats which would have been panics.
>
> So the NUMA case triggers the per-node "n->shared" logic, which
> *should* be protected by "n->list_lock". Maybe there is some bug there
> - but since that code seems to do ok on x86-64 (and apparently older
> sparc too), I really would look at arch-specific issues first.

You raise a lot of valid questions and something to look into. But if 
the root cause were such a fundamental issue (CPU memory ordering, 
compiler bug, etc) why would it only occur on this one code path -- free 
with SLAB and NUMA -- and so consistently?

Continuing to poke around, but open to any suggestions. I have enabled 
every DEBUG I can find in the memory code and nothing is popping out. In 
terms of races wouldn't all the DEBUG checks affect timing? Yet, I am 
still seeing the same stack traces due to the same root cause.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-21 17:45         ` David Ahern
@ 2015-03-21 18:49           ` Linus Torvalds
  2015-03-22 17:36             ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-21 18:49 UTC (permalink / raw)
  To: David Ahern, David Miller, sparclinux; +Cc: linux-mm, LKML

On Sat, Mar 21, 2015 at 10:45 AM, David Ahern <david.ahern@oracle.com> wrote:
>
> You raise a lot of valid questions and something to look into. But if the
> root cause were such a fundamental issue (CPU memory ordering, compiler bug,
> etc) why would it only occur on this one code path -- free with SLAB and
> NUMA -- and so consistently?

So the consistency could easily come from a compiler bug (or a missing
barrier in the kernel code) that just happens to trigger in a single
place (or in a few places, but then that's the only place that gets
exercised heavily enough to show it).

I agree that an actual hardware bug is unlikely, although that too is
possible: I can pretty much guarantee that if it were a CPU bug, it
wouldn't be some "memory ordering is entirely broken" bug in general,
it would be some very specific case that only happens with just the
right instruction timing and mix.

That said, while I bring up a CPU bug as a possibility, I really do
agree that it is *very* unlikely. Memory ordering is hard, and yes,
you can get it wrong, but at the same time CPU designers very much
know about it and tend to be pretty damn good about it. And as you
say, it generally wouldn't be *that* consistent. It might be
consistent for one particular kernel build (due to very particular
instruction mix and timings), but over lots of versions of the code
and many different debug options? Very very very unlikely.

> Continuing to poke around, but open to any suggestions. I have enabled every
> DEBUG I can find in the memory code and nothing is popping out. In terms of
> races wouldn't all the DEBUG checks affect timing? Yet, I am still seeing
> the same stack traces due to the same root cause.

Yes, generally debug options would change timings sufficiently that
any particular low-level race would certainly go away or at least
become much harder to hit. So if you have enabled spinlock debugging
etc, I don't really believe in a hw bug. It's  more likely that there
is some kernel architecture-specific code that triggers it. Or even
generic code that just happens to work on other cases due to random
issues (ie memory alignment etc).

I *would* suggest looking at that "memmove()" code. It really looks
like crap. It seems to do things byte-at-a-time for the overlapping
case, and the code seems to depend on memcpy always doing things
low-to-high, but there are multiple different memcpy implementations
so I don't know that that is always true. If one of the memcpy
functions sometimes copies the other way depending on size etc, it
could screw up.

Basically, that sparc64 memmove() implementation looks like it was
written by a dyslexic 5-year-old as a throw-away hack, and then never
got fixed.

Davem? I don't read sparc assembly, so I'm *really* not going to try
to verify that (a) all the memcpy implementations always copy
low-to-high and (b) that I even read the address comparisons in
memmove.S right.

I mention memmove just because it's actually fairly unusual for the
kernel. At the same time, if it really is broken for overlapping
regions, I'd expect *some* other places to show breakage too. So it's
probably fine, even if it does look very very bad to do things one
byte at a time backwards as a fallback.

                             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-21 18:49           ` Linus Torvalds
@ 2015-03-22 17:36             ` David Miller
  2015-03-22 19:25               ` Bob Picco
  2015-03-22 19:47               ` Linus Torvalds
  0 siblings, 2 replies; 46+ messages in thread
From: David Miller @ 2015-03-22 17:36 UTC (permalink / raw)
  To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 21 Mar 2015 11:49:12 -0700

> Davem? I don't read sparc assembly, so I'm *really* not going to try
> to verify that (a) all the memcpy implementations always copy
> low-to-high and (b) that I even read the address comparisons in
> memmove.S right.

All of the sparc memcpy implementations copy from low to high.
I'll eat my hat if they don't. :-)

The guard tests at the beginning of memmove() are saying:

	if (dst <= src)
		memcpy(...);
	if (src + len <= dst)
		memcpy(...);

And then the reverse copy loop (and we do have to copy in reverse for
correctness) is basically:

	src = (src + len - 1);
	dst = (dst + len - 1);

1:	tmp = *(u8 *)src;
	len -= 1;
	src -= 1;
	*(u8 *)dst = tmp;
	dst -= 1;
	if (len != 0)
		goto 1b;

And then we return the original 'dst' pointer.

So at first glance it looks at least correct.

memmove() is a good idea to look into though, as SLAB and SLUB are the
only really heavy users of it, and they do so with overlapping
contents.

And they end up using that byte-at-a-time code, since SLAB and SLUB
do mmemove() calls of the form:

	memmove(X + N, X, LEN);

In which case neither of the memcpy() guard tests will pass.

Maybe there is some subtle bug in there I just don't see right now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 17:36             ` David Miller
@ 2015-03-22 19:25               ` Bob Picco
  2015-03-22 19:47               ` Linus Torvalds
  1 sibling, 0 replies; 46+ messages in thread
From: Bob Picco @ 2015-03-22 19:25 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, david.ahern, sparclinux, linux-mm, linux-kernel

David Miller wrote:	[Sun Mar 22 2015, 01:36:03PM EDT]
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Sat, 21 Mar 2015 11:49:12 -0700
> 
> > Davem? I don't read sparc assembly, so I'm *really* not going to try
> > to verify that (a) all the memcpy implementations always copy
> > low-to-high and (b) that I even read the address comparisons in
> > memmove.S right.
> 
> All of the sparc memcpy implementations copy from low to high.
> I'll eat my hat if they don't. :-)
> 
> The guard tests at the beginning of memmove() are saying:
> 
> 	if (dst <= src)
> 		memcpy(...);
> 	if (src + len <= dst)
> 		memcpy(...);
> 
> And then the reverse copy loop (and we do have to copy in reverse for
> correctness) is basically:
> 
> 	src = (src + len - 1);
> 	dst = (dst + len - 1);
> 
> 1:	tmp = *(u8 *)src;
> 	len -= 1;
> 	src -= 1;
> 	*(u8 *)dst = tmp;
> 	dst -= 1;
> 	if (len != 0)
> 		goto 1b;
> 
> And then we return the original 'dst' pointer.
> 
> So at first glance it looks at least correct.
> 
> memmove() is a good idea to look into though, as SLAB and SLUB are the
> only really heavy users of it, and they do so with overlapping
> contents.
> 
> And they end up using that byte-at-a-time code, since SLAB and SLUB
> do mmemove() calls of the form:
> 
> 	memmove(X + N, X, LEN);
> 
> In which case neither of the memcpy() guard tests will pass.
> 
> Maybe there is some subtle bug in there I just don't see right now.
My original pursuit of this issue focused on transfers to and from the shared
array. Basically substituting memcpy-s with a primitive unsigned long memory
mover. This might have been incorrect.

There were substantial doubts because of large modifications to 2.6.39 too.
Unstabile hardware cause(d|s) issue too.

Eliminating the shared array functions correctly. Though this removal changes
performance and timing dramatically.

This afternoon I included modification of two memmove-s and no issue thus far.
The issue APPEARS to come from memmove-s within cache_flusharray() and/or
drain_array(). Now we are covering moves within an array_cache.

The above was done on 2.6.39.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 17:36             ` David Miller
  2015-03-22 19:25               ` Bob Picco
@ 2015-03-22 19:47               ` Linus Torvalds
  2015-03-22 22:23                 ` David Miller
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-22 19:47 UTC (permalink / raw)
  To: David Miller; +Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List

On Sun, Mar 22, 2015 at 10:36 AM, David Miller <davem@davemloft.net> wrote:
>
> And they end up using that byte-at-a-time code, since SLAB and SLUB
> do mmemove() calls of the form:
>
>         memmove(X + N, X, LEN);

Actually, the common case in slab is overlapping but of the form

     memmove(p, p+x, len);

which goes to memcpy. It's basically re-compacting the array at the beginning.

Which was why I was asking how sure you are that memcpy *always*
copies from low to high.

I don't even know which version of memcpy ends up being used on M7.
Some of them do things like use VIS. I can follow some regular sparc
asm, there's no way I'm even *looking* at that. Is it really ok to use
VIS registers in random contexts?

                           Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 19:47               ` Linus Torvalds
@ 2015-03-22 22:23                 ` David Miller
  2015-03-22 23:35                   ` David Ahern
  2015-03-22 23:49                   ` Linus Torvalds
  0 siblings, 2 replies; 46+ messages in thread
From: David Miller @ 2015-03-22 22:23 UTC (permalink / raw)
  To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 22 Mar 2015 12:47:08 -0700

> Which was why I was asking how sure you are that memcpy *always*
> copies from low to high.

Yeah I'm pretty sure.

> I don't even know which version of memcpy ends up being used on M7.
> Some of them do things like use VIS. I can follow some regular sparc
> asm, there's no way I'm even *looking* at that. Is it really ok to use
> VIS registers in random contexts?

Yes, using VIS how we do is alright, and in fact I did an audit of
this about 1 year ago.  This is another one of those "if this is
wrong, so much stuff would break"

The only thing funny some of these routines do is fetch 2 64-byte
blocks of data ahead in the inner loops, but that should be fine
right?

On the M7 we'll use the Niagara-4 memcpy.

Hmmm... I'll run this silly sparc kernel memmove through the glibc
testsuite and see if it barfs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 22:23                 ` David Miller
@ 2015-03-22 23:35                   ` David Ahern
  2015-03-22 23:54                     ` David Miller
  2015-03-22 23:49                   ` Linus Torvalds
  1 sibling, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-22 23:35 UTC (permalink / raw)
  To: David Miller, torvalds; +Cc: sparclinux, linux-mm, linux-kernel, Bob Picco

On 3/22/15 4:23 PM, David Miller wrote:
>> I don't even know which version of memcpy ends up being used on M7.
>> Some of them do things like use VIS. I can follow some regular sparc
>> asm, there's no way I'm even *looking* at that. Is it really ok to use
>> VIS registers in random contexts?
>
> Yes, using VIS how we do is alright, and in fact I did an audit of
> this about 1 year ago.  This is another one of those "if this is
> wrong, so much stuff would break"
>
> The only thing funny some of these routines do is fetch 2 64-byte
> blocks of data ahead in the inner loops, but that should be fine
> right?
>
> On the M7 we'll use the Niagara-4 memcpy.
>
> Hmmm... I'll run this silly sparc kernel memmove through the glibc
> testsuite and see if it barfs.
>

I don't know if you caught Bob's message; he has a hack to bypass memcpy 
and memmove in mm/slab.c use a for loop to move entries. With the hack 
he is not seeing the problem.

This is the hack:

+static void move_entries(void *dest, void *src, int nr)
+{
+       unsigned long *dp = dest;
+       unsigned long *sp = src;
+
+       for (; nr; nr--, dp++, sp++)
+               *dp = *sp;
+}
+

and then replace the mempy and memmove calls in transfer_objects, 
cache_flusharray and drain_array to use move_entries.

I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it clearly 
suggests the memcpy or memmove are the root cause.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 22:23                 ` David Miller
  2015-03-22 23:35                   ` David Ahern
@ 2015-03-22 23:49                   ` Linus Torvalds
  2015-03-22 23:57                     ` David Miller
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-22 23:49 UTC (permalink / raw)
  To: David Miller; +Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List

On Sun, Mar 22, 2015 at 3:23 PM, David Miller <davem@davemloft.net> wrote:
>
> Yes, using VIS how we do is alright, and in fact I did an audit of
> this about 1 year ago.  This is another one of those "if this is
> wrong, so much stuff would break"

Maybe. But it does seem like Bob Picco has narrowed it down to memmove().

It also bothers me enormously - and perhaps unreasonably - how that
memcpy code has memory barriers in it. I can see _zero_ reason for a
memory barrier inside a memcpy, unless the memcpy does something that
isn't valid to begin with. Are the VIS operatiosn perhaps using some
kind of non-temporal form that doesn't follow the TSO rules? Kind of
like the "movnt" that Intel has?

That kind of stuff makes me worry. For example, the only reason I see for

        membar          #StoreLoad | #StoreStore

after that VIS loop is that the stxa doesn't honor normal memory store
ordering rules, but if that's true, then shouldn't we have a membar
*before* the loop too? How about "ldx"? Does that also do some
unordered loads?

So the memory barriers in there just make me nervous, because they are
either entirely bogus or superfluous, or they are not - and if they
aren't, then that implies that some of the code does something really
odd with memory ordering.

I dunno. I really can't read that code at all, so I'm going entirely
by gut instinct here.

                          Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 23:35                   ` David Ahern
@ 2015-03-22 23:54                     ` David Miller
  2015-03-23  0:03                       ` David Ahern
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2015-03-22 23:54 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

From: David Ahern <david.ahern@oracle.com>
Date: Sun, 22 Mar 2015 17:35:49 -0600

> I don't know if you caught Bob's message; he has a hack to bypass
> memcpy and memmove in mm/slab.c use a for loop to move entries. With
> the hack he is not seeing the problem.
> 
> This is the hack:
> 
> +static void move_entries(void *dest, void *src, int nr)
> +{
> +       unsigned long *dp = dest;
> +       unsigned long *sp = src;
> +
> +       for (; nr; nr--, dp++, sp++)
> +               *dp = *sp;
> +}
> +
> 
> and then replace the mempy and memmove calls in transfer_objects,
> cache_flusharray and drain_array to use move_entries.
> 
> I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it
> clearly suggests the memcpy or memmove are the root cause.

Thanks, didn't notice that.

So, something is amuck.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 23:49                   ` Linus Torvalds
@ 2015-03-22 23:57                     ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-03-22 23:57 UTC (permalink / raw)
  To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 22 Mar 2015 16:49:51 -0700

> On Sun, Mar 22, 2015 at 3:23 PM, David Miller <davem@davemloft.net> wrote:
>>
>> Yes, using VIS how we do is alright, and in fact I did an audit of
>> this about 1 year ago.  This is another one of those "if this is
>> wrong, so much stuff would break"
> 
> Maybe. But it does seem like Bob Picco has narrowed it down to memmove().
> 
> It also bothers me enormously - and perhaps unreasonably - how that
> memcpy code has memory barriers in it. I can see _zero_ reason for a
> memory barrier inside a memcpy, unless the memcpy does something that
> isn't valid to begin with. Are the VIS operatiosn perhaps using some
> kind of non-temporal form that doesn't follow the TSO rules? Kind of
> like the "movnt" that Intel has?

The special cache line clearing stores (the ASI_BLK_INIT_QUAD_LDD_P
stuff) do not adhere to the memory model.

I had elided the memory barriers originally, but it caused memory
corruption.

So yes, this is similar to the 'movnt' situation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-22 23:54                     ` David Miller
@ 2015-03-23  0:03                       ` David Ahern
  2015-03-23  2:00                         ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-23  0:03 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

On 3/22/15 5:54 PM, David Miller wrote:
>> I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it
>> clearly suggests the memcpy or memmove are the root cause.
>
> Thanks, didn't notice that.
>
> So, something is amuck.

to continue to refine the problem ... I modified only the memmove lines 
(not the memcpy) and it works fine. So its the memmove.

I'm sure this will get whitespaced damaged on the copy and paste but to 
be clear this is the patch I am currently running and system is stable. 
On Friday it failed on every single; with this patch I have allyesconfig 
builds with -j 128 in a loop (clean in between) and nothing -- no panics.

diff --git a/mm/slab.c b/mm/slab.c
index c4b89ea..f5e9716 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -802,6 +802,16 @@ static inline void ac_put_obj(struct kmem_cache 
*cachep, struct array_cache *ac,
         ac->entry[ac->avail++] = objp;
  }

+static void move_entries(void *dest, void *src, int nr)
+{
+       unsigned long *dp = dest;
+       unsigned long *sp = src;
+
+       for (; nr; nr--, dp++, sp++)
+               *dp = *sp;
+}
+
+
  /*
   * Transfer objects in one arraycache to another.
   * Locking must be handled by the caller.
@@ -3344,7 +3354,7 @@ free_done:
         spin_unlock(&n->list_lock);
         slabs_destroy(cachep, &list);
         ac->avail -= batchcount;
-       memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void 
*)*ac->avail);
+       move_entries(ac->entry, &(ac->entry[batchcount]), ac->avail);
  }

  /*
@@ -3817,8 +3827,7 @@ static void drain_array(struct kmem_cache *cachep, 
struct kmem_cache_node *n,
                                 tofree = (ac->avail + 1) / 2;
                         free_block(cachep, ac->entry, tofree, node, &list);
                         ac->avail -= tofree;
-                       memmove(ac->entry, &(ac->entry[tofree]),
-                               sizeof(void *) * ac->avail);
+                       move_entries(ac->entry, &(ac->entry[tofree]), 
ac->avail);
                 }
                 spin_unlock_irq(&n->list_lock);
                 slabs_destroy(cachep, &list);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23  0:03                       ` David Ahern
@ 2015-03-23  2:00                         ` David Miller
  2015-03-23  2:19                           ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2015-03-23  2:00 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

From: David Ahern <david.ahern@oracle.com>
Date: Sun, 22 Mar 2015 18:03:30 -0600

> On 3/22/15 5:54 PM, David Miller wrote:
>>> I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it
>>> clearly suggests the memcpy or memmove are the root cause.
>>
>> Thanks, didn't notice that.
>>
>> So, something is amuck.
> 
> to continue to refine the problem ... I modified only the memmove
> lines (not the memcpy) and it works fine. So its the memmove.
> 
> I'm sure this will get whitespaced damaged on the copy and paste but
> to be clear this is the patch I am currently running and system is
> stable. On Friday it failed on every single; with this patch I have
> allyesconfig builds with -j 128 in a loop (clean in between) and
> nothing -- no panics.

Can you just try calling memcpy(), that should work because
I think we agree that if the memcpy() implementation copies
from low to high it should work.

I wonder if the triggering factor is configuring for a high
number of cpus.  I always have NR_CPUS=128 since that's the
largest machine I have.  I'll give NR_CPUS=1024 a spin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23  2:00                         ` David Miller
@ 2015-03-23  2:19                           ` David Miller
  2015-03-23 16:25                             ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2015-03-23  2:19 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco


Nevermind I think I figured out the problem.

It's the cache initializing stores, we can't do overlapping
copies where dst <= src in all cases because of them.

A store to a address modulo the cache line size (which for
these instructions is 64 bytes), clears that whole line.

But when we're doing these memmove() calls in SLAB/SLUB, we
can clear some bytes at the end of the line before they've
been read in.

And reading over NG4memcpy, this _can_ happen, the main unrolled
loop begins like this:

	load	src + 0x00
	load	src + 0x08
	load	src + 0x10
	load	src + 0x18
	load	src + 0x20
	store	dst + 0x00

Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call, right?  That store at the end there is the one to
the first line in the cache line, thus clearing the whole line, which
thus clobbers "src + 0x28" before it even gets loaded.

I'm pretty sure this is what's happening.

And it's only going to trigger if the memcpy() is 128 bytes or larger.

I'll work on a fix.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23  2:19                           ` David Miller
@ 2015-03-23 16:25                             ` David Miller
  2015-03-23 16:51                               ` John Stoffel
                                                 ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: David Miller @ 2015-03-23 16:25 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

From: David Miller <davem@davemloft.net>
Date: Sun, 22 Mar 2015 22:19:06 -0400 (EDT)

> I'll work on a fix.

Ok, here is what I committed.   David et al., let me know if you still
see the crashes with this applied.

Of course, I'll queue this up for -stable as well.

Thanks!

====================
[PATCH] sparc64: Fix several bugs in memmove().

Firstly, handle zero length calls properly.  Believe it or not there
are a few of these happening during early boot.

Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src.  The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.

For example, considering NG4memcpy, the main unrolled loop begins like
this:

     load   src + 0x00
     load   src + 0x08
     load   src + 0x10
     load   src + 0x18
     load   src + 0x20
     store  dst + 0x00

Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call.  That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.

To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well.  We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.

Reported-by: David Ahern <david.ahern@oracle.com>
Reported-by: Bob Picco <bpicco@meloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/lib/memmove.S | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/lib/memmove.S b/arch/sparc/lib/memmove.S
index b7f6334..857ad4f 100644
--- a/arch/sparc/lib/memmove.S
+++ b/arch/sparc/lib/memmove.S
@@ -8,9 +8,11 @@
 
 	.text
 ENTRY(memmove) /* o0=dst o1=src o2=len */
-	mov		%o0, %g1
+	brz,pn		%o2, 99f
+	 mov		%o0, %g1
+
 	cmp		%o0, %o1
-	bleu,pt		%xcc, memcpy
+	bleu,pt		%xcc, 2f
 	 add		%o1, %o2, %g7
 	cmp		%g7, %o0
 	bleu,pt		%xcc, memcpy
@@ -24,7 +26,34 @@ ENTRY(memmove) /* o0=dst o1=src o2=len */
 	stb		%g7, [%o0]
 	bne,pt		%icc, 1b
 	 sub		%o0, 1, %o0
-
+99:
 	retl
 	 mov		%g1, %o0
+
+	/* We can't just call memcpy for these memmove cases.  On some
+	 * chips the memcpy uses cache initializing stores and when dst
+	 * and src are close enough, those can clobber the source data
+	 * before we've loaded it in.
+	 */
+2:	or		%o0, %o1, %g7
+	or		%o2, %g7, %g7
+	andcc		%g7, 0x7, %g0
+	bne,pn		%xcc, 4f
+	 nop
+
+3:	ldx		[%o1], %g7
+	add		%o1, 8, %o1
+	subcc		%o2, 8, %o2
+	add		%o0, 8, %o0
+	bne,pt		%icc, 3b
+	 stx		%g7, [%o0 - 0x8]
+	ba,a,pt		%xcc, 99b
+
+4:	ldub		[%o1], %g7
+	add		%o1, 1, %o1
+	subcc		%o2, 1, %o2
+	add		%o0, 1, %o0
+	bne,pt		%icc, 4b
+	 stb		%g7, [%o0 - 0x1]
+	ba,a,pt		%xcc, 99b
 ENDPROC(memmove)
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 16:25                             ` David Miller
@ 2015-03-23 16:51                               ` John Stoffel
  2015-03-23 19:16                                 ` David Miller
  2015-03-23 17:00                               ` Linus Torvalds
                                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 46+ messages in thread
From: John Stoffel @ 2015-03-23 16:51 UTC (permalink / raw)
  To: David Miller
  Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco


David> ====================
David> [PATCH] sparc64: Fix several bugs in memmove().

David> Firstly, handle zero length calls properly.  Believe it or not there
David> are a few of these happening during early boot.

David> Next, we can't just drop to a memcpy() call in the forward copy case
David> where dst <= src.  The reason is that the cache initializing stores
David> used in the Niagara memcpy() implementations can end up clearing out
David> cache lines before we've sourced their original contents completely.

David> For example, considering NG4memcpy, the main unrolled loop begins like
David> this:

David>      load   src + 0x00
David>      load   src + 0x08
David>      load   src + 0x10
David>      load   src + 0x18
David>      load   src + 0x20
David>      store  dst + 0x00

David> Assume dst is 64 byte aligned and let's say that dst is src - 8 for
David> this memcpy() call.  That store at the end there is the one to the
David> first line in the cache line, thus clearing the whole line, which thus
David> clobbers "src + 0x28" before it even gets loaded.

David> To avoid this, just fall through to a simple copy only mildly
David> optimized for the case where src and dst are 8 byte aligned and the
David> length is a multiple of 8 as well.  We could get fancy and call
David> GENmemcpy() but this is good enough for how this thing is actually
David> used.

Would it make sense to have some memmove()/memcopy() tests on bootup
to catch problems like this?  I know this is a strange case, and
probably not too common, but how hard would it be to wire up tests
that go through 1 to 128 byte memmove() on bootup to make sure things
work properly?

This seems like one of those critical, but subtle things to be
checked.  And doing it only on bootup wouldn't slow anything down and
would (ideally) automatically get us coverage when people add new
archs or update the code.

John

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 16:25                             ` David Miller
  2015-03-23 16:51                               ` John Stoffel
@ 2015-03-23 17:00                               ` Linus Torvalds
  2015-03-23 19:08                                 ` David Miller
  2015-03-23 17:34                               ` David Ahern
  2015-03-24 14:57                               ` Bob Picco
  3 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-23 17:00 UTC (permalink / raw)
  To: David Miller
  Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List,
	bpicco

On Mon, Mar 23, 2015 at 9:25 AM, David Miller <davem@davemloft.net> wrote:
>
> Ok, here is what I committed.

So I wonder - looking at that assembly, I get the feeling that it
isn't any better code than gcc could generate from simple C code.

Would it perhaps be better to turn memmove() into C?

That's particularly true because if I read this code right, it now
seems to seriously pessimise non-overlapping memmove's, in that it now
*always* uses that slow downward copy if the destination is below the
source.

Now, admittedly, the kernel doesn't use a lot of memmov's, but this
still falls back on the "byte at a time" model for a lot of cases (all
non-64-bit-aligned ones). I could imagine those existing. And some
people (reasonably) hate memcpy because they've been burnt by the
overlapping case and end up using memmove as a "safe alternative", so
it's not necessarily just the overlapping case that might trigger
this.

Maybe the code could be something like

    void *memmove(void *dst, const void *src, size_t n);
    {
        // non-overlapping cases
        if (src + n <= dst)
            return memcpy(dst, src, n);
        if (dst + n <= src)
            return memcpy(dst, src, n);

        // overlapping, but we know we
        //  (a) copy upwards
        //  (b) initialize the result in at most chunks of 64
        if (dst+64 <= src)
            return memcpy(dst, src, n);

        .. do the backwards thing ..
    }

(ok, maybe I got it wrong, but you get the idea).

I *think* gcc should do ok on the above kind of code, and not generate
wildly different code from your handcoded version.

                            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 16:25                             ` David Miller
  2015-03-23 16:51                               ` John Stoffel
  2015-03-23 17:00                               ` Linus Torvalds
@ 2015-03-23 17:34                               ` David Ahern
  2015-03-23 19:35                                 ` David Miller
  2015-03-24 14:57                               ` Bob Picco
  3 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2015-03-23 17:34 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

On 3/23/15 10:25 AM, David Miller wrote:
> [PATCH] sparc64: Fix several bugs in memmove().
>
> Firstly, handle zero length calls properly.  Believe it or not there
> are a few of these happening during early boot.
>
> Next, we can't just drop to a memcpy() call in the forward copy case
> where dst <= src.  The reason is that the cache initializing stores
> used in the Niagara memcpy() implementations can end up clearing out
> cache lines before we've sourced their original contents completely.
>
> For example, considering NG4memcpy, the main unrolled loop begins like
> this:
>
>       load   src + 0x00
>       load   src + 0x08
>       load   src + 0x10
>       load   src + 0x18
>       load   src + 0x20
>       store  dst + 0x00
>
> Assume dst is 64 byte aligned and let's say that dst is src - 8 for
> this memcpy() call.  That store at the end there is the one to the
> first line in the cache line, thus clearing the whole line, which thus
> clobbers "src + 0x28" before it even gets loaded.
>
> To avoid this, just fall through to a simple copy only mildly
> optimized for the case where src and dst are 8 byte aligned and the
> length is a multiple of 8 as well.  We could get fancy and call
> GENmemcpy() but this is good enough for how this thing is actually
> used.
>
> Reported-by: David Ahern <david.ahern@oracle.com>
> Reported-by: Bob Picco <bpicco@meloft.net>
> Signed-off-by: David S. Miller <davem@davemloft.net>

seems like a formality at this point, but this resolves the panic on the 
M7-based ldom and baremetal. The T5-8 failed to boot, but it could be a 
different problem.

Thanks for the fast turnaround,
David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 17:00                               ` Linus Torvalds
@ 2015-03-23 19:08                                 ` David Miller
  2015-03-23 19:47                                   ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2015-03-23 19:08 UTC (permalink / raw)
  To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel, bpicco

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 23 Mar 2015 10:00:02 -0700

> Maybe the code could be something like
> 
>     void *memmove(void *dst, const void *src, size_t n);
>     {
>         // non-overlapping cases
>         if (src + n <= dst)
>             return memcpy(dst, src, n);
>         if (dst + n <= src)
>             return memcpy(dst, src, n);
> 
>         // overlapping, but we know we
>         //  (a) copy upwards
>         //  (b) initialize the result in at most chunks of 64
>         if (dst+64 <= src)
>             return memcpy(dst, src, n);
> 
>         .. do the backwards thing ..
>     }
> 
> (ok, maybe I got it wrong, but you get the idea).
> 
> I *think* gcc should do ok on the above kind of code, and not generate
> wildly different code from your handcoded version.

Sure you could do that in C, but I really want to avoid using memcpy()
if dst and src overlap in any way at all.

Said another way, I don't want to codify that "64" thing.  The next
chip could do 128 byte initializing stores.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 16:51                               ` John Stoffel
@ 2015-03-23 19:16                                 ` David Miller
  2015-03-23 19:56                                   ` John Stoffel
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2015-03-23 19:16 UTC (permalink / raw)
  To: john; +Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco

From: "John Stoffel" <john@stoffel.org>
Date: Mon, 23 Mar 2015 12:51:03 -0400

> Would it make sense to have some memmove()/memcopy() tests on bootup
> to catch problems like this?  I know this is a strange case, and
> probably not too common, but how hard would it be to wire up tests
> that go through 1 to 128 byte memmove() on bootup to make sure things
> work properly?
> 
> This seems like one of those critical, but subtle things to be
> checked.  And doing it only on bootup wouldn't slow anything down and
> would (ideally) automatically get us coverage when people add new
> archs or update the code.

One of two things is already happening.

There have been assembler memcpy/memset development test harnesses
around that most arch developers are using, and those test things
rather extensively.

Also, the memcpy/memset routines on sparc in particular are completely
shared with glibc, we use the same exact code in both trees.  So it's
getting tested there too.

memmove() is just not handled this way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 17:34                               ` David Ahern
@ 2015-03-23 19:35                                 ` David Miller
  2015-03-23 19:58                                   ` David Ahern
  2015-03-24  1:01                                   ` David Ahern
  0 siblings, 2 replies; 46+ messages in thread
From: David Miller @ 2015-03-23 19:35 UTC (permalink / raw)
  To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

From: David Ahern <david.ahern@oracle.com>
Date: Mon, 23 Mar 2015 11:34:34 -0600

> seems like a formality at this point, but this resolves the panic on
> the M7-based ldom and baremetal. The T5-8 failed to boot, but it could
> be a different problem.

Specifically, does the T5-8 boot without my patch applied?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 19:08                                 ` David Miller
@ 2015-03-23 19:47                                   ` Linus Torvalds
  2015-03-23 19:52                                     ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2015-03-23 19:47 UTC (permalink / raw)
  To: David Miller
  Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List,
	bpicco

On Mon, Mar 23, 2015 at 12:08 PM, David Miller <davem@davemloft.net> wrote:
>
> Sure you could do that in C, but I really want to avoid using memcpy()
> if dst and src overlap in any way at all.
>
> Said another way, I don't want to codify that "64" thing.  The next
> chip could do 128 byte initializing stores.

But David, THAT IS NOT WHAT YOUR BROKEN ASM DOES ANYWAY!

Read it again. Your asm code does not check for overlap. Look at this:

        cmp             %o0, %o1
        bleu,pt         %xcc, 2f

and ponder. It's wrong.

So even if you don't want to take that "allow overlap more than 64
bytes apart" thing, my C version actually is *better* than the broken
asm version you have.

The new asm version is better than the old one, because the new
breakage is about really bad performance rather than actively
breaking, but still..

                         Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 19:47                                   ` Linus Torvalds
@ 2015-03-23 19:52                                     ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-03-23 19:52 UTC (permalink / raw)
  To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel, bpicco

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 23 Mar 2015 12:47:49 -0700

> On Mon, Mar 23, 2015 at 12:08 PM, David Miller <davem@davemloft.net> wrote:
>>
>> Sure you could do that in C, but I really want to avoid using memcpy()
>> if dst and src overlap in any way at all.
>>
>> Said another way, I don't want to codify that "64" thing.  The next
>> chip could do 128 byte initializing stores.
> 
> But David, THAT IS NOT WHAT YOUR BROKEN ASM DOES ANYWAY!
> 
> Read it again. Your asm code does not check for overlap. Look at this:
> 
>         cmp             %o0, %o1
>         bleu,pt         %xcc, 2f
> 
> and ponder. It's wrong.

Right, it's not checking for overlap.  It's checking for "does a
forward copy work?"

That's the standard test for this, and it's what glibc uses in it's
generic memmove() implementation FWIW.  (granted, I know glibc is not
generally a good source for "right way to do things :-)

> The new asm version is better than the old one, because the new
> breakage is about really bad performance rather than actively
> breaking, but still..

I accept that it's suboptimal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 19:16                                 ` David Miller
@ 2015-03-23 19:56                                   ` John Stoffel
  2015-03-23 20:08                                     ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: John Stoffel @ 2015-03-23 19:56 UTC (permalink / raw)
  To: David Miller
  Cc: john, david.ahern, torvalds, sparclinux, linux-mm, linux-kernel,
	bpicco

>>>>> "David" == David Miller <davem@davemloft.net> writes:

David> From: "John Stoffel" <john@stoffel.org>
David> Date: Mon, 23 Mar 2015 12:51:03 -0400

>> Would it make sense to have some memmove()/memcopy() tests on bootup
>> to catch problems like this?  I know this is a strange case, and
>> probably not too common, but how hard would it be to wire up tests
>> that go through 1 to 128 byte memmove() on bootup to make sure things
>> work properly?
>> 
>> This seems like one of those critical, but subtle things to be
>> checked.  And doing it only on bootup wouldn't slow anything down and
>> would (ideally) automatically get us coverage when people add new
>> archs or update the code.

David> One of two things is already happening.

David> There have been assembler memcpy/memset development test harnesses
David> around that most arch developers are using, and those test things
David> rather extensively.

David> Also, the memcpy/memset routines on sparc in particular are completely
David> shared with glibc, we use the same exact code in both trees.  So it's
David> getting tested there too.

Thats' good to know.   I wasn't sure.

David> memmove() is just not handled this way.

Bummers.  So why isn't this covered by the glibc tests too?  Not
accusing, not at all!  Just wondering.  

Thanks for all your work David, I've been amazed at your energy here!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 19:35                                 ` David Miller
@ 2015-03-23 19:58                                   ` David Ahern
  2015-03-24  1:01                                   ` David Ahern
  1 sibling, 0 replies; 46+ messages in thread
From: David Ahern @ 2015-03-23 19:58 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

On 3/23/15 1:35 PM, David Miller wrote:
> From: David Ahern <david.ahern@oracle.com>
> Date: Mon, 23 Mar 2015 11:34:34 -0600
>
>> seems like a formality at this point, but this resolves the panic on
>> the M7-based ldom and baremetal. The T5-8 failed to boot, but it could
>> be a different problem.
>
> Specifically, does the T5-8 boot without my patch applied?
>

I am running around in circles with it... it takes 15 minutes after a 
hard reset to get logged in, and I forgot that the 2.6.39 can't handle 
-j 1024 either (task scheduler problem), and then I wasted time waiting 
for sandwich shop to learn how to use mobile app ordering, ...

I'll respond as soon as I can.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 19:56                                   ` John Stoffel
@ 2015-03-23 20:08                                     ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-03-23 20:08 UTC (permalink / raw)
  To: john; +Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco

From: "John Stoffel" <john@stoffel.org>
Date: Mon, 23 Mar 2015 15:56:02 -0400

>>>>>> "David" == David Miller <davem@davemloft.net> writes:
> 
> David> From: "John Stoffel" <john@stoffel.org>
> David> Date: Mon, 23 Mar 2015 12:51:03 -0400
> 
>>> Would it make sense to have some memmove()/memcopy() tests on bootup
>>> to catch problems like this?  I know this is a strange case, and
>>> probably not too common, but how hard would it be to wire up tests
>>> that go through 1 to 128 byte memmove() on bootup to make sure things
>>> work properly?
>>> 
>>> This seems like one of those critical, but subtle things to be
>>> checked.  And doing it only on bootup wouldn't slow anything down and
>>> would (ideally) automatically get us coverage when people add new
>>> archs or update the code.
> 
> David> One of two things is already happening.
> 
> David> There have been assembler memcpy/memset development test harnesses
> David> around that most arch developers are using, and those test things
> David> rather extensively.
> 
> David> Also, the memcpy/memset routines on sparc in particular are completely
> David> shared with glibc, we use the same exact code in both trees.  So it's
> David> getting tested there too.
> 
> Thats' good to know.   I wasn't sure.
> 
> David> memmove() is just not handled this way.
> 
> Bummers.  So why isn't this covered by the glibc tests too?

Because the kernel's memmove() is different from the one we use in glibc
on sparc.  In fact, we use the generic C version in glibc which expands
to forward and backward word copies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 19:35                                 ` David Miller
  2015-03-23 19:58                                   ` David Ahern
@ 2015-03-24  1:01                                   ` David Ahern
  1 sibling, 0 replies; 46+ messages in thread
From: David Ahern @ 2015-03-24  1:01 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco

On 3/23/15 1:35 PM, David Miller wrote:
> From: David Ahern <david.ahern@oracle.com>
> Date: Mon, 23 Mar 2015 11:34:34 -0600
>
>> seems like a formality at this point, but this resolves the panic on
>> the M7-based ldom and baremetal. The T5-8 failed to boot, but it could
>> be a different problem.
>
> Specifically, does the T5-8 boot without my patch applied?
>

The T5-8 is having problems; has to be unrelated to this commit. T5-2 
(256 cpus) boots fine, and make -j 256 on an allyesconfig builds fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-23 16:25                             ` David Miller
                                                 ` (2 preceding siblings ...)
  2015-03-23 17:34                               ` David Ahern
@ 2015-03-24 14:57                               ` Bob Picco
  2015-03-24 16:05                                 ` David Miller
  3 siblings, 1 reply; 46+ messages in thread
From: Bob Picco @ 2015-03-24 14:57 UTC (permalink / raw)
  To: David Miller
  Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco

David Miller wrote:	[Mon Mar 23 2015, 12:25:30PM EDT]
> From: David Miller <davem@davemloft.net>
> Date: Sun, 22 Mar 2015 22:19:06 -0400 (EDT)
> 
> > I'll work on a fix.
> 
> Ok, here is what I committed.   David et al., let me know if you still
> see the crashes with this applied.
> 
> Of course, I'll queue this up for -stable as well.
> 
> Thanks!
> 
> ====================
> [PATCH] sparc64: Fix several bugs in memmove().
> 
> Firstly, handle zero length calls properly.  Believe it or not there
> are a few of these happening during early boot.
> 
> Next, we can't just drop to a memcpy() call in the forward copy case
> where dst <= src.  The reason is that the cache initializing stores
> used in the Niagara memcpy() implementations can end up clearing out
> cache lines before we've sourced their original contents completely.
> 
> For example, considering NG4memcpy, the main unrolled loop begins like
> this:
> 
>      load   src + 0x00
>      load   src + 0x08
>      load   src + 0x10
>      load   src + 0x18
>      load   src + 0x20
>      store  dst + 0x00
> 
> Assume dst is 64 byte aligned and let's say that dst is src - 8 for
> this memcpy() call.  That store at the end there is the one to the
> first line in the cache line, thus clearing the whole line, which thus
> clobbers "src + 0x28" before it even gets loaded.
> 
> To avoid this, just fall through to a simple copy only mildly
> optimized for the case where src and dst are 8 byte aligned and the
> length is a multiple of 8 as well.  We could get fancy and call
> GENmemcpy() but this is good enough for how this thing is actually
> used.
> 
> Reported-by: David Ahern <david.ahern@oracle.com>
> Reported-by: Bob Picco <bpicco@meloft.net>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
Seems solid with 2.6.39 on M7-4. Jalap?no is happy with current sparc.git.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: 4.0.0-rc4: panic in free_block
  2015-03-24 14:57                               ` Bob Picco
@ 2015-03-24 16:05                                 ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-03-24 16:05 UTC (permalink / raw)
  To: bpicco; +Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel

From: Bob Picco <bpicco@meloft.net>
Date: Tue, 24 Mar 2015 10:57:53 -0400

> Seems solid with 2.6.39 on M7-4. Jalap?no is happy with current sparc.git.

Thanks for all the testing, it's been integrated into the -stable
queues as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-03-24 16:05 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-20 15:07 4.0.0-rc4: panic in free_block David Ahern
2015-03-20 16:48 ` Linus Torvalds
2015-03-20 16:53   ` David Ahern
2015-03-20 16:58     ` Linus Torvalds
2015-03-20 18:05       ` David Ahern
2015-03-20 18:53         ` Linus Torvalds
2015-03-20 19:04           ` David Ahern
2015-03-20 19:47         ` David Miller
2015-03-20 19:54           ` David Ahern
2015-03-20 20:19             ` David Miller
2015-03-20 19:42       ` David Miller
2015-03-20 20:01       ` Dave Hansen
2015-03-20 21:17 ` Linus Torvalds
2015-03-20 22:49   ` David Ahern
2015-03-21  0:18     ` David Ahern
2015-03-21  0:34       ` David Rientjes
2015-03-21  0:39         ` David Ahern
2015-03-21  0:47       ` Linus Torvalds
2015-03-21 17:45         ` David Ahern
2015-03-21 18:49           ` Linus Torvalds
2015-03-22 17:36             ` David Miller
2015-03-22 19:25               ` Bob Picco
2015-03-22 19:47               ` Linus Torvalds
2015-03-22 22:23                 ` David Miller
2015-03-22 23:35                   ` David Ahern
2015-03-22 23:54                     ` David Miller
2015-03-23  0:03                       ` David Ahern
2015-03-23  2:00                         ` David Miller
2015-03-23  2:19                           ` David Miller
2015-03-23 16:25                             ` David Miller
2015-03-23 16:51                               ` John Stoffel
2015-03-23 19:16                                 ` David Miller
2015-03-23 19:56                                   ` John Stoffel
2015-03-23 20:08                                     ` David Miller
2015-03-23 17:00                               ` Linus Torvalds
2015-03-23 19:08                                 ` David Miller
2015-03-23 19:47                                   ` Linus Torvalds
2015-03-23 19:52                                     ` David Miller
2015-03-23 17:34                               ` David Ahern
2015-03-23 19:35                                 ` David Miller
2015-03-23 19:58                                   ` David Ahern
2015-03-24  1:01                                   ` David Ahern
2015-03-24 14:57                               ` Bob Picco
2015-03-24 16:05                                 ` David Miller
2015-03-22 23:49                   ` Linus Torvalds
2015-03-22 23:57                     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).