* 4.0.0-rc4: panic in free_block @ 2015-03-20 15:07 David Ahern 2015-03-20 16:48 ` Linus Torvalds 2015-03-20 21:17 ` Linus Torvalds 0 siblings, 2 replies; 46+ messages in thread From: David Ahern @ 2015-03-20 15:07 UTC (permalink / raw) To: linux-mm; +Cc: LKML, Linus Torvalds I can easily reproduce the panic below doing a kernel build with make -j N, N=128, 256, etc. This is a 1024 cpu system running 4.0.0-rc4. The top 3 frames are consistently: free_block+0x60 cache_flusharray+0xac kmem_cache_free+0xfc After that one path has been from __mmdrop and the others are like below, from remove_vma. Unable to handle kernel paging request at virtual address 0006100000000000 tsk->{mm,active_mm}->context = 00000000000000ce tsk->{mm,active_mm}->pgd = fff8803b56698000 \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ sh(173167): Oops [#1] CPU: 760 PID: 173167 Comm: sh Not tainted 4.0.0-rc4+ #1 task: fff8803b4e928b00 ti: fff8803b51800000 task.ti: fff8803b51800000 TSTATE: 0000009911e01601 TPC: 000000000055de88 TNPC: 000000000055de8c Y: 000003f0 Not tainted TPC: <free_block+0x60/0x16c> g0: fff1ffef00000001 g1: fff8803b75826e00 g2: 0000000000100100 g3: 0000100000000000 g4: fff8803b4e928b00 g5: fff8803b76664000 g6: fff8803b51800000 g7: 0006000000000000 o0: 0000000000008000 o1: 0000000000000046 o2: 0000000000000023 o3: 00060100767db6a0 o4: 0000000000000000 o5: 0000000000000015 sp: fff8803b51802d11 ret_pc: fff8803b75826e08 RPC: <0xfff8803b75826e08> l0: 0000000000200200 l1: 0000000000c005e8 l2: 0000000000d7e2b8 l3: fff8803b3edb4000 l4: 0000000000000007 l5: 0000000000000001 l6: 00000000000000b1 l7: ffffffffffefffff i0: fff8000050409c60 i1: fff8803b7738f168 i2: 000000000000003c i3: fff8803b75826e28 i4: fff8803b51803670 i5: 0000000000000000 i6: fff8803b51802dc1 i7: 000000000055eaa4 I7: <cache_flusharray+0xac/0xf4> Call Trace: [000000000055eaa4] cache_flusharray+0xac/0xf4 [000000000055e66c] kmem_cache_free+0xfc/0x1ac [000000000054139c] remove_vma+0x68/0x78 [00000000005414ac] exit_mmap+0x100/0x130 [000000000045acb4] mmput+0x50/0xe8 [000000000056c284] flush_old_exec+0x500/0x5d8 [00000000005b0614] load_elf_binary+0x254/0xff4 [000000000056ba70] search_binary_handler+0xa4/0x28c [000000000056d068] do_execveat_common+0x44c/0x624 [000000000056d3e0] do_execve+0x34/0x48 [000000000056d40c] SyS_execve+0x18/0x2c [0000000000406254] linux_sparc_syscall+0x34/0x44 Disabling lock debugging due to kernel taint Caller[000000000055eaa4]: cache_flusharray+0xac/0xf4 Caller[000000000055e66c]: kmem_cache_free+0xfc/0x1ac Caller[000000000054139c]: remove_vma+0x68/0x78 Caller[00000000005414ac]: exit_mmap+0x100/0x130 Caller[000000000045acb4]: mmput+0x50/0xe8 Caller[000000000056c284]: flush_old_exec+0x500/0x5d8 Caller[00000000005b0614]: load_elf_binary+0x254/0xff4 Caller[000000000056ba70]: search_binary_handler+0xa4/0x28c Caller[000000000056d068]: do_execveat_common+0x44c/0x624 Caller[000000000056d3e0]: do_execve+0x34/0x48 Caller[000000000056d40c]: SyS_execve+0x18/0x2c Caller[0000000000406254]: linux_sparc_syscall+0x34/0x44 Caller[fff80001004134c8]: 0xfff80001004134c8 Instruction DUMP: 86230003 8730f00d 8728f006 <d658c007> 8600c007 8e0ac008 2ac1c002 c658e030 d458e028 #### objdump for free_block on the vmlinux: vmlinux-4.0.0-rc4+: file format elf64-sparc Disassembly of section .text: 000000000055de28 <free_block>: free_block(): ... free_block(): /opt/dahern/linux.git/kbuild/../mm/slab.c:3265 55de64: 10 68 00 47 b %xcc, 55df80 <free_block+0x158> 55de68: 85 30 b0 02 srlx %g2, 2, %g2 clear_obj_pfmemalloc(): /opt/dahern/linux.git/kbuild/../mm/slab.c:224 55de6c: 98 0b 3f fe and %o4, -2, %o4 55de70: d8 76 40 00 stx %o4, [ %i1 ] virt_to_head_page(): /opt/dahern/linux.git/kbuild/../include/linux/mm.h:554 55de74: c6 5c 80 00 ldx [ %l2 ], %g3 55de78: ce 5c 40 00 ldx [ %l1 ], %g7 55de7c: 86 23 00 03 sub %o4, %g3, %g3 55de80: 87 30 f0 0d srlx %g3, 0xd, %g3 55de84: 87 28 f0 06 sllx %g3, 6, %g3 test_bit(): /opt/dahern/linux.git/kbuild/../include/asm-generic/bitops/non-atomic.h:105 55de88: d6 58 c0 07 ldx [ %g3 + %g7 ], %o3 virt_to_head_page(): /opt/dahern/linux.git/kbuild/../include/linux/mm.h:554 55de8c: 86 00 c0 07 add %g3, %g7, %g3 ... Let me know if you need anything else. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 15:07 4.0.0-rc4: panic in free_block David Ahern @ 2015-03-20 16:48 ` Linus Torvalds 2015-03-20 16:53 ` David Ahern 2015-03-20 21:17 ` Linus Torvalds 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-20 16:48 UTC (permalink / raw) To: David Ahern, David S. Miller; +Cc: linux-mm, LKML, sparclinux [ Added Davem and the sparc mailing list, since it happens on sparc and that just makes me suspicious ] On Fri, Mar 20, 2015 at 8:07 AM, David Ahern <david.ahern@oracle.com> wrote: > I can easily reproduce the panic below doing a kernel build with make -j N, > N=128, 256, etc. This is a 1024 cpu system running 4.0.0-rc4. 3.19 is fine? Because I dont' think I've seen any reports like this for others, and what stands out is sparc (and to a lesser degree "1024 cpus", which obviously gets a lot less testing) > The top 3 frames are consistently: > free_block+0x60 > cache_flusharray+0xac > kmem_cache_free+0xfc > > After that one path has been from __mmdrop and the others are like below, > from remove_vma. > > Unable to handle kernel paging request at virtual address 0006100000000000 One thing you *might* check is if the problem goes away if you select CONFIG_SLUB instead of CONFIG_SLAB. I'd really like to just get rid of SLAB. The whole "we have multiple different allocators" is a mess and causes test coverage issues. Apart from testing with CONFIG_SLUB, if 3.19 is ok and you seem to be able to "easily reproduce" this, the obvious thing to do is to try to bisect it. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 16:48 ` Linus Torvalds @ 2015-03-20 16:53 ` David Ahern 2015-03-20 16:58 ` Linus Torvalds 0 siblings, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-20 16:53 UTC (permalink / raw) To: Linus Torvalds, David S. Miller; +Cc: linux-mm, LKML, sparclinux On 3/20/15 10:48 AM, Linus Torvalds wrote: > [ Added Davem and the sparc mailing list, since it happens on sparc > and that just makes me suspicious ] > > On Fri, Mar 20, 2015 at 8:07 AM, David Ahern <david.ahern@oracle.com> wrote: >> I can easily reproduce the panic below doing a kernel build with make -j N, >> N=128, 256, etc. This is a 1024 cpu system running 4.0.0-rc4. > > 3.19 is fine? Because I dont' think I've seen any reports like this > for others, and what stands out is sparc (and to a lesser degree "1024 > cpus", which obviously gets a lot less testing) I haven't tried 3.19 yet. Just backed up to 3.18 and it shows the same problem. And I can reproduce the 4.0 crash in a 128 cpu ldom (VM). > >> The top 3 frames are consistently: >> free_block+0x60 >> cache_flusharray+0xac >> kmem_cache_free+0xfc >> >> After that one path has been from __mmdrop and the others are like below, >> from remove_vma. >> >> Unable to handle kernel paging request at virtual address 0006100000000000 > > One thing you *might* check is if the problem goes away if you select > CONFIG_SLUB instead of CONFIG_SLAB. I'd really like to just get rid of > SLAB. The whole "we have multiple different allocators" is a mess and > causes test coverage issues. > > Apart from testing with CONFIG_SLUB, if 3.19 is ok and you seem to be > able to "easily reproduce" this, the obvious thing to do is to try to > bisect it. I'll try SLUB. The ldom reboots 1000 times faster then resetting the h/w so a better chance of bisecting - if I can find a known good release. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 16:53 ` David Ahern @ 2015-03-20 16:58 ` Linus Torvalds 2015-03-20 18:05 ` David Ahern ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Linus Torvalds @ 2015-03-20 16:58 UTC (permalink / raw) To: David Ahern; +Cc: David S. Miller, linux-mm, LKML, sparclinux On Fri, Mar 20, 2015 at 9:53 AM, David Ahern <david.ahern@oracle.com> wrote: > > I haven't tried 3.19 yet. Just backed up to 3.18 and it shows the same > problem. And I can reproduce the 4.0 crash in a 128 cpu ldom (VM). Ok, so if 3.18 also has it, then trying 3.19 is pointless, this is obviously an old problem. Which makes it even more likely that it's sparc-related. 128 cpu's is still "unusual", of course, but by no means unheard of, and I'f have expected others to report it too if it was wasy to trigger on x86-64. That said, SLAB is probably also almost unheard of in high-CPU configurations, since slub has all the magical unlocked lists etc for scalability. So maybe it's a generic SLAB bug, and nobody with lots of CPU's is testing SLAB. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 16:58 ` Linus Torvalds @ 2015-03-20 18:05 ` David Ahern 2015-03-20 18:53 ` Linus Torvalds 2015-03-20 19:47 ` David Miller 2015-03-20 19:42 ` David Miller 2015-03-20 20:01 ` Dave Hansen 2 siblings, 2 replies; 46+ messages in thread From: David Ahern @ 2015-03-20 18:05 UTC (permalink / raw) To: Linus Torvalds, linux-mm; +Cc: David S. Miller, LKML, sparclinux On 3/20/15 10:58 AM, Linus Torvalds wrote: > That said, SLAB is probably also almost unheard of in high-CPU > configurations, since slub has all the magical unlocked lists etc for > scalability. So maybe it's a generic SLAB bug, and nobody with lots of > CPU's is testing SLAB. > Evidently, it is a well known problem internally that goes back to at least 2.6.39. To this point I have not paid attention to the allocators. At what point is SLUB considered stable for large systems? Is 2.6.39 stable? As for SLAB it is not clear if this is a sparc only problem. Perhaps the config should have a warning? It looks like SLAB is still the default for most arch. DaveM: do you mind if I submit a patch to change the default for sparc to SLUB? Now that the monster is unleashed, off to other problems... Thanks, David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 18:05 ` David Ahern @ 2015-03-20 18:53 ` Linus Torvalds 2015-03-20 19:04 ` David Ahern 2015-03-20 19:47 ` David Miller 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-20 18:53 UTC (permalink / raw) To: David Ahern; +Cc: linux-mm, David S. Miller, LKML, sparclinux On Fri, Mar 20, 2015 at 11:05 AM, David Ahern <david.ahern@oracle.com> wrote: > > Evidently, it is a well known problem internally that goes back to at least > 2.6.39. > > To this point I have not paid attention to the allocators. At what point is > SLUB considered stable for large systems? Is 2.6.39 stable? SLUB should definitely be considered a stable allocator. It's the default allocator for at least Fedora, and that presumably means all of Redhat. SuSE seems to use SLAB still, though, so it must be getting lots of testing on x86 too. Did you test with SLUB? Does it work there? > As for SLAB it is not clear if this is a sparc only problem. Perhaps the > config should have a warning? It looks like SLAB is still the default for > most arch. I definitely think SLAB should work (although I *would* like to get rid of the duplicate allocators), and I still do think it's likely a sparc issue. Especially as it apparently goes way back. x86 gets a *lot* more testing, and I don't really recall anything like this. I'm not comfy enough reading sparc asm code to really pinpoint exactly what looks to go wrong, so I have very little input on what the problem could be. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 18:53 ` Linus Torvalds @ 2015-03-20 19:04 ` David Ahern 0 siblings, 0 replies; 46+ messages in thread From: David Ahern @ 2015-03-20 19:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-mm, David S. Miller, LKML, sparclinux On 3/20/15 12:53 PM, Linus Torvalds wrote: > SLUB should definitely be considered a stable allocator. It's the > default allocator for at least Fedora, and that presumably means all > of Redhat. > > SuSE seems to use SLAB still, though, so it must be getting lots of > testing on x86 too. > > Did you test with SLUB? Does it work there? sorry, forgot to add that detail in the last response: it works fine. No panics at all with SLUB. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 18:05 ` David Ahern 2015-03-20 18:53 ` Linus Torvalds @ 2015-03-20 19:47 ` David Miller 2015-03-20 19:54 ` David Ahern 1 sibling, 1 reply; 46+ messages in thread From: David Miller @ 2015-03-20 19:47 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, linux-mm, linux-kernel, sparclinux From: David Ahern <david.ahern@oracle.com> Date: Fri, 20 Mar 2015 12:05:05 -0600 > DaveM: do you mind if I submit a patch to change the default for sparc > to SLUB? I think we're jumping the gun about all of this, and doing anything with default Kconfig settings would be entirely premature until we know what the real bug is. On my T4-2 I've used nothing but SLAB and haven't hit any of these problems. I can't even remember the last time I turned SLUB on, and it's just because I'm lazy. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 19:47 ` David Miller @ 2015-03-20 19:54 ` David Ahern 2015-03-20 20:19 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-20 19:54 UTC (permalink / raw) To: David Miller; +Cc: torvalds, linux-mm, linux-kernel, sparclinux On 3/20/15 1:47 PM, David Miller wrote: > From: David Ahern <david.ahern@oracle.com> > Date: Fri, 20 Mar 2015 12:05:05 -0600 > >> DaveM: do you mind if I submit a patch to change the default for sparc >> to SLUB? > > I think we're jumping the gun about all of this, and doing anything > with default Kconfig settings would be entirely premature until we > know what the real bug is. The suggestion to change to SLUB as the default was based on Linus' comment "SLAB is probably also almost unheard of in high-CPU configurations, since slub has all the magical unlocked lists etc for scalability." > > On my T4-2 I've used nothing but SLAB and haven't hit any of these > problems. I can't even remember the last time I turned SLUB on, > and it's just because I'm lazy. > Interesting. With -j <64 and talking softly it completes. But -j 128 and higher always ends in a panic. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 19:54 ` David Ahern @ 2015-03-20 20:19 ` David Miller 0 siblings, 0 replies; 46+ messages in thread From: David Miller @ 2015-03-20 20:19 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, linux-mm, linux-kernel, sparclinux From: David Ahern <david.ahern@oracle.com> Date: Fri, 20 Mar 2015 13:54:09 -0600 > Interesting. With -j <64 and talking softly it completes. But -j 128 > and higher always ends in a panic. Please share more details of your configuration. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 16:58 ` Linus Torvalds 2015-03-20 18:05 ` David Ahern @ 2015-03-20 19:42 ` David Miller 2015-03-20 20:01 ` Dave Hansen 2 siblings, 0 replies; 46+ messages in thread From: David Miller @ 2015-03-20 19:42 UTC (permalink / raw) To: torvalds; +Cc: david.ahern, linux-mm, linux-kernel, sparclinux From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 20 Mar 2015 09:58:25 -0700 > 128 cpu's is still "unusual" As unusual as the system I do all of my kernel builds on :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 16:58 ` Linus Torvalds 2015-03-20 18:05 ` David Ahern 2015-03-20 19:42 ` David Miller @ 2015-03-20 20:01 ` Dave Hansen 2 siblings, 0 replies; 46+ messages in thread From: Dave Hansen @ 2015-03-20 20:01 UTC (permalink / raw) To: Linus Torvalds, David Ahern; +Cc: David S. Miller, linux-mm, LKML, sparclinux On 03/20/2015 09:58 AM, Linus Torvalds wrote: > 128 cpu's is still "unusual", of course, but by no means unheard of, > and I'f have expected others to report it too if it was wasy to > trigger on x86-64. FWIW, I configured a kernel with SLAB and kicked off a bunch of compiles on a 160-thread x86_64 system. It definitely doesn't die _quickly_. It's been running for an hour or two. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 15:07 4.0.0-rc4: panic in free_block David Ahern 2015-03-20 16:48 ` Linus Torvalds @ 2015-03-20 21:17 ` Linus Torvalds 2015-03-20 22:49 ` David Ahern 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-20 21:17 UTC (permalink / raw) To: David Ahern; +Cc: linux-mm, LKML On Fri, Mar 20, 2015 at 8:07 AM, David Ahern <david.ahern@oracle.com> wrote: > Instruction DUMP: 86230003 8730f00d 8728f006 <d658c007> 8600c007 8e0ac008 2ac1c002 c658e030 d458e028 Ok, so it's d658c007 that faults, which is that ldx [ %g3 + %g7 ], %o3 instruction. Looking at your objdump: > free_block(): > /opt/dahern/linux.git/kbuild/../mm/slab.c:3265 > 55de64: 10 68 00 47 b %xcc, 55df80 <free_block+0x158> > 55de68: 85 30 b0 02 srlx %g2, 2, %g2 > clear_obj_pfmemalloc(): > /opt/dahern/linux.git/kbuild/../mm/slab.c:224 > 55de6c: 98 0b 3f fe and %o4, -2, %o4 > 55de70: d8 76 40 00 stx %o4, [ %i1 ] > virt_to_head_page(): > /opt/dahern/linux.git/kbuild/../include/linux/mm.h:554 > 55de74: c6 5c 80 00 ldx [ %l2 ], %g3 > 55de78: ce 5c 40 00 ldx [ %l1 ], %g7 > 55de7c: 86 23 00 03 sub %o4, %g3, %g3 > 55de80: 87 30 f0 0d srlx %g3, 0xd, %g3 > 55de84: 87 28 f0 06 sllx %g3, 6, %g3 > test_bit(): > /opt/dahern/linux.git/kbuild/../include/asm-generic/bitops/non-atomic.h:105 > 55de88: d6 58 c0 07 ldx [ %g3 + %g7 ], %o3 > virt_to_head_page(): > /opt/dahern/linux.git/kbuild/../include/linux/mm.h:554 > 55de8c: 86 00 c0 07 add %g3, %g7, %g3 I think that's the load of "page->flags" which is almost certainly part of that initial page = virt_to_head_page(objp); at the top of the loop in free_block(). In fact, it's probably from compound_head_fast() which does if (unlikely(PageTail(page))) if (likely(__get_page_tail(page))) return; so I think it's about to test the PG_tail bit. So it's the virt_to_page(x) that has returned 0006100000000000. For sparc64, that's #define virt_to_page(kaddr) pfn_to_page(__pa(kaddr)>>PAGE_SHIFT) Looking at the code generation, I think %g7 (0x0006000000000000) is VMEMMAP_BASE, and %g3 is "pfn << 6", where the "<< 6" is because a "struct page" is 64 bytes. And looking at that sub %o4, %g3, %g3 I think that's "__pa(x)", so I think %o4 is 'x'. That also matches the "and %o4, -2, %o4", which would be the clear_obj_pfmemalloc(). And %o4 is 0. In other words, if I read that sparc asm right (and it is very likely that I do *not*), then "objp" is NULL, and that's why you crash. That's odd, because we know that objp cannot be NULL in kmem_slab_free() (even if we allowed it, like with kfree(), remove_vma() cannot possibly have a NULL vma, since ti dereferences it multiple times). So I must be misreading this completely. Somebody with better sparc debugging mojo should double-check my logic. How would objp be NULL? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 21:17 ` Linus Torvalds @ 2015-03-20 22:49 ` David Ahern 2015-03-21 0:18 ` David Ahern 0 siblings, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-20 22:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-mm, LKML On 3/20/15 3:17 PM, Linus Torvalds wrote: > In other words, if I read that sparc asm right (and it is very likely > that I do *not*), then "objp" is NULL, and that's why you crash. That does appear to be why. I put a WARN_ON before clear_obj_pfmemalloc() if objpp[i] is NULL. I got 2 splats during an 'allyesconfig' build and the system stayed up. > > That's odd, because we know that objp cannot be NULL in > kmem_slab_free() (even if we allowed it, like with kfree(), > remove_vma() cannot possibly have a NULL vma, since ti dereferences it > multiple times). > > So I must be misreading this completely. Somebody with better sparc > debugging mojo should double-check my logic. How would objp be NULL? I'll add checks to higher layers and see if it reveals anything. I did ask around and apparently this bug is hit only with the new M7 processors. DaveM: that's why you are not hitting this. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-20 22:49 ` David Ahern @ 2015-03-21 0:18 ` David Ahern 2015-03-21 0:34 ` David Rientjes 2015-03-21 0:47 ` Linus Torvalds 0 siblings, 2 replies; 46+ messages in thread From: David Ahern @ 2015-03-21 0:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-mm, LKML On 3/20/15 4:49 PM, David Ahern wrote: > On 3/20/15 3:17 PM, Linus Torvalds wrote: >> In other words, if I read that sparc asm right (and it is very likely >> that I do *not*), then "objp" is NULL, and that's why you crash. > > That does appear to be why. I put a WARN_ON before > clear_obj_pfmemalloc() if objpp[i] is NULL. I got 2 splats during an > 'allyesconfig' build and the system stayed up. > >> >> That's odd, because we know that objp cannot be NULL in >> kmem_slab_free() (even if we allowed it, like with kfree(), >> remove_vma() cannot possibly have a NULL vma, since ti dereferences it >> multiple times). >> >> So I must be misreading this completely. Somebody with better sparc >> debugging mojo should double-check my logic. How would objp be NULL? > > I'll add checks to higher layers and see if it reveals anything. > > I did ask around and apparently this bug is hit only with the new M7 > processors. DaveM: that's why you are not hitting this. Here's another data point: If I disable NUMA I don't see the problem. Performance drops, but no NULL pointer splats which would have been panics. The 128 cpu ldom with NUMA enabled shows the problem every single time I do a kernel compile (-j 128). With NUMA disabled I have done 3 allyesconfig compiles without hitting the problem. I'll put the compiles into a loop while I head out for dinner. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-21 0:18 ` David Ahern @ 2015-03-21 0:34 ` David Rientjes 2015-03-21 0:39 ` David Ahern 2015-03-21 0:47 ` Linus Torvalds 1 sibling, 1 reply; 46+ messages in thread From: David Rientjes @ 2015-03-21 0:34 UTC (permalink / raw) To: David Ahern; +Cc: Linus Torvalds, linux-mm, LKML On Fri, 20 Mar 2015, David Ahern wrote: > Here's another data point: If I disable NUMA I don't see the problem. > Performance drops, but no NULL pointer splats which would have been panics. > > The 128 cpu ldom with NUMA enabled shows the problem every single time I do a > kernel compile (-j 128). With NUMA disabled I have done 3 allyesconfig > compiles without hitting the problem. I'll put the compiles into a loop while > I head out for dinner. > It might be helpful to enable CONFIG_DEBUG_SLAB if you're reproducing it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-21 0:34 ` David Rientjes @ 2015-03-21 0:39 ` David Ahern 0 siblings, 0 replies; 46+ messages in thread From: David Ahern @ 2015-03-21 0:39 UTC (permalink / raw) To: David Rientjes; +Cc: Linus Torvalds, linux-mm, LKML On 3/20/15 6:34 PM, David Rientjes wrote: > On Fri, 20 Mar 2015, David Ahern wrote: > >> Here's another data point: If I disable NUMA I don't see the problem. >> Performance drops, but no NULL pointer splats which would have been panics. >> >> The 128 cpu ldom with NUMA enabled shows the problem every single time I do a >> kernel compile (-j 128). With NUMA disabled I have done 3 allyesconfig >> compiles without hitting the problem. I'll put the compiles into a loop while >> I head out for dinner. >> > > It might be helpful to enable CONFIG_DEBUG_SLAB if you're reproducing it. > I enabled that and a few other DEBUG configs earlier -- nothing popped out. I'll do it again -- and others when I return. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-21 0:18 ` David Ahern 2015-03-21 0:34 ` David Rientjes @ 2015-03-21 0:47 ` Linus Torvalds 2015-03-21 17:45 ` David Ahern 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-21 0:47 UTC (permalink / raw) To: David Ahern; +Cc: linux-mm, LKML On Fri, Mar 20, 2015 at 5:18 PM, David Ahern <david.ahern@oracle.com> wrote: > On 3/20/15 4:49 PM, David Ahern wrote: >> >> I did ask around and apparently this bug is hit only with the new M7 >> processors. DaveM: that's why you are not hitting this. Quite frankly, this smells even more like an architecture bug. It could be anywhere: it could be a CPU memory ordering issue, a compiler bug, or a missing barrier or other thing. How confident are you in the M7 memory ordering rules? It's a fairly new core, no? With new speculative reads etc? Maybe the Linux spinlocks don't have the right serialization, and more aggressive reordering in the new core shows a bug? Looking at this code, if this is a race, I see a few things that are worth checking out - it does a very much overlapping "memmove()". The sparc/lib/memmove.S file looks suspiciously bad (is that a byte-at-a-time loop? Is it even correctly checking overlap?) - it relies on both percpu data and a spinlock. I'm sure the sparc spinlock code has been tested *extensively* with old cores, but maybe some new speculative read ends up breaking them? I'm assuming M7 still TSO and 'ldsub' has acquire semantics? Is it configurable like some sparc versions? I'm wondering whether the Solaris locks might have some extra memory barriers due to supporting the other (weaker) sparc memory models, and maybe they hid some M7 "feature" by mistake... *Some* of the sparc memcpy routines have odd membar's in them. Why would a TSO machine need a memory barrier inside a memcpy. That just makes me go "Ehh?" > Here's another data point: If I disable NUMA I don't see the problem. > Performance drops, but no NULL pointer splats which would have been panics. So the NUMA case triggers the per-node "n->shared" logic, which *should* be protected by "n->list_lock". Maybe there is some bug there - but since that code seems to do ok on x86-64 (and apparently older sparc too), I really would look at arch-specific issues first. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-21 0:47 ` Linus Torvalds @ 2015-03-21 17:45 ` David Ahern 2015-03-21 18:49 ` Linus Torvalds 0 siblings, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-21 17:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-mm, LKML On 3/20/15 6:47 PM, Linus Torvalds wrote: > >> Here's another data point: If I disable NUMA I don't see the problem. >> Performance drops, but no NULL pointer splats which would have been panics. > > So the NUMA case triggers the per-node "n->shared" logic, which > *should* be protected by "n->list_lock". Maybe there is some bug there > - but since that code seems to do ok on x86-64 (and apparently older > sparc too), I really would look at arch-specific issues first. You raise a lot of valid questions and something to look into. But if the root cause were such a fundamental issue (CPU memory ordering, compiler bug, etc) why would it only occur on this one code path -- free with SLAB and NUMA -- and so consistently? Continuing to poke around, but open to any suggestions. I have enabled every DEBUG I can find in the memory code and nothing is popping out. In terms of races wouldn't all the DEBUG checks affect timing? Yet, I am still seeing the same stack traces due to the same root cause. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-21 17:45 ` David Ahern @ 2015-03-21 18:49 ` Linus Torvalds 2015-03-22 17:36 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-21 18:49 UTC (permalink / raw) To: David Ahern, David Miller, sparclinux; +Cc: linux-mm, LKML On Sat, Mar 21, 2015 at 10:45 AM, David Ahern <david.ahern@oracle.com> wrote: > > You raise a lot of valid questions and something to look into. But if the > root cause were such a fundamental issue (CPU memory ordering, compiler bug, > etc) why would it only occur on this one code path -- free with SLAB and > NUMA -- and so consistently? So the consistency could easily come from a compiler bug (or a missing barrier in the kernel code) that just happens to trigger in a single place (or in a few places, but then that's the only place that gets exercised heavily enough to show it). I agree that an actual hardware bug is unlikely, although that too is possible: I can pretty much guarantee that if it were a CPU bug, it wouldn't be some "memory ordering is entirely broken" bug in general, it would be some very specific case that only happens with just the right instruction timing and mix. That said, while I bring up a CPU bug as a possibility, I really do agree that it is *very* unlikely. Memory ordering is hard, and yes, you can get it wrong, but at the same time CPU designers very much know about it and tend to be pretty damn good about it. And as you say, it generally wouldn't be *that* consistent. It might be consistent for one particular kernel build (due to very particular instruction mix and timings), but over lots of versions of the code and many different debug options? Very very very unlikely. > Continuing to poke around, but open to any suggestions. I have enabled every > DEBUG I can find in the memory code and nothing is popping out. In terms of > races wouldn't all the DEBUG checks affect timing? Yet, I am still seeing > the same stack traces due to the same root cause. Yes, generally debug options would change timings sufficiently that any particular low-level race would certainly go away or at least become much harder to hit. So if you have enabled spinlock debugging etc, I don't really believe in a hw bug. It's more likely that there is some kernel architecture-specific code that triggers it. Or even generic code that just happens to work on other cases due to random issues (ie memory alignment etc). I *would* suggest looking at that "memmove()" code. It really looks like crap. It seems to do things byte-at-a-time for the overlapping case, and the code seems to depend on memcpy always doing things low-to-high, but there are multiple different memcpy implementations so I don't know that that is always true. If one of the memcpy functions sometimes copies the other way depending on size etc, it could screw up. Basically, that sparc64 memmove() implementation looks like it was written by a dyslexic 5-year-old as a throw-away hack, and then never got fixed. Davem? I don't read sparc assembly, so I'm *really* not going to try to verify that (a) all the memcpy implementations always copy low-to-high and (b) that I even read the address comparisons in memmove.S right. I mention memmove just because it's actually fairly unusual for the kernel. At the same time, if it really is broken for overlapping regions, I'd expect *some* other places to show breakage too. So it's probably fine, even if it does look very very bad to do things one byte at a time backwards as a fallback. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-21 18:49 ` Linus Torvalds @ 2015-03-22 17:36 ` David Miller 2015-03-22 19:25 ` Bob Picco 2015-03-22 19:47 ` Linus Torvalds 0 siblings, 2 replies; 46+ messages in thread From: David Miller @ 2015-03-22 17:36 UTC (permalink / raw) To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel From: Linus Torvalds <torvalds@linux-foundation.org> Date: Sat, 21 Mar 2015 11:49:12 -0700 > Davem? I don't read sparc assembly, so I'm *really* not going to try > to verify that (a) all the memcpy implementations always copy > low-to-high and (b) that I even read the address comparisons in > memmove.S right. All of the sparc memcpy implementations copy from low to high. I'll eat my hat if they don't. :-) The guard tests at the beginning of memmove() are saying: if (dst <= src) memcpy(...); if (src + len <= dst) memcpy(...); And then the reverse copy loop (and we do have to copy in reverse for correctness) is basically: src = (src + len - 1); dst = (dst + len - 1); 1: tmp = *(u8 *)src; len -= 1; src -= 1; *(u8 *)dst = tmp; dst -= 1; if (len != 0) goto 1b; And then we return the original 'dst' pointer. So at first glance it looks at least correct. memmove() is a good idea to look into though, as SLAB and SLUB are the only really heavy users of it, and they do so with overlapping contents. And they end up using that byte-at-a-time code, since SLAB and SLUB do mmemove() calls of the form: memmove(X + N, X, LEN); In which case neither of the memcpy() guard tests will pass. Maybe there is some subtle bug in there I just don't see right now. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 17:36 ` David Miller @ 2015-03-22 19:25 ` Bob Picco 2015-03-22 19:47 ` Linus Torvalds 1 sibling, 0 replies; 46+ messages in thread From: Bob Picco @ 2015-03-22 19:25 UTC (permalink / raw) To: David Miller; +Cc: torvalds, david.ahern, sparclinux, linux-mm, linux-kernel David Miller wrote: [Sun Mar 22 2015, 01:36:03PM EDT] > From: Linus Torvalds <torvalds@linux-foundation.org> > Date: Sat, 21 Mar 2015 11:49:12 -0700 > > > Davem? I don't read sparc assembly, so I'm *really* not going to try > > to verify that (a) all the memcpy implementations always copy > > low-to-high and (b) that I even read the address comparisons in > > memmove.S right. > > All of the sparc memcpy implementations copy from low to high. > I'll eat my hat if they don't. :-) > > The guard tests at the beginning of memmove() are saying: > > if (dst <= src) > memcpy(...); > if (src + len <= dst) > memcpy(...); > > And then the reverse copy loop (and we do have to copy in reverse for > correctness) is basically: > > src = (src + len - 1); > dst = (dst + len - 1); > > 1: tmp = *(u8 *)src; > len -= 1; > src -= 1; > *(u8 *)dst = tmp; > dst -= 1; > if (len != 0) > goto 1b; > > And then we return the original 'dst' pointer. > > So at first glance it looks at least correct. > > memmove() is a good idea to look into though, as SLAB and SLUB are the > only really heavy users of it, and they do so with overlapping > contents. > > And they end up using that byte-at-a-time code, since SLAB and SLUB > do mmemove() calls of the form: > > memmove(X + N, X, LEN); > > In which case neither of the memcpy() guard tests will pass. > > Maybe there is some subtle bug in there I just don't see right now. My original pursuit of this issue focused on transfers to and from the shared array. Basically substituting memcpy-s with a primitive unsigned long memory mover. This might have been incorrect. There were substantial doubts because of large modifications to 2.6.39 too. Unstabile hardware cause(d|s) issue too. Eliminating the shared array functions correctly. Though this removal changes performance and timing dramatically. This afternoon I included modification of two memmove-s and no issue thus far. The issue APPEARS to come from memmove-s within cache_flusharray() and/or drain_array(). Now we are covering moves within an array_cache. The above was done on 2.6.39. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 17:36 ` David Miller 2015-03-22 19:25 ` Bob Picco @ 2015-03-22 19:47 ` Linus Torvalds 2015-03-22 22:23 ` David Miller 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-22 19:47 UTC (permalink / raw) To: David Miller; +Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List On Sun, Mar 22, 2015 at 10:36 AM, David Miller <davem@davemloft.net> wrote: > > And they end up using that byte-at-a-time code, since SLAB and SLUB > do mmemove() calls of the form: > > memmove(X + N, X, LEN); Actually, the common case in slab is overlapping but of the form memmove(p, p+x, len); which goes to memcpy. It's basically re-compacting the array at the beginning. Which was why I was asking how sure you are that memcpy *always* copies from low to high. I don't even know which version of memcpy ends up being used on M7. Some of them do things like use VIS. I can follow some regular sparc asm, there's no way I'm even *looking* at that. Is it really ok to use VIS registers in random contexts? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 19:47 ` Linus Torvalds @ 2015-03-22 22:23 ` David Miller 2015-03-22 23:35 ` David Ahern 2015-03-22 23:49 ` Linus Torvalds 0 siblings, 2 replies; 46+ messages in thread From: David Miller @ 2015-03-22 22:23 UTC (permalink / raw) To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel From: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun, 22 Mar 2015 12:47:08 -0700 > Which was why I was asking how sure you are that memcpy *always* > copies from low to high. Yeah I'm pretty sure. > I don't even know which version of memcpy ends up being used on M7. > Some of them do things like use VIS. I can follow some regular sparc > asm, there's no way I'm even *looking* at that. Is it really ok to use > VIS registers in random contexts? Yes, using VIS how we do is alright, and in fact I did an audit of this about 1 year ago. This is another one of those "if this is wrong, so much stuff would break" The only thing funny some of these routines do is fetch 2 64-byte blocks of data ahead in the inner loops, but that should be fine right? On the M7 we'll use the Niagara-4 memcpy. Hmmm... I'll run this silly sparc kernel memmove through the glibc testsuite and see if it barfs. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 22:23 ` David Miller @ 2015-03-22 23:35 ` David Ahern 2015-03-22 23:54 ` David Miller 2015-03-22 23:49 ` Linus Torvalds 1 sibling, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-22 23:35 UTC (permalink / raw) To: David Miller, torvalds; +Cc: sparclinux, linux-mm, linux-kernel, Bob Picco On 3/22/15 4:23 PM, David Miller wrote: >> I don't even know which version of memcpy ends up being used on M7. >> Some of them do things like use VIS. I can follow some regular sparc >> asm, there's no way I'm even *looking* at that. Is it really ok to use >> VIS registers in random contexts? > > Yes, using VIS how we do is alright, and in fact I did an audit of > this about 1 year ago. This is another one of those "if this is > wrong, so much stuff would break" > > The only thing funny some of these routines do is fetch 2 64-byte > blocks of data ahead in the inner loops, but that should be fine > right? > > On the M7 we'll use the Niagara-4 memcpy. > > Hmmm... I'll run this silly sparc kernel memmove through the glibc > testsuite and see if it barfs. > I don't know if you caught Bob's message; he has a hack to bypass memcpy and memmove in mm/slab.c use a for loop to move entries. With the hack he is not seeing the problem. This is the hack: +static void move_entries(void *dest, void *src, int nr) +{ + unsigned long *dp = dest; + unsigned long *sp = src; + + for (; nr; nr--, dp++, sp++) + *dp = *sp; +} + and then replace the mempy and memmove calls in transfer_objects, cache_flusharray and drain_array to use move_entries. I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it clearly suggests the memcpy or memmove are the root cause. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 23:35 ` David Ahern @ 2015-03-22 23:54 ` David Miller 2015-03-23 0:03 ` David Ahern 0 siblings, 1 reply; 46+ messages in thread From: David Miller @ 2015-03-22 23:54 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco From: David Ahern <david.ahern@oracle.com> Date: Sun, 22 Mar 2015 17:35:49 -0600 > I don't know if you caught Bob's message; he has a hack to bypass > memcpy and memmove in mm/slab.c use a for loop to move entries. With > the hack he is not seeing the problem. > > This is the hack: > > +static void move_entries(void *dest, void *src, int nr) > +{ > + unsigned long *dp = dest; > + unsigned long *sp = src; > + > + for (; nr; nr--, dp++, sp++) > + *dp = *sp; > +} > + > > and then replace the mempy and memmove calls in transfer_objects, > cache_flusharray and drain_array to use move_entries. > > I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it > clearly suggests the memcpy or memmove are the root cause. Thanks, didn't notice that. So, something is amuck. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 23:54 ` David Miller @ 2015-03-23 0:03 ` David Ahern 2015-03-23 2:00 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-23 0:03 UTC (permalink / raw) To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco On 3/22/15 5:54 PM, David Miller wrote: >> I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it >> clearly suggests the memcpy or memmove are the root cause. > > Thanks, didn't notice that. > > So, something is amuck. to continue to refine the problem ... I modified only the memmove lines (not the memcpy) and it works fine. So its the memmove. I'm sure this will get whitespaced damaged on the copy and paste but to be clear this is the patch I am currently running and system is stable. On Friday it failed on every single; with this patch I have allyesconfig builds with -j 128 in a loop (clean in between) and nothing -- no panics. diff --git a/mm/slab.c b/mm/slab.c index c4b89ea..f5e9716 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -802,6 +802,16 @@ static inline void ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac, ac->entry[ac->avail++] = objp; } +static void move_entries(void *dest, void *src, int nr) +{ + unsigned long *dp = dest; + unsigned long *sp = src; + + for (; nr; nr--, dp++, sp++) + *dp = *sp; +} + + /* * Transfer objects in one arraycache to another. * Locking must be handled by the caller. @@ -3344,7 +3354,7 @@ free_done: spin_unlock(&n->list_lock); slabs_destroy(cachep, &list); ac->avail -= batchcount; - memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail); + move_entries(ac->entry, &(ac->entry[batchcount]), ac->avail); } /* @@ -3817,8 +3827,7 @@ static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n, tofree = (ac->avail + 1) / 2; free_block(cachep, ac->entry, tofree, node, &list); ac->avail -= tofree; - memmove(ac->entry, &(ac->entry[tofree]), - sizeof(void *) * ac->avail); + move_entries(ac->entry, &(ac->entry[tofree]), ac->avail); } spin_unlock_irq(&n->list_lock); slabs_destroy(cachep, &list); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 0:03 ` David Ahern @ 2015-03-23 2:00 ` David Miller 2015-03-23 2:19 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: David Miller @ 2015-03-23 2:00 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco From: David Ahern <david.ahern@oracle.com> Date: Sun, 22 Mar 2015 18:03:30 -0600 > On 3/22/15 5:54 PM, David Miller wrote: >>> I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it >>> clearly suggests the memcpy or memmove are the root cause. >> >> Thanks, didn't notice that. >> >> So, something is amuck. > > to continue to refine the problem ... I modified only the memmove > lines (not the memcpy) and it works fine. So its the memmove. > > I'm sure this will get whitespaced damaged on the copy and paste but > to be clear this is the patch I am currently running and system is > stable. On Friday it failed on every single; with this patch I have > allyesconfig builds with -j 128 in a loop (clean in between) and > nothing -- no panics. Can you just try calling memcpy(), that should work because I think we agree that if the memcpy() implementation copies from low to high it should work. I wonder if the triggering factor is configuring for a high number of cpus. I always have NR_CPUS=128 since that's the largest machine I have. I'll give NR_CPUS=1024 a spin. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 2:00 ` David Miller @ 2015-03-23 2:19 ` David Miller 2015-03-23 16:25 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: David Miller @ 2015-03-23 2:19 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco Nevermind I think I figured out the problem. It's the cache initializing stores, we can't do overlapping copies where dst <= src in all cases because of them. A store to a address modulo the cache line size (which for these instructions is 64 bytes), clears that whole line. But when we're doing these memmove() calls in SLAB/SLUB, we can clear some bytes at the end of the line before they've been read in. And reading over NG4memcpy, this _can_ happen, the main unrolled loop begins like this: load src + 0x00 load src + 0x08 load src + 0x10 load src + 0x18 load src + 0x20 store dst + 0x00 Assume dst is 64 byte aligned and let's say that dst is src - 8 for this memcpy() call, right? That store at the end there is the one to the first line in the cache line, thus clearing the whole line, which thus clobbers "src + 0x28" before it even gets loaded. I'm pretty sure this is what's happening. And it's only going to trigger if the memcpy() is 128 bytes or larger. I'll work on a fix. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 2:19 ` David Miller @ 2015-03-23 16:25 ` David Miller 2015-03-23 16:51 ` John Stoffel ` (3 more replies) 0 siblings, 4 replies; 46+ messages in thread From: David Miller @ 2015-03-23 16:25 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco From: David Miller <davem@davemloft.net> Date: Sun, 22 Mar 2015 22:19:06 -0400 (EDT) > I'll work on a fix. Ok, here is what I committed. David et al., let me know if you still see the crashes with this applied. Of course, I'll queue this up for -stable as well. Thanks! ==================== [PATCH] sparc64: Fix several bugs in memmove(). Firstly, handle zero length calls properly. Believe it or not there are a few of these happening during early boot. Next, we can't just drop to a memcpy() call in the forward copy case where dst <= src. The reason is that the cache initializing stores used in the Niagara memcpy() implementations can end up clearing out cache lines before we've sourced their original contents completely. For example, considering NG4memcpy, the main unrolled loop begins like this: load src + 0x00 load src + 0x08 load src + 0x10 load src + 0x18 load src + 0x20 store dst + 0x00 Assume dst is 64 byte aligned and let's say that dst is src - 8 for this memcpy() call. That store at the end there is the one to the first line in the cache line, thus clearing the whole line, which thus clobbers "src + 0x28" before it even gets loaded. To avoid this, just fall through to a simple copy only mildly optimized for the case where src and dst are 8 byte aligned and the length is a multiple of 8 as well. We could get fancy and call GENmemcpy() but this is good enough for how this thing is actually used. Reported-by: David Ahern <david.ahern@oracle.com> Reported-by: Bob Picco <bpicco@meloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> --- arch/sparc/lib/memmove.S | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/sparc/lib/memmove.S b/arch/sparc/lib/memmove.S index b7f6334..857ad4f 100644 --- a/arch/sparc/lib/memmove.S +++ b/arch/sparc/lib/memmove.S @@ -8,9 +8,11 @@ .text ENTRY(memmove) /* o0=dst o1=src o2=len */ - mov %o0, %g1 + brz,pn %o2, 99f + mov %o0, %g1 + cmp %o0, %o1 - bleu,pt %xcc, memcpy + bleu,pt %xcc, 2f add %o1, %o2, %g7 cmp %g7, %o0 bleu,pt %xcc, memcpy @@ -24,7 +26,34 @@ ENTRY(memmove) /* o0=dst o1=src o2=len */ stb %g7, [%o0] bne,pt %icc, 1b sub %o0, 1, %o0 - +99: retl mov %g1, %o0 + + /* We can't just call memcpy for these memmove cases. On some + * chips the memcpy uses cache initializing stores and when dst + * and src are close enough, those can clobber the source data + * before we've loaded it in. + */ +2: or %o0, %o1, %g7 + or %o2, %g7, %g7 + andcc %g7, 0x7, %g0 + bne,pn %xcc, 4f + nop + +3: ldx [%o1], %g7 + add %o1, 8, %o1 + subcc %o2, 8, %o2 + add %o0, 8, %o0 + bne,pt %icc, 3b + stx %g7, [%o0 - 0x8] + ba,a,pt %xcc, 99b + +4: ldub [%o1], %g7 + add %o1, 1, %o1 + subcc %o2, 1, %o2 + add %o0, 1, %o0 + bne,pt %icc, 4b + stb %g7, [%o0 - 0x1] + ba,a,pt %xcc, 99b ENDPROC(memmove) -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 16:25 ` David Miller @ 2015-03-23 16:51 ` John Stoffel 2015-03-23 19:16 ` David Miller 2015-03-23 17:00 ` Linus Torvalds ` (2 subsequent siblings) 3 siblings, 1 reply; 46+ messages in thread From: John Stoffel @ 2015-03-23 16:51 UTC (permalink / raw) To: David Miller Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco David> ==================== David> [PATCH] sparc64: Fix several bugs in memmove(). David> Firstly, handle zero length calls properly. Believe it or not there David> are a few of these happening during early boot. David> Next, we can't just drop to a memcpy() call in the forward copy case David> where dst <= src. The reason is that the cache initializing stores David> used in the Niagara memcpy() implementations can end up clearing out David> cache lines before we've sourced their original contents completely. David> For example, considering NG4memcpy, the main unrolled loop begins like David> this: David> load src + 0x00 David> load src + 0x08 David> load src + 0x10 David> load src + 0x18 David> load src + 0x20 David> store dst + 0x00 David> Assume dst is 64 byte aligned and let's say that dst is src - 8 for David> this memcpy() call. That store at the end there is the one to the David> first line in the cache line, thus clearing the whole line, which thus David> clobbers "src + 0x28" before it even gets loaded. David> To avoid this, just fall through to a simple copy only mildly David> optimized for the case where src and dst are 8 byte aligned and the David> length is a multiple of 8 as well. We could get fancy and call David> GENmemcpy() but this is good enough for how this thing is actually David> used. Would it make sense to have some memmove()/memcopy() tests on bootup to catch problems like this? I know this is a strange case, and probably not too common, but how hard would it be to wire up tests that go through 1 to 128 byte memmove() on bootup to make sure things work properly? This seems like one of those critical, but subtle things to be checked. And doing it only on bootup wouldn't slow anything down and would (ideally) automatically get us coverage when people add new archs or update the code. John -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 16:51 ` John Stoffel @ 2015-03-23 19:16 ` David Miller 2015-03-23 19:56 ` John Stoffel 0 siblings, 1 reply; 46+ messages in thread From: David Miller @ 2015-03-23 19:16 UTC (permalink / raw) To: john; +Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco From: "John Stoffel" <john@stoffel.org> Date: Mon, 23 Mar 2015 12:51:03 -0400 > Would it make sense to have some memmove()/memcopy() tests on bootup > to catch problems like this? I know this is a strange case, and > probably not too common, but how hard would it be to wire up tests > that go through 1 to 128 byte memmove() on bootup to make sure things > work properly? > > This seems like one of those critical, but subtle things to be > checked. And doing it only on bootup wouldn't slow anything down and > would (ideally) automatically get us coverage when people add new > archs or update the code. One of two things is already happening. There have been assembler memcpy/memset development test harnesses around that most arch developers are using, and those test things rather extensively. Also, the memcpy/memset routines on sparc in particular are completely shared with glibc, we use the same exact code in both trees. So it's getting tested there too. memmove() is just not handled this way. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 19:16 ` David Miller @ 2015-03-23 19:56 ` John Stoffel 2015-03-23 20:08 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: John Stoffel @ 2015-03-23 19:56 UTC (permalink / raw) To: David Miller Cc: john, david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco >>>>> "David" == David Miller <davem@davemloft.net> writes: David> From: "John Stoffel" <john@stoffel.org> David> Date: Mon, 23 Mar 2015 12:51:03 -0400 >> Would it make sense to have some memmove()/memcopy() tests on bootup >> to catch problems like this? I know this is a strange case, and >> probably not too common, but how hard would it be to wire up tests >> that go through 1 to 128 byte memmove() on bootup to make sure things >> work properly? >> >> This seems like one of those critical, but subtle things to be >> checked. And doing it only on bootup wouldn't slow anything down and >> would (ideally) automatically get us coverage when people add new >> archs or update the code. David> One of two things is already happening. David> There have been assembler memcpy/memset development test harnesses David> around that most arch developers are using, and those test things David> rather extensively. David> Also, the memcpy/memset routines on sparc in particular are completely David> shared with glibc, we use the same exact code in both trees. So it's David> getting tested there too. Thats' good to know. I wasn't sure. David> memmove() is just not handled this way. Bummers. So why isn't this covered by the glibc tests too? Not accusing, not at all! Just wondering. Thanks for all your work David, I've been amazed at your energy here! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 19:56 ` John Stoffel @ 2015-03-23 20:08 ` David Miller 0 siblings, 0 replies; 46+ messages in thread From: David Miller @ 2015-03-23 20:08 UTC (permalink / raw) To: john; +Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco From: "John Stoffel" <john@stoffel.org> Date: Mon, 23 Mar 2015 15:56:02 -0400 >>>>>> "David" == David Miller <davem@davemloft.net> writes: > > David> From: "John Stoffel" <john@stoffel.org> > David> Date: Mon, 23 Mar 2015 12:51:03 -0400 > >>> Would it make sense to have some memmove()/memcopy() tests on bootup >>> to catch problems like this? I know this is a strange case, and >>> probably not too common, but how hard would it be to wire up tests >>> that go through 1 to 128 byte memmove() on bootup to make sure things >>> work properly? >>> >>> This seems like one of those critical, but subtle things to be >>> checked. And doing it only on bootup wouldn't slow anything down and >>> would (ideally) automatically get us coverage when people add new >>> archs or update the code. > > David> One of two things is already happening. > > David> There have been assembler memcpy/memset development test harnesses > David> around that most arch developers are using, and those test things > David> rather extensively. > > David> Also, the memcpy/memset routines on sparc in particular are completely > David> shared with glibc, we use the same exact code in both trees. So it's > David> getting tested there too. > > Thats' good to know. I wasn't sure. > > David> memmove() is just not handled this way. > > Bummers. So why isn't this covered by the glibc tests too? Because the kernel's memmove() is different from the one we use in glibc on sparc. In fact, we use the generic C version in glibc which expands to forward and backward word copies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 16:25 ` David Miller 2015-03-23 16:51 ` John Stoffel @ 2015-03-23 17:00 ` Linus Torvalds 2015-03-23 19:08 ` David Miller 2015-03-23 17:34 ` David Ahern 2015-03-24 14:57 ` Bob Picco 3 siblings, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-23 17:00 UTC (permalink / raw) To: David Miller Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List, bpicco On Mon, Mar 23, 2015 at 9:25 AM, David Miller <davem@davemloft.net> wrote: > > Ok, here is what I committed. So I wonder - looking at that assembly, I get the feeling that it isn't any better code than gcc could generate from simple C code. Would it perhaps be better to turn memmove() into C? That's particularly true because if I read this code right, it now seems to seriously pessimise non-overlapping memmove's, in that it now *always* uses that slow downward copy if the destination is below the source. Now, admittedly, the kernel doesn't use a lot of memmov's, but this still falls back on the "byte at a time" model for a lot of cases (all non-64-bit-aligned ones). I could imagine those existing. And some people (reasonably) hate memcpy because they've been burnt by the overlapping case and end up using memmove as a "safe alternative", so it's not necessarily just the overlapping case that might trigger this. Maybe the code could be something like void *memmove(void *dst, const void *src, size_t n); { // non-overlapping cases if (src + n <= dst) return memcpy(dst, src, n); if (dst + n <= src) return memcpy(dst, src, n); // overlapping, but we know we // (a) copy upwards // (b) initialize the result in at most chunks of 64 if (dst+64 <= src) return memcpy(dst, src, n); .. do the backwards thing .. } (ok, maybe I got it wrong, but you get the idea). I *think* gcc should do ok on the above kind of code, and not generate wildly different code from your handcoded version. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 17:00 ` Linus Torvalds @ 2015-03-23 19:08 ` David Miller 2015-03-23 19:47 ` Linus Torvalds 0 siblings, 1 reply; 46+ messages in thread From: David Miller @ 2015-03-23 19:08 UTC (permalink / raw) To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel, bpicco From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 23 Mar 2015 10:00:02 -0700 > Maybe the code could be something like > > void *memmove(void *dst, const void *src, size_t n); > { > // non-overlapping cases > if (src + n <= dst) > return memcpy(dst, src, n); > if (dst + n <= src) > return memcpy(dst, src, n); > > // overlapping, but we know we > // (a) copy upwards > // (b) initialize the result in at most chunks of 64 > if (dst+64 <= src) > return memcpy(dst, src, n); > > .. do the backwards thing .. > } > > (ok, maybe I got it wrong, but you get the idea). > > I *think* gcc should do ok on the above kind of code, and not generate > wildly different code from your handcoded version. Sure you could do that in C, but I really want to avoid using memcpy() if dst and src overlap in any way at all. Said another way, I don't want to codify that "64" thing. The next chip could do 128 byte initializing stores. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 19:08 ` David Miller @ 2015-03-23 19:47 ` Linus Torvalds 2015-03-23 19:52 ` David Miller 0 siblings, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-23 19:47 UTC (permalink / raw) To: David Miller Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List, bpicco On Mon, Mar 23, 2015 at 12:08 PM, David Miller <davem@davemloft.net> wrote: > > Sure you could do that in C, but I really want to avoid using memcpy() > if dst and src overlap in any way at all. > > Said another way, I don't want to codify that "64" thing. The next > chip could do 128 byte initializing stores. But David, THAT IS NOT WHAT YOUR BROKEN ASM DOES ANYWAY! Read it again. Your asm code does not check for overlap. Look at this: cmp %o0, %o1 bleu,pt %xcc, 2f and ponder. It's wrong. So even if you don't want to take that "allow overlap more than 64 bytes apart" thing, my C version actually is *better* than the broken asm version you have. The new asm version is better than the old one, because the new breakage is about really bad performance rather than actively breaking, but still.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 19:47 ` Linus Torvalds @ 2015-03-23 19:52 ` David Miller 0 siblings, 0 replies; 46+ messages in thread From: David Miller @ 2015-03-23 19:52 UTC (permalink / raw) To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel, bpicco From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 23 Mar 2015 12:47:49 -0700 > On Mon, Mar 23, 2015 at 12:08 PM, David Miller <davem@davemloft.net> wrote: >> >> Sure you could do that in C, but I really want to avoid using memcpy() >> if dst and src overlap in any way at all. >> >> Said another way, I don't want to codify that "64" thing. The next >> chip could do 128 byte initializing stores. > > But David, THAT IS NOT WHAT YOUR BROKEN ASM DOES ANYWAY! > > Read it again. Your asm code does not check for overlap. Look at this: > > cmp %o0, %o1 > bleu,pt %xcc, 2f > > and ponder. It's wrong. Right, it's not checking for overlap. It's checking for "does a forward copy work?" That's the standard test for this, and it's what glibc uses in it's generic memmove() implementation FWIW. (granted, I know glibc is not generally a good source for "right way to do things :-) > The new asm version is better than the old one, because the new > breakage is about really bad performance rather than actively > breaking, but still.. I accept that it's suboptimal. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 16:25 ` David Miller 2015-03-23 16:51 ` John Stoffel 2015-03-23 17:00 ` Linus Torvalds @ 2015-03-23 17:34 ` David Ahern 2015-03-23 19:35 ` David Miller 2015-03-24 14:57 ` Bob Picco 3 siblings, 1 reply; 46+ messages in thread From: David Ahern @ 2015-03-23 17:34 UTC (permalink / raw) To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco On 3/23/15 10:25 AM, David Miller wrote: > [PATCH] sparc64: Fix several bugs in memmove(). > > Firstly, handle zero length calls properly. Believe it or not there > are a few of these happening during early boot. > > Next, we can't just drop to a memcpy() call in the forward copy case > where dst <= src. The reason is that the cache initializing stores > used in the Niagara memcpy() implementations can end up clearing out > cache lines before we've sourced their original contents completely. > > For example, considering NG4memcpy, the main unrolled loop begins like > this: > > load src + 0x00 > load src + 0x08 > load src + 0x10 > load src + 0x18 > load src + 0x20 > store dst + 0x00 > > Assume dst is 64 byte aligned and let's say that dst is src - 8 for > this memcpy() call. That store at the end there is the one to the > first line in the cache line, thus clearing the whole line, which thus > clobbers "src + 0x28" before it even gets loaded. > > To avoid this, just fall through to a simple copy only mildly > optimized for the case where src and dst are 8 byte aligned and the > length is a multiple of 8 as well. We could get fancy and call > GENmemcpy() but this is good enough for how this thing is actually > used. > > Reported-by: David Ahern <david.ahern@oracle.com> > Reported-by: Bob Picco <bpicco@meloft.net> > Signed-off-by: David S. Miller <davem@davemloft.net> seems like a formality at this point, but this resolves the panic on the M7-based ldom and baremetal. The T5-8 failed to boot, but it could be a different problem. Thanks for the fast turnaround, David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 17:34 ` David Ahern @ 2015-03-23 19:35 ` David Miller 2015-03-23 19:58 ` David Ahern 2015-03-24 1:01 ` David Ahern 0 siblings, 2 replies; 46+ messages in thread From: David Miller @ 2015-03-23 19:35 UTC (permalink / raw) To: david.ahern; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco From: David Ahern <david.ahern@oracle.com> Date: Mon, 23 Mar 2015 11:34:34 -0600 > seems like a formality at this point, but this resolves the panic on > the M7-based ldom and baremetal. The T5-8 failed to boot, but it could > be a different problem. Specifically, does the T5-8 boot without my patch applied? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 19:35 ` David Miller @ 2015-03-23 19:58 ` David Ahern 2015-03-24 1:01 ` David Ahern 1 sibling, 0 replies; 46+ messages in thread From: David Ahern @ 2015-03-23 19:58 UTC (permalink / raw) To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco On 3/23/15 1:35 PM, David Miller wrote: > From: David Ahern <david.ahern@oracle.com> > Date: Mon, 23 Mar 2015 11:34:34 -0600 > >> seems like a formality at this point, but this resolves the panic on >> the M7-based ldom and baremetal. The T5-8 failed to boot, but it could >> be a different problem. > > Specifically, does the T5-8 boot without my patch applied? > I am running around in circles with it... it takes 15 minutes after a hard reset to get logged in, and I forgot that the 2.6.39 can't handle -j 1024 either (task scheduler problem), and then I wasted time waiting for sandwich shop to learn how to use mobile app ordering, ... I'll respond as soon as I can. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 19:35 ` David Miller 2015-03-23 19:58 ` David Ahern @ 2015-03-24 1:01 ` David Ahern 1 sibling, 0 replies; 46+ messages in thread From: David Ahern @ 2015-03-24 1:01 UTC (permalink / raw) To: David Miller; +Cc: torvalds, sparclinux, linux-mm, linux-kernel, bpicco On 3/23/15 1:35 PM, David Miller wrote: > From: David Ahern <david.ahern@oracle.com> > Date: Mon, 23 Mar 2015 11:34:34 -0600 > >> seems like a formality at this point, but this resolves the panic on >> the M7-based ldom and baremetal. The T5-8 failed to boot, but it could >> be a different problem. > > Specifically, does the T5-8 boot without my patch applied? > The T5-8 is having problems; has to be unrelated to this commit. T5-2 (256 cpus) boots fine, and make -j 256 on an allyesconfig builds fine. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-23 16:25 ` David Miller ` (2 preceding siblings ...) 2015-03-23 17:34 ` David Ahern @ 2015-03-24 14:57 ` Bob Picco 2015-03-24 16:05 ` David Miller 3 siblings, 1 reply; 46+ messages in thread From: Bob Picco @ 2015-03-24 14:57 UTC (permalink / raw) To: David Miller Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel, bpicco David Miller wrote: [Mon Mar 23 2015, 12:25:30PM EDT] > From: David Miller <davem@davemloft.net> > Date: Sun, 22 Mar 2015 22:19:06 -0400 (EDT) > > > I'll work on a fix. > > Ok, here is what I committed. David et al., let me know if you still > see the crashes with this applied. > > Of course, I'll queue this up for -stable as well. > > Thanks! > > ==================== > [PATCH] sparc64: Fix several bugs in memmove(). > > Firstly, handle zero length calls properly. Believe it or not there > are a few of these happening during early boot. > > Next, we can't just drop to a memcpy() call in the forward copy case > where dst <= src. The reason is that the cache initializing stores > used in the Niagara memcpy() implementations can end up clearing out > cache lines before we've sourced their original contents completely. > > For example, considering NG4memcpy, the main unrolled loop begins like > this: > > load src + 0x00 > load src + 0x08 > load src + 0x10 > load src + 0x18 > load src + 0x20 > store dst + 0x00 > > Assume dst is 64 byte aligned and let's say that dst is src - 8 for > this memcpy() call. That store at the end there is the one to the > first line in the cache line, thus clearing the whole line, which thus > clobbers "src + 0x28" before it even gets loaded. > > To avoid this, just fall through to a simple copy only mildly > optimized for the case where src and dst are 8 byte aligned and the > length is a multiple of 8 as well. We could get fancy and call > GENmemcpy() but this is good enough for how this thing is actually > used. > > Reported-by: David Ahern <david.ahern@oracle.com> > Reported-by: Bob Picco <bpicco@meloft.net> > Signed-off-by: David S. Miller <davem@davemloft.net> > --- Seems solid with 2.6.39 on M7-4. Jalap?no is happy with current sparc.git. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-24 14:57 ` Bob Picco @ 2015-03-24 16:05 ` David Miller 0 siblings, 0 replies; 46+ messages in thread From: David Miller @ 2015-03-24 16:05 UTC (permalink / raw) To: bpicco; +Cc: david.ahern, torvalds, sparclinux, linux-mm, linux-kernel From: Bob Picco <bpicco@meloft.net> Date: Tue, 24 Mar 2015 10:57:53 -0400 > Seems solid with 2.6.39 on M7-4. Jalap?no is happy with current sparc.git. Thanks for all the testing, it's been integrated into the -stable queues as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 22:23 ` David Miller 2015-03-22 23:35 ` David Ahern @ 2015-03-22 23:49 ` Linus Torvalds 2015-03-22 23:57 ` David Miller 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2015-03-22 23:49 UTC (permalink / raw) To: David Miller; +Cc: David Ahern, sparclinux, linux-mm, Linux Kernel Mailing List On Sun, Mar 22, 2015 at 3:23 PM, David Miller <davem@davemloft.net> wrote: > > Yes, using VIS how we do is alright, and in fact I did an audit of > this about 1 year ago. This is another one of those "if this is > wrong, so much stuff would break" Maybe. But it does seem like Bob Picco has narrowed it down to memmove(). It also bothers me enormously - and perhaps unreasonably - how that memcpy code has memory barriers in it. I can see _zero_ reason for a memory barrier inside a memcpy, unless the memcpy does something that isn't valid to begin with. Are the VIS operatiosn perhaps using some kind of non-temporal form that doesn't follow the TSO rules? Kind of like the "movnt" that Intel has? That kind of stuff makes me worry. For example, the only reason I see for membar #StoreLoad | #StoreStore after that VIS loop is that the stxa doesn't honor normal memory store ordering rules, but if that's true, then shouldn't we have a membar *before* the loop too? How about "ldx"? Does that also do some unordered loads? So the memory barriers in there just make me nervous, because they are either entirely bogus or superfluous, or they are not - and if they aren't, then that implies that some of the code does something really odd with memory ordering. I dunno. I really can't read that code at all, so I'm going entirely by gut instinct here. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: 4.0.0-rc4: panic in free_block 2015-03-22 23:49 ` Linus Torvalds @ 2015-03-22 23:57 ` David Miller 0 siblings, 0 replies; 46+ messages in thread From: David Miller @ 2015-03-22 23:57 UTC (permalink / raw) To: torvalds; +Cc: david.ahern, sparclinux, linux-mm, linux-kernel From: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun, 22 Mar 2015 16:49:51 -0700 > On Sun, Mar 22, 2015 at 3:23 PM, David Miller <davem@davemloft.net> wrote: >> >> Yes, using VIS how we do is alright, and in fact I did an audit of >> this about 1 year ago. This is another one of those "if this is >> wrong, so much stuff would break" > > Maybe. But it does seem like Bob Picco has narrowed it down to memmove(). > > It also bothers me enormously - and perhaps unreasonably - how that > memcpy code has memory barriers in it. I can see _zero_ reason for a > memory barrier inside a memcpy, unless the memcpy does something that > isn't valid to begin with. Are the VIS operatiosn perhaps using some > kind of non-temporal form that doesn't follow the TSO rules? Kind of > like the "movnt" that Intel has? The special cache line clearing stores (the ASI_BLK_INIT_QUAD_LDD_P stuff) do not adhere to the memory model. I had elided the memory barriers originally, but it caused memory corruption. So yes, this is similar to the 'movnt' situation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2015-03-24 16:05 UTC | newest] Thread overview: 46+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-20 15:07 4.0.0-rc4: panic in free_block David Ahern 2015-03-20 16:48 ` Linus Torvalds 2015-03-20 16:53 ` David Ahern 2015-03-20 16:58 ` Linus Torvalds 2015-03-20 18:05 ` David Ahern 2015-03-20 18:53 ` Linus Torvalds 2015-03-20 19:04 ` David Ahern 2015-03-20 19:47 ` David Miller 2015-03-20 19:54 ` David Ahern 2015-03-20 20:19 ` David Miller 2015-03-20 19:42 ` David Miller 2015-03-20 20:01 ` Dave Hansen 2015-03-20 21:17 ` Linus Torvalds 2015-03-20 22:49 ` David Ahern 2015-03-21 0:18 ` David Ahern 2015-03-21 0:34 ` David Rientjes 2015-03-21 0:39 ` David Ahern 2015-03-21 0:47 ` Linus Torvalds 2015-03-21 17:45 ` David Ahern 2015-03-21 18:49 ` Linus Torvalds 2015-03-22 17:36 ` David Miller 2015-03-22 19:25 ` Bob Picco 2015-03-22 19:47 ` Linus Torvalds 2015-03-22 22:23 ` David Miller 2015-03-22 23:35 ` David Ahern 2015-03-22 23:54 ` David Miller 2015-03-23 0:03 ` David Ahern 2015-03-23 2:00 ` David Miller 2015-03-23 2:19 ` David Miller 2015-03-23 16:25 ` David Miller 2015-03-23 16:51 ` John Stoffel 2015-03-23 19:16 ` David Miller 2015-03-23 19:56 ` John Stoffel 2015-03-23 20:08 ` David Miller 2015-03-23 17:00 ` Linus Torvalds 2015-03-23 19:08 ` David Miller 2015-03-23 19:47 ` Linus Torvalds 2015-03-23 19:52 ` David Miller 2015-03-23 17:34 ` David Ahern 2015-03-23 19:35 ` David Miller 2015-03-23 19:58 ` David Ahern 2015-03-24 1:01 ` David Ahern 2015-03-24 14:57 ` Bob Picco 2015-03-24 16:05 ` David Miller 2015-03-22 23:49 ` Linus Torvalds 2015-03-22 23:57 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).