* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel [not found] <20171010121513.GC5445@yexl-desktop> @ 2017-10-11 2:31 ` Josh Poimboeuf 2017-10-11 17:01 ` Josh Poimboeuf 0 siblings, 1 reply; 21+ messages in thread From: Josh Poimboeuf @ 2017-10-11 2:31 UTC (permalink / raw) To: kernel test robot Cc: Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm On Tue, Oct 10, 2017 at 08:15:13PM +0800, kernel test robot wrote: > > FYI, we noticed the following commit (built with gcc-4.8): > > commit: 81d387190039c14edac8de2b3ec789beb899afd9 ("x86/kconfig: Consolidate unwinders into multiple choice selection") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > in testcase: boot > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > +------------------------------------------+------------+------------+ > | | a34a766ff9 | 81d3871900 | > +------------------------------------------+------------+------------+ > | boot_successes | 24 | 5 | > | boot_failures | 12 | 31 | > | BUG:kernel_hang_in_test_stage | 12 | 1 | > | BUG:unable_to_handle_kernel | 0 | 30 | > | Oops:#[##] | 0 | 30 | > | Kernel_panic-not_syncing:Fatal_exception | 0 | 30 | > +------------------------------------------+------------+------------+ > > > > [ 5.324797] BUG: unable to handle kernel paging request at ffff88001c4b0000 > [ 5.326126] IP: slob_free+0x2bf/0x3d7 > [ 5.328023] PGD 17d9c067 > [ 5.328023] P4D 17d9c067 > [ 5.328023] PUD 17d9d067 > [ 5.328023] PMD 1f91e067 > [ 5.328023] PTE 800000001c4b0060 > [ 5.328023] > [ 5.328023] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 5.328023] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc1-00044-g81d3871 #1 > [ 5.328023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > [ 5.328023] task: ffff8800002fa000 task.stack: ffffc900000d0000 > [ 5.328023] RIP: 0010:slob_free+0x2bf/0x3d7 > [ 5.328023] RSP: 0000:ffffc900000d3d58 EFLAGS: 00010002 > [ 5.328023] RAX: 0000000000000027 RBX: ffff88001c4affb0 RCX: 0000000000000000 > [ 5.328023] RDX: ffff88001c4af000 RSI: 0000000000000000 RDI: ffff88001c4afffe > [ 5.328023] RBP: ffff88001c4afffe R08: 0000000000000001 R09: 0000000000000000 > [ 5.328023] R10: ffffea000069a420 R11: ffff88001ffdb000 R12: ffff88001c4aff5c > [ 5.328023] R13: 0000000000000027 R14: 0000000000000027 R15: 0000000000000027 > [ 5.328023] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000 > [ 5.328023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5.328023] CR2: ffff88001c4b0000 CR3: 0000000016211000 CR4: 00000000000406b0 > [ 5.328023] Call Trace: > [ 5.328023] ? link_target+0xb2/0xc7 > [ 5.328023] kfree+0x158/0x1b6 > [ 5.328023] link_target+0xb2/0xc7 > [ 5.328023] new_node+0x32b/0x4d1 > [ 5.328023] gcov_event+0x33e/0x546 > [ 5.328023] ? gcov_persist_setup+0xbb/0xbb > [ 5.328023] gcov_enable_events+0x3c/0x89 > [ 5.328023] gcov_fs_init+0x134/0x191 > [ 5.328023] do_one_initcall+0x10e/0x2df > [ 5.328023] kernel_init_freeable+0x3ec/0x559 > [ 5.328023] ? rest_init+0x145/0x145 > [ 5.328023] kernel_init+0xc/0x1a8 > [ 5.328023] ret_from_fork+0x2a/0x40 > [ 5.328023] Code: e8 8d f7 ff ff 48 ff 05 c9 8c 91 02 85 c0 75 51 49 0f bf c5 48 ff 05 c2 8c 91 02 48 8d 3c 43 48 39 ef 75 3d 48 ff 05 ba 8c 91 02 <8b> 6d 00 66 85 ed 7e 09 48 ff 05 b3 8c 91 02 eb 05 bd 01 00 00 > [ 5.328023] RIP: slob_free+0x2bf/0x3d7 RSP: ffffc900000d3d58 > [ 5.328023] CR2: ffff88001c4b0000 > [ 5.328023] ---[ end trace f8ee1579929b04f0 ]--- Adding the slub maintainers. Is slob still supposed to work? The bisection is blaming the ORC unwinder, but I'm having trouble finding anything ORC specific about it. I wonder if the disabling of frame pointers changed the code generation enough to trigger this bug somehow. Looking at the panic, the code in slob_free() was: 0: e8 8d f7 ff ff callq 0xfffffffffffff792 5: 48 ff 05 c9 8c 91 02 incq 0x2918cc9(%rip) # 0x2918cd5 c: 85 c0 test %eax,%eax e: 75 51 jne 0x61 10: 49 0f bf c5 movswq %r13w,%rax 14: 48 ff 05 c2 8c 91 02 incq 0x2918cc2(%rip) # 0x2918cdd 1b: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi 1f: 48 39 ef cmp %rbp,%rdi 22: 75 3d jne 0x61 24: 48 ff 05 ba 8c 91 02 incq 0x2918cba(%rip) # 0x2918ce5 2b:* 8b 6d 00 mov 0x0(%rbp),%ebp <-- trapping instruction 2e: 66 85 ed test %bp,%bp 31: 7e 09 jle 0x3c 33: 48 ff 05 b3 8c 91 02 incq 0x2918cb3(%rip) # 0x2918ced 3a: eb 05 jmp 0x41 3c: bd .byte 0xbd 3d: 01 00 add %eax,(%rax) The slob_free() code tried to read four bytes at ffff88001c4afffe, and ended up reading past the page into a bad area. I think the bad address (ffff88001c4afffe) was returned from slob_next() and it panicked trying to read s->units in slob_units(). Interestingly, I've found that I get panics when booting with CONFIG_SLOB enabled, with both ORC and frame pointers: general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0-rc1+ #74 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Workqueue: crypto mcryptd_flusher task: ffff880139a98000 task.stack: ffffc9000082c000 RIP: 0010:skip_7+0x0/0x67 RSP: 0000:ffffc9000082fd88 EFLAGS: 00010246 RAX: ffff880134b65e34 RBX: 00000000f7654321 RCX: 0000000000000003 RDX: 0000000000000000 RSI: ffffffff81d22039 RDI: ffff880135be0248 RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8238d260 R13: ffff88013a7e53a8 R14: 00000000fffb7593 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001e11000 CR4: 00000000001406f0 Call Trace: sha256_ctx_mgr_flush+0x28/0x30 sha256_mb_flusher+0x53/0x120 mcryptd_flusher+0xc4/0xf0 process_one_work+0x253/0x6b0 worker_thread+0x4d/0x3b0 ? preempt_count_sub+0x9b/0x100 kthread+0x12c/0x150 ? process_one_work+0x6b0/0x6b0 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x2a/0x40 Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1 RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88 I have no idea how that crypto panic could could be related to slob, but at least it goes away when I switch to slub. -- Josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-11 2:31 ` [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel Josh Poimboeuf @ 2017-10-11 17:01 ` Josh Poimboeuf 2017-10-12 17:05 ` Christopher Lameter 2017-10-17 7:33 ` Joonsoo Kim 0 siblings, 2 replies; 21+ messages in thread From: Josh Poimboeuf @ 2017-10-11 17:01 UTC (permalink / raw) To: kernel test robot Cc: Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Christoph Lameter I failed to add the slab maintainers to CC on the last attempt. Trying again. On Tue, Oct 10, 2017 at 09:31:06PM -0500, Josh Poimboeuf wrote: > On Tue, Oct 10, 2017 at 08:15:13PM +0800, kernel test robot wrote: > > > > FYI, we noticed the following commit (built with gcc-4.8): > > > > commit: 81d387190039c14edac8de2b3ec789beb899afd9 ("x86/kconfig: Consolidate unwinders into multiple choice selection") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > in testcase: boot > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > +------------------------------------------+------------+------------+ > > | | a34a766ff9 | 81d3871900 | > > +------------------------------------------+------------+------------+ > > | boot_successes | 24 | 5 | > > | boot_failures | 12 | 31 | > > | BUG:kernel_hang_in_test_stage | 12 | 1 | > > | BUG:unable_to_handle_kernel | 0 | 30 | > > | Oops:#[##] | 0 | 30 | > > | Kernel_panic-not_syncing:Fatal_exception | 0 | 30 | > > +------------------------------------------+------------+------------+ > > > > > > > > [ 5.324797] BUG: unable to handle kernel paging request at ffff88001c4b0000 > > [ 5.326126] IP: slob_free+0x2bf/0x3d7 > > [ 5.328023] PGD 17d9c067 > > [ 5.328023] P4D 17d9c067 > > [ 5.328023] PUD 17d9d067 > > [ 5.328023] PMD 1f91e067 > > [ 5.328023] PTE 800000001c4b0060 > > [ 5.328023] > > [ 5.328023] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 5.328023] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc1-00044-g81d3871 #1 > > [ 5.328023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > [ 5.328023] task: ffff8800002fa000 task.stack: ffffc900000d0000 > > [ 5.328023] RIP: 0010:slob_free+0x2bf/0x3d7 > > [ 5.328023] RSP: 0000:ffffc900000d3d58 EFLAGS: 00010002 > > [ 5.328023] RAX: 0000000000000027 RBX: ffff88001c4affb0 RCX: 0000000000000000 > > [ 5.328023] RDX: ffff88001c4af000 RSI: 0000000000000000 RDI: ffff88001c4afffe > > [ 5.328023] RBP: ffff88001c4afffe R08: 0000000000000001 R09: 0000000000000000 > > [ 5.328023] R10: ffffea000069a420 R11: ffff88001ffdb000 R12: ffff88001c4aff5c > > [ 5.328023] R13: 0000000000000027 R14: 0000000000000027 R15: 0000000000000027 > > [ 5.328023] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000 > > [ 5.328023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 5.328023] CR2: ffff88001c4b0000 CR3: 0000000016211000 CR4: 00000000000406b0 > > [ 5.328023] Call Trace: > > [ 5.328023] ? link_target+0xb2/0xc7 > > [ 5.328023] kfree+0x158/0x1b6 > > [ 5.328023] link_target+0xb2/0xc7 > > [ 5.328023] new_node+0x32b/0x4d1 > > [ 5.328023] gcov_event+0x33e/0x546 > > [ 5.328023] ? gcov_persist_setup+0xbb/0xbb > > [ 5.328023] gcov_enable_events+0x3c/0x89 > > [ 5.328023] gcov_fs_init+0x134/0x191 > > [ 5.328023] do_one_initcall+0x10e/0x2df > > [ 5.328023] kernel_init_freeable+0x3ec/0x559 > > [ 5.328023] ? rest_init+0x145/0x145 > > [ 5.328023] kernel_init+0xc/0x1a8 > > [ 5.328023] ret_from_fork+0x2a/0x40 > > [ 5.328023] Code: e8 8d f7 ff ff 48 ff 05 c9 8c 91 02 85 c0 75 51 49 0f bf c5 48 ff 05 c2 8c 91 02 48 8d 3c 43 48 39 ef 75 3d 48 ff 05 ba 8c 91 02 <8b> 6d 00 66 85 ed 7e 09 48 ff 05 b3 8c 91 02 eb 05 bd 01 00 00 > > [ 5.328023] RIP: slob_free+0x2bf/0x3d7 RSP: ffffc900000d3d58 > > [ 5.328023] CR2: ffff88001c4b0000 > > [ 5.328023] ---[ end trace f8ee1579929b04f0 ]--- > > Adding the slub maintainers. Is slob still supposed to work? > > The bisection is blaming the ORC unwinder, but I'm having trouble > finding anything ORC specific about it. I wonder if the disabling of > frame pointers changed the code generation enough to trigger this bug > somehow. > > Looking at the panic, the code in slob_free() was: > > 0: e8 8d f7 ff ff callq 0xfffffffffffff792 > 5: 48 ff 05 c9 8c 91 02 incq 0x2918cc9(%rip) # 0x2918cd5 > c: 85 c0 test %eax,%eax > e: 75 51 jne 0x61 > 10: 49 0f bf c5 movswq %r13w,%rax > 14: 48 ff 05 c2 8c 91 02 incq 0x2918cc2(%rip) # 0x2918cdd > 1b: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi > 1f: 48 39 ef cmp %rbp,%rdi > 22: 75 3d jne 0x61 > 24: 48 ff 05 ba 8c 91 02 incq 0x2918cba(%rip) # 0x2918ce5 > 2b:* 8b 6d 00 mov 0x0(%rbp),%ebp <-- trapping instruction > 2e: 66 85 ed test %bp,%bp > 31: 7e 09 jle 0x3c > 33: 48 ff 05 b3 8c 91 02 incq 0x2918cb3(%rip) # 0x2918ced > 3a: eb 05 jmp 0x41 > 3c: bd .byte 0xbd > 3d: 01 00 add %eax,(%rax) > > The slob_free() code tried to read four bytes at ffff88001c4afffe, and > ended up reading past the page into a bad area. I think the bad address > (ffff88001c4afffe) was returned from slob_next() and it panicked trying > to read s->units in slob_units(). > > Interestingly, I've found that I get panics when booting with > CONFIG_SLOB enabled, with both ORC and frame pointers: > > general protection fault: 0000 [#1] PREEMPT SMP > Modules linked in: > CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0-rc1+ #74 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > Workqueue: crypto mcryptd_flusher > task: ffff880139a98000 task.stack: ffffc9000082c000 > RIP: 0010:skip_7+0x0/0x67 > RSP: 0000:ffffc9000082fd88 EFLAGS: 00010246 > RAX: ffff880134b65e34 RBX: 00000000f7654321 RCX: 0000000000000003 > RDX: 0000000000000000 RSI: ffffffff81d22039 RDI: ffff880135be0248 > RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001 > R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8238d260 > R13: ffff88013a7e53a8 R14: 00000000fffb7593 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000001e11000 CR4: 00000000001406f0 > Call Trace: > sha256_ctx_mgr_flush+0x28/0x30 > sha256_mb_flusher+0x53/0x120 > mcryptd_flusher+0xc4/0xf0 > process_one_work+0x253/0x6b0 > worker_thread+0x4d/0x3b0 > ? preempt_count_sub+0x9b/0x100 > kthread+0x12c/0x150 > ? process_one_work+0x6b0/0x6b0 > ? kthread_create_on_node+0x70/0x70 > ret_from_fork+0x2a/0x40 > Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1 > RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88 > > I have no idea how that crypto panic could could be related to slob, but > at least it goes away when I switch to slub. -- Josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-11 17:01 ` Josh Poimboeuf @ 2017-10-12 17:05 ` Christopher Lameter 2017-10-12 17:54 ` Linus Torvalds ` (2 more replies) 2017-10-17 7:33 ` Joonsoo Kim 1 sibling, 3 replies; 21+ messages in thread From: Christopher Lameter @ 2017-10-12 17:05 UTC (permalink / raw) To: Josh Poimboeuf Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > I failed to add the slab maintainers to CC on the last attempt. Trying > again. Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple allocator and the K&R mechanism that was used in the early kernels. > > Adding the slub maintainers. Is slob still supposed to work? Have not seen anyone using it in a decade or so. Does the same config with SLUB and slub_debug on the commandline run cleanly? > > I have no idea how that crypto panic could could be related to slob, but > > at least it goes away when I switch to slub. Can you run SLUB with full debug? specify slub_debug on the commandline or set CONFIG_SLUB_DEBUG_ON ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-12 17:05 ` Christopher Lameter @ 2017-10-12 17:54 ` Linus Torvalds 2017-10-12 18:48 ` Andrew Morton 2017-10-12 17:54 ` Linus Torvalds 2017-10-13 4:45 ` Josh Poimboeuf 2 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2017-10-12 17:54 UTC (permalink / raw) To: Christopher Lameter Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote: > On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > >> I failed to add the slab maintainers to CC on the last attempt. Trying >> again. > > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple > allocator and the K&R mechanism that was used in the early kernels. Should we finally just get rid of SLOB? I'm not happy about the whole "three different allocators" crap. It's been there for much too long, and I've tried to cut it down before. People always protest, but three different allocators, one of which gets basically no testing, is not good. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-12 17:54 ` Linus Torvalds @ 2017-10-12 18:48 ` Andrew Morton 2017-10-12 19:19 ` Geert Uytterhoeven 0 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2017-10-12 18:48 UTC (permalink / raw) To: Linus Torvalds Cc: Christopher Lameter, Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Matt Mackall On Thu, 12 Oct 2017 10:54:57 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote: > > On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > > > >> I failed to add the slab maintainers to CC on the last attempt. Trying > >> again. > > > > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple > > allocator and the K&R mechanism that was used in the early kernels. > > Should we finally just get rid of SLOB? > > I'm not happy about the whole "three different allocators" crap. It's > been there for much too long, and I've tried to cut it down before. > People always protest, but three different allocators, one of which > gets basically no testing, is not good. > I am not aware of anyone using slob. We could disable it in Kconfig for a year, see what the feedback looks like. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-12 18:48 ` Andrew Morton @ 2017-10-12 19:19 ` Geert Uytterhoeven 0 siblings, 0 replies; 21+ messages in thread From: Geert Uytterhoeven @ 2017-10-12 19:19 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Christopher Lameter, Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP, Linux MM, Pekka Enberg, David Rientjes, Joonsoo Kim, Matt Mackall On Thu, Oct 12, 2017 at 8:48 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 12 Oct 2017 10:54:57 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: >> On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote: >> > On Wed, 11 Oct 2017, Josh Poimboeuf wrote: >> > >> >> I failed to add the slab maintainers to CC on the last attempt. Trying >> >> again. >> > >> > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple >> > allocator and the K&R mechanism that was used in the early kernels. >> >> Should we finally just get rid of SLOB? >> >> I'm not happy about the whole "three different allocators" crap. It's >> been there for much too long, and I've tried to cut it down before. >> People always protest, but three different allocators, one of which >> gets basically no testing, is not good. > > I am not aware of anyone using slob. We could disable it in Kconfig > for a year, see what the feedback looks like. $ git grep CONFIG_SLOB=y arch/arm/configs/clps711x_defconfig:CONFIG_SLOB=y arch/arm/configs/collie_defconfig:CONFIG_SLOB=y arch/arm/configs/multi_v4t_defconfig:CONFIG_SLOB=y arch/arm/configs/omap1_defconfig:CONFIG_SLOB=y arch/arm/configs/pxa_defconfig:CONFIG_SLOB=y arch/arm/configs/tct_hammer_defconfig:CONFIG_SLOB=y arch/arm/configs/xcep_defconfig:CONFIG_SLOB=y arch/blackfin/configs/DNP5370_defconfig:CONFIG_SLOB=y arch/h8300/configs/edosk2674_defconfig:CONFIG_SLOB=y arch/h8300/configs/h8300h-sim_defconfig:CONFIG_SLOB=y arch/h8300/configs/h8s-sim_defconfig:CONFIG_SLOB=y arch/openrisc/configs/or1ksim_defconfig:CONFIG_SLOB=y arch/sh/configs/rsk7201_defconfig:CONFIG_SLOB=y arch/sh/configs/rsk7203_defconfig:CONFIG_SLOB=y arch/sh/configs/se7206_defconfig:CONFIG_SLOB=y arch/sh/configs/shmin_defconfig:CONFIG_SLOB=y arch/sh/configs/shx3_defconfig:CONFIG_SLOB=y kernel/configs/tiny.config:CONFIG_SLOB=y $ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-12 17:05 ` Christopher Lameter 2017-10-12 17:54 ` Linus Torvalds @ 2017-10-12 17:54 ` Linus Torvalds 2017-10-13 4:45 ` Josh Poimboeuf 2 siblings, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2017-10-12 17:54 UTC (permalink / raw) To: Christopher Lameter Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote: > On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > >> I failed to add the slab maintainers to CC on the last attempt. Trying >> again. > > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple > allocator and the K&R mechanism that was used in the early kernels. Should we finally just get rid of SLOB? I'm not happy about the whole "three different allocators" crap. It's been there for much too long, and I've tried to cut it down before. People always protest, but three different allocators, one of which gets basically no testing, is not good. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-12 17:05 ` Christopher Lameter 2017-10-12 17:54 ` Linus Torvalds 2017-10-12 17:54 ` Linus Torvalds @ 2017-10-13 4:45 ` Josh Poimboeuf 2017-10-13 13:56 ` Andrey Ryabinin 2017-10-13 15:22 ` Christopher Lameter 2 siblings, 2 replies; 21+ messages in thread From: Josh Poimboeuf @ 2017-10-13 4:45 UTC (permalink / raw) To: Christopher Lameter Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote: > On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > > > I failed to add the slab maintainers to CC on the last attempt. Trying > > again. > > > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple > allocator and the K&R mechanism that was used in the early kernels. > > > > Adding the slub maintainers. Is slob still supposed to work? > > Have not seen anyone using it in a decade or so. > > Does the same config with SLUB and slub_debug on the commandline run > cleanly? > > > > I have no idea how that crypto panic could could be related to slob, but > > > at least it goes away when I switch to slub. > > Can you run SLUB with full debug? specify slub_debug on the commandline or > set CONFIG_SLUB_DEBUG_ON Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I got with CONFIG_SLOB. The trapping instruction is: vmovdqa 0x140(%rdi),%xmm0 I'll try to bisect it tomorrow. It at least goes back to v4.10. I'm not really sure whether this panic is related to SLUB or SLOB at all. (Though the original panic reported upthread by the kernel test robot *does* look SLOB related.) general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0 #81 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Workqueue: crypto mcryptd_flusher task: ffff880139108040 task.stack: ffffc9000082c000 RIP: 0010:skip_7+0x0/0x67 RSP: 0018:ffffc9000082fd88 EFLAGS: 00010246 RAX: ffff88013834172c RBX: 00000000f7654321 RCX: 0000000000000003 RDX: 0000000000000000 RSI: ffffffff81d254f9 RDI: ffff8801381b1a88 RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff82392260 R13: ffff88013a7e6500 R14: 00000000fffb80f5 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88491ef914 CR3: 0000000001e11000 CR4: 00000000001406f0 Call Trace: sha256_ctx_mgr_flush+0x28/0x30 sha256_mb_flusher+0x53/0x120 mcryptd_flusher+0xc4/0xf0 process_one_work+0x253/0x6b0 worker_thread+0x4d/0x3b0 ? preempt_count_sub+0x9b/0x100 kthread+0x133/0x150 ? process_one_work+0x6b0/0x6b0 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x2a/0x40 Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1 RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88 ---[ end trace d89a1613b7d1b8bc ]--- BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33 in_atomic(): 1, irqs_disabled(): 0, pid: 58, name: kworker/0:1 INFO: lockdep is turned off. Preemption disabled at: [<ffffffff81041933>] kernel_fpu_begin+0x13/0x20 CPU: 0 PID: 58 Comm: kworker/0:1 Tainted: G D 4.13.0 #81 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Workqueue: crypto mcryptd_flusher Call Trace: dump_stack+0x8e/0xcd ___might_sleep+0x185/0x260 __might_sleep+0x4a/0x80 exit_signals+0x33/0x2d0 do_exit+0xb4/0xd80 ? kthread+0x133/0x150 rewind_stack_do_exit+0x17/0x20 note: kworker/0:1[58] exited with preempt_count 1 tsc: Refined TSC clocksource calibration: 2793.538 MHz clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x28446877189, max_idle_ns: 440795280878 ns -- Josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 4:45 ` Josh Poimboeuf @ 2017-10-13 13:56 ` Andrey Ryabinin 2017-10-13 16:19 ` Josh Poimboeuf 2017-10-13 19:09 ` Linus Torvalds 2017-10-13 15:22 ` Christopher Lameter 1 sibling, 2 replies; 21+ messages in thread From: Andrey Ryabinin @ 2017-10-13 13:56 UTC (permalink / raw) To: Josh Poimboeuf, Christopher Lameter Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Megha Dey, Herbert Xu, David S. Miller, linux-crypto On 10/13/2017 07:45 AM, Josh Poimboeuf wrote: > On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote: >> On Wed, 11 Oct 2017, Josh Poimboeuf wrote: >> >>> I failed to add the slab maintainers to CC on the last attempt. Trying >>> again. >> >> >> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple >> allocator and the K&R mechanism that was used in the early kernels. >> >>>> Adding the slub maintainers. Is slob still supposed to work? >> >> Have not seen anyone using it in a decade or so. >> >> Does the same config with SLUB and slub_debug on the commandline run >> cleanly? >> >>>> I have no idea how that crypto panic could could be related to slob, but >>>> at least it goes away when I switch to slub. >> >> Can you run SLUB with full debug? specify slub_debug on the commandline or >> set CONFIG_SLUB_DEBUG_ON > > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I > got with CONFIG_SLOB. The trapping instruction is: > > vmovdqa 0x140(%rdi),%xmm0 It's unaligned access. Look at %rdi. vmovdqa requires 16-byte alignment. Apparently, something fed kmalloc()'ed data here. But kmalloc() guarantees only sizeof(unsigned long) alignment. slub_debug changes slub's objects layout, so what happened to be 16-bytes aligned without slub_debug, may become 8-byte aligned with slub_debg on. > I'll try to bisect it tomorrow. It at least goes back to v4.10. Probably no point. I bet this bug always was here (since this code added). This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix would be to align the data properly? --- arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S index 8fe6338bcc84..7fd5d9b568c7 100644 --- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S +++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S @@ -155,8 +155,8 @@ LABEL skip_ %I .endr # Find min length - vmovdqa _lens+0*16(state), %xmm0 - vmovdqa _lens+1*16(state), %xmm1 + vmovdqu _lens+0*16(state), %xmm0 + vmovdqu _lens+1*16(state), %xmm1 vpminud %xmm1, %xmm0, %xmm2 # xmm2 has {D,C,B,A} vpalignr $8, %xmm2, %xmm3, %xmm3 # xmm3 has {x,x,D,C} @@ -176,8 +176,8 @@ LABEL skip_ %I vpsubd %xmm2, %xmm0, %xmm0 vpsubd %xmm2, %xmm1, %xmm1 - vmovdqa %xmm0, _lens+0*16(state) - vmovdqa %xmm1, _lens+1*16(state) + vmovdqu %xmm0, _lens+0*16(state) + vmovdqu %xmm1, _lens+1*16(state) # "state" and "args" are the same address, arg1 # len is arg2 -- 2.13.6 > I'm > not really sure whether this panic is related to SLUB or SLOB at all. > (Though the original panic reported upthread by the kernel test robot > *does* look SLOB related.) > > general protection fault: 0000 [#1] PREEMPT SMP > Modules linked in: > CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0 #81 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > Workqueue: crypto mcryptd_flusher > task: ffff880139108040 task.stack: ffffc9000082c000 > RIP: 0010:skip_7+0x0/0x67 > RSP: 0018:ffffc9000082fd88 EFLAGS: 00010246 > RAX: ffff88013834172c RBX: 00000000f7654321 RCX: 0000000000000003 > RDX: 0000000000000000 RSI: ffffffff81d254f9 RDI: ffff8801381b1a88 > RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001 > R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff82392260 > R13: ffff88013a7e6500 R14: 00000000fffb80f5 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f88491ef914 CR3: 0000000001e11000 CR4: 00000000001406f0 > Call Trace: > sha256_ctx_mgr_flush+0x28/0x30 > sha256_mb_flusher+0x53/0x120 > mcryptd_flusher+0xc4/0xf0 > process_one_work+0x253/0x6b0 > worker_thread+0x4d/0x3b0 > ? preempt_count_sub+0x9b/0x100 > kthread+0x133/0x150 > ? process_one_work+0x6b0/0x6b0 > ? kthread_create_on_node+0x70/0x70 > ret_from_fork+0x2a/0x40 > Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1 > RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88 > ---[ end trace d89a1613b7d1b8bc ]--- > BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33 > in_atomic(): 1, irqs_disabled(): 0, pid: 58, name: kworker/0:1 > INFO: lockdep is turned off. > Preemption disabled at: > [<ffffffff81041933>] kernel_fpu_begin+0x13/0x20 > CPU: 0 PID: 58 Comm: kworker/0:1 Tainted: G D 4.13.0 #81 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > Workqueue: crypto mcryptd_flusher > Call Trace: > dump_stack+0x8e/0xcd > ___might_sleep+0x185/0x260 > __might_sleep+0x4a/0x80 > exit_signals+0x33/0x2d0 > do_exit+0xb4/0xd80 > ? kthread+0x133/0x150 > rewind_stack_do_exit+0x17/0x20 > note: kworker/0:1[58] exited with preempt_count 1 > tsc: Refined TSC clocksource calibration: 2793.538 MHz > clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x28446877189, max_idle_ns: 440795280878 ns > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 13:56 ` Andrey Ryabinin @ 2017-10-13 16:19 ` Josh Poimboeuf 2017-10-13 19:09 ` Linus Torvalds 1 sibling, 0 replies; 21+ messages in thread From: Josh Poimboeuf @ 2017-10-13 16:19 UTC (permalink / raw) To: Andrey Ryabinin Cc: Christopher Lameter, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Megha Dey, Herbert Xu, David S. Miller, linux-crypto On Fri, Oct 13, 2017 at 04:56:43PM +0300, Andrey Ryabinin wrote: > On 10/13/2017 07:45 AM, Josh Poimboeuf wrote: > > On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote: > >> On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > >> > >>> I failed to add the slab maintainers to CC on the last attempt. Trying > >>> again. > >> > >> > >> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple > >> allocator and the K&R mechanism that was used in the early kernels. > >> > >>>> Adding the slub maintainers. Is slob still supposed to work? > >> > >> Have not seen anyone using it in a decade or so. > >> > >> Does the same config with SLUB and slub_debug on the commandline run > >> cleanly? > >> > >>>> I have no idea how that crypto panic could could be related to slob, but > >>>> at least it goes away when I switch to slub. > >> > >> Can you run SLUB with full debug? specify slub_debug on the commandline or > >> set CONFIG_SLUB_DEBUG_ON > > > > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I > > got with CONFIG_SLOB. The trapping instruction is: > > > > vmovdqa 0x140(%rdi),%xmm0 > > > It's unaligned access. Look at %rdi. vmovdqa requires 16-byte alignment. > Apparently, something fed kmalloc()'ed data here. But kmalloc() guarantees only sizeof(unsigned long) > alignment. slub_debug changes slub's objects layout, so what happened to be 16-bytes aligned > without slub_debug, may become 8-byte aligned with slub_debg on. > > > > I'll try to bisect it tomorrow. It at least goes back to v4.10. > > Probably no point. I bet this bug always was here (since this code added). > > This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix > would be to align the data properly? > > --- > arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S > index 8fe6338bcc84..7fd5d9b568c7 100644 > --- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S > +++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S > @@ -155,8 +155,8 @@ LABEL skip_ %I > .endr > > # Find min length > - vmovdqa _lens+0*16(state), %xmm0 > - vmovdqa _lens+1*16(state), %xmm1 > + vmovdqu _lens+0*16(state), %xmm0 > + vmovdqu _lens+1*16(state), %xmm1 > > vpminud %xmm1, %xmm0, %xmm2 # xmm2 has {D,C,B,A} > vpalignr $8, %xmm2, %xmm3, %xmm3 # xmm3 has {x,x,D,C} > @@ -176,8 +176,8 @@ LABEL skip_ %I > vpsubd %xmm2, %xmm0, %xmm0 > vpsubd %xmm2, %xmm1, %xmm1 > > - vmovdqa %xmm0, _lens+0*16(state) > - vmovdqa %xmm1, _lens+1*16(state) > + vmovdqu %xmm0, _lens+0*16(state) > + vmovdqu %xmm1, _lens+1*16(state) > > # "state" and "args" are the same address, arg1 > # len is arg2 > -- > 2.13.6 Makes sense. I can confirm that the above patch fixes the panic. -- Josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 13:56 ` Andrey Ryabinin 2017-10-13 16:19 ` Josh Poimboeuf @ 2017-10-13 19:09 ` Linus Torvalds 2017-10-13 20:01 ` Andy Lutomirski 2017-10-13 20:17 ` Jeffrey Walton 1 sibling, 2 replies; 21+ messages in thread From: Linus Torvalds @ 2017-10-13 19:09 UTC (permalink / raw) To: Andrey Ryabinin Cc: Josh Poimboeuf, Christopher Lameter, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Megha Dey, Herbert Xu, David S. Miller, Linux Crypto Mailing List On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > > This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix > would be to align the data properly? I suspect anything that has the SHA extensions should also do unaligned loads efficiently. The whole "aligned only" model is broken. It's just doing two loads from the state pointer, there's likely no point in trying to align it. So your patch looks fine, but maybe somebody could add the required alignment to the sha256 context allocation (which I don't know where it is). But yeah, that other SLOB panic looks unrelated to this. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 19:09 ` Linus Torvalds @ 2017-10-13 20:01 ` Andy Lutomirski 2017-10-13 20:17 ` Jeffrey Walton 1 sibling, 0 replies; 21+ messages in thread From: Andy Lutomirski @ 2017-10-13 20:01 UTC (permalink / raw) To: Linus Torvalds Cc: Andrey Ryabinin, Josh Poimboeuf, Christopher Lameter, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Megha Dey, Herbert Xu, David S. Miller, Linux Crypto Mailing List On Fri, Oct 13, 2017 at 12:09 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin > <aryabinin@virtuozzo.com> wrote: >> >> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix >> would be to align the data properly? > > I suspect anything that has the SHA extensions should also do > unaligned loads efficiently. The whole "aligned only" model is broken. > It's just doing two loads from the state pointer, there's likely no > point in trying to align it. > > So your patch looks fine, but maybe somebody could add the required > alignment to the sha256 context allocation (which I don't know where > it is). IIRC if we try the latter, then we'll risk hitting the #*!&@% gcc bug that mostly prevents 16-byte alignment from working on GCC before 4.8 or so. That way lies debugging disasters. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 19:09 ` Linus Torvalds 2017-10-13 20:01 ` Andy Lutomirski @ 2017-10-13 20:17 ` Jeffrey Walton 1 sibling, 0 replies; 21+ messages in thread From: Jeffrey Walton @ 2017-10-13 20:17 UTC (permalink / raw) To: Linus Torvalds Cc: Andrey Ryabinin, Josh Poimboeuf, Christopher Lameter, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Megha Dey, Herbert Xu, David S. Miller, Linux Crypto Mailing List On Fri, Oct 13, 2017 at 3:09 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin > <aryabinin@virtuozzo.com> wrote: >> >> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix >> would be to align the data properly? > > I suspect anything that has the SHA extensions should also do > unaligned loads efficiently. The whole "aligned only" model is broken. > It's just doing two loads from the state pointer, there's likely no > point in trying to align it. +1, good engineering. AVX2 requires 32-byte buffer alignment in some places. It is trickier than this use case because __BIGGEST_ALIGNMENT__ doubled, but a lot of code still assumes 16-bytes. Jeff -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 4:45 ` Josh Poimboeuf 2017-10-13 13:56 ` Andrey Ryabinin @ 2017-10-13 15:22 ` Christopher Lameter 2017-10-13 15:37 ` Josh Poimboeuf 1 sibling, 1 reply; 21+ messages in thread From: Christopher Lameter @ 2017-10-13 15:22 UTC (permalink / raw) To: Josh Poimboeuf Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Thu, 12 Oct 2017, Josh Poimboeuf wrote: > > Can you run SLUB with full debug? specify slub_debug on the commandline or > > set CONFIG_SLUB_DEBUG_ON > > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I > got with CONFIG_SLOB. The trapping instruction is: > > vmovdqa 0x140(%rdi),%xmm0 > > I'll try to bisect it tomorrow. It at least goes back to v4.10. I'm > not really sure whether this panic is related to SLUB or SLOB at all. Guess not. The slab allocators can fail if the metadata gets corrupted. That is why we have extensive debug modes so we can find who is to blame for corruptions. > (Though the original panic reported upthread by the kernel test robot > *does* look SLOB related.) Yup. Just happened to be configured for SLOB then. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-13 15:22 ` Christopher Lameter @ 2017-10-13 15:37 ` Josh Poimboeuf 0 siblings, 0 replies; 21+ messages in thread From: Josh Poimboeuf @ 2017-10-13 15:37 UTC (permalink / raw) To: Christopher Lameter Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Fri, Oct 13, 2017 at 10:22:54AM -0500, Christopher Lameter wrote: > On Thu, 12 Oct 2017, Josh Poimboeuf wrote: > > > > Can you run SLUB with full debug? specify slub_debug on the commandline or > > > set CONFIG_SLUB_DEBUG_ON > > > > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I > > got with CONFIG_SLOB. The trapping instruction is: > > > > vmovdqa 0x140(%rdi),%xmm0 > > > > I'll try to bisect it tomorrow. It at least goes back to v4.10. I'm > > not really sure whether this panic is related to SLUB or SLOB at all. > > Guess not. The slab allocators can fail if the metadata gets corrupted. > That is why we have extensive debug modes so we can find who is to blame > for corruptions. > > > (Though the original panic reported upthread by the kernel test robot > > *does* look SLOB related.) > > Yup. Just happened to be configured for SLOB then. Just to clarify, the upthread panic in SLOB is *not* related to the crypto issue. So somebody still needs to look at that one: https://lkml.kernel.org/r/20171010121513.GC5445@yexl-desktop -- Josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-11 17:01 ` Josh Poimboeuf 2017-10-12 17:05 ` Christopher Lameter @ 2017-10-17 7:33 ` Joonsoo Kim 2017-10-17 7:50 ` Thomas Gleixner 2017-10-18 10:40 ` Linus Torvalds 1 sibling, 2 replies; 21+ messages in thread From: Joonsoo Kim @ 2017-10-17 7:33 UTC (permalink / raw) To: Josh Poimboeuf Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton, Christoph Lameter On Wed, Oct 11, 2017 at 12:01:20PM -0500, Josh Poimboeuf wrote: > I failed to add the slab maintainers to CC on the last attempt. Trying > again. > > On Tue, Oct 10, 2017 at 09:31:06PM -0500, Josh Poimboeuf wrote: > > On Tue, Oct 10, 2017 at 08:15:13PM +0800, kernel test robot wrote: > > > > > > FYI, we noticed the following commit (built with gcc-4.8): > > > > > > commit: 81d387190039c14edac8de2b3ec789beb899afd9 ("x86/kconfig: Consolidate unwinders into multiple choice selection") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > in testcase: boot > > > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M > > > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > > > > +------------------------------------------+------------+------------+ > > > | | a34a766ff9 | 81d3871900 | > > > +------------------------------------------+------------+------------+ > > > | boot_successes | 24 | 5 | > > > | boot_failures | 12 | 31 | > > > | BUG:kernel_hang_in_test_stage | 12 | 1 | > > > | BUG:unable_to_handle_kernel | 0 | 30 | > > > | Oops:#[##] | 0 | 30 | > > > | Kernel_panic-not_syncing:Fatal_exception | 0 | 30 | > > > +------------------------------------------+------------+------------+ > > > > > > > > > > > > [ 5.324797] BUG: unable to handle kernel paging request at ffff88001c4b0000 > > > [ 5.326126] IP: slob_free+0x2bf/0x3d7 > > > [ 5.328023] PGD 17d9c067 > > > [ 5.328023] P4D 17d9c067 > > > [ 5.328023] PUD 17d9d067 > > > [ 5.328023] PMD 1f91e067 > > > [ 5.328023] PTE 800000001c4b0060 > > > [ 5.328023] > > > [ 5.328023] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > > [ 5.328023] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc1-00044-g81d3871 #1 > > > [ 5.328023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > [ 5.328023] task: ffff8800002fa000 task.stack: ffffc900000d0000 > > > [ 5.328023] RIP: 0010:slob_free+0x2bf/0x3d7 > > > [ 5.328023] RSP: 0000:ffffc900000d3d58 EFLAGS: 00010002 > > > [ 5.328023] RAX: 0000000000000027 RBX: ffff88001c4affb0 RCX: 0000000000000000 > > > [ 5.328023] RDX: ffff88001c4af000 RSI: 0000000000000000 RDI: ffff88001c4afffe > > > [ 5.328023] RBP: ffff88001c4afffe R08: 0000000000000001 R09: 0000000000000000 > > > [ 5.328023] R10: ffffea000069a420 R11: ffff88001ffdb000 R12: ffff88001c4aff5c > > > [ 5.328023] R13: 0000000000000027 R14: 0000000000000027 R15: 0000000000000027 > > > [ 5.328023] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000 > > > [ 5.328023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 5.328023] CR2: ffff88001c4b0000 CR3: 0000000016211000 CR4: 00000000000406b0 > > > [ 5.328023] Call Trace: > > > [ 5.328023] ? link_target+0xb2/0xc7 > > > [ 5.328023] kfree+0x158/0x1b6 > > > [ 5.328023] link_target+0xb2/0xc7 > > > [ 5.328023] new_node+0x32b/0x4d1 > > > [ 5.328023] gcov_event+0x33e/0x546 > > > [ 5.328023] ? gcov_persist_setup+0xbb/0xbb > > > [ 5.328023] gcov_enable_events+0x3c/0x89 > > > [ 5.328023] gcov_fs_init+0x134/0x191 > > > [ 5.328023] do_one_initcall+0x10e/0x2df > > > [ 5.328023] kernel_init_freeable+0x3ec/0x559 > > > [ 5.328023] ? rest_init+0x145/0x145 > > > [ 5.328023] kernel_init+0xc/0x1a8 > > > [ 5.328023] ret_from_fork+0x2a/0x40 > > > [ 5.328023] Code: e8 8d f7 ff ff 48 ff 05 c9 8c 91 02 85 c0 75 51 49 0f bf c5 48 ff 05 c2 8c 91 02 48 8d 3c 43 48 39 ef 75 3d 48 ff 05 ba 8c 91 02 <8b> 6d 00 66 85 ed 7e 09 48 ff 05 b3 8c 91 02 eb 05 bd 01 00 00 > > > [ 5.328023] RIP: slob_free+0x2bf/0x3d7 RSP: ffffc900000d3d58 > > > [ 5.328023] CR2: ffff88001c4b0000 > > > [ 5.328023] ---[ end trace f8ee1579929b04f0 ]--- > > > > Adding the slub maintainers. Is slob still supposed to work? > > > > The bisection is blaming the ORC unwinder, but I'm having trouble > > finding anything ORC specific about it. I wonder if the disabling of > > frame pointers changed the code generation enough to trigger this bug > > somehow. > > > > Looking at the panic, the code in slob_free() was: > > > > 0: e8 8d f7 ff ff callq 0xfffffffffffff792 > > 5: 48 ff 05 c9 8c 91 02 incq 0x2918cc9(%rip) # 0x2918cd5 > > c: 85 c0 test %eax,%eax > > e: 75 51 jne 0x61 > > 10: 49 0f bf c5 movswq %r13w,%rax > > 14: 48 ff 05 c2 8c 91 02 incq 0x2918cc2(%rip) # 0x2918cdd > > 1b: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi > > 1f: 48 39 ef cmp %rbp,%rdi > > 22: 75 3d jne 0x61 > > 24: 48 ff 05 ba 8c 91 02 incq 0x2918cba(%rip) # 0x2918ce5 > > 2b:* 8b 6d 00 mov 0x0(%rbp),%ebp <-- trapping instruction > > 2e: 66 85 ed test %bp,%bp > > 31: 7e 09 jle 0x3c > > 33: 48 ff 05 b3 8c 91 02 incq 0x2918cb3(%rip) # 0x2918ced > > 3a: eb 05 jmp 0x41 > > 3c: bd .byte 0xbd > > 3d: 01 00 add %eax,(%rax) > > > > The slob_free() code tried to read four bytes at ffff88001c4afffe, and > > ended up reading past the page into a bad area. I think the bad address > > (ffff88001c4afffe) was returned from slob_next() and it panicked trying > > to read s->units in slob_units(). Hello, It looks like a compiler bug. The code of slob_units() try to read two bytes at ffff88001c4afffe. It's valid. But the compiler generates wrong code that try to read four bytes. static slobidx_t slob_units(slob_t *s) { if (s->units > 0) return s->units; return 1; } s->units is defined as two bytes in this setup. Wrongly generated code for this part. 'mov 0x0(%rbp), %ebp' %ebp is four bytes. I guess that this wrong four bytes read cross over the valid memory boundary and this issue happend. Proper code (two bytes read) is generated if different version of gcc is used. If someone knows related compiler people, please Ccing. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-17 7:33 ` Joonsoo Kim @ 2017-10-17 7:50 ` Thomas Gleixner 2017-10-18 7:31 ` Joonsoo Kim 2017-10-18 10:40 ` Linus Torvalds 1 sibling, 1 reply; 21+ messages in thread From: Thomas Gleixner @ 2017-10-17 7:50 UTC (permalink / raw) To: Joonsoo Kim Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton, Christoph Lameter On Tue, 17 Oct 2017, Joonsoo Kim wrote: > On Wed, Oct 11, 2017 at 12:01:20PM -0500, Josh Poimboeuf wrote: > > > Looking at the panic, the code in slob_free() was: > > > > > > 0: e8 8d f7 ff ff callq 0xfffffffffffff792 > > > 5: 48 ff 05 c9 8c 91 02 incq 0x2918cc9(%rip) # 0x2918cd5 > > > c: 85 c0 test %eax,%eax > > > e: 75 51 jne 0x61 > > > 10: 49 0f bf c5 movswq %r13w,%rax > > > 14: 48 ff 05 c2 8c 91 02 incq 0x2918cc2(%rip) # 0x2918cdd > > > 1b: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi > > > 1f: 48 39 ef cmp %rbp,%rdi > > > 22: 75 3d jne 0x61 > > > 24: 48 ff 05 ba 8c 91 02 incq 0x2918cba(%rip) # 0x2918ce5 > > > 2b:* 8b 6d 00 mov 0x0(%rbp),%ebp <-- trapping instruction > > > 2e: 66 85 ed test %bp,%bp > > > 31: 7e 09 jle 0x3c > > > 33: 48 ff 05 b3 8c 91 02 incq 0x2918cb3(%rip) # 0x2918ced > > > 3a: eb 05 jmp 0x41 > > > 3c: bd .byte 0xbd > > > 3d: 01 00 add %eax,(%rax) > > > > > > The slob_free() code tried to read four bytes at ffff88001c4afffe, and > > > ended up reading past the page into a bad area. I think the bad address > > > (ffff88001c4afffe) was returned from slob_next() and it panicked trying > > > to read s->units in slob_units(). > > Hello, > > It looks like a compiler bug. The code of slob_units() try to read two > bytes at ffff88001c4afffe. It's valid. But the compiler generates > wrong code that try to read four bytes. > > static slobidx_t slob_units(slob_t *s) > { > if (s->units > 0) > return s->units; > return 1; > } > > s->units is defined as two bytes in this setup. > > Wrongly generated code for this part. > > 'mov 0x0(%rbp), %ebp' > > %ebp is four bytes. > > I guess that this wrong four bytes read cross over the valid memory > boundary and this issue happend. > > Proper code (two bytes read) is generated if different version of gcc > is used. Which version fails to generate proper code and which versions work? Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-17 7:50 ` Thomas Gleixner @ 2017-10-18 7:31 ` Joonsoo Kim 0 siblings, 0 replies; 21+ messages in thread From: Joonsoo Kim @ 2017-10-18 7:31 UTC (permalink / raw) To: Thomas Gleixner Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton, Christoph Lameter On Tue, Oct 17, 2017 at 09:50:04AM +0200, Thomas Gleixner wrote: > On Tue, 17 Oct 2017, Joonsoo Kim wrote: > > On Wed, Oct 11, 2017 at 12:01:20PM -0500, Josh Poimboeuf wrote: > > > > Looking at the panic, the code in slob_free() was: > > > > > > > > 0: e8 8d f7 ff ff callq 0xfffffffffffff792 > > > > 5: 48 ff 05 c9 8c 91 02 incq 0x2918cc9(%rip) # 0x2918cd5 > > > > c: 85 c0 test %eax,%eax > > > > e: 75 51 jne 0x61 > > > > 10: 49 0f bf c5 movswq %r13w,%rax > > > > 14: 48 ff 05 c2 8c 91 02 incq 0x2918cc2(%rip) # 0x2918cdd > > > > 1b: 48 8d 3c 43 lea (%rbx,%rax,2),%rdi > > > > 1f: 48 39 ef cmp %rbp,%rdi > > > > 22: 75 3d jne 0x61 > > > > 24: 48 ff 05 ba 8c 91 02 incq 0x2918cba(%rip) # 0x2918ce5 > > > > 2b:* 8b 6d 00 mov 0x0(%rbp),%ebp <-- trapping instruction > > > > 2e: 66 85 ed test %bp,%bp > > > > 31: 7e 09 jle 0x3c > > > > 33: 48 ff 05 b3 8c 91 02 incq 0x2918cb3(%rip) # 0x2918ced > > > > 3a: eb 05 jmp 0x41 > > > > 3c: bd .byte 0xbd > > > > 3d: 01 00 add %eax,(%rax) > > > > > > > > The slob_free() code tried to read four bytes at ffff88001c4afffe, and > > > > ended up reading past the page into a bad area. I think the bad address > > > > (ffff88001c4afffe) was returned from slob_next() and it panicked trying > > > > to read s->units in slob_units(). > > > > Hello, > > > > It looks like a compiler bug. The code of slob_units() try to read two > > bytes at ffff88001c4afffe. It's valid. But the compiler generates > > wrong code that try to read four bytes. > > > > static slobidx_t slob_units(slob_t *s) > > { > > if (s->units > 0) > > return s->units; > > return 1; > > } > > > > s->units is defined as two bytes in this setup. > > > > Wrongly generated code for this part. > > > > 'mov 0x0(%rbp), %ebp' > > > > %ebp is four bytes. > > > > I guess that this wrong four bytes read cross over the valid memory > > boundary and this issue happend. > > > > Proper code (two bytes read) is generated if different version of gcc > > is used. > > Which version fails to generate proper code and which versions work? > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and the latest version works fine. I guess that this problem is related to the corner case of some optimization feature since minor code change makes the result different. And, with -O2, proper code is generated even if gcc 4.8 is used. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-17 7:33 ` Joonsoo Kim 2017-10-17 7:50 ` Thomas Gleixner @ 2017-10-18 10:40 ` Linus Torvalds 2017-10-18 13:15 ` Thomas Gleixner 1 sibling, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2017-10-18 10:40 UTC (permalink / raw) To: Joonsoo Kim Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton, Christoph Lameter On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > It looks like a compiler bug. The code of slob_units() try to read two > bytes at ffff88001c4afffe. It's valid. But the compiler generates > wrong code that try to read four bytes. > > static slobidx_t slob_units(slob_t *s) > { > if (s->units > 0) > return s->units; > return 1; > } > > s->units is defined as two bytes in this setup. > > Wrongly generated code for this part. > > 'mov 0x0(%rbp), %ebp' > > %ebp is four bytes. > > I guess that this wrong four bytes read cross over the valid memory > boundary and this issue happend. Hmm. I can see why the compiler would do that (16-bit accesses are slow), but it's definitely wrong. Does it work ok if that slob_units() code is written as static slobidx_t slob_units(slob_t *s) { int units = READ_ONCE(s->units); if (units > 0) return units; return 1; } which might be an acceptable workaround for now? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-18 10:40 ` Linus Torvalds @ 2017-10-18 13:15 ` Thomas Gleixner 2017-10-19 2:14 ` Joonsoo Kim 0 siblings, 1 reply; 21+ messages in thread From: Thomas Gleixner @ 2017-10-18 13:15 UTC (permalink / raw) To: Linus Torvalds Cc: Joonsoo Kim, Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, LKML, LKP, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton, Christoph Lameter On Wed, 18 Oct 2017, Linus Torvalds wrote: > On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > > It looks like a compiler bug. The code of slob_units() try to read two > > bytes at ffff88001c4afffe. It's valid. But the compiler generates > > wrong code that try to read four bytes. > > > > static slobidx_t slob_units(slob_t *s) > > { > > if (s->units > 0) > > return s->units; > > return 1; > > } > > > > s->units is defined as two bytes in this setup. > > > > Wrongly generated code for this part. > > > > 'mov 0x0(%rbp), %ebp' > > > > %ebp is four bytes. > > > > I guess that this wrong four bytes read cross over the valid memory > > boundary and this issue happend. > > Hmm. I can see why the compiler would do that (16-bit accesses are > slow), but it's definitely wrong. > > Does it work ok if that slob_units() code is written as > > static slobidx_t slob_units(slob_t *s) > { > int units = READ_ONCE(s->units); > > if (units > 0) > return units; > return 1; > } > > which might be an acceptable workaround for now? Discussed exactly that with Peter Zijlstra yesterday, but we came to the conclusion that this is a whack a mole game. It might fix this slob issue, but what guarantees that we don't have the same problem in some other place? Just duct taping this particular instance makes me nervous. Joonsoo says: > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and > the latest version works fine. > I guess that this problem is related to the corner case of some > optimization feature since minor code change makes the result > different. And, with -O2, proper code is generated even if gcc 4.8 is > used. So it would be useful to figure out which optimization bit is causing that and blacklist it for the affected compiler versions. Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel 2017-10-18 13:15 ` Thomas Gleixner @ 2017-10-19 2:14 ` Joonsoo Kim 0 siblings, 0 replies; 21+ messages in thread From: Joonsoo Kim @ 2017-10-19 2:14 UTC (permalink / raw) To: Thomas Gleixner Cc: Linus Torvalds, Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, LKML, LKP, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton, Christoph Lameter On Wed, Oct 18, 2017 at 03:15:03PM +0200, Thomas Gleixner wrote: > On Wed, 18 Oct 2017, Linus Torvalds wrote: > > On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > > > > It looks like a compiler bug. The code of slob_units() try to read two > > > bytes at ffff88001c4afffe. It's valid. But the compiler generates > > > wrong code that try to read four bytes. > > > > > > static slobidx_t slob_units(slob_t *s) > > > { > > > if (s->units > 0) > > > return s->units; > > > return 1; > > > } > > > > > > s->units is defined as two bytes in this setup. > > > > > > Wrongly generated code for this part. > > > > > > 'mov 0x0(%rbp), %ebp' > > > > > > %ebp is four bytes. > > > > > > I guess that this wrong four bytes read cross over the valid memory > > > boundary and this issue happend. > > > > Hmm. I can see why the compiler would do that (16-bit accesses are > > slow), but it's definitely wrong. > > > > Does it work ok if that slob_units() code is written as > > > > static slobidx_t slob_units(slob_t *s) > > { > > int units = READ_ONCE(s->units); > > > > if (units > 0) > > return units; > > return 1; > > } > > > > which might be an acceptable workaround for now? > > Discussed exactly that with Peter Zijlstra yesterday, but we came to the > conclusion that this is a whack a mole game. It might fix this slob issue, > but what guarantees that we don't have the same problem in some other > place? Just duct taping this particular instance makes me nervous. I have checked that above patch works fine but I agree with Thomas. > Joonsoo says: > > > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and > > the latest version works fine. > > > I guess that this problem is related to the corner case of some > > optimization feature since minor code change makes the result > > different. And, with -O2, proper code is generated even if gcc 4.8 is > > used. > > So it would be useful to figure out which optimization bit is causing that > and blacklist it for the affected compiler versions. I have tried it but cannot find any clue. What I did is that compiling with -O2 and disabling some options to make option list as same as -Os. Some guide line is roughly mentioned in gcc man page. However, I cannot reproduce the issue by this way. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-10-19 2:11 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20171010121513.GC5445@yexl-desktop> 2017-10-11 2:31 ` [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel Josh Poimboeuf 2017-10-11 17:01 ` Josh Poimboeuf 2017-10-12 17:05 ` Christopher Lameter 2017-10-12 17:54 ` Linus Torvalds 2017-10-12 18:48 ` Andrew Morton 2017-10-12 19:19 ` Geert Uytterhoeven 2017-10-12 17:54 ` Linus Torvalds 2017-10-13 4:45 ` Josh Poimboeuf 2017-10-13 13:56 ` Andrey Ryabinin 2017-10-13 16:19 ` Josh Poimboeuf 2017-10-13 19:09 ` Linus Torvalds 2017-10-13 20:01 ` Andy Lutomirski 2017-10-13 20:17 ` Jeffrey Walton 2017-10-13 15:22 ` Christopher Lameter 2017-10-13 15:37 ` Josh Poimboeuf 2017-10-17 7:33 ` Joonsoo Kim 2017-10-17 7:50 ` Thomas Gleixner 2017-10-18 7:31 ` Joonsoo Kim 2017-10-18 10:40 ` Linus Torvalds 2017-10-18 13:15 ` Thomas Gleixner 2017-10-19 2:14 ` Joonsoo Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).