Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
       [not found] <20171010121513.GC5445@yexl-desktop>
@ 2017-10-11  2:31 ` Josh Poimboeuf
  2017-10-11 17:01   ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2017-10-11  2:31 UTC (permalink / raw)
  To: kernel test robot
  Cc: Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds,
	Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp,
	linux-mm

On Tue, Oct 10, 2017 at 08:15:13PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit (built with gcc-4.8):
> 
> commit: 81d387190039c14edac8de2b3ec789beb899afd9 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> +------------------------------------------+------------+------------+
> |                                          | a34a766ff9 | 81d3871900 |
> +------------------------------------------+------------+------------+
> | boot_successes                           | 24         | 5          |
> | boot_failures                            | 12         | 31         |
> | BUG:kernel_hang_in_test_stage            | 12         | 1          |
> | BUG:unable_to_handle_kernel              | 0          | 30         |
> | Oops:#[##]                               | 0          | 30         |
> | Kernel_panic-not_syncing:Fatal_exception | 0          | 30         |
> +------------------------------------------+------------+------------+
> 
> 
> 
> [    5.324797] BUG: unable to handle kernel paging request at ffff88001c4b0000
> [    5.326126] IP: slob_free+0x2bf/0x3d7
> [    5.328023] PGD 17d9c067 
> [    5.328023] P4D 17d9c067 
> [    5.328023] PUD 17d9d067 
> [    5.328023] PMD 1f91e067 
> [    5.328023] PTE 800000001c4b0060
> [    5.328023] 
> [    5.328023] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [    5.328023] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc1-00044-g81d3871 #1
> [    5.328023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [    5.328023] task: ffff8800002fa000 task.stack: ffffc900000d0000
> [    5.328023] RIP: 0010:slob_free+0x2bf/0x3d7
> [    5.328023] RSP: 0000:ffffc900000d3d58 EFLAGS: 00010002
> [    5.328023] RAX: 0000000000000027 RBX: ffff88001c4affb0 RCX: 0000000000000000
> [    5.328023] RDX: ffff88001c4af000 RSI: 0000000000000000 RDI: ffff88001c4afffe
> [    5.328023] RBP: ffff88001c4afffe R08: 0000000000000001 R09: 0000000000000000
> [    5.328023] R10: ffffea000069a420 R11: ffff88001ffdb000 R12: ffff88001c4aff5c
> [    5.328023] R13: 0000000000000027 R14: 0000000000000027 R15: 0000000000000027
> [    5.328023] FS:  0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
> [    5.328023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.328023] CR2: ffff88001c4b0000 CR3: 0000000016211000 CR4: 00000000000406b0
> [    5.328023] Call Trace:
> [    5.328023]  ? link_target+0xb2/0xc7
> [    5.328023]  kfree+0x158/0x1b6
> [    5.328023]  link_target+0xb2/0xc7
> [    5.328023]  new_node+0x32b/0x4d1
> [    5.328023]  gcov_event+0x33e/0x546
> [    5.328023]  ? gcov_persist_setup+0xbb/0xbb
> [    5.328023]  gcov_enable_events+0x3c/0x89
> [    5.328023]  gcov_fs_init+0x134/0x191
> [    5.328023]  do_one_initcall+0x10e/0x2df
> [    5.328023]  kernel_init_freeable+0x3ec/0x559
> [    5.328023]  ? rest_init+0x145/0x145
> [    5.328023]  kernel_init+0xc/0x1a8
> [    5.328023]  ret_from_fork+0x2a/0x40
> [    5.328023] Code: e8 8d f7 ff ff 48 ff 05 c9 8c 91 02 85 c0 75 51 49 0f bf c5 48 ff 05 c2 8c 91 02 48 8d 3c 43 48 39 ef 75 3d 48 ff 05 ba 8c 91 02 <8b> 6d 00 66 85 ed 7e 09 48 ff 05 b3 8c 91 02 eb 05 bd 01 00 00 
> [    5.328023] RIP: slob_free+0x2bf/0x3d7 RSP: ffffc900000d3d58
> [    5.328023] CR2: ffff88001c4b0000
> [    5.328023] ---[ end trace f8ee1579929b04f0 ]---

Adding the slub maintainers.  Is slob still supposed to work?

The bisection is blaming the ORC unwinder, but I'm having trouble
finding anything ORC specific about it.  I wonder if the disabling of
frame pointers changed the code generation enough to trigger this bug
somehow.

Looking at the panic, the code in slob_free() was:

   0:	e8 8d f7 ff ff       	callq  0xfffffffffffff792
   5:	48 ff 05 c9 8c 91 02 	incq   0x2918cc9(%rip)        # 0x2918cd5
   c:	85 c0                	test   %eax,%eax
   e:	75 51                	jne    0x61
  10:	49 0f bf c5          	movswq %r13w,%rax
  14:	48 ff 05 c2 8c 91 02 	incq   0x2918cc2(%rip)        # 0x2918cdd
  1b:	48 8d 3c 43          	lea    (%rbx,%rax,2),%rdi
  1f:	48 39 ef             	cmp    %rbp,%rdi
  22:	75 3d                	jne    0x61
  24:	48 ff 05 ba 8c 91 02 	incq   0x2918cba(%rip)        # 0x2918ce5
  2b:*	8b 6d 00             	mov    0x0(%rbp),%ebp		<-- trapping instruction
  2e:	66 85 ed             	test   %bp,%bp
  31:	7e 09                	jle    0x3c
  33:	48 ff 05 b3 8c 91 02 	incq   0x2918cb3(%rip)        # 0x2918ced
  3a:	eb 05                	jmp    0x41
  3c:	bd                   	.byte 0xbd
  3d:	01 00                	add    %eax,(%rax)

The slob_free() code tried to read four bytes at ffff88001c4afffe, and
ended up reading past the page into a bad area.  I think the bad address
(ffff88001c4afffe) was returned from slob_next() and it panicked trying
to read s->units in slob_units().

Interestingly, I've found that I get panics when booting with
CONFIG_SLOB enabled, with both ORC and frame pointers:

  general protection fault: 0000 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0-rc1+ #74
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
  Workqueue: crypto mcryptd_flusher
  task: ffff880139a98000 task.stack: ffffc9000082c000
  RIP: 0010:skip_7+0x0/0x67
  RSP: 0000:ffffc9000082fd88 EFLAGS: 00010246
  RAX: ffff880134b65e34 RBX: 00000000f7654321 RCX: 0000000000000003
  RDX: 0000000000000000 RSI: ffffffff81d22039 RDI: ffff880135be0248
  RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8238d260
  R13: ffff88013a7e53a8 R14: 00000000fffb7593 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000001e11000 CR4: 00000000001406f0
  Call Trace:
   sha256_ctx_mgr_flush+0x28/0x30
   sha256_mb_flusher+0x53/0x120
   mcryptd_flusher+0xc4/0xf0
   process_one_work+0x253/0x6b0
   worker_thread+0x4d/0x3b0
   ? preempt_count_sub+0x9b/0x100
   kthread+0x12c/0x150
   ? process_one_work+0x6b0/0x6b0
   ? kthread_create_on_node+0x70/0x70
   ret_from_fork+0x2a/0x40
  Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1
  RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88

I have no idea how that crypto panic could could be related to slob, but
at least it goes away when I switch to slub.

-- 
Josh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
  2017-10-11  2:31 ` [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel Josh Poimboeuf
@ 2017-10-11 17:01   ` Josh Poimboeuf
  2017-10-12 17:05     ` Christopher Lameter
  2017-10-17  7:33     ` Joonsoo Kim
  0 siblings, 2 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2017-10-11 17:01 UTC (permalink / raw)
  To: kernel test robot
  Cc: Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Linus Torvalds,
	Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, lkp,
	linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Christoph Lameter

I failed to add the slab maintainers to CC on the last attempt.  Trying
again.

On Tue, Oct 10, 2017 at 09:31:06PM -0500, Josh Poimboeuf wrote:
> On Tue, Oct 10, 2017 at 08:15:13PM +0800, kernel test robot wrote:
> > 
> > FYI, we noticed the following commit (built with gcc-4.8):
> > 
> > commit: 81d387190039c14edac8de2b3ec789beb899afd9 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > in testcase: boot
> > 
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > 
> > 
> > +------------------------------------------+------------+------------+
> > |                                          | a34a766ff9 | 81d3871900 |
> > +------------------------------------------+------------+------------+
> > | boot_successes                           | 24         | 5          |
> > | boot_failures                            | 12         | 31         |
> > | BUG:kernel_hang_in_test_stage            | 12         | 1          |
> > | BUG:unable_to_handle_kernel              | 0          | 30         |
> > | Oops:#[##]                               | 0          | 30         |
> > | Kernel_panic-not_syncing:Fatal_exception | 0          | 30         |
> > +------------------------------------------+------------+------------+
> > 
> > 
> > 
> > [    5.324797] BUG: unable to handle kernel paging request at ffff88001c4b0000
> > [    5.326126] IP: slob_free+0x2bf/0x3d7
> > [    5.328023] PGD 17d9c067 
> > [    5.328023] P4D 17d9c067 
> > [    5.328023] PUD 17d9d067 
> > [    5.328023] PMD 1f91e067 
> > [    5.328023] PTE 800000001c4b0060
> > [    5.328023] 
> > [    5.328023] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [    5.328023] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc1-00044-g81d3871 #1
> > [    5.328023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > [    5.328023] task: ffff8800002fa000 task.stack: ffffc900000d0000
> > [    5.328023] RIP: 0010:slob_free+0x2bf/0x3d7
> > [    5.328023] RSP: 0000:ffffc900000d3d58 EFLAGS: 00010002
> > [    5.328023] RAX: 0000000000000027 RBX: ffff88001c4affb0 RCX: 0000000000000000
> > [    5.328023] RDX: ffff88001c4af000 RSI: 0000000000000000 RDI: ffff88001c4afffe
> > [    5.328023] RBP: ffff88001c4afffe R08: 0000000000000001 R09: 0000000000000000
> > [    5.328023] R10: ffffea000069a420 R11: ffff88001ffdb000 R12: ffff88001c4aff5c
> > [    5.328023] R13: 0000000000000027 R14: 0000000000000027 R15: 0000000000000027
> > [    5.328023] FS:  0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
> > [    5.328023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    5.328023] CR2: ffff88001c4b0000 CR3: 0000000016211000 CR4: 00000000000406b0
> > [    5.328023] Call Trace:
> > [    5.328023]  ? link_target+0xb2/0xc7
> > [    5.328023]  kfree+0x158/0x1b6
> > [    5.328023]  link_target+0xb2/0xc7
> > [    5.328023]  new_node+0x32b/0x4d1
> > [    5.328023]  gcov_event+0x33e/0x546
> > [    5.328023]  ? gcov_persist_setup+0xbb/0xbb
> > [    5.328023]  gcov_enable_events+0x3c/0x89
> > [    5.328023]  gcov_fs_init+0x134/0x191
> > [    5.328023]  do_one_initcall+0x10e/0x2df
> > [    5.328023]  kernel_init_freeable+0x3ec/0x559
> > [    5.328023]  ? rest_init+0x145/0x145
> > [    5.328023]  kernel_init+0xc/0x1a8
> > [    5.328023]  ret_from_fork+0x2a/0x40
> > [    5.328023] Code: e8 8d f7 ff ff 48 ff 05 c9 8c 91 02 85 c0 75 51 49 0f bf c5 48 ff 05 c2 8c 91 02 48 8d 3c 43 48 39 ef 75 3d 48 ff 05 ba 8c 91 02 <8b> 6d 00 66 85 ed 7e 09 48 ff 05 b3 8c 91 02 eb 05 bd 01 00 00 
> > [    5.328023] RIP: slob_free+0x2bf/0x3d7 RSP: ffffc900000d3d58
> > [    5.328023] CR2: ffff88001c4b0000
> > [    5.328023] ---[ end trace f8ee1579929b04f0 ]---
> 
> Adding the slub maintainers.  Is slob still supposed to work?
> 
> The bisection is blaming the ORC unwinder, but I'm having trouble
> finding anything ORC specific about it.  I wonder if the disabling of
> frame pointers changed the code generation enough to trigger this bug
> somehow.
> 
> Looking at the panic, the code in slob_free() was:
> 
>    0:	e8 8d f7 ff ff       	callq  0xfffffffffffff792
>    5:	48 ff 05 c9 8c 91 02 	incq   0x2918cc9(%rip)        # 0x2918cd5
>    c:	85 c0                	test   %eax,%eax
>    e:	75 51                	jne    0x61
>   10:	49 0f bf c5          	movswq %r13w,%rax
>   14:	48 ff 05 c2 8c 91 02 	incq   0x2918cc2(%rip)        # 0x2918cdd
>   1b:	48 8d 3c 43          	lea    (%rbx,%rax,2),%rdi
>   1f:	48 39 ef             	cmp    %rbp,%rdi
>   22:	75 3d                	jne    0x61
>   24:	48 ff 05 ba 8c 91 02 	incq   0x2918cba(%rip)        # 0x2918ce5
>   2b:*	8b 6d 00             	mov    0x0(%rbp),%ebp		<-- trapping instruction
>   2e:	66 85 ed             	test   %bp,%bp
>   31:	7e 09                	jle    0x3c
>   33:	48 ff 05 b3 8c 91 02 	incq   0x2918cb3(%rip)        # 0x2918ced
>   3a:	eb 05                	jmp    0x41
>   3c:	bd                   	.byte 0xbd
>   3d:	01 00                	add    %eax,(%rax)
> 
> The slob_free() code tried to read four bytes at ffff88001c4afffe, and
> ended up reading past the page into a bad area.  I think the bad address
> (ffff88001c4afffe) was returned from slob_next() and it panicked trying
> to read s->units in slob_units().
> 
> Interestingly, I've found that I get panics when booting with
> CONFIG_SLOB enabled, with both ORC and frame pointers:
> 
>   general protection fault: 0000 [#1] PREEMPT SMP
>   Modules linked in:
>   CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0-rc1+ #74
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
>   Workqueue: crypto mcryptd_flusher
>   task: ffff880139a98000 task.stack: ffffc9000082c000
>   RIP: 0010:skip_7+0x0/0x67
>   RSP: 0000:ffffc9000082fd88 EFLAGS: 00010246
>   RAX: ffff880134b65e34 RBX: 00000000f7654321 RCX: 0000000000000003
>   RDX: 0000000000000000 RSI: ffffffff81d22039 RDI: ffff880135be0248
>   RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001
>   R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8238d260
>   R13: ffff88013a7e53a8 R14: 00000000fffb7593 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000000 CR3: 0000000001e11000 CR4: 00000000001406f0
>   Call Trace:
>    sha256_ctx_mgr_flush+0x28/0x30
>    sha256_mb_flusher+0x53/0x120
>    mcryptd_flusher+0xc4/0xf0
>    process_one_work+0x253/0x6b0
>    worker_thread+0x4d/0x3b0
>    ? preempt_count_sub+0x9b/0x100
>    kthread+0x12c/0x150
>    ? process_one_work+0x6b0/0x6b0
>    ? kthread_create_on_node+0x70/0x70
>    ret_from_fork+0x2a/0x40
>   Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1
>   RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88
> 
> I have no idea how that crypto panic could could be related to slob, but
> at least it goes away when I switch to slub.

-- 
Josh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
  2017-10-11 17:01   ` Josh Poimboeuf
@ 2017-10-12 17:05     ` Christopher Lameter
  2017-10-12 17:54       ` Linus Torvalds
                         ` (2 more replies)
  2017-10-17  7:33     ` Joonsoo Kim
  1 sibling, 3 replies; 21+ messages in thread
From: Christopher Lameter @ 2017-10-12 17:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner,
	LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Wed, 11 Oct 2017, Josh Poimboeuf wrote:

> I failed to add the slab maintainers to CC on the last attempt.  Trying
> again.


Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
allocator and the K&R mechanism that was used in the early kernels.

> > Adding the slub maintainers.  Is slob still supposed to work?

Have not seen anyone using it in a decade or so.

Does the same config with SLUB and slub_debug on the commandline run
cleanly?

> > I have no idea how that crypto panic could could be related to slob, but
> > at least it goes away when I switch to slub.

Can you run SLUB with full debug? specify slub_debug on the commandline or
set CONFIG_SLUB_DEBUG_ON

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-12 17:05     ` Christopher Lameter
@ 2017-10-12 17:54       ` Linus Torvalds
  2017-10-12 18:48         ` Andrew Morton
  2017-10-12 17:54       ` Linus Torvalds
  2017-10-13  4:45       ` Josh Poimboeuf
  2 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2017-10-12 17:54 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML,
	lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote:
> On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
>
>> I failed to add the slab maintainers to CC on the last attempt.  Trying
>> again.
>
> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
> allocator and the K&R mechanism that was used in the early kernels.

Should we finally just get rid of SLOB?

I'm not happy about the whole "three different allocators" crap. It's
been there for much too long, and I've tried to cut it down before.
People always protest, but three different allocators, one of which
gets basically no testing, is not good.

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-12 17:05     ` Christopher Lameter
  2017-10-12 17:54       ` Linus Torvalds
@ 2017-10-12 17:54       ` Linus Torvalds
  2017-10-13  4:45       ` Josh Poimboeuf
  2 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2017-10-12 17:54 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML,
	lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote:
> On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
>
>> I failed to add the slab maintainers to CC on the last attempt.  Trying
>> again.
>
> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
> allocator and the K&R mechanism that was used in the early kernels.

Should we finally just get rid of SLOB?

I'm not happy about the whole "three different allocators" crap. It's
been there for much too long, and I've tried to cut it down before.
People always protest, but three different allocators, one of which
gets basically no testing, is not good.

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-12 17:54       ` Linus Torvalds
@ 2017-10-12 18:48         ` Andrew Morton
  2017-10-12 19:19           ` Geert Uytterhoeven
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2017-10-12 18:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christopher Lameter, Josh Poimboeuf, kernel test robot,
	Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith,
	Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Matt Mackall

On Thu, 12 Oct 2017 10:54:57 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote:
> > On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
> >
> >> I failed to add the slab maintainers to CC on the last attempt.  Trying
> >> again.
> >
> > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
> > allocator and the K&R mechanism that was used in the early kernels.
> 
> Should we finally just get rid of SLOB?
> 
> I'm not happy about the whole "three different allocators" crap. It's
> been there for much too long, and I've tried to cut it down before.
> People always protest, but three different allocators, one of which
> gets basically no testing, is not good.
> 

I am not aware of anyone using slob.  We could disable it in Kconfig
for a year, see what the feedback looks like.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-12 18:48         ` Andrew Morton
@ 2017-10-12 19:19           ` Geert Uytterhoeven
  0 siblings, 0 replies; 21+ messages in thread
From: Geert Uytterhoeven @ 2017-10-12 19:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Christopher Lameter, Josh Poimboeuf,
	kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP,
	Linux MM, Pekka Enberg, David Rientjes, Joonsoo Kim, Matt Mackall

On Thu, Oct 12, 2017 at 8:48 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 12 Oct 2017 10:54:57 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> On Thu, Oct 12, 2017 at 10:05 AM, Christopher Lameter <cl@linux.com> wrote:
>> > On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
>> >
>> >> I failed to add the slab maintainers to CC on the last attempt.  Trying
>> >> again.
>> >
>> > Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
>> > allocator and the K&R mechanism that was used in the early kernels.
>>
>> Should we finally just get rid of SLOB?
>>
>> I'm not happy about the whole "three different allocators" crap. It's
>> been there for much too long, and I've tried to cut it down before.
>> People always protest, but three different allocators, one of which
>> gets basically no testing, is not good.
>
> I am not aware of anyone using slob.  We could disable it in Kconfig
> for a year, see what the feedback looks like.

$ git grep CONFIG_SLOB=y
arch/arm/configs/clps711x_defconfig:CONFIG_SLOB=y
arch/arm/configs/collie_defconfig:CONFIG_SLOB=y
arch/arm/configs/multi_v4t_defconfig:CONFIG_SLOB=y
arch/arm/configs/omap1_defconfig:CONFIG_SLOB=y
arch/arm/configs/pxa_defconfig:CONFIG_SLOB=y
arch/arm/configs/tct_hammer_defconfig:CONFIG_SLOB=y
arch/arm/configs/xcep_defconfig:CONFIG_SLOB=y
arch/blackfin/configs/DNP5370_defconfig:CONFIG_SLOB=y
arch/h8300/configs/edosk2674_defconfig:CONFIG_SLOB=y
arch/h8300/configs/h8300h-sim_defconfig:CONFIG_SLOB=y
arch/h8300/configs/h8s-sim_defconfig:CONFIG_SLOB=y
arch/openrisc/configs/or1ksim_defconfig:CONFIG_SLOB=y
arch/sh/configs/rsk7201_defconfig:CONFIG_SLOB=y
arch/sh/configs/rsk7203_defconfig:CONFIG_SLOB=y
arch/sh/configs/se7206_defconfig:CONFIG_SLOB=y
arch/sh/configs/shmin_defconfig:CONFIG_SLOB=y
arch/sh/configs/shx3_defconfig:CONFIG_SLOB=y
kernel/configs/tiny.config:CONFIG_SLOB=y
$

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
  2017-10-12 17:05     ` Christopher Lameter
  2017-10-12 17:54       ` Linus Torvalds
  2017-10-12 17:54       ` Linus Torvalds
@ 2017-10-13  4:45       ` Josh Poimboeuf
  2017-10-13 13:56         ` Andrey Ryabinin
  2017-10-13 15:22         ` Christopher Lameter
  2 siblings, 2 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2017-10-13  4:45 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner,
	LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote:
> On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
> 
> > I failed to add the slab maintainers to CC on the last attempt.  Trying
> > again.
> 
> 
> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
> allocator and the K&R mechanism that was used in the early kernels.
> 
> > > Adding the slub maintainers.  Is slob still supposed to work?
> 
> Have not seen anyone using it in a decade or so.
> 
> Does the same config with SLUB and slub_debug on the commandline run
> cleanly?
> 
> > > I have no idea how that crypto panic could could be related to slob, but
> > > at least it goes away when I switch to slub.
> 
> Can you run SLUB with full debug? specify slub_debug on the commandline or
> set CONFIG_SLUB_DEBUG_ON

Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I
got with CONFIG_SLOB.  The trapping instruction is:

  vmovdqa 0x140(%rdi),%xmm0
  
I'll try to bisect it tomorrow.  It at least goes back to v4.10.  I'm
not really sure whether this panic is related to SLUB or SLOB at all.
(Though the original panic reported upthread by the kernel test robot
*does* look SLOB related.)

  general protection fault: 0000 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0 #81
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
  Workqueue: crypto mcryptd_flusher
  task: ffff880139108040 task.stack: ffffc9000082c000
  RIP: 0010:skip_7+0x0/0x67
  RSP: 0018:ffffc9000082fd88 EFLAGS: 00010246
  RAX: ffff88013834172c RBX: 00000000f7654321 RCX: 0000000000000003
  RDX: 0000000000000000 RSI: ffffffff81d254f9 RDI: ffff8801381b1a88
  RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff82392260
  R13: ffff88013a7e6500 R14: 00000000fffb80f5 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f88491ef914 CR3: 0000000001e11000 CR4: 00000000001406f0
  Call Trace:
   sha256_ctx_mgr_flush+0x28/0x30
   sha256_mb_flusher+0x53/0x120
   mcryptd_flusher+0xc4/0xf0
   process_one_work+0x253/0x6b0
   worker_thread+0x4d/0x3b0
   ? preempt_count_sub+0x9b/0x100
   kthread+0x133/0x150
   ? process_one_work+0x6b0/0x6b0
   ? kthread_create_on_node+0x70/0x70
   ret_from_fork+0x2a/0x40
  Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1
  RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88
  ---[ end trace d89a1613b7d1b8bc ]---
  BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33
  in_atomic(): 1, irqs_disabled(): 0, pid: 58, name: kworker/0:1
  INFO: lockdep is turned off.
  Preemption disabled at:
  [<ffffffff81041933>] kernel_fpu_begin+0x13/0x20
  CPU: 0 PID: 58 Comm: kworker/0:1 Tainted: G      D         4.13.0 #81
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
  Workqueue: crypto mcryptd_flusher
  Call Trace:
   dump_stack+0x8e/0xcd
   ___might_sleep+0x185/0x260
   __might_sleep+0x4a/0x80
   exit_signals+0x33/0x2d0
   do_exit+0xb4/0xd80
   ? kthread+0x133/0x150
   rewind_stack_do_exit+0x17/0x20
  note: kworker/0:1[58] exited with preempt_count 1
  tsc: Refined TSC clocksource calibration: 2793.538 MHz
  clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x28446877189, max_idle_ns: 440795280878 ns

-- 
Josh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-13  4:45       ` Josh Poimboeuf
@ 2017-10-13 13:56         ` Andrey Ryabinin
  2017-10-13 16:19           ` Josh Poimboeuf
  2017-10-13 19:09           ` Linus Torvalds
  2017-10-13 15:22         ` Christopher Lameter
  1 sibling, 2 replies; 21+ messages in thread
From: Andrey Ryabinin @ 2017-10-13 13:56 UTC (permalink / raw)
  To: Josh Poimboeuf, Christopher Lameter
  Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner,
	LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Megha Dey, Herbert Xu, David S. Miller,
	linux-crypto

On 10/13/2017 07:45 AM, Josh Poimboeuf wrote:
> On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote:
>> On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
>>
>>> I failed to add the slab maintainers to CC on the last attempt.  Trying
>>> again.
>>
>>
>> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
>> allocator and the K&R mechanism that was used in the early kernels.
>>
>>>> Adding the slub maintainers.  Is slob still supposed to work?
>>
>> Have not seen anyone using it in a decade or so.
>>
>> Does the same config with SLUB and slub_debug on the commandline run
>> cleanly?
>>
>>>> I have no idea how that crypto panic could could be related to slob, but
>>>> at least it goes away when I switch to slub.
>>
>> Can you run SLUB with full debug? specify slub_debug on the commandline or
>> set CONFIG_SLUB_DEBUG_ON
> 
> Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I
> got with CONFIG_SLOB.  The trapping instruction is:
> 
>   vmovdqa 0x140(%rdi),%xmm0


It's unaligned access. Look at %rdi. vmovdqa requires 16-byte alignment.
Apparently, something fed kmalloc()'ed data here. But kmalloc() guarantees only sizeof(unsigned long)
alignment. slub_debug changes slub's objects layout, so what happened to be 16-bytes aligned
without slub_debug, may become 8-byte aligned with slub_debg on.

   
> I'll try to bisect it tomorrow.  It at least goes back to v4.10.

Probably no point. I bet this bug always was here (since this code added).

This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix
would be to align the data properly?

---
 arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
index 8fe6338bcc84..7fd5d9b568c7 100644
--- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
+++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
@@ -155,8 +155,8 @@ LABEL skip_ %I
 .endr
 
 	# Find min length
-	vmovdqa _lens+0*16(state), %xmm0
-	vmovdqa _lens+1*16(state), %xmm1
+	vmovdqu _lens+0*16(state), %xmm0
+	vmovdqu _lens+1*16(state), %xmm1
 
 	vpminud %xmm1, %xmm0, %xmm2		# xmm2 has {D,C,B,A}
 	vpalignr $8, %xmm2, %xmm3, %xmm3	# xmm3 has {x,x,D,C}
@@ -176,8 +176,8 @@ LABEL skip_ %I
 	vpsubd	%xmm2, %xmm0, %xmm0
 	vpsubd	%xmm2, %xmm1, %xmm1
 
-	vmovdqa	%xmm0, _lens+0*16(state)
-	vmovdqa	%xmm1, _lens+1*16(state)
+	vmovdqu	%xmm0, _lens+0*16(state)
+	vmovdqu	%xmm1, _lens+1*16(state)
 
 	# "state" and "args" are the same address, arg1
 	# len is arg2
-- 
2.13.6



> I'm
> not really sure whether this panic is related to SLUB or SLOB at all.
> (Though the original panic reported upthread by the kernel test robot
> *does* look SLOB related.)
> 
>   general protection fault: 0000 [#1] PREEMPT SMP
>   Modules linked in:
>   CPU: 0 PID: 58 Comm: kworker/0:1 Not tainted 4.13.0 #81
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
>   Workqueue: crypto mcryptd_flusher
>   task: ffff880139108040 task.stack: ffffc9000082c000
>   RIP: 0010:skip_7+0x0/0x67
>   RSP: 0018:ffffc9000082fd88 EFLAGS: 00010246
>   RAX: ffff88013834172c RBX: 00000000f7654321 RCX: 0000000000000003
>   RDX: 0000000000000000 RSI: ffffffff81d254f9 RDI: ffff8801381b1a88
>   RBP: ffffc9000082fd90 R08: 0000000000000000 R09: 0000000000000001
>   R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff82392260
>   R13: ffff88013a7e6500 R14: 00000000fffb80f5 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff88013a600000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f88491ef914 CR3: 0000000001e11000 CR4: 00000000001406f0
>   Call Trace:
>    sha256_ctx_mgr_flush+0x28/0x30
>    sha256_mb_flusher+0x53/0x120
>    mcryptd_flusher+0xc4/0xf0
>    process_one_work+0x253/0x6b0
>    worker_thread+0x4d/0x3b0
>    ? preempt_count_sub+0x9b/0x100
>    kthread+0x133/0x150
>    ? process_one_work+0x6b0/0x6b0
>    ? kthread_create_on_node+0x70/0x70
>    ret_from_fork+0x2a/0x40
>   Code: 89 87 30 01 00 00 c7 87 58 01 00 00 ff ff ff ff 48 83 bf a0 01 00 00 00 75 11 48 89 87 38 01 00 00 c7 87 5c 01 00 00 ff ff ff ff <c5> f9 6f 87 40 01 00 00 c5 f9 6f 8f 50 01 00 00 c4 e2 79 3b d1
>   RIP: skip_7+0x0/0x67 RSP: ffffc9000082fd88
>   ---[ end trace d89a1613b7d1b8bc ]---
>   BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33
>   in_atomic(): 1, irqs_disabled(): 0, pid: 58, name: kworker/0:1
>   INFO: lockdep is turned off.
>   Preemption disabled at:
>   [<ffffffff81041933>] kernel_fpu_begin+0x13/0x20
>   CPU: 0 PID: 58 Comm: kworker/0:1 Tainted: G      D         4.13.0 #81
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
>   Workqueue: crypto mcryptd_flusher
>   Call Trace:
>    dump_stack+0x8e/0xcd
>    ___might_sleep+0x185/0x260
>    __might_sleep+0x4a/0x80
>    exit_signals+0x33/0x2d0
>    do_exit+0xb4/0xd80
>    ? kthread+0x133/0x150
>    rewind_stack_do_exit+0x17/0x20
>   note: kworker/0:1[58] exited with preempt_count 1
>   tsc: Refined TSC clocksource calibration: 2793.538 MHz
>   clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x28446877189, max_idle_ns: 440795280878 ns
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
  2017-10-13  4:45       ` Josh Poimboeuf
  2017-10-13 13:56         ` Andrey Ryabinin
@ 2017-10-13 15:22         ` Christopher Lameter
  2017-10-13 15:37           ` Josh Poimboeuf
  1 sibling, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2017-10-13 15:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner,
	LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Thu, 12 Oct 2017, Josh Poimboeuf wrote:

> > Can you run SLUB with full debug? specify slub_debug on the commandline or
> > set CONFIG_SLUB_DEBUG_ON
>
> Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I
> got with CONFIG_SLOB.  The trapping instruction is:
>
>   vmovdqa 0x140(%rdi),%xmm0
>
> I'll try to bisect it tomorrow.  It at least goes back to v4.10.  I'm
> not really sure whether this panic is related to SLUB or SLOB at all.

Guess not. The slab allocators can fail if the metadata gets corrupted.
That is why we have extensive debug modes so we can find who is to blame
for corruptions.

> (Though the original panic reported upthread by the kernel test robot
> *does* look SLOB related.)

Yup. Just happened to be configured for SLOB then.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
  2017-10-13 15:22         ` Christopher Lameter
@ 2017-10-13 15:37           ` Josh Poimboeuf
  0 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2017-10-13 15:37 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner,
	LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton

On Fri, Oct 13, 2017 at 10:22:54AM -0500, Christopher Lameter wrote:
> On Thu, 12 Oct 2017, Josh Poimboeuf wrote:
> 
> > > Can you run SLUB with full debug? specify slub_debug on the commandline or
> > > set CONFIG_SLUB_DEBUG_ON
> >
> > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I
> > got with CONFIG_SLOB.  The trapping instruction is:
> >
> >   vmovdqa 0x140(%rdi),%xmm0
> >
> > I'll try to bisect it tomorrow.  It at least goes back to v4.10.  I'm
> > not really sure whether this panic is related to SLUB or SLOB at all.
> 
> Guess not. The slab allocators can fail if the metadata gets corrupted.
> That is why we have extensive debug modes so we can find who is to blame
> for corruptions.
> 
> > (Though the original panic reported upthread by the kernel test robot
> > *does* look SLOB related.)
> 
> Yup. Just happened to be configured for SLOB then.

Just to clarify, the upthread panic in SLOB is *not* related to the
crypto issue.  So somebody still needs to look at that one:

  https://lkml.kernel.org/r/20171010121513.GC5445@yexl-desktop

-- 
Josh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-13 13:56         ` Andrey Ryabinin
@ 2017-10-13 16:19           ` Josh Poimboeuf
  2017-10-13 19:09           ` Linus Torvalds
  1 sibling, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2017-10-13 16:19 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Christopher Lameter, kernel test robot, Ingo Molnar,
	Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	H. Peter Anvin, Jiri Slaby, Linus Torvalds, Mike Galbraith,
	Peter Zijlstra, Thomas Gleixner, LKML, lkp, linux-mm,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Megha Dey, Herbert Xu, David S. Miller, linux-crypto

On Fri, Oct 13, 2017 at 04:56:43PM +0300, Andrey Ryabinin wrote:
> On 10/13/2017 07:45 AM, Josh Poimboeuf wrote:
> > On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote:
> >> On Wed, 11 Oct 2017, Josh Poimboeuf wrote:
> >>
> >>> I failed to add the slab maintainers to CC on the last attempt.  Trying
> >>> again.
> >>
> >>
> >> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple
> >> allocator and the K&R mechanism that was used in the early kernels.
> >>
> >>>> Adding the slub maintainers.  Is slob still supposed to work?
> >>
> >> Have not seen anyone using it in a decade or so.
> >>
> >> Does the same config with SLUB and slub_debug on the commandline run
> >> cleanly?
> >>
> >>>> I have no idea how that crypto panic could could be related to slob, but
> >>>> at least it goes away when I switch to slub.
> >>
> >> Can you run SLUB with full debug? specify slub_debug on the commandline or
> >> set CONFIG_SLUB_DEBUG_ON
> > 
> > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I
> > got with CONFIG_SLOB.  The trapping instruction is:
> > 
> >   vmovdqa 0x140(%rdi),%xmm0
> 
> 
> It's unaligned access. Look at %rdi. vmovdqa requires 16-byte alignment.
> Apparently, something fed kmalloc()'ed data here. But kmalloc() guarantees only sizeof(unsigned long)
> alignment. slub_debug changes slub's objects layout, so what happened to be 16-bytes aligned
> without slub_debug, may become 8-byte aligned with slub_debg on.
> 
>    
> > I'll try to bisect it tomorrow.  It at least goes back to v4.10.
> 
> Probably no point. I bet this bug always was here (since this code added).
> 
> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix
> would be to align the data properly?
> 
> ---
>  arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
> index 8fe6338bcc84..7fd5d9b568c7 100644
> --- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
> +++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
> @@ -155,8 +155,8 @@ LABEL skip_ %I
>  .endr
>  
>  	# Find min length
> -	vmovdqa _lens+0*16(state), %xmm0
> -	vmovdqa _lens+1*16(state), %xmm1
> +	vmovdqu _lens+0*16(state), %xmm0
> +	vmovdqu _lens+1*16(state), %xmm1
>  
>  	vpminud %xmm1, %xmm0, %xmm2		# xmm2 has {D,C,B,A}
>  	vpalignr $8, %xmm2, %xmm3, %xmm3	# xmm3 has {x,x,D,C}
> @@ -176,8 +176,8 @@ LABEL skip_ %I
>  	vpsubd	%xmm2, %xmm0, %xmm0
>  	vpsubd	%xmm2, %xmm1, %xmm1
>  
> -	vmovdqa	%xmm0, _lens+0*16(state)
> -	vmovdqa	%xmm1, _lens+1*16(state)
> +	vmovdqu	%xmm0, _lens+0*16(state)
> +	vmovdqu	%xmm1, _lens+1*16(state)
>  
>  	# "state" and "args" are the same address, arg1
>  	# len is arg2
> -- 
> 2.13.6

Makes sense.  I can confirm that the above patch fixes the panic.

-- 
Josh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-13 13:56         ` Andrey Ryabinin
  2017-10-13 16:19           ` Josh Poimboeuf
@ 2017-10-13 19:09           ` Linus Torvalds
  2017-10-13 20:01             ` Andy Lutomirski
  2017-10-13 20:17             ` Jeffrey Walton
  1 sibling, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2017-10-13 19:09 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Josh Poimboeuf, Christopher Lameter, kernel test robot,
	Ingo Molnar, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, H. Peter Anvin, Jiri Slaby, Mike Galbraith,
	Peter Zijlstra, Thomas Gleixner, LKML, LKP, linux-mm,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Megha Dey, Herbert Xu, David S. Miller, Linux Crypto Mailing List

On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix
> would be to align the data properly?

I suspect anything that has the SHA extensions should also do
unaligned loads efficiently. The whole "aligned only" model is broken.
It's just doing two loads from the state pointer, there's likely no
point in trying to align it.

So your patch looks fine, but maybe somebody could add the required
alignment to the sha256 context allocation (which I don't know where
it is).

But yeah, that other SLOB panic looks unrelated to this.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-13 19:09           ` Linus Torvalds
@ 2017-10-13 20:01             ` Andy Lutomirski
  2017-10-13 20:17             ` Jeffrey Walton
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2017-10-13 20:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrey Ryabinin, Josh Poimboeuf, Christopher Lameter,
	kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP,
	linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Megha Dey, Herbert Xu, David S. Miller,
	Linux Crypto Mailing List

On Fri, Oct 13, 2017 at 12:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
>>
>> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix
>> would be to align the data properly?
>
> I suspect anything that has the SHA extensions should also do
> unaligned loads efficiently. The whole "aligned only" model is broken.
> It's just doing two loads from the state pointer, there's likely no
> point in trying to align it.
>
> So your patch looks fine, but maybe somebody could add the required
> alignment to the sha256 context allocation (which I don't know where
> it is).

IIRC if we try the latter, then we'll risk hitting the #*!&@% gcc bug
that mostly prevents 16-byte alignment from working on GCC before 4.8
or so.  That way lies debugging disasters.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-13 19:09           ` Linus Torvalds
  2017-10-13 20:01             ` Andy Lutomirski
@ 2017-10-13 20:17             ` Jeffrey Walton
  1 sibling, 0 replies; 21+ messages in thread
From: Jeffrey Walton @ 2017-10-13 20:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrey Ryabinin, Josh Poimboeuf, Christopher Lameter,
	kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML, LKP,
	linux-mm, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Megha Dey, Herbert Xu, David S. Miller,
	Linux Crypto Mailing List

On Fri, Oct 13, 2017 at 3:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
>>
>> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix
>> would be to align the data properly?
>
> I suspect anything that has the SHA extensions should also do
> unaligned loads efficiently. The whole "aligned only" model is broken.
> It's just doing two loads from the state pointer, there's likely no
> point in trying to align it.

+1, good engineering.

AVX2 requires 32-byte buffer alignment in some places. It is trickier
than this use case because __BIGGEST_ALIGNMENT__ doubled, but a lot of
code still assumes 16-bytes.

Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig]  81d3871900: BUG:unable_to_handle_kernel
  2017-10-11 17:01   ` Josh Poimboeuf
  2017-10-12 17:05     ` Christopher Lameter
@ 2017-10-17  7:33     ` Joonsoo Kim
  2017-10-17  7:50       ` Thomas Gleixner
  2017-10-18 10:40       ` Linus Torvalds
  1 sibling, 2 replies; 21+ messages in thread
From: Joonsoo Kim @ 2017-10-17  7:33 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel test robot, Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Jiri Slaby,
	Linus Torvalds, Mike Galbraith, Peter Zijlstra, Thomas Gleixner,
	LKML, lkp, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton,
	Christoph Lameter

On Wed, Oct 11, 2017 at 12:01:20PM -0500, Josh Poimboeuf wrote:
> I failed to add the slab maintainers to CC on the last attempt.  Trying
> again.
> 
> On Tue, Oct 10, 2017 at 09:31:06PM -0500, Josh Poimboeuf wrote:
> > On Tue, Oct 10, 2017 at 08:15:13PM +0800, kernel test robot wrote:
> > > 
> > > FYI, we noticed the following commit (built with gcc-4.8):
> > > 
> > > commit: 81d387190039c14edac8de2b3ec789beb899afd9 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > > in testcase: boot
> > > 
> > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M
> > > 
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > > 
> > > 
> > > +------------------------------------------+------------+------------+
> > > |                                          | a34a766ff9 | 81d3871900 |
> > > +------------------------------------------+------------+------------+
> > > | boot_successes                           | 24         | 5          |
> > > | boot_failures                            | 12         | 31         |
> > > | BUG:kernel_hang_in_test_stage            | 12         | 1          |
> > > | BUG:unable_to_handle_kernel              | 0          | 30         |
> > > | Oops:#[##]                               | 0          | 30         |
> > > | Kernel_panic-not_syncing:Fatal_exception | 0          | 30         |
> > > +------------------------------------------+------------+------------+
> > > 
> > > 
> > > 
> > > [    5.324797] BUG: unable to handle kernel paging request at ffff88001c4b0000
> > > [    5.326126] IP: slob_free+0x2bf/0x3d7
> > > [    5.328023] PGD 17d9c067 
> > > [    5.328023] P4D 17d9c067 
> > > [    5.328023] PUD 17d9d067 
> > > [    5.328023] PMD 1f91e067 
> > > [    5.328023] PTE 800000001c4b0060
> > > [    5.328023] 
> > > [    5.328023] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [    5.328023] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc1-00044-g81d3871 #1
> > > [    5.328023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > [    5.328023] task: ffff8800002fa000 task.stack: ffffc900000d0000
> > > [    5.328023] RIP: 0010:slob_free+0x2bf/0x3d7
> > > [    5.328023] RSP: 0000:ffffc900000d3d58 EFLAGS: 00010002
> > > [    5.328023] RAX: 0000000000000027 RBX: ffff88001c4affb0 RCX: 0000000000000000
> > > [    5.328023] RDX: ffff88001c4af000 RSI: 0000000000000000 RDI: ffff88001c4afffe
> > > [    5.328023] RBP: ffff88001c4afffe R08: 0000000000000001 R09: 0000000000000000
> > > [    5.328023] R10: ffffea000069a420 R11: ffff88001ffdb000 R12: ffff88001c4aff5c
> > > [    5.328023] R13: 0000000000000027 R14: 0000000000000027 R15: 0000000000000027
> > > [    5.328023] FS:  0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
> > > [    5.328023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    5.328023] CR2: ffff88001c4b0000 CR3: 0000000016211000 CR4: 00000000000406b0
> > > [    5.328023] Call Trace:
> > > [    5.328023]  ? link_target+0xb2/0xc7
> > > [    5.328023]  kfree+0x158/0x1b6
> > > [    5.328023]  link_target+0xb2/0xc7
> > > [    5.328023]  new_node+0x32b/0x4d1
> > > [    5.328023]  gcov_event+0x33e/0x546
> > > [    5.328023]  ? gcov_persist_setup+0xbb/0xbb
> > > [    5.328023]  gcov_enable_events+0x3c/0x89
> > > [    5.328023]  gcov_fs_init+0x134/0x191
> > > [    5.328023]  do_one_initcall+0x10e/0x2df
> > > [    5.328023]  kernel_init_freeable+0x3ec/0x559
> > > [    5.328023]  ? rest_init+0x145/0x145
> > > [    5.328023]  kernel_init+0xc/0x1a8
> > > [    5.328023]  ret_from_fork+0x2a/0x40
> > > [    5.328023] Code: e8 8d f7 ff ff 48 ff 05 c9 8c 91 02 85 c0 75 51 49 0f bf c5 48 ff 05 c2 8c 91 02 48 8d 3c 43 48 39 ef 75 3d 48 ff 05 ba 8c 91 02 <8b> 6d 00 66 85 ed 7e 09 48 ff 05 b3 8c 91 02 eb 05 bd 01 00 00 
> > > [    5.328023] RIP: slob_free+0x2bf/0x3d7 RSP: ffffc900000d3d58
> > > [    5.328023] CR2: ffff88001c4b0000
> > > [    5.328023] ---[ end trace f8ee1579929b04f0 ]---
> > 
> > Adding the slub maintainers.  Is slob still supposed to work?
> > 
> > The bisection is blaming the ORC unwinder, but I'm having trouble
> > finding anything ORC specific about it.  I wonder if the disabling of
> > frame pointers changed the code generation enough to trigger this bug
> > somehow.
> > 
> > Looking at the panic, the code in slob_free() was:
> > 
> >    0:	e8 8d f7 ff ff       	callq  0xfffffffffffff792
> >    5:	48 ff 05 c9 8c 91 02 	incq   0x2918cc9(%rip)        # 0x2918cd5
> >    c:	85 c0                	test   %eax,%eax
> >    e:	75 51                	jne    0x61
> >   10:	49 0f bf c5          	movswq %r13w,%rax
> >   14:	48 ff 05 c2 8c 91 02 	incq   0x2918cc2(%rip)        # 0x2918cdd
> >   1b:	48 8d 3c 43          	lea    (%rbx,%rax,2),%rdi
> >   1f:	48 39 ef             	cmp    %rbp,%rdi
> >   22:	75 3d                	jne    0x61
> >   24:	48 ff 05 ba 8c 91 02 	incq   0x2918cba(%rip)        # 0x2918ce5
> >   2b:*	8b 6d 00             	mov    0x0(%rbp),%ebp		<-- trapping instruction
> >   2e:	66 85 ed             	test   %bp,%bp
> >   31:	7e 09                	jle    0x3c
> >   33:	48 ff 05 b3 8c 91 02 	incq   0x2918cb3(%rip)        # 0x2918ced
> >   3a:	eb 05                	jmp    0x41
> >   3c:	bd                   	.byte 0xbd
> >   3d:	01 00                	add    %eax,(%rax)
> > 
> > The slob_free() code tried to read four bytes at ffff88001c4afffe, and
> > ended up reading past the page into a bad area.  I think the bad address
> > (ffff88001c4afffe) was returned from slob_next() and it panicked trying
> > to read s->units in slob_units().

Hello,

It looks like a compiler bug. The code of slob_units() try to read two
bytes at ffff88001c4afffe. It's valid. But the compiler generates
wrong code that try to read four bytes.

static slobidx_t slob_units(slob_t *s) 
{
  if (s->units > 0)
    return s->units;
  return 1;
}

s->units is defined as two bytes in this setup.

Wrongly generated code for this part.

'mov 0x0(%rbp), %ebp'

%ebp is four bytes.

I guess that this wrong four bytes read cross over the valid memory
boundary and this issue happend.

Proper code (two bytes read) is generated if different version of gcc
is used.

If someone knows related compiler people, please Ccing.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-17  7:33     ` Joonsoo Kim
@ 2017-10-17  7:50       ` Thomas Gleixner
  2017-10-18  7:31         ` Joonsoo Kim
  2017-10-18 10:40       ` Linus Torvalds
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2017-10-17  7:50 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, LKML,
	lkp, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton,
	Christoph Lameter

On Tue, 17 Oct 2017, Joonsoo Kim wrote:
> On Wed, Oct 11, 2017 at 12:01:20PM -0500, Josh Poimboeuf wrote:
> > > Looking at the panic, the code in slob_free() was:
> > > 
> > >    0:	e8 8d f7 ff ff       	callq  0xfffffffffffff792
> > >    5:	48 ff 05 c9 8c 91 02 	incq   0x2918cc9(%rip)        # 0x2918cd5
> > >    c:	85 c0                	test   %eax,%eax
> > >    e:	75 51                	jne    0x61
> > >   10:	49 0f bf c5          	movswq %r13w,%rax
> > >   14:	48 ff 05 c2 8c 91 02 	incq   0x2918cc2(%rip)        # 0x2918cdd
> > >   1b:	48 8d 3c 43          	lea    (%rbx,%rax,2),%rdi
> > >   1f:	48 39 ef             	cmp    %rbp,%rdi
> > >   22:	75 3d                	jne    0x61
> > >   24:	48 ff 05 ba 8c 91 02 	incq   0x2918cba(%rip)        # 0x2918ce5
> > >   2b:*	8b 6d 00             	mov    0x0(%rbp),%ebp		<-- trapping instruction
> > >   2e:	66 85 ed             	test   %bp,%bp
> > >   31:	7e 09                	jle    0x3c
> > >   33:	48 ff 05 b3 8c 91 02 	incq   0x2918cb3(%rip)        # 0x2918ced
> > >   3a:	eb 05                	jmp    0x41
> > >   3c:	bd                   	.byte 0xbd
> > >   3d:	01 00                	add    %eax,(%rax)
> > > 
> > > The slob_free() code tried to read four bytes at ffff88001c4afffe, and
> > > ended up reading past the page into a bad area.  I think the bad address
> > > (ffff88001c4afffe) was returned from slob_next() and it panicked trying
> > > to read s->units in slob_units().
> 
> Hello,
> 
> It looks like a compiler bug. The code of slob_units() try to read two
> bytes at ffff88001c4afffe. It's valid. But the compiler generates
> wrong code that try to read four bytes.
> 
> static slobidx_t slob_units(slob_t *s) 
> {
>   if (s->units > 0)
>     return s->units;
>   return 1;
> }
> 
> s->units is defined as two bytes in this setup.
> 
> Wrongly generated code for this part.
> 
> 'mov 0x0(%rbp), %ebp'
> 
> %ebp is four bytes.
> 
> I guess that this wrong four bytes read cross over the valid memory
> boundary and this issue happend.
> 
> Proper code (two bytes read) is generated if different version of gcc
> is used.

Which version fails to generate proper code and which versions work?

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-17  7:50       ` Thomas Gleixner
@ 2017-10-18  7:31         ` Joonsoo Kim
  0 siblings, 0 replies; 21+ messages in thread
From: Joonsoo Kim @ 2017-10-18  7:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jiri Slaby, Linus Torvalds, Mike Galbraith, Peter Zijlstra, LKML,
	lkp, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton,
	Christoph Lameter

On Tue, Oct 17, 2017 at 09:50:04AM +0200, Thomas Gleixner wrote:
> On Tue, 17 Oct 2017, Joonsoo Kim wrote:
> > On Wed, Oct 11, 2017 at 12:01:20PM -0500, Josh Poimboeuf wrote:
> > > > Looking at the panic, the code in slob_free() was:
> > > > 
> > > >    0:	e8 8d f7 ff ff       	callq  0xfffffffffffff792
> > > >    5:	48 ff 05 c9 8c 91 02 	incq   0x2918cc9(%rip)        # 0x2918cd5
> > > >    c:	85 c0                	test   %eax,%eax
> > > >    e:	75 51                	jne    0x61
> > > >   10:	49 0f bf c5          	movswq %r13w,%rax
> > > >   14:	48 ff 05 c2 8c 91 02 	incq   0x2918cc2(%rip)        # 0x2918cdd
> > > >   1b:	48 8d 3c 43          	lea    (%rbx,%rax,2),%rdi
> > > >   1f:	48 39 ef             	cmp    %rbp,%rdi
> > > >   22:	75 3d                	jne    0x61
> > > >   24:	48 ff 05 ba 8c 91 02 	incq   0x2918cba(%rip)        # 0x2918ce5
> > > >   2b:*	8b 6d 00             	mov    0x0(%rbp),%ebp		<-- trapping instruction
> > > >   2e:	66 85 ed             	test   %bp,%bp
> > > >   31:	7e 09                	jle    0x3c
> > > >   33:	48 ff 05 b3 8c 91 02 	incq   0x2918cb3(%rip)        # 0x2918ced
> > > >   3a:	eb 05                	jmp    0x41
> > > >   3c:	bd                   	.byte 0xbd
> > > >   3d:	01 00                	add    %eax,(%rax)
> > > > 
> > > > The slob_free() code tried to read four bytes at ffff88001c4afffe, and
> > > > ended up reading past the page into a bad area.  I think the bad address
> > > > (ffff88001c4afffe) was returned from slob_next() and it panicked trying
> > > > to read s->units in slob_units().
> > 
> > Hello,
> > 
> > It looks like a compiler bug. The code of slob_units() try to read two
> > bytes at ffff88001c4afffe. It's valid. But the compiler generates
> > wrong code that try to read four bytes.
> > 
> > static slobidx_t slob_units(slob_t *s) 
> > {
> >   if (s->units > 0)
> >     return s->units;
> >   return 1;
> > }
> > 
> > s->units is defined as two bytes in this setup.
> > 
> > Wrongly generated code for this part.
> > 
> > 'mov 0x0(%rbp), %ebp'
> > 
> > %ebp is four bytes.
> > 
> > I guess that this wrong four bytes read cross over the valid memory
> > boundary and this issue happend.
> > 
> > Proper code (two bytes read) is generated if different version of gcc
> > is used.
> 
> Which version fails to generate proper code and which versions work?
> 

gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and
the latest version works fine.

I guess that this problem is related to the corner case of some
optimization feature since minor code change makes the result
different. And, with -O2, proper code is generated even if gcc 4.8 is
used.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-17  7:33     ` Joonsoo Kim
  2017-10-17  7:50       ` Thomas Gleixner
@ 2017-10-18 10:40       ` Linus Torvalds
  2017-10-18 13:15         ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2017-10-18 10:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Josh Poimboeuf, kernel test robot, Ingo Molnar, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jiri Slaby, Mike Galbraith, Peter Zijlstra, Thomas Gleixner, LKML,
	LKP, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton,
	Christoph Lameter

On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>
> It looks like a compiler bug. The code of slob_units() try to read two
> bytes at ffff88001c4afffe. It's valid. But the compiler generates
> wrong code that try to read four bytes.
>
> static slobidx_t slob_units(slob_t *s)
> {
>   if (s->units > 0)
>     return s->units;
>   return 1;
> }
>
> s->units is defined as two bytes in this setup.
>
> Wrongly generated code for this part.
>
> 'mov 0x0(%rbp), %ebp'
>
> %ebp is four bytes.
>
> I guess that this wrong four bytes read cross over the valid memory
> boundary and this issue happend.

Hmm. I can see why the compiler would do that (16-bit accesses are
slow), but it's definitely wrong.

Does it work ok if that slob_units() code is written as

  static slobidx_t slob_units(slob_t *s)
  {
     int units = READ_ONCE(s->units);

     if (units > 0)
         return units;
     return 1;
  }

which might be an acceptable workaround for now?

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-18 10:40       ` Linus Torvalds
@ 2017-10-18 13:15         ` Thomas Gleixner
  2017-10-19  2:14           ` Joonsoo Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2017-10-18 13:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joonsoo Kim, Josh Poimboeuf, kernel test robot, Ingo Molnar,
	Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, LKML,
	LKP, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton,
	Christoph Lameter

On Wed, 18 Oct 2017, Linus Torvalds wrote:
> On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >
> > It looks like a compiler bug. The code of slob_units() try to read two
> > bytes at ffff88001c4afffe. It's valid. But the compiler generates
> > wrong code that try to read four bytes.
> >
> > static slobidx_t slob_units(slob_t *s)
> > {
> >   if (s->units > 0)
> >     return s->units;
> >   return 1;
> > }
> >
> > s->units is defined as two bytes in this setup.
> >
> > Wrongly generated code for this part.
> >
> > 'mov 0x0(%rbp), %ebp'
> >
> > %ebp is four bytes.
> >
> > I guess that this wrong four bytes read cross over the valid memory
> > boundary and this issue happend.
> 
> Hmm. I can see why the compiler would do that (16-bit accesses are
> slow), but it's definitely wrong.
> 
> Does it work ok if that slob_units() code is written as
> 
>   static slobidx_t slob_units(slob_t *s)
>   {
>      int units = READ_ONCE(s->units);
> 
>      if (units > 0)
>          return units;
>      return 1;
>   }
> 
> which might be an acceptable workaround for now?

Discussed exactly that with Peter Zijlstra yesterday, but we came to the
conclusion that this is a whack a mole game. It might fix this slob issue,
but what guarantees that we don't have the same problem in some other
place? Just duct taping this particular instance makes me nervous.

Joonsoo says:

> gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and
> the latest version works fine.

> I guess that this problem is related to the corner case of some
> optimization feature since minor code change makes the result
> different. And, with -O2, proper code is generated even if gcc 4.8 is
> used.

So it would be useful to figure out which optimization bit is causing that
and blacklist it for the affected compiler versions.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel
  2017-10-18 13:15         ` Thomas Gleixner
@ 2017-10-19  2:14           ` Joonsoo Kim
  0 siblings, 0 replies; 21+ messages in thread
From: Joonsoo Kim @ 2017-10-19  2:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Josh Poimboeuf, kernel test robot, Ingo Molnar,
	Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	H. Peter Anvin, Jiri Slaby, Mike Galbraith, Peter Zijlstra, LKML,
	LKP, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton,
	Christoph Lameter

On Wed, Oct 18, 2017 at 03:15:03PM +0200, Thomas Gleixner wrote:
> On Wed, 18 Oct 2017, Linus Torvalds wrote:
> > On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > >
> > > It looks like a compiler bug. The code of slob_units() try to read two
> > > bytes at ffff88001c4afffe. It's valid. But the compiler generates
> > > wrong code that try to read four bytes.
> > >
> > > static slobidx_t slob_units(slob_t *s)
> > > {
> > >   if (s->units > 0)
> > >     return s->units;
> > >   return 1;
> > > }
> > >
> > > s->units is defined as two bytes in this setup.
> > >
> > > Wrongly generated code for this part.
> > >
> > > 'mov 0x0(%rbp), %ebp'
> > >
> > > %ebp is four bytes.
> > >
> > > I guess that this wrong four bytes read cross over the valid memory
> > > boundary and this issue happend.
> > 
> > Hmm. I can see why the compiler would do that (16-bit accesses are
> > slow), but it's definitely wrong.
> > 
> > Does it work ok if that slob_units() code is written as
> > 
> >   static slobidx_t slob_units(slob_t *s)
> >   {
> >      int units = READ_ONCE(s->units);
> > 
> >      if (units > 0)
> >          return units;
> >      return 1;
> >   }
> > 
> > which might be an acceptable workaround for now?
> 
> Discussed exactly that with Peter Zijlstra yesterday, but we came to the
> conclusion that this is a whack a mole game. It might fix this slob issue,
> but what guarantees that we don't have the same problem in some other
> place? Just duct taping this particular instance makes me nervous.

I have checked that above patch works fine but I agree with Thomas.

> Joonsoo says:
> 
> > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and
> > the latest version works fine.
> 
> > I guess that this problem is related to the corner case of some
> > optimization feature since minor code change makes the result
> > different. And, with -O2, proper code is generated even if gcc 4.8 is
> > used.
> 
> So it would be useful to figure out which optimization bit is causing that
> and blacklist it for the affected compiler versions.

I have tried it but cannot find any clue. What I did is that compiling
with -O2 and disabling some options to make option list as same as
-Os. Some guide line is roughly mentioned in gcc man page. However, I
cannot reproduce the issue by this way.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-10-19  2:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20171010121513.GC5445@yexl-desktop>
2017-10-11  2:31 ` [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel Josh Poimboeuf
2017-10-11 17:01   ` Josh Poimboeuf
2017-10-12 17:05     ` Christopher Lameter
2017-10-12 17:54       ` Linus Torvalds
2017-10-12 18:48         ` Andrew Morton
2017-10-12 19:19           ` Geert Uytterhoeven
2017-10-12 17:54       ` Linus Torvalds
2017-10-13  4:45       ` Josh Poimboeuf
2017-10-13 13:56         ` Andrey Ryabinin
2017-10-13 16:19           ` Josh Poimboeuf
2017-10-13 19:09           ` Linus Torvalds
2017-10-13 20:01             ` Andy Lutomirski
2017-10-13 20:17             ` Jeffrey Walton
2017-10-13 15:22         ` Christopher Lameter
2017-10-13 15:37           ` Josh Poimboeuf
2017-10-17  7:33     ` Joonsoo Kim
2017-10-17  7:50       ` Thomas Gleixner
2017-10-18  7:31         ` Joonsoo Kim
2017-10-18 10:40       ` Linus Torvalds
2017-10-18 13:15         ` Thomas Gleixner
2017-10-19  2:14           ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).