public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* WARNING in lmce_supported() during reboot.
@ 2024-10-25 23:13 Kuniyuki Iwashima
  2024-10-25 23:26 ` Benjamin Herrenschmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-25 23:13 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
	Dave Hansen, H. Peter Anvin, Benjamin Herrenschmidt,
	Kuniyuki Iwashima

Hello x86 maintainers,

We have seen the splat below few times when just rebooting hosts.

It rarely happens and seems a timing related, so we don't have a
reproducer.

Our kernel source in the splat is here,
https://github.com/amazonlinux/linux/tree/kernel-6.1.61-85.141.amzn2023

and the triggered WARN_ON_ONCE() in lmce_supported() is here.
https://github.com/amazonlinux/linux/blob/kernel-6.1.61-85.141.amzn2023/arch/x86/kernel/cpu/mce/intel.c#L124

Do you have any hint ?

Thanks in advance.


ACPI: PM: Preparing to enter system sleep state S5
reboot: Restarting system
reboot: machine restart
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mce/intel.c:124 lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
Modules linked in: ib_core binfmt_misc ext4 crc16 mbcache jbd2 sunrpc mousedev atkbd psmouse ghash_clmulni_intel vivaldi_fmap libps2 aesni_intel crypto_simd cryptd i8042 serio ena button sch_fq_codel dm_mod fuse configfs dax loop dmi_sysfs simpledrm drm_shmem_helper drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm i2c_core drm_panel_orientation_quirks backlight fb crc32_pclmul crc32c_intel fbdev efivarfs
Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
Code: 81 fb 00 00 00 09 75 da b9 3a 00 00 00 0f 32 48 c1 e2 20 48 09 c2 48 89 d3 66 90 48 89 d8 48 c1 e8 14 83 e0 01 83 e3 01 75 ba <0f> 0b 31 c0 eb b4 31 d2 48 89 de bf 3a 00 00 00 e8 6b e6 57 00 eb
All code
========
   0:	81 fb 00 00 00 09    	cmp    $0x9000000,%ebx
   6:	75 da                	jne    0xffffffffffffffe2
   8:	b9 3a 00 00 00       	mov    $0x3a,%ecx
   d:	0f 32                	rdmsr
   f:	48 c1 e2 20          	shl    $0x20,%rdx
  13:	48 09 c2             	or     %rax,%rdx
  16:	48 89 d3             	mov    %rdx,%rbx
  19:	66 90                	xchg   %ax,%ax
  1b:	48 89 d8             	mov    %rbx,%rax
  1e:	48 c1 e8 14          	shr    $0x14,%rax
  22:	83 e0 01             	and    $0x1,%eax
  25:	83 e3 01             	and    $0x1,%ebx
  28:	75 ba                	jne    0xffffffffffffffe4
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb b4                	jmp    0xffffffffffffffe4
  30:	31 d2                	xor    %edx,%edx
  32:	48 89 de             	mov    %rbx,%rsi
  35:	bf 3a 00 00 00       	mov    $0x3a,%edi
  3a:	e8 6b e6 57 00       	call   0x57e6aa
  3f:	eb                   	.byte 0xeb

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb b4                	jmp    0xffffffffffffffba
   6:	31 d2                	xor    %edx,%edx
   8:	48 89 de             	mov    %rbx,%rsi
   b:	bf 3a 00 00 00       	mov    $0x3a,%edi
  10:	e8 6b e6 57 00       	call   0x57e680
  15:	eb                   	.byte 0xeb
RSP: 0018:ffffa18f00154fb8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000003a
RDX: 0000000000000000 RSI: 00000000000000ff RDI: ffff965cfe2599c0
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffa18f00154ff8 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff965cfe240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8485dfba30 CR3: 0000000389a10003 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) 
? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) 
? mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465 arch/x86/kernel/cpu/mce/intel.c:502) 
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
? __warn (kernel/panic.c:672) 
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
? report_bug (lib/bug.c:201 lib/bug.c:219) 
? handle_bug (arch/x86/kernel/traps.c:324) 
? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1)) 
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568) 
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
? clear_local_APIC (./arch/x86/include/asm/apic.h:393 arch/x86/kernel/apic/apic.c:1192) 
mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465 arch/x86/kernel/cpu/mce/intel.c:502) 
stop_this_cpu (arch/x86/kernel/process.c:780) 
__sysvec_reboot (arch/x86/kernel/smp.c:140) 
sysvec_reboot (arch/x86/kernel/smp.c:136 (discriminator 14)) 
</IRQ>
<TASK>
asm_sysvec_reboot (./arch/x86/include/asm/idtentry.h:656) 
RIP: 0010:acpi_idle_do_entry (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 drivers/acpi/processor_idle.c:113 drivers/acpi/processor_idle.c:572) 
Code: 75 08 48 8b 15 b1 81 df 02 ed c3 cc cc cc cc 65 48 8b 04 25 00 ff 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 58 c8 6a 00 fb f4 <fa> c3 cc cc cc cc e9 01 fc ff ff 90 0f 1f 44 00 00 41 56 41 55 41
All code
========
   0:	75 08                	jne    0xa
   2:	48 8b 15 b1 81 df 02 	mov    0x2df81b1(%rip),%rdx        # 0x2df81ba
   9:	ed                   	in     (%dx),%eax
   a:	c3                   	ret
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	65 48 8b 04 25 00 ff 	mov    %gs:0x1ff00,%rax
  16:	01 00 
  18:	48 8b 00             	mov    (%rax),%rax
  1b:	a8 08                	test   $0x8,%al
  1d:	75 eb                	jne    0xa
  1f:	66 90                	xchg   %ax,%ax
  21:	0f 00 2d 58 c8 6a 00 	verw   0x6ac858(%rip)        # 0x6ac880
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli		<-- trapping instruction
  2b:	c3                   	ret
  2c:	cc                   	int3
  2d:	cc                   	int3
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	e9 01 fc ff ff       	jmp    0xfffffffffffffc36
  35:	90                   	nop
  36:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3b:	41 56                	push   %r14
  3d:	41 55                	push   %r13
  3f:	41                   	rex.B

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	c3                   	ret
   2:	cc                   	int3
   3:	cc                   	int3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	e9 01 fc ff ff       	jmp    0xfffffffffffffc0c
   b:	90                   	nop
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	41 56                	push   %r14
  13:	41 55                	push   %r13
  15:	41                   	rex.B
RSP: 0018:ffffa18f000afe70 EFLAGS: 00000246
RAX: 0000000000004000 RBX: ffff965603d92400 RCX: 4000000000000000
RDX: ffff965cfe240000 RSI: ffff965601478800 RDI: ffff965601478864
RBP: 0000000000000001 R08: ffffffffb62182c0 R09: 0000000000000000
R10: 0000000000002703 R11: 000000000001993d R12: 0000000000000001
R13: ffffffffb6218340 R14: 0000000000000001 R15: 0000000000000000
acpi_idle_enter (drivers/acpi/processor_idle.c:711 (discriminator 3)) 
cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239) 
cpuidle_enter (drivers/cpuidle/cpuidle.c:358) 
cpuidle_idle_call (kernel/sched/idle.c:240) 
do_idle (kernel/sched/idle.c:305) 
cpu_startup_entry (kernel/sched/idle.c:400 (discriminator 1)) 
start_secondary (arch/x86/kernel/smpboot.c:215 arch/x86/kernel/smpboot.c:249) 
secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358) 
</TASK>
---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-30  2:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25 23:13 WARNING in lmce_supported() during reboot Kuniyuki Iwashima
2024-10-25 23:26 ` Benjamin Herrenschmidt
2024-10-25 23:57 ` Luck, Tony
2024-10-25 23:58 ` Dave Hansen
2024-10-26  2:33   ` Benjamin Herrenschmidt
2024-10-28 15:46     ` Dave Hansen
2024-10-30  2:26     ` Zhuo, Qiuxu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox