From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Kuniyuki Iwashima <kuniyu@amazon.com>,
x86@kernel.org, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: WARNING in lmce_supported() during reboot.
Date: Sat, 26 Oct 2024 10:26:15 +1100 [thread overview]
Message-ID: <ab54f94827d200ac8a05b4ee180895b0cbd55014.camel@kernel.crashing.org> (raw)
In-Reply-To: <20241025231320.45417-1-kuniyu@amazon.com>
On Fri, 2024-10-25 at 16:13 -0700, Kuniyuki Iwashima wrote:
> Hello x86 maintainers,
>
> We have seen the splat below few times when just rebooting hosts.
>
> It rarely happens and seems a timing related, so we don't have a
> reproducer.
>
> Our kernel source in the splat is here,
> https://github.com/amazonlinux/linux/tree/kernel-6.1.61-85.141.amzn2023
>
> and the triggered WARN_ON_ONCE() in lmce_supported() is here.
> https://github.com/amazonlinux/linux/blob/kernel-6.1.61-85.141.amzn2023/arch/x86/kernel/cpu/mce/intel.c#L124
(switching to my lkml/spam friendly email)
I also hit it with 6.1.112-122.189.amzn2023.x86_64
Cheers,
Ben.
> Do you have any hint ?
>
> Thanks in advance.
>
>
> ACPI: PM: Preparing to enter system sleep state S5
> reboot: Restarting system
> reboot: machine restart
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mce/intel.c:124
> lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99)
> Modules linked in: ib_core binfmt_misc ext4 crc16 mbcache jbd2 sunrpc
> mousedev atkbd psmouse ghash_clmulni_intel vivaldi_fmap libps2
> aesni_intel crypto_simd cryptd i8042 serio ena button sch_fq_codel
> dm_mod fuse configfs dax loop dmi_sysfs simpledrm drm_shmem_helper
> drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect
> sysimgblt fb_sys_fops cfbcopyarea drm i2c_core
> drm_panel_orientation_quirks backlight fb crc32_pclmul crc32c_intel
> fbdev efivarfs
> Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99)
> Code: 81 fb 00 00 00 09 75 da b9 3a 00 00 00 0f 32 48 c1 e2 20 48 09
> c2 48 89 d3 66 90 48 89 d8 48 c1 e8 14 83 e0 01 83 e3 01 75 ba <0f>
> 0b 31 c0 eb b4 31 d2 48 89 de bf 3a 00 00 00 e8 6b e6 57 00 eb
> All code
> ========
> 0: 81 fb 00 00 00 09 cmp $0x9000000,%ebx
> 6: 75 da jne 0xffffffffffffffe2
> 8: b9 3a 00 00 00 mov $0x3a,%ecx
> d: 0f 32 rdmsr
> f: 48 c1 e2 20 shl $0x20,%rdx
> 13: 48 09 c2 or %rax,%rdx
> 16: 48 89 d3 mov %rdx,%rbx
> 19: 66 90 xchg %ax,%ax
> 1b: 48 89 d8 mov %rbx,%rax
> 1e: 48 c1 e8 14 shr $0x14,%rax
> 22: 83 e0 01 and $0x1,%eax
> 25: 83 e3 01 and $0x1,%ebx
> 28: 75 ba jne 0xffffffffffffffe4
> 2a:* 0f 0b ud2 <-- trapping
> instruction
> 2c: 31 c0 xor %eax,%eax
> 2e: eb b4 jmp 0xffffffffffffffe4
> 30: 31 d2 xor %edx,%edx
> 32: 48 89 de mov %rbx,%rsi
> 35: bf 3a 00 00 00 mov $0x3a,%edi
> 3a: e8 6b e6 57 00 call 0x57e6aa
> 3f: eb .byte 0xeb
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 31 c0 xor %eax,%eax
> 4: eb b4 jmp 0xffffffffffffffba
> 6: 31 d2 xor %edx,%edx
> 8: 48 89 de mov %rbx,%rsi
> b: bf 3a 00 00 00 mov $0x3a,%edi
> 10: e8 6b e6 57 00 call 0x57e680
> 15: eb .byte 0xeb
> RSP: 0018:ffffa18f00154fb8 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000003a
> RDX: 0000000000000000 RSI: 00000000000000ff RDI: ffff965cfe2599c0
> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffa18f00154ff8 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff965cfe240000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8485dfba30 CR3: 0000000389a10003 CR4: 00000000007706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <IRQ>
> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
> ? mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465
> arch/x86/kernel/cpu/mce/intel.c:502)
> ? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99)
> ? __warn (kernel/panic.c:672)
> ? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99)
> ? report_bug (lib/bug.c:201 lib/bug.c:219)
> ? handle_bug (arch/x86/kernel/traps.c:324)
> ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
> ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> ? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99)
> ? clear_local_APIC (./arch/x86/include/asm/apic.h:393
> arch/x86/kernel/apic/apic.c:1192)
> mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465
> arch/x86/kernel/cpu/mce/intel.c:502)
> stop_this_cpu (arch/x86/kernel/process.c:780)
> __sysvec_reboot (arch/x86/kernel/smp.c:140)
> sysvec_reboot (arch/x86/kernel/smp.c:136 (discriminator 14))
> </IRQ>
> <TASK>
> asm_sysvec_reboot (./arch/x86/include/asm/idtentry.h:656)
> RIP: 0010:acpi_idle_do_entry (./arch/x86/include/asm/irqflags.h:40
> ./arch/x86/include/asm/irqflags.h:75
> drivers/acpi/processor_idle.c:113 drivers/acpi/processor_idle.c:572)
> Code: 75 08 48 8b 15 b1 81 df 02 ed c3 cc cc cc cc 65 48 8b 04 25 00
> ff 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 58 c8 6a 00 fb f4 <fa>
> c3 cc cc cc cc e9 01 fc ff ff 90 0f 1f 44 00 00 41 56 41 55 41
> All code
> ========
> 0: 75 08 jne 0xa
> 2: 48 8b 15 b1 81 df 02 mov 0x2df81b1(%rip),%rdx #
> 0x2df81ba
> 9: ed in (%dx),%eax
> a: c3 ret
> b: cc int3
> c: cc int3
> d: cc int3
> e: cc int3
> f: 65 48 8b 04 25 00 ff mov %gs:0x1ff00,%rax
> 16: 01 00
> 18: 48 8b 00 mov (%rax),%rax
> 1b: a8 08 test $0x8,%al
> 1d: 75 eb jne 0xa
> 1f: 66 90 xchg %ax,%ax
> 21: 0f 00 2d 58 c8 6a 00 verw 0x6ac858(%rip) #
> 0x6ac880
> 28: fb sti
> 29: f4 hlt
> 2a:* fa cli <-- trapping
> instruction
> 2b: c3 ret
> 2c: cc int3
> 2d: cc int3
> 2e: cc int3
> 2f: cc int3
> 30: e9 01 fc ff ff jmp 0xfffffffffffffc36
> 35: 90 nop
> 36: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 3b: 41 56 push %r14
> 3d: 41 55 push %r13
> 3f: 41 rex.B
>
> Code starting with the faulting instruction
> ===========================================
> 0: fa cli
> 1: c3 ret
> 2: cc int3
> 3: cc int3
> 4: cc int3
> 5: cc int3
> 6: e9 01 fc ff ff jmp 0xfffffffffffffc0c
> b: 90 nop
> c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 11: 41 56 push %r14
> 13: 41 55 push %r13
> 15: 41 rex.B
> RSP: 0018:ffffa18f000afe70 EFLAGS: 00000246
> RAX: 0000000000004000 RBX: ffff965603d92400 RCX: 4000000000000000
> RDX: ffff965cfe240000 RSI: ffff965601478800 RDI: ffff965601478864
> RBP: 0000000000000001 R08: ffffffffb62182c0 R09: 0000000000000000
> R10: 0000000000002703 R11: 000000000001993d R12: 0000000000000001
> R13: ffffffffb6218340 R14: 0000000000000001 R15: 0000000000000000
> acpi_idle_enter (drivers/acpi/processor_idle.c:711 (discriminator 3))
> cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239)
> cpuidle_enter (drivers/cpuidle/cpuidle.c:358)
> cpuidle_idle_call (kernel/sched/idle.c:240)
> do_idle (kernel/sched/idle.c:305)
> cpu_startup_entry (kernel/sched/idle.c:400 (discriminator 1))
> start_secondary (arch/x86/kernel/smpboot.c:215
> arch/x86/kernel/smpboot.c:249)
> secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
> </TASK>
> ---[ end trace 0000000000000000 ]---
next prev parent reply other threads:[~2024-10-25 23:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 23:13 WARNING in lmce_supported() during reboot Kuniyuki Iwashima
2024-10-25 23:26 ` Benjamin Herrenschmidt [this message]
2024-10-25 23:57 ` Luck, Tony
2024-10-25 23:58 ` Dave Hansen
2024-10-26 2:33 ` Benjamin Herrenschmidt
2024-10-28 15:46 ` Dave Hansen
2024-10-30 2:26 ` Zhuo, Qiuxu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab54f94827d200ac8a05b4ee180895b0cbd55014.camel@kernel.crashing.org \
--to=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kuniyu@amazon.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox