* [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]
@ 2024-03-28 7:36 kernel test robot
2024-03-28 21:17 ` Pawan Gupta
0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-03-28 7:36 UTC (permalink / raw)
To: Pawan Gupta; +Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm, oliver.sang
Hello,
we reported a performance issue for this commit in
https://lore.kernel.org/all/202403041300.a7fb1462-yujie.liu@intel.com/
now we noticed a persistent crash issue:
a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:100 99% 100:100 dmesg.EIP:restore_all_switch_stack
:100 99% 100:100 dmesg.Kernel_panic-not_syncing:Fatal_exception
:100 99% 100:100 dmesg.general_protection_fault:#[##]
below details FYI.
kernel test robot noticed "general_protection_fault:#[##]" on:
commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master 70293240c5ce675a67bfc48f419b093023b862b3]
[test failed on linux-next/master 13ee4a7161b6fd938aef6688ff43b163f6d83e37]
in testcase: trinity
version:
with following parameters:
runtime: 600s
compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com
[ 25.175767][ T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
[ 25.245597][ T669] general protection fault: 0000 [#1] PREEMPT SMP
[ 25.246417][ T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
[ 25.247743][ T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957)
[ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
All code
========
0: 4c 24 10 rex.WR and $0x10,%al
3: 36 89 48 fc ss mov %ecx,-0x4(%rax)
7: 8b 4c 24 0c mov 0xc(%rsp),%ecx
b: 81 e1 ff ff 00 00 and $0xffff,%ecx
11: 36 89 48 f8 ss mov %ecx,-0x8(%rax)
15: 8b 4c 24 08 mov 0x8(%rsp),%ecx
19: 36 89 48 f4 ss mov %ecx,-0xc(%rax)
1d: 8b 4c 24 04 mov 0x4(%rsp),%ecx
21: 36 89 48 f0 ss mov %ecx,-0x10(%rax)
25: 59 pop %rcx
26: 8d 60 f0 lea -0x10(%rax),%esp
29: 58 pop %rax
2a:* 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59431 <-- trapping instruction
31: cf iret
32: 6a 00 push $0x0
34: 68 88 6b d4 c1 push $0xffffffffc1d46b88
39: eb 00 jmp 0x3b
3b: fc cld
3c: 0f a0 push %fs
3e: 50 push %rax
3f: b8 .byte 0xb8
Code starting with the faulting instruction
===========================================
0: 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59407
7: cf iret
8: 6a 00 push $0x0
a: 68 88 6b d4 c1 push $0xffffffffc1d46b88
f: eb 00 jmp 0x11
11: fc cld
12: 0f a0 push %fs
14: 50 push %rax
15: b8 .byte 0xb8
[ 25.251494][ T669] EAX: 00000000 EBX: 000001a0 ECX: 000001a1 EDX: 00000000
[ 25.252271][ T669] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ffa2efdc
[ 25.253037][ T669] DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
[ 25.253892][ T669] CR0: 80050033 CR2: b7dabd6e CR3: 2cc341c0 CR4: 000406b0
[ 25.254655][ T669] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 25.255413][ T669] DR6: fffe0ff0 DR7: 00000400
[ 25.255952][ T669] Call Trace:
[ 25.256376][ T669] ? __die_body (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:478 kbuild/src/consumer/arch/x86/kernel/dumpstack.c:420)
[ 25.256907][ T669] ? die_addr (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:?)
[ 25.257411][ T669] ? exc_general_protection (kbuild/src/consumer/arch/x86/kernel/traps.c:698)
[ 25.258067][ T669] ? __entry_text_start (??:?)
[ 25.258691][ T669] ? irqentry_exit_to_user_mode (kbuild/src/consumer/kernel/entry/common.c:228)
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240328/202403281553.79f5a16f-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] 2024-03-28 7:36 [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] kernel test robot @ 2024-03-28 21:17 ` Pawan Gupta 2024-04-14 6:41 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 1 reply; 4+ messages in thread From: Pawan Gupta @ 2024-03-28 21:17 UTC (permalink / raw) To: kernel test robot; +Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote: > compiler: clang-17 > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com > > > [ 25.175767][ T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary. > [ 25.245597][ T669] general protection fault: 0000 [#1] PREEMPT SMP > [ 25.246417][ T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9 > [ 25.247743][ T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957) > [ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8 > All code > ======== > 0: 4c 24 10 rex.WR and $0x10,%al > 3: 36 89 48 fc ss mov %ecx,-0x4(%rax) > 7: 8b 4c 24 0c mov 0xc(%rsp),%ecx > b: 81 e1 ff ff 00 00 and $0xffff,%ecx > 11: 36 89 48 f8 ss mov %ecx,-0x8(%rax) > 15: 8b 4c 24 08 mov 0x8(%rsp),%ecx > 19: 36 89 48 f4 ss mov %ecx,-0xc(%rax) > 1d: 8b 4c 24 04 mov 0x4(%rsp),%ecx > 21: 36 89 48 f0 ss mov %ecx,-0x10(%rax) > 25: 59 pop %rcx > 26: 8d 60 f0 lea -0x10(%rax),%esp > 29: 58 pop %rax > 2a:* 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59431 <-- trapping instruction This is due to 64-bit addressing with CONFIG_X86_32=y on clang. I haven't tried with clang, but I don't see this happening with gcc-11: entry_INT80_32: ... <+446>: mov 0x4(%esp),%ecx <+450>: mov %ecx,%ss:-0x10(%eax) <+454>: pop %ecx <+455>: lea -0x10(%eax),%esp <+458>: pop %eax <+459>: verw 0xc1d5c700 <---------- <+466>: iret > 31: cf iret > 32: 6a 00 push $0x0 > 34: 68 88 6b d4 c1 push $0xffffffffc1d46b88 > 39: eb 00 jmp 0x3b ... The config has CONFIG_X86_32=y, but it is possible that in 32-bit build with clang, 64-bit mode expansion of "VERW (_ASM_RIP(addr))" is getting used i.e. __ASM_FORM_RAW(b) below: file: arch/x86/include/asm/asm.h ... #ifndef __x86_64__ /* 32 bit */ # define __ASM_SEL(a,b) __ASM_FORM(a) # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(a) #else /* 64 bit */ # define __ASM_SEL(a,b) __ASM_FORM(b) # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(b) <-------- #endif ... /* Adds a (%rip) suffix on 64 bits only; for immediate memory references */ #define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip)) Possibly __x86_64__ is being defined with clang even when CONFIG_X86_32=y. I am not sure about current level of 32-bit mode support in clang. This seems inconclusive: https://discourse.llvm.org/t/x86-32-bit-testing/65480 Does anyone care about 32-bit mode builds with clang? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] 2024-03-28 21:17 ` Pawan Gupta @ 2024-04-14 6:41 ` Linux regression tracking (Thorsten Leemhuis) 2024-04-17 18:54 ` Pawan Gupta 0 siblings, 1 reply; 4+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-04-14 6:41 UTC (permalink / raw) To: Pawan Gupta Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm, Linux kernel regressions list, kernel test robot Hi, Thorsten here, the Linux kernel's regression tracker. On 28.03.24 22:17, Pawan Gupta wrote: > On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote: >> compiler: clang-17 >> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G >> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of >> the same patch/commit), kindly add following tags >> | Reported-by: kernel test robot <oliver.sang@intel.com> >> | Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com TWIMC, a user report general protection faults with dosemu that were bisected to a 6.6.y backport of the commit that causes the problem discussed in this thread (6613d82e617dd7 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")). User compiles using gcc, so it might be a different problem. Happens with 6.8.y as well. The problem occurs with x86-32 kernels, but strangely only on some of the x86-32 systems the reporter has (e.g. on some everything works fine). Makes me wonder if the commit exposed an older problem that only happens on some machines. For details see https://bugzilla.kernel.org/show_bug.cgi?id=218707 Could not CC the reporter here due to the bugzilla privacy policy; if you want to get in contact, please use bugzilla. Ciao, Thorsten >> [ 25.175767][ T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary. >> [ 25.245597][ T669] general protection fault: 0000 [#1] PREEMPT SMP >> [ 25.246417][ T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9 >> [ 25.247743][ T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 >> [ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957) >> [ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8 >> All code >> ======== >> 0: 4c 24 10 rex.WR and $0x10,%al >> 3: 36 89 48 fc ss mov %ecx,-0x4(%rax) >> 7: 8b 4c 24 0c mov 0xc(%rsp),%ecx >> b: 81 e1 ff ff 00 00 and $0xffff,%ecx >> 11: 36 89 48 f8 ss mov %ecx,-0x8(%rax) >> 15: 8b 4c 24 08 mov 0x8(%rsp),%ecx >> 19: 36 89 48 f4 ss mov %ecx,-0xc(%rax) >> 1d: 8b 4c 24 04 mov 0x4(%rsp),%ecx >> 21: 36 89 48 f0 ss mov %ecx,-0x10(%rax) >> 25: 59 pop %rcx >> 26: 8d 60 f0 lea -0x10(%rax),%esp >> 29: 58 pop %rax >> 2a:* 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59431 <-- trapping instruction > > This is due to 64-bit addressing with CONFIG_X86_32=y on clang. > > I haven't tried with clang, but I don't see this happening with gcc-11: > > entry_INT80_32: > ... > <+446>: mov 0x4(%esp),%ecx > <+450>: mov %ecx,%ss:-0x10(%eax) > <+454>: pop %ecx > <+455>: lea -0x10(%eax),%esp > <+458>: pop %eax > <+459>: verw 0xc1d5c700 <---------- > <+466>: iret > >> 31: cf iret >> 32: 6a 00 push $0x0 >> 34: 68 88 6b d4 c1 push $0xffffffffc1d46b88 >> 39: eb 00 jmp 0x3b > ... > > The config has CONFIG_X86_32=y, but it is possible that in 32-bit build > with clang, 64-bit mode expansion of "VERW (_ASM_RIP(addr))" is getting > used i.e. __ASM_FORM_RAW(b) below: > > file: arch/x86/include/asm/asm.h > ... > #ifndef __x86_64__ > /* 32 bit */ > # define __ASM_SEL(a,b) __ASM_FORM(a) > # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(a) > #else > /* 64 bit */ > # define __ASM_SEL(a,b) __ASM_FORM(b) > # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(b) <-------- > #endif > ... > /* Adds a (%rip) suffix on 64 bits only; for immediate memory references */ > #define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip)) > > Possibly __x86_64__ is being defined with clang even when CONFIG_X86_32=y. > > I am not sure about current level of 32-bit mode support in clang. This > seems inconclusive: > > https://discourse.llvm.org/t/x86-32-bit-testing/65480 > > Does anyone care about 32-bit mode builds with clang? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] 2024-04-14 6:41 ` Linux regression tracking (Thorsten Leemhuis) @ 2024-04-17 18:54 ` Pawan Gupta 0 siblings, 0 replies; 4+ messages in thread From: Pawan Gupta @ 2024-04-17 18:54 UTC (permalink / raw) To: Linux regressions mailing list Cc: oe-lkp, lkp, linux-kernel, Dave Hansen, kvm, kernel test robot On Sun, Apr 14, 2024 at 08:41:52AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. > > On 28.03.24 22:17, Pawan Gupta wrote: > > On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote: > >> compiler: clang-17 > >> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > >> > >> If you fix the issue in a separate patch/commit (i.e. not just a new version of > >> the same patch/commit), kindly add following tags > >> | Reported-by: kernel test robot <oliver.sang@intel.com> > >> | Closes: https://lore.kernel.org/oe-lkp/202403281553.79f5a16f-lkp@intel.com > > TWIMC, a user report general protection faults with dosemu that were > bisected to a 6.6.y backport of the commit that causes the problem > discussed in this thread (6613d82e617dd7 ("x86/bugs: Use ALTERNATIVE() > instead of mds_user_clear static key")). > > User compiles using gcc, so it might be a different problem. Happens > with 6.8.y as well. > > The problem occurs with x86-32 kernels, but strangely only on some of > the x86-32 systems the reporter has (e.g. on some everything works > fine). Makes me wonder if the commit exposed an older problem that only > happens on some machines. > > For details see https://bugzilla.kernel.org/show_bug.cgi?id=218707 > Could not CC the reporter here due to the bugzilla privacy policy; if > you want to get in contact, please use bugzilla. Sorry for the late response, I was off work. I will look into this and get back. I might need help reproducing this issue, but let me first see if I can reproduce with the info in the bugzilla. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-17 18:54 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-28 7:36 [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##] kernel test robot 2024-03-28 21:17 ` Pawan Gupta 2024-04-14 6:41 ` Linux regression tracking (Thorsten Leemhuis) 2024-04-17 18:54 ` Pawan Gupta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox