* [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
@ 2023-08-29 7:59 kernel test robot
[not found] ` <DS7PR11MB59651DCF238053376F5927C88FE6A@DS7PR11MB5965.namprd11.prod.outlook.com>
0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2023-08-29 7:59 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: oe-lkp, lkp, oliver.sang
Hello,
kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
commit: 7f1f6ace4ce1c337bb7c10592ba522e8a91836e3 ("x86/microcode/32: Move early loading after paging enable")
https://git.kernel.org/cgit/linux/kernel/git/tglx/devel.git x86/microcode
in testcase: boot
compiler: clang-16
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+---------------------------------------------+------------+------------+
| | d2700f4067 | 7f1f6ace4c |
+---------------------------------------------+------------+------------+
| boot_successes | 13 | 0 |
| boot_failures | 0 | 12 |
| BUG:unable_to_handle_page_fault_for_address | 0 | 12 |
| Oops:#[##] | 0 | 12 |
| EIP:load_ucode_ap | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 12 |
+---------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308291502.2d535a18-oliver.sang@intel.com
[ 0.139017][ T0] BUG: unable to handle page fault for address: 027025c4
[ 0.139017][ T0] #PF: supervisor read access in kernel mode
[ 0.139017][ T0] #PF: error_code(0x0000) - not-present page
[ 0.139017][ T0] *pde = 00000000
[ 0.139017][ T0] Oops: 0000 [#1] PREEMPT SMP
[ 0.139017][ T0] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.5.0-rc3-00012-g7f1f6ace4ce1 #1 0b760317710af5cd94fc9e004982f482d14400de
[ 0.139017][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.139017][ T0] EIP: load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187)
[ 0.139017][ T0] Code: c2 a1 34 69 ce c2 eb c7 83 c4 04 5e 5f 5b 5d c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 cc 3e 8d 74 26 00 55 89 e5 53 57 56 <80> 3d c4 25 70 02 00 74 05 5e 5f 5b 5d c3 b8 01 00 00 00 31 c9 0f
All code
========
0: c2 a1 34 retq $0x34a1
3: 69 ce c2 eb c7 83 imul $0x83c7ebc2,%esi,%ecx
9: c4 (bad)
a: 04 5e add $0x5e,%al
c: 5f pop %rdi
d: 5b pop %rbx
e: 5d pop %rbp
f: c3 retq
10: 00 00 add %al,(%rax)
12: cc int3
13: cc int3
14: 00 00 add %al,(%rax)
16: cc int3
17: cc int3
18: 00 00 add %al,(%rax)
1a: cc int3
1b: cc int3
1c: 00 00 add %al,(%rax)
1e: cc int3
1f: 3e 8d 74 26 00 lea %ds:0x0(%rsi,%riz,1),%esi
24: 55 push %rbp
25: 89 e5 mov %esp,%ebp
27: 53 push %rbx
28: 57 push %rdi
29: 56 push %rsi
2a:* 80 3d c4 25 70 02 00 cmpb $0x0,0x27025c4(%rip) # 0x27025f5 <-- trapping instruction
31: 74 05 je 0x38
33: 5e pop %rsi
34: 5f pop %rdi
35: 5b pop %rbx
36: 5d pop %rbp
37: c3 retq
38: b8 01 00 00 00 mov $0x1,%eax
3d: 31 c9 xor %ecx,%ecx
3f: 0f .byte 0xf
Code starting with the faulting instruction
===========================================
0: 80 3d c4 25 70 02 00 cmpb $0x0,0x27025c4(%rip) # 0x27025cb
7: 74 05 je 0xe
9: 5e pop %rsi
a: 5f pop %rdi
b: 5b pop %rbx
c: 5d pop %rbp
d: c3 retq
e: b8 01 00 00 00 mov $0x1,%eax
13: 31 c9 xor %ecx,%ecx
15: 0f .byte 0xf
[ 0.139017][ T0] EAX: 00000000 EBX: 01020800 ECX: ff008983 EDX: ff000000
[ 0.139017][ T0] ESI: 00000000 EDI: 00000000 EBP: c405df9c ESP: c405df90
[ 0.139017][ T0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
[ 0.139017][ T0] CR0: 80050033 CR2: 027025c4 CR3: 02cdf000 CR4: 00000090
[ 0.139017][ T0] Call Trace:
[ 0.139017][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420)
[ 0.139017][ T0] ? __die (arch/x86/kernel/dumpstack.c:434)
[ 0.139017][ T0] ? page_fault_oops (arch/x86/mm/fault.c:703)
[ 0.139017][ T0] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:761)
[ 0.139017][ T0] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:817)
[ 0.139017][ T0] ? bad_area_nosemaphore (arch/x86/mm/fault.c:866)
[ 0.139017][ T0] ? do_user_addr_fault (arch/x86/mm/fault.c:?)
[ 0.139017][ T0] ? exc_page_fault (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542)
[ 0.139017][ T0] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1499)
[ 0.139017][ T0] ? handle_exception (init_task.c:?)
[ 0.139017][ T0] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1499)
[ 0.139017][ T0] ? load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187)
[ 0.139017][ T0] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1499)
[ 0.139017][ T0] ? load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187)
[ 0.139017][ T0] start_secondary (arch/x86/kernel/smpboot.c:283)
[ 0.139017][ T0] startup_32_smp (??:?)
[ 0.139017][ T0] Modules linked in:
[ 0.139017][ T0] CR2: 00000000027025c4
[ 0.139017][ T0] ---[ end trace 0000000000000000 ]---
[ 0.139017][ T0] EIP: load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187)
[ 0.139017][ T0] Code: c2 a1 34 69 ce c2 eb c7 83 c4 04 5e 5f 5b 5d c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 cc 3e 8d 74 26 00 55 89 e5 53 57 56 <80> 3d c4 25 70 02 00 74 05 5e 5f 5b 5d c3 b8 01 00 00 00 31 c9 0f
All code
========
0: c2 a1 34 retq $0x34a1
3: 69 ce c2 eb c7 83 imul $0x83c7ebc2,%esi,%ecx
9: c4 (bad)
a: 04 5e add $0x5e,%al
c: 5f pop %rdi
d: 5b pop %rbx
e: 5d pop %rbp
f: c3 retq
10: 00 00 add %al,(%rax)
12: cc int3
13: cc int3
14: 00 00 add %al,(%rax)
16: cc int3
17: cc int3
18: 00 00 add %al,(%rax)
1a: cc int3
1b: cc int3
1c: 00 00 add %al,(%rax)
1e: cc int3
1f: 3e 8d 74 26 00 lea %ds:0x0(%rsi,%riz,1),%esi
24: 55 push %rbp
25: 89 e5 mov %esp,%ebp
27: 53 push %rbx
28: 57 push %rdi
29: 56 push %rsi
2a:* 80 3d c4 25 70 02 00 cmpb $0x0,0x27025c4(%rip) # 0x27025f5 <-- trapping instruction
31: 74 05 je 0x38
33: 5e pop %rsi
34: 5f pop %rdi
35: 5b pop %rbx
36: 5d pop %rbp
37: c3 retq
38: b8 01 00 00 00 mov $0x1,%eax
3d: 31 c9 xor %ecx,%ecx
3f: 0f .byte 0xf
Code starting with the faulting instruction
===========================================
0: 80 3d c4 25 70 02 00 cmpb $0x0,0x27025c4(%rip) # 0x27025cb
7: 74 05 je 0xe
9: 5e pop %rsi
a: 5f pop %rdi
b: 5b pop %rbx
c: 5d pop %rbp
d: c3 retq
e: b8 01 00 00 00 mov $0x1,%eax
13: 31 c9 xor %ecx,%ecx
15: 0f .byte 0xf
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230829/202308291502.2d535a18-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
[not found] ` <DS7PR11MB59651DCF238053376F5927C88FE6A@DS7PR11MB5965.namprd11.prod.outlook.com>
@ 2023-08-30 14:04 ` Zhuo, Qiuxu
2023-08-30 17:08 ` Thomas Gleixner
0 siblings, 1 reply; 4+ messages in thread
From: Zhuo, Qiuxu @ 2023-08-30 14:04 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Du, Julie, oe-lkp@lists.linux.dev, lkp, Sang, Oliver
Hi Thomas,
> From: Sang, Oliver <oliver.sang@intel.com>
> Sent: Tuesday, August 29, 2023 4:00 PM
> To: Thomas Gleixner <tglx@linutronix.de>
> ...
> BUG:unable_to_handle_page_fault_for_address
> ...
> [ 0.139017][ T0] BUG: unable to handle page fault for address: 027025c4
> ...
> [ 0.139017][ T0] CR0: 80050033 CR2: 027025c4 CR3: 02cdf000 CR4:
The bit31 of CR0 was set which indicated that the paging was enabled as expected by your patch [1].
> ...
> [ 0.139017][ T0] EIP: load_ucode_ap
> (arch/x86/kernel/cpu/microcode/core.c:177
> arch/x86/kernel/cpu/microcode/core.c:187)
The fault EIP was due to accessing the following memory in the physical address:
check_loader_disabled_ap() -> return *((bool *)__pa_nodebug(&dis_ucode_ldr));
Your patch [1] moved the 32-bit early ucode update after paging was enabled.
So, seems like all the following places need memory access in the virtual address.
I can't reproduce the issue if I apply the following patch.
Please let me know if you want me to create a separate patch.
[1] 7f1f6ace4ce1 ("x86/microcode/32: Move early loading after paging enable")
Thanks!
-Qiuxu
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 95f5dcf87bcb..c4886552bb0d 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -86,17 +86,11 @@ static u32 final_levels[] = {
static bool amd_check_current_patch_level(void)
{
u32 lvl, dummy, i;
- u32 *levels;
native_rdmsr(MSR_AMD64_PATCH_LEVEL, lvl, dummy);
- if (IS_ENABLED(CONFIG_X86_32))
- levels = (u32 *)__pa_nodebug(&final_levels);
- else
- levels = final_levels;
-
- for (i = 0; levels[i]; i++) {
- if (lvl == levels[i])
+ for (i = 0; final_levels[i]; i++) {
+ if (lvl == final_levels[i])
return true;
}
return false;
@@ -104,36 +98,23 @@ static bool amd_check_current_patch_level(void)
static bool __init check_loader_disabled_bsp(void)
{
- static const char *__dis_opt_str = "dis_ucode_ldr";
-
-#ifdef CONFIG_X86_32
- const char *cmdline = (const char *)__pa_nodebug(boot_command_line);
- const char *option = (const char *)__pa_nodebug(__dis_opt_str);
- bool *res = (bool *)__pa_nodebug(&dis_ucode_ldr);
-
-#else /* CONFIG_X86_64 */
- const char *cmdline = boot_command_line;
- const char *option = __dis_opt_str;
- bool *res = &dis_ucode_ldr;
-#endif
-
/*
* CPUID(1).ECX[31]: reserved for hypervisor use. This is still not
* completely accurate as xen pv guests don't see that CPUID bit set but
* that's good enough as they don't land on the BSP path anyway.
*/
if (native_cpuid_ecx(1) & BIT(31))
- return *res;
+ return dis_ucode_ldr;
if (x86_cpuid_vendor() == X86_VENDOR_AMD) {
if (amd_check_current_patch_level())
- return *res;
+ return dis_ucode_ldr;
}
- if (cmdline_find_option_bool(cmdline, option) <= 0)
- *res = false;
+ if (cmdline_find_option_bool(boot_command_line, "dis_ucode_ldr") <= 0)
+ dis_ucode_ldr = false;
- return *res;
+ return dis_ucode_ldr;
}
void __init load_ucode_bsp(void)
@@ -173,11 +154,7 @@ void __init load_ucode_bsp(void)
static bool check_loader_disabled_ap(void)
{
-#ifdef CONFIG_X86_32
- return *((bool *)__pa_nodebug(&dis_ucode_ldr));
-#else
return dis_ucode_ldr;
-#endif
}
void load_ucode_ap(void)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
2023-08-30 14:04 ` Zhuo, Qiuxu
@ 2023-08-30 17:08 ` Thomas Gleixner
2023-08-31 2:38 ` Zhuo, Qiuxu
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2023-08-30 17:08 UTC (permalink / raw)
To: Zhuo, Qiuxu; +Cc: Du, Julie, oe-lkp@lists.linux.dev, lkp, Sang, Oliver
On Wed, Aug 30 2023 at 14:04, Qiuxu Zhuo wrote:
>> From: Sang, Oliver <oliver.sang@intel.com>
>> Sent: Tuesday, August 29, 2023 4:00 PM
>> To: Thomas Gleixner <tglx@linutronix.de>
>> ...
>> BUG:unable_to_handle_page_fault_for_address
>> ...
>> [ 0.139017][ T0] BUG: unable to handle page fault for address: 027025c4
>> ...
>> [ 0.139017][ T0] CR0: 80050033 CR2: 027025c4 CR3: 02cdf000 CR4:
>
> The bit31 of CR0 was set which indicated that the paging was enabled as expected by your patch [1].
>
>> ...
>> [ 0.139017][ T0] EIP: load_ucode_ap
>> (arch/x86/kernel/cpu/microcode/core.c:177
>> arch/x86/kernel/cpu/microcode/core.c:187)
>
> The fault EIP was due to accessing the following memory in the physical address:
> check_loader_disabled_ap() -> return *((bool *)__pa_nodebug(&dis_ucode_ldr));
>
> Your patch [1] moved the 32-bit early ucode update after paging was enabled.
> So, seems like all the following places need memory access in the virtual address.
>
> I can't reproduce the issue if I apply the following patch.
> Please let me know if you want me to create a separate patch.
>
> [1] 7f1f6ace4ce1 ("x86/microcode/32: Move early loading after paging enable")
Ooops. I completely forgot to reply to this report. I've fixed that up
today in the morning already and pushed out an updated version into that
x86/microcode branch. It's pretty much the same as you did +/- a detail.
Sorry for stealing your time.
Thanks,
Thomas
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
2023-08-30 17:08 ` Thomas Gleixner
@ 2023-08-31 2:38 ` Zhuo, Qiuxu
0 siblings, 0 replies; 4+ messages in thread
From: Zhuo, Qiuxu @ 2023-08-31 2:38 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Du, Julie, oe-lkp@lists.linux.dev, lkp, Sang, Oliver
> From: Thomas Gleixner <tglx@linutronix.de>
> ...
> Ooops. I completely forgot to reply to this report. I've fixed that up today in
> the morning already and pushed out an updated version into that
> x86/microcode branch. It's pretty much the same as you did +/- a detail.
Hi Thomas,
Thanks for your reply.
Re-tested your branch[1] with the top commit[2].
The issue is fixed (can't be reproduced).
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/microcode
[2] 85a3feda8043 ("x86/microcode/intel: Add a minimum required revision for late-loads")
-Qiuxu
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-31 2:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-29 7:59 [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address kernel test robot
[not found] ` <DS7PR11MB59651DCF238053376F5927C88FE6A@DS7PR11MB5965.namprd11.prod.outlook.com>
2023-08-30 14:04 ` Zhuo, Qiuxu
2023-08-30 17:08 ` Thomas Gleixner
2023-08-31 2:38 ` Zhuo, Qiuxu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.