[tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for

All of lore.kernel.org
 help / color / mirror / Atom feed

* [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
@ 2023-08-29  7:59 kernel test robot
       [not found] ` <DS7PR11MB59651DCF238053376F5927C88FE6A@DS7PR11MB5965.namprd11.prod.outlook.com>
  0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2023-08-29  7:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: oe-lkp, lkp, oliver.sang



Hello,

kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:

commit: 7f1f6ace4ce1c337bb7c10592ba522e8a91836e3 ("x86/microcode/32: Move early loading after paging enable")
https://git.kernel.org/cgit/linux/kernel/git/tglx/devel.git x86/microcode

in testcase: boot

compiler: clang-16
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------+------------+------------+
|                                             | d2700f4067 | 7f1f6ace4c |
+---------------------------------------------+------------+------------+
| boot_successes                              | 13         | 0          |
| boot_failures                               | 0          | 12         |
| BUG:unable_to_handle_page_fault_for_address | 0          | 12         |
| Oops:#[##]                                  | 0          | 12         |
| EIP:load_ucode_ap                           | 0          | 12         |
| Kernel_panic-not_syncing:Fatal_exception    | 0          | 12         |
+---------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308291502.2d535a18-oliver.sang@intel.com


[    0.139017][    T0] BUG: unable to handle page fault for address: 027025c4
[    0.139017][    T0] #PF: supervisor read access in kernel mode
[    0.139017][    T0] #PF: error_code(0x0000) - not-present page
[    0.139017][    T0] *pde = 00000000
[    0.139017][    T0] Oops: 0000 [#1] PREEMPT SMP
[    0.139017][    T0] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.5.0-rc3-00012-g7f1f6ace4ce1 #1 0b760317710af5cd94fc9e004982f482d14400de
[    0.139017][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.139017][ T0] EIP: load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187) 
[ 0.139017][ T0] Code: c2 a1 34 69 ce c2 eb c7 83 c4 04 5e 5f 5b 5d c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 cc 3e 8d 74 26 00 55 89 e5 53 57 56 <80> 3d c4 25 70 02 00 74 05 5e 5f 5b 5d c3 b8 01 00 00 00 31 c9 0f
All code
========
   0:	c2 a1 34             	retq   $0x34a1
   3:	69 ce c2 eb c7 83    	imul   $0x83c7ebc2,%esi,%ecx
   9:	c4                   	(bad)  
   a:	04 5e                	add    $0x5e,%al
   c:	5f                   	pop    %rdi
   d:	5b                   	pop    %rbx
   e:	5d                   	pop    %rbp
   f:	c3                   	retq   
  10:	00 00                	add    %al,(%rax)
  12:	cc                   	int3   
  13:	cc                   	int3   
  14:	00 00                	add    %al,(%rax)
  16:	cc                   	int3   
  17:	cc                   	int3   
  18:	00 00                	add    %al,(%rax)
  1a:	cc                   	int3   
  1b:	cc                   	int3   
  1c:	00 00                	add    %al,(%rax)
  1e:	cc                   	int3   
  1f:	3e 8d 74 26 00       	lea    %ds:0x0(%rsi,%riz,1),%esi
  24:	55                   	push   %rbp
  25:	89 e5                	mov    %esp,%ebp
  27:	53                   	push   %rbx
  28:	57                   	push   %rdi
  29:	56                   	push   %rsi
  2a:*	80 3d c4 25 70 02 00 	cmpb   $0x0,0x27025c4(%rip)        # 0x27025f5		<-- trapping instruction
  31:	74 05                	je     0x38
  33:	5e                   	pop    %rsi
  34:	5f                   	pop    %rdi
  35:	5b                   	pop    %rbx
  36:	5d                   	pop    %rbp
  37:	c3                   	retq   
  38:	b8 01 00 00 00       	mov    $0x1,%eax
  3d:	31 c9                	xor    %ecx,%ecx
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	80 3d c4 25 70 02 00 	cmpb   $0x0,0x27025c4(%rip)        # 0x27025cb
   7:	74 05                	je     0xe
   9:	5e                   	pop    %rsi
   a:	5f                   	pop    %rdi
   b:	5b                   	pop    %rbx
   c:	5d                   	pop    %rbp
   d:	c3                   	retq   
   e:	b8 01 00 00 00       	mov    $0x1,%eax
  13:	31 c9                	xor    %ecx,%ecx
  15:	0f                   	.byte 0xf
[    0.139017][    T0] EAX: 00000000 EBX: 01020800 ECX: ff008983 EDX: ff000000
[    0.139017][    T0] ESI: 00000000 EDI: 00000000 EBP: c405df9c ESP: c405df90
[    0.139017][    T0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
[    0.139017][    T0] CR0: 80050033 CR2: 027025c4 CR3: 02cdf000 CR4: 00000090
[    0.139017][    T0] Call Trace:
[ 0.139017][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) 
[ 0.139017][ T0] ? __die (arch/x86/kernel/dumpstack.c:434) 
[ 0.139017][ T0] ? page_fault_oops (arch/x86/mm/fault.c:703) 
[ 0.139017][ T0] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:761) 
[ 0.139017][ T0] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:817) 
[ 0.139017][ T0] ? bad_area_nosemaphore (arch/x86/mm/fault.c:866) 
[ 0.139017][ T0] ? do_user_addr_fault (arch/x86/mm/fault.c:?) 
[ 0.139017][ T0] ? exc_page_fault (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542) 
[ 0.139017][ T0] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1499) 
[ 0.139017][ T0] ? handle_exception (init_task.c:?) 
[ 0.139017][ T0] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1499) 
[ 0.139017][ T0] ? load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187) 
[ 0.139017][ T0] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1499) 
[ 0.139017][ T0] ? load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187) 
[ 0.139017][ T0] start_secondary (arch/x86/kernel/smpboot.c:283) 
[ 0.139017][ T0] startup_32_smp (??:?) 
[    0.139017][    T0] Modules linked in:
[    0.139017][    T0] CR2: 00000000027025c4
[    0.139017][    T0] ---[ end trace 0000000000000000 ]---
[ 0.139017][ T0] EIP: load_ucode_ap (arch/x86/kernel/cpu/microcode/core.c:177 arch/x86/kernel/cpu/microcode/core.c:187) 
[ 0.139017][ T0] Code: c2 a1 34 69 ce c2 eb c7 83 c4 04 5e 5f 5b 5d c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 cc 3e 8d 74 26 00 55 89 e5 53 57 56 <80> 3d c4 25 70 02 00 74 05 5e 5f 5b 5d c3 b8 01 00 00 00 31 c9 0f
All code
========
   0:	c2 a1 34             	retq   $0x34a1
   3:	69 ce c2 eb c7 83    	imul   $0x83c7ebc2,%esi,%ecx
   9:	c4                   	(bad)  
   a:	04 5e                	add    $0x5e,%al
   c:	5f                   	pop    %rdi
   d:	5b                   	pop    %rbx
   e:	5d                   	pop    %rbp
   f:	c3                   	retq   
  10:	00 00                	add    %al,(%rax)
  12:	cc                   	int3   
  13:	cc                   	int3   
  14:	00 00                	add    %al,(%rax)
  16:	cc                   	int3   
  17:	cc                   	int3   
  18:	00 00                	add    %al,(%rax)
  1a:	cc                   	int3   
  1b:	cc                   	int3   
  1c:	00 00                	add    %al,(%rax)
  1e:	cc                   	int3   
  1f:	3e 8d 74 26 00       	lea    %ds:0x0(%rsi,%riz,1),%esi
  24:	55                   	push   %rbp
  25:	89 e5                	mov    %esp,%ebp
  27:	53                   	push   %rbx
  28:	57                   	push   %rdi
  29:	56                   	push   %rsi
  2a:*	80 3d c4 25 70 02 00 	cmpb   $0x0,0x27025c4(%rip)        # 0x27025f5		<-- trapping instruction
  31:	74 05                	je     0x38
  33:	5e                   	pop    %rsi
  34:	5f                   	pop    %rdi
  35:	5b                   	pop    %rbx
  36:	5d                   	pop    %rbp
  37:	c3                   	retq   
  38:	b8 01 00 00 00       	mov    $0x1,%eax
  3d:	31 c9                	xor    %ecx,%ecx
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	80 3d c4 25 70 02 00 	cmpb   $0x0,0x27025c4(%rip)        # 0x27025cb
   7:	74 05                	je     0xe
   9:	5e                   	pop    %rsi
   a:	5f                   	pop    %rdi
   b:	5b                   	pop    %rbx
   c:	5d                   	pop    %rbp
   d:	c3                   	retq   
   e:	b8 01 00 00 00       	mov    $0x1,%eax
  13:	31 c9                	xor    %ecx,%ecx
  15:	0f                   	.byte 0xf


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230829/202308291502.2d535a18-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
       [not found] ` <DS7PR11MB59651DCF238053376F5927C88FE6A@DS7PR11MB5965.namprd11.prod.outlook.com>
@ 2023-08-30 14:04   ` Zhuo, Qiuxu
  2023-08-30 17:08     ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Zhuo, Qiuxu @ 2023-08-30 14:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Du, Julie, oe-lkp@lists.linux.dev, lkp, Sang, Oliver

Hi Thomas,

> From: Sang, Oliver <oliver.sang@intel.com>
> Sent: Tuesday, August 29, 2023 4:00 PM
> To: Thomas Gleixner <tglx@linutronix.de>
> ...
> BUG:unable_to_handle_page_fault_for_address
> ...
> [    0.139017][    T0] BUG: unable to handle page fault for address: 027025c4
> ...
> [    0.139017][    T0] CR0: 80050033 CR2: 027025c4 CR3: 02cdf000 CR4:

The bit31 of CR0 was set which indicated that the paging was enabled as expected by your patch [1].

> ...
> [ 0.139017][ T0] EIP: load_ucode_ap
> (arch/x86/kernel/cpu/microcode/core.c:177
> arch/x86/kernel/cpu/microcode/core.c:187)

The fault EIP was due to accessing the following memory in the physical address:
    check_loader_disabled_ap() -> return *((bool *)__pa_nodebug(&dis_ucode_ldr));

Your patch [1] moved the 32-bit early ucode update after paging was enabled.
So, seems like all the following places need memory access in the virtual address.

I can't reproduce the issue if I apply the following patch.
Please let me know if you want me to create a separate patch.

[1] 7f1f6ace4ce1 ("x86/microcode/32: Move early loading after paging enable")

Thanks!
-Qiuxu

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 95f5dcf87bcb..c4886552bb0d 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -86,17 +86,11 @@ static u32 final_levels[] = {
 static bool amd_check_current_patch_level(void)
 {
        u32 lvl, dummy, i;
-       u32 *levels;

        native_rdmsr(MSR_AMD64_PATCH_LEVEL, lvl, dummy);

-       if (IS_ENABLED(CONFIG_X86_32))
-               levels = (u32 *)__pa_nodebug(&final_levels);
-       else
-               levels = final_levels;
-
-       for (i = 0; levels[i]; i++) {
-               if (lvl == levels[i])
+       for (i = 0; final_levels[i]; i++) {
+               if (lvl == final_levels[i])
                        return true;
        }
        return false;
@@ -104,36 +98,23 @@ static bool amd_check_current_patch_level(void)

 static bool __init check_loader_disabled_bsp(void)
 {
-       static const char *__dis_opt_str = "dis_ucode_ldr";
-
-#ifdef CONFIG_X86_32
-       const char *cmdline = (const char *)__pa_nodebug(boot_command_line);
-       const char *option  = (const char *)__pa_nodebug(__dis_opt_str);
-       bool *res = (bool *)__pa_nodebug(&dis_ucode_ldr);
-
-#else /* CONFIG_X86_64 */
-       const char *cmdline = boot_command_line;
-       const char *option  = __dis_opt_str;
-       bool *res = &dis_ucode_ldr;
-#endif
-
        /*
         * CPUID(1).ECX[31]: reserved for hypervisor use. This is still not
         * completely accurate as xen pv guests don't see that CPUID bit set but
         * that's good enough as they don't land on the BSP path anyway.
         */
        if (native_cpuid_ecx(1) & BIT(31))
-               return *res;
+               return dis_ucode_ldr;

        if (x86_cpuid_vendor() == X86_VENDOR_AMD) {
                if (amd_check_current_patch_level())
-                       return *res;
+                       return dis_ucode_ldr;
        }

-       if (cmdline_find_option_bool(cmdline, option) <= 0)
-               *res = false;
+       if (cmdline_find_option_bool(boot_command_line, "dis_ucode_ldr") <= 0)
+               dis_ucode_ldr = false;

-       return *res;
+       return dis_ucode_ldr;
 }

 void __init load_ucode_bsp(void)
@@ -173,11 +154,7 @@ void __init load_ucode_bsp(void)

 static bool check_loader_disabled_ap(void)
 {
-#ifdef CONFIG_X86_32
-       return *((bool *)__pa_nodebug(&dis_ucode_ldr));
-#else
        return dis_ucode_ldr;
-#endif
 }

 void load_ucode_ap(void)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
  2023-08-30 14:04   ` Zhuo, Qiuxu
@ 2023-08-30 17:08     ` Thomas Gleixner
  2023-08-31  2:38       ` Zhuo, Qiuxu
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2023-08-30 17:08 UTC (permalink / raw)
  To: Zhuo, Qiuxu; +Cc: Du, Julie, oe-lkp@lists.linux.dev, lkp, Sang, Oliver

On Wed, Aug 30 2023 at 14:04, Qiuxu Zhuo wrote:
>> From: Sang, Oliver <oliver.sang@intel.com>
>> Sent: Tuesday, August 29, 2023 4:00 PM
>> To: Thomas Gleixner <tglx@linutronix.de>
>> ...
>> BUG:unable_to_handle_page_fault_for_address
>> ...
>> [    0.139017][    T0] BUG: unable to handle page fault for address: 027025c4
>> ...
>> [    0.139017][    T0] CR0: 80050033 CR2: 027025c4 CR3: 02cdf000 CR4:
>
> The bit31 of CR0 was set which indicated that the paging was enabled as expected by your patch [1].
>
>> ...
>> [ 0.139017][ T0] EIP: load_ucode_ap
>> (arch/x86/kernel/cpu/microcode/core.c:177
>> arch/x86/kernel/cpu/microcode/core.c:187)
>
> The fault EIP was due to accessing the following memory in the physical address:
>     check_loader_disabled_ap() -> return *((bool *)__pa_nodebug(&dis_ucode_ldr));
>
> Your patch [1] moved the 32-bit early ucode update after paging was enabled.
> So, seems like all the following places need memory access in the virtual address.
>
> I can't reproduce the issue if I apply the following patch.
> Please let me know if you want me to create a separate patch.
>
> [1] 7f1f6ace4ce1 ("x86/microcode/32: Move early loading after paging enable")

Ooops. I completely forgot to reply to this report. I've fixed that up
today in the morning already and pushed out an updated version into that
x86/microcode branch. It's pretty much the same as you did +/- a detail.

Sorry for stealing your time.

Thanks,

        Thomas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address
  2023-08-30 17:08     ` Thomas Gleixner
@ 2023-08-31  2:38       ` Zhuo, Qiuxu
  0 siblings, 0 replies; 4+ messages in thread
From: Zhuo, Qiuxu @ 2023-08-31  2:38 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Du, Julie, oe-lkp@lists.linux.dev, lkp, Sang, Oliver

> From: Thomas Gleixner <tglx@linutronix.de>
> ...
> Ooops. I completely forgot to reply to this report. I've fixed that up today in
> the morning already and pushed out an updated version into that
> x86/microcode branch. It's pretty much the same as you did +/- a detail.

Hi Thomas,

Thanks for your reply.

Re-tested your branch[1] with the top commit[2].
The issue is fixed (can't be reproduced).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/microcode
[2] 85a3feda8043 ("x86/microcode/intel: Add a minimum required revision for late-loads")

-Qiuxu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-31  2:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-29  7:59 [tglx-devel:x86/microcode] [x86/microcode/32] 7f1f6ace4c: BUG:unable_to_handle_page_fault_for_address kernel test robot
     [not found] ` <DS7PR11MB59651DCF238053376F5927C88FE6A@DS7PR11MB5965.namprd11.prod.outlook.com>
2023-08-30 14:04   ` Zhuo, Qiuxu
2023-08-30 17:08     ` Thomas Gleixner
2023-08-31  2:38       ` Zhuo, Qiuxu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.