* kexec reboot failed due to commit 75d090fd167ac
@ 2023-08-29 11:48 Aaron Lu
2023-08-29 12:14 ` Bagas Sanjaya
0 siblings, 1 reply; 20+ messages in thread
From: Aaron Lu @ 2023-08-29 11:48 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 390 bytes --]
Hi Kirill,
Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel
SPR test machine. With git bisect, the first bad commit is 75d090fd167ac
("x86/tdx: Add unaccepted memory support").
I have no idea why a tdx change would affect it, I'm not doing anything
related to tdx.
Any ideas?
The kernel config is attached, let me know if you need any other info.
Thanks,
Aaron
[-- Attachment #2: config-spr.gz --]
[-- Type: application/gzip, Size: 62274 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: kexec reboot failed due to commit 75d090fd167ac 2023-08-29 11:48 kexec reboot failed due to commit 75d090fd167ac Aaron Lu @ 2023-08-29 12:14 ` Bagas Sanjaya 2023-08-29 12:51 ` Aaron Lu 0 siblings, 1 reply; 20+ messages in thread From: Bagas Sanjaya @ 2023-08-29 12:14 UTC (permalink / raw) To: Aaron Lu, Kirill A. Shutemov, Borislav Petkov Cc: Linux Kernel Mailing List, Linux Regressions [-- Attachment #1: Type: text/plain, Size: 820 bytes --] On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote: > Hi Kirill, > > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac > ("x86/tdx: Add unaccepted memory support"). > > I have no idea why a tdx change would affect it, I'm not doing anything > related to tdx. > > Any ideas? > > The kernel config is attached, let me know if you need any other info. Can you provide system logs (e.g. journalctl output) when attempting to reboot? Anyway, thanks for the regression report. I'm adding it to regzbot: #regzbot ^introduced: 75d090fd167aca #regzbot title: unable to reboot with kexec due to TDX unaccepted memory support Thanks. -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-08-29 12:14 ` Bagas Sanjaya @ 2023-08-29 12:51 ` Aaron Lu 2023-08-29 12:59 ` Kirill A. Shutemov 0 siblings, 1 reply; 20+ messages in thread From: Aaron Lu @ 2023-08-29 12:51 UTC (permalink / raw) To: Bagas Sanjaya Cc: Kirill A. Shutemov, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Tue, Aug 29, 2023 at 07:14:59PM +0700, Bagas Sanjaya wrote: > On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote: > > Hi Kirill, > > > > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel > > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac > > ("x86/tdx: Add unaccepted memory support"). > > > > I have no idea why a tdx change would affect it, I'm not doing anything > > related to tdx. > > > > Any ideas? > > > > The kernel config is attached, let me know if you need any other info. > > Can you provide system logs (e.g. journalctl output) when attempting to > reboot? ... ... Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Syncing filesystems and block devices. Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Sending SIGTERM to remaining processes... Aug 29 19:16:00 be3af2b6059f systemd-journald[2629]: Journal stopped -- Boot 7e5173842b8b4be581886ff25ad0c02f -- Aug 29 19:24:27 be3af2b6059f kernel: microcode: updated early: 0x2b000161 -> 0x2b000461, date = 2023-03-13 Aug 29 19:24:27 be3af2b6059f kernel: Linux version 6.3.8-100.fc37.x86_64 (mockbuild@bkernel02.iad2.fedoraproject.org) Aug 29 19:24:27 be3af2b6059f kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.8-100.fc37.x86_64 root=UUID=4381321e-e0> First 3 lines are from the first kernel, then I attmpted to kexec reboot to 6.4.0-rc5-00009-g75d090fd167a and remote console hanged with the reboot message of the first kernel. After a while, I know kexec failed so I power cycled the machine to boot into a distro kernel, that is the last 3 lines. There is no trace of the failed boot. I guess the kexeced kernel failed to start early in the boot process so the log is probably only available in serial, if any. Unfortunately, there is no serial support for this machine. Thanks, Aaron > Anyway, thanks for the regression report. I'm adding it to regzbot: > > #regzbot ^introduced: 75d090fd167aca > #regzbot title: unable to reboot with kexec due to TDX unaccepted memory support > > Thanks. > > -- > An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-08-29 12:51 ` Aaron Lu @ 2023-08-29 12:59 ` Kirill A. Shutemov 2023-08-29 14:04 ` Aaron Lu 0 siblings, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-08-29 12:59 UTC (permalink / raw) To: Aaron Lu Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Tue, Aug 29, 2023 at 08:51:34PM +0800, Aaron Lu wrote: > On Tue, Aug 29, 2023 at 07:14:59PM +0700, Bagas Sanjaya wrote: > > On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote: > > > Hi Kirill, > > > > > > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel > > > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac > > > ("x86/tdx: Add unaccepted memory support"). > > > > > > I have no idea why a tdx change would affect it, I'm not doing anything > > > related to tdx. > > > > > > Any ideas? Are we talking about bare metal? Or is it kexec in a VM? > > > The kernel config is attached, let me know if you need any other info. > > > > Can you provide system logs (e.g. journalctl output) when attempting to > > reboot? > > ... ... > Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Syncing filesystems and block devices. > Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Sending SIGTERM to remaining processes... > Aug 29 19:16:00 be3af2b6059f systemd-journald[2629]: Journal stopped > -- Boot 7e5173842b8b4be581886ff25ad0c02f -- > Aug 29 19:24:27 be3af2b6059f kernel: microcode: updated early: 0x2b000161 -> 0x2b000461, date = 2023-03-13 > Aug 29 19:24:27 be3af2b6059f kernel: Linux version 6.3.8-100.fc37.x86_64 (mockbuild@bkernel02.iad2.fedoraproject.org) > Aug 29 19:24:27 be3af2b6059f kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.8-100.fc37.x86_64 root=UUID=4381321e-e0> > > First 3 lines are from the first kernel, then I attmpted to kexec reboot > to 6.4.0-rc5-00009-g75d090fd167a and remote console hanged with the > reboot message of the first kernel. After a while, I know kexec failed > so I power cycled the machine to boot into a distro kernel, that is the > last 3 lines. There is no trace of the failed boot. > > I guess the kexeced kernel failed to start early in the boot process > so the log is probably only available in serial, if any. Unfortunately, > there is no serial support for this machine. Could you show dmesg of the first kernel before kexec? -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-08-29 12:59 ` Kirill A. Shutemov @ 2023-08-29 14:04 ` Aaron Lu 2023-09-07 13:14 ` Kirill A. Shutemov 0 siblings, 1 reply; 20+ messages in thread From: Aaron Lu @ 2023-08-29 14:04 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions [-- Attachment #1: Type: text/plain, Size: 2531 bytes --] On Tue, Aug 29, 2023 at 03:59:39PM +0300, Kirill A. Shutemov wrote: > On Tue, Aug 29, 2023 at 08:51:34PM +0800, Aaron Lu wrote: > > On Tue, Aug 29, 2023 at 07:14:59PM +0700, Bagas Sanjaya wrote: > > > On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote: > > > > Hi Kirill, > > > > > > > > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel > > > > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac > > > > ("x86/tdx: Add unaccepted memory support"). > > > > > > > > I have no idea why a tdx change would affect it, I'm not doing anything > > > > related to tdx. > > > > > > > > Any ideas? > > Are we talking about bare metal? Or is it kexec in a VM? Bare metal. > > > > The kernel config is attached, let me know if you need any other info. > > > > > > Can you provide system logs (e.g. journalctl output) when attempting to > > > reboot? > > > > ... ... > > Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Syncing filesystems and block devices. > > Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Sending SIGTERM to remaining processes... > > Aug 29 19:16:00 be3af2b6059f systemd-journald[2629]: Journal stopped > > -- Boot 7e5173842b8b4be581886ff25ad0c02f -- > > Aug 29 19:24:27 be3af2b6059f kernel: microcode: updated early: 0x2b000161 -> 0x2b000461, date = 2023-03-13 > > Aug 29 19:24:27 be3af2b6059f kernel: Linux version 6.3.8-100.fc37.x86_64 (mockbuild@bkernel02.iad2.fedoraproject.org) > > Aug 29 19:24:27 be3af2b6059f kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.8-100.fc37.x86_64 root=UUID=4381321e-e0> > > > > First 3 lines are from the first kernel, then I attmpted to kexec reboot > > to 6.4.0-rc5-00009-g75d090fd167a and remote console hanged with the > > reboot message of the first kernel. After a while, I know kexec failed > > so I power cycled the machine to boot into a distro kernel, that is the > > last 3 lines. There is no trace of the failed boot. > > > > I guess the kexeced kernel failed to start early in the boot process > > so the log is probably only available in serial, if any. Unfortunately, > > there is no serial support for this machine. > > Could you show dmesg of the first kernel before kexec? Attached. BTW, kexec is invoked like this: kver=6.4.0-rc5-00009-g75d090fd167a kdir=$HOME/kernels/$kver sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" Thanks, Aaron [-- Attachment #2: dmesg_spr.gz --] [-- Type: application/gzip, Size: 69139 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-08-29 14:04 ` Aaron Lu @ 2023-09-07 13:14 ` Kirill A. Shutemov 2023-09-08 6:02 ` Aaron Lu 0 siblings, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-07 13:14 UTC (permalink / raw) To: Aaron Lu Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > Could you show dmesg of the first kernel before kexec? > > Attached. > > BTW, kexec is invoked like this: > kver=6.4.0-rc5-00009-g75d090fd167a > kdir=$HOME/kernels/$kver > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" I don't understand why it happens. Could you check if this patch changes anything: diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 94b7abcf624b..172c476ff6f3 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, debug_putstr("\nDecompressing Linux... "); +#if 0 if (init_unaccepted_memory()) { debug_putstr("Accepting memory... "); accept_memory(__pa(output), __pa(output) + needed_size); } +#endif __decompress(input_data, input_len, NULL, NULL, output, output_len, NULL, error); -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-07 13:14 ` Kirill A. Shutemov @ 2023-09-08 6:02 ` Aaron Lu 2023-09-08 12:32 ` Kirill A. Shutemov 0 siblings, 1 reply; 20+ messages in thread From: Aaron Lu @ 2023-09-08 6:02 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote: > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > > Could you show dmesg of the first kernel before kexec? > > > > Attached. > > > > BTW, kexec is invoked like this: > > kver=6.4.0-rc5-00009-g75d090fd167a > > kdir=$HOME/kernels/$kver > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" > > I don't understand why it happens. > > Could you check if this patch changes anything: > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > index 94b7abcf624b..172c476ff6f3 100644 > --- a/arch/x86/boot/compressed/misc.c > +++ b/arch/x86/boot/compressed/misc.c > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > > debug_putstr("\nDecompressing Linux... "); > > +#if 0 > if (init_unaccepted_memory()) { > debug_putstr("Accepting memory... "); > accept_memory(__pa(output), __pa(output) + needed_size); > } > +#endif > > __decompress(input_data, input_len, NULL, NULL, output, output_len, > NULL, error); > -- It solved the problem. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-08 6:02 ` Aaron Lu @ 2023-09-08 12:32 ` Kirill A. Shutemov 2023-09-08 15:58 ` Kees Cook 0 siblings, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-08 12:32 UTC (permalink / raw) To: Aaron Lu, Kees Cook Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote: > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote: > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > > > Could you show dmesg of the first kernel before kexec? > > > > > > Attached. > > > > > > BTW, kexec is invoked like this: > > > kver=6.4.0-rc5-00009-g75d090fd167a > > > kdir=$HOME/kernels/$kver > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" > > > > I don't understand why it happens. > > > > Could you check if this patch changes anything: > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > > index 94b7abcf624b..172c476ff6f3 100644 > > --- a/arch/x86/boot/compressed/misc.c > > +++ b/arch/x86/boot/compressed/misc.c > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > > > > debug_putstr("\nDecompressing Linux... "); > > > > +#if 0 > > if (init_unaccepted_memory()) { > > debug_putstr("Accepting memory... "); > > accept_memory(__pa(output), __pa(output) + needed_size); > > } > > +#endif > > > > __decompress(input_data, input_len, NULL, NULL, output, output_len, > > NULL, error); > > -- > > It solved the problem. Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet understand why and how unaccepted memory is involved. I will look more into it. Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away. Kees, maybe you have a clue? diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index 9191280d9ea3..26ccce41d781 100644 --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -40,7 +40,7 @@ #ifdef CONFIG_X86_64 # define BOOT_STACK_SIZE 0x4000 -# define BOOT_INIT_PGT_SIZE (6*4096) +# define BOOT_INIT_PGT_SIZE (7*4096) # ifdef CONFIG_RANDOMIZE_BASE /* * Assuming all cross the 512GB boundary: -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-08 12:32 ` Kirill A. Shutemov @ 2023-09-08 15:58 ` Kees Cook 2023-09-08 16:17 ` Ard Biesheuvel 0 siblings, 1 reply; 20+ messages in thread From: Kees Cook @ 2023-09-08 15:58 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, ardb On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote: > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote: > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote: > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > > > > Could you show dmesg of the first kernel before kexec? > > > > > > > > Attached. > > > > > > > > BTW, kexec is invoked like this: > > > > kver=6.4.0-rc5-00009-g75d090fd167a > > > > kdir=$HOME/kernels/$kver > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" > > > > > > I don't understand why it happens. > > > > > > Could you check if this patch changes anything: > > > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > > > index 94b7abcf624b..172c476ff6f3 100644 > > > --- a/arch/x86/boot/compressed/misc.c > > > +++ b/arch/x86/boot/compressed/misc.c > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > > > > > > debug_putstr("\nDecompressing Linux... "); > > > > > > +#if 0 > > > if (init_unaccepted_memory()) { > > > debug_putstr("Accepting memory... "); > > > accept_memory(__pa(output), __pa(output) + needed_size); > > > } > > > +#endif > > > > > > __decompress(input_data, input_len, NULL, NULL, output, output_len, > > > NULL, error); > > > -- > > > > It solved the problem. > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet > understand why and how unaccepted memory is involved. I will look more > into it. > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away. Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot attempts? (i.e. maybe some position is bad and KASLR happens to usually avoid it?) > Kees, maybe you have a clue? The only thing I can think of is that something isn't being counted correctly due to the size of code, and it just happens that this commit makes the code large enough to exceed some set of mappings? > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h > index 9191280d9ea3..26ccce41d781 100644 > --- a/arch/x86/include/asm/boot.h > +++ b/arch/x86/include/asm/boot.h > @@ -40,7 +40,7 @@ > #ifdef CONFIG_X86_64 > # define BOOT_STACK_SIZE 0x4000 > > -# define BOOT_INIT_PGT_SIZE (6*4096) > +# define BOOT_INIT_PGT_SIZE (7*4096) That's why this might be working, for example? How large is the boot image before/after the commit, etc? > # ifdef CONFIG_RANDOMIZE_BASE > /* > * Assuming all cross the 512GB boundary: > -- > Kiryl Shutsemau / Kirill A. Shutemov -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-08 15:58 ` Kees Cook @ 2023-09-08 16:17 ` Ard Biesheuvel 2023-09-09 11:32 ` Kirill A. Shutemov 0 siblings, 1 reply; 20+ messages in thread From: Ard Biesheuvel @ 2023-09-08 16:17 UTC (permalink / raw) To: Kees Cook Cc: Kirill A. Shutemov, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote: > > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote: > > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote: > > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > > > > > Could you show dmesg of the first kernel before kexec? > > > > > > > > > > Attached. > > > > > > > > > > BTW, kexec is invoked like this: > > > > > kver=6.4.0-rc5-00009-g75d090fd167a > > > > > kdir=$HOME/kernels/$kver > > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" > > > > > > > > I don't understand why it happens. > > > > > > > > Could you check if this patch changes anything: > > > > > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > > > > index 94b7abcf624b..172c476ff6f3 100644 > > > > --- a/arch/x86/boot/compressed/misc.c > > > > +++ b/arch/x86/boot/compressed/misc.c > > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > > > > > > > > debug_putstr("\nDecompressing Linux... "); > > > > > > > > +#if 0 > > > > if (init_unaccepted_memory()) { > > > > debug_putstr("Accepting memory... "); > > > > accept_memory(__pa(output), __pa(output) + needed_size); > > > > } > > > > +#endif > > > > > > > > __decompress(input_data, input_len, NULL, NULL, output, output_len, > > > > NULL, error); > > > > -- > > > > > > It solved the problem. > > > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet > > understand why and how unaccepted memory is involved. I will look more > > into it. > > > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away. > > Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot > attempts? (i.e. maybe some position is bad and KASLR happens to usually > avoid it?) > > > Kees, maybe you have a clue? > > The only thing I can think of is that something isn't being counted > correctly due to the size of code, and it just happens that this commit > makes the code large enough to exceed some set of mappings? > > > > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h > > index 9191280d9ea3..26ccce41d781 100644 > > --- a/arch/x86/include/asm/boot.h > > +++ b/arch/x86/include/asm/boot.h > > @@ -40,7 +40,7 @@ > > #ifdef CONFIG_X86_64 > > # define BOOT_STACK_SIZE 0x4000 > > > > -# define BOOT_INIT_PGT_SIZE (6*4096) > > +# define BOOT_INIT_PGT_SIZE (7*4096) > > That's why this might be working, for example? How large is the boot > image before/after the commit, etc? > Not sure why these changes would make a difference here, but choking on accept_memory() on a non-TDX suggests that init_unaccepted_memory() is poking into unmapped memory before it even decides that the unaccepted memory does not exist. init_unaccepted_memory() has ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len); if (ret) { warn("EFI config table not found."); return false; } which looks for <guid, phys_addr> tuples in an array pointed to by the EFI system table, and if either of those is not mapped, things can be expected to explode. The only odd thing there is that this code is invoked after setting up the 'demand paging' logic in the decompressor. If you haven't yet, could you please retry the kexec boot with earlyprintk=tty<insert your UART params here>? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-08 16:17 ` Ard Biesheuvel @ 2023-09-09 11:32 ` Kirill A. Shutemov 2023-09-11 14:56 ` Dave Young 0 siblings, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-09 11:32 UTC (permalink / raw) To: Ard Biesheuvel Cc: Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions On Fri, Sep 08, 2023 at 06:17:53PM +0200, Ard Biesheuvel wrote: > On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@chromium.org> wrote: > > > > On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote: > > > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote: > > > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > > > > > > Could you show dmesg of the first kernel before kexec? > > > > > > > > > > > > Attached. > > > > > > > > > > > > BTW, kexec is invoked like this: > > > > > > kver=6.4.0-rc5-00009-g75d090fd167a > > > > > > kdir=$HOME/kernels/$kver > > > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" > > > > > > > > > > I don't understand why it happens. > > > > > > > > > > Could you check if this patch changes anything: > > > > > > > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > > > > > index 94b7abcf624b..172c476ff6f3 100644 > > > > > --- a/arch/x86/boot/compressed/misc.c > > > > > +++ b/arch/x86/boot/compressed/misc.c > > > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > > > > > > > > > > debug_putstr("\nDecompressing Linux... "); > > > > > > > > > > +#if 0 > > > > > if (init_unaccepted_memory()) { > > > > > debug_putstr("Accepting memory... "); > > > > > accept_memory(__pa(output), __pa(output) + needed_size); > > > > > } > > > > > +#endif > > > > > > > > > > __decompress(input_data, input_len, NULL, NULL, output, output_len, > > > > > NULL, error); > > > > > -- > > > > > > > > It solved the problem. > > > > > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet > > > understand why and how unaccepted memory is involved. I will look more > > > into it. > > > > > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away. > > > > Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot > > attempts? (i.e. maybe some position is bad and KASLR happens to usually > > avoid it?) Yes, it can be luck. > > > Kees, maybe you have a clue? > > > > The only thing I can think of is that something isn't being counted > > correctly due to the size of code, and it just happens that this commit > > makes the code large enough to exceed some set of mappings? > > > > > > > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h > > > index 9191280d9ea3..26ccce41d781 100644 > > > --- a/arch/x86/include/asm/boot.h > > > +++ b/arch/x86/include/asm/boot.h > > > @@ -40,7 +40,7 @@ > > > #ifdef CONFIG_X86_64 > > > # define BOOT_STACK_SIZE 0x4000 > > > > > > -# define BOOT_INIT_PGT_SIZE (6*4096) > > > +# define BOOT_INIT_PGT_SIZE (7*4096) > > > > That's why this might be working, for example? How large is the boot > > image before/after the commit, etc? > > > > Not sure why these changes would make a difference here, but choking > on accept_memory() on a non-TDX suggests that init_unaccepted_memory() > is poking into unmapped memory before it even decides that the > unaccepted memory does not exist. > > init_unaccepted_memory() has > > ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len); > if (ret) { > warn("EFI config table not found."); > return false; > } > > which looks for <guid, phys_addr> tuples in an array pointed to by the > EFI system table, and if either of those is not mapped, things can be > expected to explode. > > The only odd thing there is that this code is invoked after setting up > the 'demand paging' logic in the decompressor. > > If you haven't yet, could you please retry the kexec boot with > earlyprintk=tty<insert your UART params here>? early console in extract_kernel input_data: 0x000000807eb433a8 input_len: 0x0000000000d26271 output: 0x000000807b000000 output_len: 0x0000000004800c10 kernel_total_size: 0x0000000003e28000 needed_size: 0x0000000004a00000 trampoline_32bit: 0x000000000009d000 Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? pages->pgt_buf_offset: 0x0000000000006000 pages->pgt_buf_size: 0x0000000000006000 Error: kernel_ident_mapping_init() failed It crashes on #PF due to stbl->nr_tables dereference in efi_get_conf_table() called from init_unaccepted_memory(). I don't see anything special about stbl location: 0x775d6018. One other bit of information: disabling 5-level paging also helps the issue. I will debug further. -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-09 11:32 ` Kirill A. Shutemov @ 2023-09-11 14:56 ` Dave Young 2023-09-11 14:57 ` Kirill A. Shutemov 0 siblings, 1 reply; 20+ messages in thread From: Dave Young @ 2023-09-11 14:56 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec Add kexec list in cc On Sat, 9 Sept 2023 at 19:34, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > On Fri, Sep 08, 2023 at 06:17:53PM +0200, Ard Biesheuvel wrote: > > On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@chromium.org> wrote: > > > > > > On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote: > > > > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote: > > > > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote: > > > > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote: > > > > > > > > Could you show dmesg of the first kernel before kexec? > > > > > > > > > > > > > > Attached. > > > > > > > > > > > > > > BTW, kexec is invoked like this: > > > > > > > kver=6.4.0-rc5-00009-g75d090fd167a > > > > > > > kdir=$HOME/kernels/$kver > > > > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0" > > > > > > > > > > > > I don't understand why it happens. > > > > > > > > > > > > Could you check if this patch changes anything: > > > > > > > > > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > > > > > > index 94b7abcf624b..172c476ff6f3 100644 > > > > > > --- a/arch/x86/boot/compressed/misc.c > > > > > > +++ b/arch/x86/boot/compressed/misc.c > > > > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > > > > > > > > > > > > debug_putstr("\nDecompressing Linux... "); > > > > > > > > > > > > +#if 0 > > > > > > if (init_unaccepted_memory()) { > > > > > > debug_putstr("Accepting memory... "); > > > > > > accept_memory(__pa(output), __pa(output) + needed_size); > > > > > > } > > > > > > +#endif > > > > > > > > > > > > __decompress(input_data, input_len, NULL, NULL, output, output_len, > > > > > > NULL, error); > > > > > > -- > > > > > > > > > > It solved the problem. > > > > > > > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet > > > > understand why and how unaccepted memory is involved. I will look more > > > > into it. > > > > > > > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away. > > > > > > Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot > > > attempts? (i.e. maybe some position is bad and KASLR happens to usually > > > avoid it?) > > Yes, it can be luck. > > > > > Kees, maybe you have a clue? > > > > > > The only thing I can think of is that something isn't being counted > > > correctly due to the size of code, and it just happens that this commit > > > makes the code large enough to exceed some set of mappings? > > > > > > > > > > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h > > > > index 9191280d9ea3..26ccce41d781 100644 > > > > --- a/arch/x86/include/asm/boot.h > > > > +++ b/arch/x86/include/asm/boot.h > > > > @@ -40,7 +40,7 @@ > > > > #ifdef CONFIG_X86_64 > > > > # define BOOT_STACK_SIZE 0x4000 > > > > > > > > -# define BOOT_INIT_PGT_SIZE (6*4096) > > > > +# define BOOT_INIT_PGT_SIZE (7*4096) > > > > > > That's why this might be working, for example? How large is the boot > > > image before/after the commit, etc? > > > > > > > Not sure why these changes would make a difference here, but choking > > on accept_memory() on a non-TDX suggests that init_unaccepted_memory() > > is poking into unmapped memory before it even decides that the > > unaccepted memory does not exist. > > > > init_unaccepted_memory() has > > > > ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len); > > if (ret) { > > warn("EFI config table not found."); > > return false; > > } > > > > which looks for <guid, phys_addr> tuples in an array pointed to by the > > EFI system table, and if either of those is not mapped, things can be > > expected to explode. > > > > The only odd thing there is that this code is invoked after setting up > > the 'demand paging' logic in the decompressor. > > > > If you haven't yet, could you please retry the kexec boot with > > earlyprintk=tty<insert your UART params here>? > > early console in extract_kernel > input_data: 0x000000807eb433a8 > input_len: 0x0000000000d26271 > output: 0x000000807b000000 > output_len: 0x0000000004800c10 > kernel_total_size: 0x0000000003e28000 > needed_size: 0x0000000004a00000 > trampoline_32bit: 0x000000000009d000 > > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? > pages->pgt_buf_offset: 0x0000000000006000 > pages->pgt_buf_size: 0x0000000000006000 > > > Error: kernel_ident_mapping_init() failed > > It crashes on #PF due to stbl->nr_tables dereference in > efi_get_conf_table() called from init_unaccepted_memory(). > > I don't see anything special about stbl location: 0x775d6018. > > One other bit of information: disabling 5-level paging also helps the > issue. > > I will debug further. > > -- > Kiryl Shutsemau / Kirill A. Shutemov > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-11 14:56 ` Dave Young @ 2023-09-11 14:57 ` Kirill A. Shutemov 2023-09-11 15:33 ` Tom Lendacky 2023-09-13 14:24 ` Kirill A. Shutemov 0 siblings, 2 replies; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-11 14:57 UTC (permalink / raw) To: Dave Young Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec, Tom Lendacky On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: > > early console in extract_kernel > > input_data: 0x000000807eb433a8 > > input_len: 0x0000000000d26271 > > output: 0x000000807b000000 > > output_len: 0x0000000004800c10 > > kernel_total_size: 0x0000000003e28000 > > needed_size: 0x0000000004a00000 > > trampoline_32bit: 0x000000000009d000 > > > > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? > > pages->pgt_buf_offset: 0x0000000000006000 > > pages->pgt_buf_size: 0x0000000000006000 > > > > > > Error: kernel_ident_mapping_init() failed > > > > It crashes on #PF due to stbl->nr_tables dereference in > > efi_get_conf_table() called from init_unaccepted_memory(). > > > > I don't see anything special about stbl location: 0x775d6018. > > > > One other bit of information: disabling 5-level paging also helps the > > issue. > > > > I will debug further. The problem is not limited to unaccepted memory, it also triggers if we reach efi_get_rsdp_addr() in the same setup. I think we have several problems here. - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline, boot_data and setup_data if we assume that they are in different 1G regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1 for PUD, 4 for PMD tables. Looks like we never map EFI/ACPI memory explicitly. It might work if kernel/cmdline/... are in single 1G and we have spare pages to handle page faults. - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys; - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support. And if start pagetables from scratch ('else' case of 'if (p4d_offset...)) we run out of memory. I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y case. I don't know what the right fix here. We can increase the constants to be enough to cover existing cases, but it is very fragile. I am not sure I saw all users. Some of them could silently handled with pagefault handler in some setups. And it is hard to catch new users during code review. Also I'm not sure why do we need pagefault handler there. Looks like it just masking problems. I think everything has to be mapped explicitly. Any comments? -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-11 14:57 ` Kirill A. Shutemov @ 2023-09-11 15:33 ` Tom Lendacky 2023-09-11 15:53 ` Kirill A. Shutemov 2023-09-13 14:24 ` Kirill A. Shutemov 1 sibling, 1 reply; 20+ messages in thread From: Tom Lendacky @ 2023-09-11 15:33 UTC (permalink / raw) To: Kirill A. Shutemov, Dave Young Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec On 9/11/23 09:57, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: >>> early console in extract_kernel >>> input_data: 0x000000807eb433a8 >>> input_len: 0x0000000000d26271 >>> output: 0x000000807b000000 >>> output_len: 0x0000000004800c10 >>> kernel_total_size: 0x0000000003e28000 >>> needed_size: 0x0000000004a00000 >>> trampoline_32bit: 0x000000000009d000 >>> >>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? >>> pages->pgt_buf_offset: 0x0000000000006000 >>> pages->pgt_buf_size: 0x0000000000006000 >>> >>> >>> Error: kernel_ident_mapping_init() failed >>> >>> It crashes on #PF due to stbl->nr_tables dereference in >>> efi_get_conf_table() called from init_unaccepted_memory(). >>> >>> I don't see anything special about stbl location: 0x775d6018. >>> >>> One other bit of information: disabling 5-level paging also helps the >>> issue. >>> >>> I will debug further. > > The problem is not limited to unaccepted memory, it also triggers if we > reach efi_get_rsdp_addr() in the same setup. > > I think we have several problems here. > > - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline, > boot_data and setup_data if we assume that they are in different 1G > regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1 > for PUD, 4 for PMD tables. > > Looks like we never map EFI/ACPI memory explicitly. > > It might work if kernel/cmdline/... are in single 1G and we have > spare pages to handle page faults. > > - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys; > > - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support. > And if start pagetables from scratch ('else' case of 'if (p4d_offset...)) > we run out of memory. > > I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y > case. > > I don't know what the right fix here. We can increase the constants to be > enough to cover existing cases, but it is very fragile. I am not sure I > saw all users. Some of them could silently handled with pagefault handler > in some setups. And it is hard to catch new users during code review. > > Also I'm not sure why do we need pagefault handler there. Looks like it > just masking problems. I think everything has to be mapped explicitly. > > Any comments? There was a similar related issue around the cc_info blob that is captured here: https://lore.kernel.org/lkml/20230601072043.24439-1-ltao@redhat.com/ Personally, I'm a fan of mapping the EFI tables that will be passed to the kexec/kdump kernel. To me, that seems to more closely match the valid mappings for the tables when control is transferred to the OS from UEFI on the initial boot. Thanks, Tom > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-11 15:33 ` Tom Lendacky @ 2023-09-11 15:53 ` Kirill A. Shutemov 2023-09-11 17:13 ` Tom Lendacky 0 siblings, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-11 15:53 UTC (permalink / raw) To: Tom Lendacky Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec On Mon, Sep 11, 2023 at 10:33:01AM -0500, Tom Lendacky wrote: > On 9/11/23 09:57, Kirill A. Shutemov wrote: > > On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: > > > > early console in extract_kernel > > > > input_data: 0x000000807eb433a8 > > > > input_len: 0x0000000000d26271 > > > > output: 0x000000807b000000 > > > > output_len: 0x0000000004800c10 > > > > kernel_total_size: 0x0000000003e28000 > > > > needed_size: 0x0000000004a00000 > > > > trampoline_32bit: 0x000000000009d000 > > > > > > > > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? > > > > pages->pgt_buf_offset: 0x0000000000006000 > > > > pages->pgt_buf_size: 0x0000000000006000 > > > > > > > > > > > > Error: kernel_ident_mapping_init() failed > > > > > > > > It crashes on #PF due to stbl->nr_tables dereference in > > > > efi_get_conf_table() called from init_unaccepted_memory(). > > > > > > > > I don't see anything special about stbl location: 0x775d6018. > > > > > > > > One other bit of information: disabling 5-level paging also helps the > > > > issue. > > > > > > > > I will debug further. > > > > The problem is not limited to unaccepted memory, it also triggers if we > > reach efi_get_rsdp_addr() in the same setup. > > > > I think we have several problems here. > > > > - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline, > > boot_data and setup_data if we assume that they are in different 1G > > regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1 > > for PUD, 4 for PMD tables. > > > > Looks like we never map EFI/ACPI memory explicitly. > > > > It might work if kernel/cmdline/... are in single 1G and we have > > spare pages to handle page faults. > > > > - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys; > > > > - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support. > > And if start pagetables from scratch ('else' case of 'if (p4d_offset...)) > > we run out of memory. > > > > I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y > > case. > > > > I don't know what the right fix here. We can increase the constants to be > > enough to cover existing cases, but it is very fragile. I am not sure I > > saw all users. Some of them could silently handled with pagefault handler > > in some setups. And it is hard to catch new users during code review. > > > > Also I'm not sure why do we need pagefault handler there. Looks like it > > just masking problems. I think everything has to be mapped explicitly. > > > > Any comments? > > There was a similar related issue around the cc_info blob that is captured > here: https://lore.kernel.org/lkml/20230601072043.24439-1-ltao@redhat.com/ > > Personally, I'm a fan of mapping the EFI tables that will be passed to the > kexec/kdump kernel. To me, that seems to more closely match the valid > mappings for the tables when control is transferred to the OS from UEFI on > the initial boot. I don't see how it would help if initialize_identity_maps() resets pagetables. See 'else' case of 'if (p4d_offset...). -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-11 15:53 ` Kirill A. Shutemov @ 2023-09-11 17:13 ` Tom Lendacky 0 siblings, 0 replies; 20+ messages in thread From: Tom Lendacky @ 2023-09-11 17:13 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec On 9/11/23 10:53, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 10:33:01AM -0500, Tom Lendacky wrote: >> On 9/11/23 09:57, Kirill A. Shutemov wrote: >>> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: >>>>> early console in extract_kernel >>>>> input_data: 0x000000807eb433a8 >>>>> input_len: 0x0000000000d26271 >>>>> output: 0x000000807b000000 >>>>> output_len: 0x0000000004800c10 >>>>> kernel_total_size: 0x0000000003e28000 >>>>> needed_size: 0x0000000004a00000 >>>>> trampoline_32bit: 0x000000000009d000 >>>>> >>>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? >>>>> pages->pgt_buf_offset: 0x0000000000006000 >>>>> pages->pgt_buf_size: 0x0000000000006000 >>>>> >>>>> >>>>> Error: kernel_ident_mapping_init() failed >>>>> >>>>> It crashes on #PF due to stbl->nr_tables dereference in >>>>> efi_get_conf_table() called from init_unaccepted_memory(). >>>>> >>>>> I don't see anything special about stbl location: 0x775d6018. >>>>> >>>>> One other bit of information: disabling 5-level paging also helps the >>>>> issue. >>>>> >>>>> I will debug further. >>> >>> The problem is not limited to unaccepted memory, it also triggers if we >>> reach efi_get_rsdp_addr() in the same setup. >>> >>> I think we have several problems here. >>> >>> - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline, >>> boot_data and setup_data if we assume that they are in different 1G >>> regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1 >>> for PUD, 4 for PMD tables. >>> >>> Looks like we never map EFI/ACPI memory explicitly. >>> >>> It might work if kernel/cmdline/... are in single 1G and we have >>> spare pages to handle page faults. >>> >>> - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys; >>> >>> - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support. >>> And if start pagetables from scratch ('else' case of 'if (p4d_offset...)) >>> we run out of memory. >>> >>> I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y >>> case. >>> >>> I don't know what the right fix here. We can increase the constants to be >>> enough to cover existing cases, but it is very fragile. I am not sure I >>> saw all users. Some of them could silently handled with pagefault handler >>> in some setups. And it is hard to catch new users during code review. >>> >>> Also I'm not sure why do we need pagefault handler there. Looks like it >>> just masking problems. I think everything has to be mapped explicitly. >>> >>> Any comments? >> >> There was a similar related issue around the cc_info blob that is captured >> here: https://lore.kernel.org/lkml/20230601072043.24439-1-ltao@redhat.com/ >> >> Personally, I'm a fan of mapping the EFI tables that will be passed to the >> kexec/kdump kernel. To me, that seems to more closely match the valid >> mappings for the tables when control is transferred to the OS from UEFI on >> the initial boot. > > I don't see how it would help if initialize_identity_maps() resets > pagetables. See 'else' case of 'if (p4d_offset...). Ok, I see what you mean now. Thanks, Tom > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-11 14:57 ` Kirill A. Shutemov 2023-09-11 15:33 ` Tom Lendacky @ 2023-09-13 14:24 ` Kirill A. Shutemov 2023-09-21 9:54 ` Linux regression tracking (Thorsten Leemhuis) 1 sibling, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-13 14:24 UTC (permalink / raw) To: Dave Young Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec, Tom Lendacky, x86 On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: > > > early console in extract_kernel > > > input_data: 0x000000807eb433a8 > > > input_len: 0x0000000000d26271 > > > output: 0x000000807b000000 > > > output_len: 0x0000000004800c10 > > > kernel_total_size: 0x0000000003e28000 > > > needed_size: 0x0000000004a00000 > > > trampoline_32bit: 0x000000000009d000 > > > > > > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? > > > pages->pgt_buf_offset: 0x0000000000006000 > > > pages->pgt_buf_size: 0x0000000000006000 > > > > > > > > > Error: kernel_ident_mapping_init() failed > > > > > > It crashes on #PF due to stbl->nr_tables dereference in > > > efi_get_conf_table() called from init_unaccepted_memory(). > > > > > > I don't see anything special about stbl location: 0x775d6018. > > > > > > One other bit of information: disabling 5-level paging also helps the > > > issue. > > > > > > I will debug further. > > The problem is not limited to unaccepted memory, it also triggers if we > reach efi_get_rsdp_addr() in the same setup. > > I think we have several problems here. > > - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline, > boot_data and setup_data if we assume that they are in different 1G > regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1 > for PUD, 4 for PMD tables. > > Looks like we never map EFI/ACPI memory explicitly. > > It might work if kernel/cmdline/... are in single 1G and we have > spare pages to handle page faults. > > - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys; > > - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support. > And if start pagetables from scratch ('else' case of 'if (p4d_offset...)) > we run out of memory. > > I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y > case. > > I don't know what the right fix here. We can increase the constants to be > enough to cover existing cases, but it is very fragile. I am not sure I > saw all users. Some of them could silently handled with pagefault handler > in some setups. And it is hard to catch new users during code review. > > Also I'm not sure why do we need pagefault handler there. Looks like it > just masking problems. I think everything has to be mapped explicitly. > > Any comments? I struggle to come up with anything better than increasing the constant to a value that "ought to be enough for anybody" ©, let's say 128K. And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP. Objections? -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-13 14:24 ` Kirill A. Shutemov @ 2023-09-21 9:54 ` Linux regression tracking (Thorsten Leemhuis) 2023-09-21 16:03 ` Kirill A. Shutemov 0 siblings, 1 reply; 20+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-21 9:54 UTC (permalink / raw) To: Kirill A. Shutemov, Dave Young Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, Linux Regressions, kexec, Tom Lendacky, x86 On 13.09.23 16:24, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote: >> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: >>>> early console in extract_kernel >>>> input_data: 0x000000807eb433a8 >>>> input_len: 0x0000000000d26271 >>>> output: 0x000000807b000000 >>>> output_len: 0x0000000004800c10 >>>> kernel_total_size: 0x0000000003e28000 >>>> needed_size: 0x0000000004a00000 >>>> trampoline_32bit: 0x000000000009d000 >>>> >>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? >>>> pages->pgt_buf_offset: 0x0000000000006000 >>>> pages->pgt_buf_size: 0x0000000000006000 >>>> >>>> Error: kernel_ident_mapping_init() failed > [...] >> The problem is not limited to unaccepted memory, it also triggers if we >> reach efi_get_rsdp_addr() in the same setup. >> >> I think we have several problems here. > [...] >> Any comments? > > I struggle to come up with anything better than increasing the constant to > a value that "ought to be enough for anybody" ©, let's say 128K. > > And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP. > > Objections? Apparently not, as there was no reply since then (which is why I show up here, as it looked like fixing this regression stalled). Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-21 9:54 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-09-21 16:03 ` Kirill A. Shutemov 2023-09-22 10:12 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 1 reply; 20+ messages in thread From: Kirill A. Shutemov @ 2023-09-21 16:03 UTC (permalink / raw) To: Linux regressions mailing list Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, kexec, Tom Lendacky, x86 On Thu, Sep 21, 2023 at 11:54:15AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > On 13.09.23 16:24, Kirill A. Shutemov wrote: > > On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote: > >> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: > >>>> early console in extract_kernel > >>>> input_data: 0x000000807eb433a8 > >>>> input_len: 0x0000000000d26271 > >>>> output: 0x000000807b000000 > >>>> output_len: 0x0000000004800c10 > >>>> kernel_total_size: 0x0000000003e28000 > >>>> needed_size: 0x0000000004a00000 > >>>> trampoline_32bit: 0x000000000009d000 > >>>> > >>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? > >>>> pages->pgt_buf_offset: 0x0000000000006000 > >>>> pages->pgt_buf_size: 0x0000000000006000 > >>>> > >>>> Error: kernel_ident_mapping_init() failed > > [...] > >> The problem is not limited to unaccepted memory, it also triggers if we > >> reach efi_get_rsdp_addr() in the same setup. > >> > >> I think we have several problems here. > > [...] > >> Any comments? > > > > I struggle to come up with anything better than increasing the constant to > > a value that "ought to be enough for anybody" ©, let's say 128K. > > > > And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP. > > > > Objections? > > Apparently not, as there was no reply since then (which is why I show up > here, as it looked like fixing this regression stalled). It has been fixed in upstream by the commit f530ee95b72e ("x86/boot/compressed: Reserve more memory for page tables") -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kexec reboot failed due to commit 75d090fd167ac 2023-09-21 16:03 ` Kirill A. Shutemov @ 2023-09-22 10:12 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 20+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-22 10:12 UTC (permalink / raw) To: Kirill A. Shutemov, Linux regressions mailing list Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List, kexec, Tom Lendacky, x86 On 21.09.23 18:03, Kirill A. Shutemov wrote: > On Thu, Sep 21, 2023 at 11:54:15AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 13.09.23 16:24, Kirill A. Shutemov wrote: >>> On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote: >>>> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote: >>>>>> early console in extract_kernel >>>>>> input_data: 0x000000807eb433a8 >>>>>> input_len: 0x0000000000d26271 >>>>>> output: 0x000000807b000000 >>>>>> output_len: 0x0000000004800c10 >>>>>> kernel_total_size: 0x0000000003e28000 >>>>>> needed_size: 0x0000000004a00000 >>>>>> trampoline_32bit: 0x000000000009d000 >>>>>> >>>>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!? >>>>>> pages->pgt_buf_offset: 0x0000000000006000 >>>>>> pages->pgt_buf_size: 0x0000000000006000 >>>>>> >>>>>> Error: kernel_ident_mapping_init() failed >>> [...] >>>> The problem is not limited to unaccepted memory, it also triggers if we >>>> reach efi_get_rsdp_addr() in the same setup. >>>> >>>> I think we have several problems here. >>> [...] >>>> Any comments? >>> >>> I struggle to come up with anything better than increasing the constant to >>> a value that "ought to be enough for anybody" ©, let's say 128K. >>> >>> And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP. >>> >>> Objections? >> >> Apparently not, as there was no reply since then (which is why I show up >> here, as it looked like fixing this regression stalled). > > It has been fixed in upstream by the commit f530ee95b72e > ("x86/boot/compressed: Reserve more memory for page tables") Ahh, great, thx for letting me know. That commit sadly missed a Link: or Closes: tag to the regression report, which Linus and the docs ask for (and regression tracking relies on), then it would have noticed this automatically. Whatever, things happen, thx again. #regzbot fix: f530ee95b72e Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-09-22 10:12 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-29 11:48 kexec reboot failed due to commit 75d090fd167ac Aaron Lu 2023-08-29 12:14 ` Bagas Sanjaya 2023-08-29 12:51 ` Aaron Lu 2023-08-29 12:59 ` Kirill A. Shutemov 2023-08-29 14:04 ` Aaron Lu 2023-09-07 13:14 ` Kirill A. Shutemov 2023-09-08 6:02 ` Aaron Lu 2023-09-08 12:32 ` Kirill A. Shutemov 2023-09-08 15:58 ` Kees Cook 2023-09-08 16:17 ` Ard Biesheuvel 2023-09-09 11:32 ` Kirill A. Shutemov 2023-09-11 14:56 ` Dave Young 2023-09-11 14:57 ` Kirill A. Shutemov 2023-09-11 15:33 ` Tom Lendacky 2023-09-11 15:53 ` Kirill A. Shutemov 2023-09-11 17:13 ` Tom Lendacky 2023-09-13 14:24 ` Kirill A. Shutemov 2023-09-21 9:54 ` Linux regression tracking (Thorsten Leemhuis) 2023-09-21 16:03 ` Kirill A. Shutemov 2023-09-22 10:12 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox