kexec reboot failed due to commit 75d090fd167ac

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* kexec reboot failed due to commit 75d090fd167ac
@ 2023-08-29 11:48 Aaron Lu
  2023-08-29 12:14 ` Bagas Sanjaya
  0 siblings, 1 reply; 20+ messages in thread
From: Aaron Lu @ 2023-08-29 11:48 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 390 bytes --]

Hi Kirill,

Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel
SPR test machine. With git bisect, the first bad commit is 75d090fd167ac
("x86/tdx: Add unaccepted memory support").

I have no idea why a tdx change would affect it, I'm not doing anything
related to tdx.

Any ideas?

The kernel config is attached, let me know if you need any other info.

Thanks,
Aaron

[-- Attachment #2: config-spr.gz --]
[-- Type: application/gzip, Size: 62274 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-08-29 11:48 kexec reboot failed due to commit 75d090fd167ac Aaron Lu
@ 2023-08-29 12:14 ` Bagas Sanjaya
  2023-08-29 12:51   ` Aaron Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Bagas Sanjaya @ 2023-08-29 12:14 UTC (permalink / raw)
  To: Aaron Lu, Kirill A. Shutemov, Borislav Petkov
  Cc: Linux Kernel Mailing List, Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote:
> Hi Kirill,
> 
> Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel
> SPR test machine. With git bisect, the first bad commit is 75d090fd167ac
> ("x86/tdx: Add unaccepted memory support").
> 
> I have no idea why a tdx change would affect it, I'm not doing anything
> related to tdx.
> 
> Any ideas?
> 
> The kernel config is attached, let me know if you need any other info.

Can you provide system logs (e.g. journalctl output) when attempting to
reboot?

Anyway, thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: 75d090fd167aca
#regzbot title: unable to reboot with kexec due to TDX unaccepted memory support

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-08-29 12:14 ` Bagas Sanjaya
@ 2023-08-29 12:51   ` Aaron Lu
  2023-08-29 12:59     ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Aaron Lu @ 2023-08-29 12:51 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Kirill A. Shutemov, Borislav Petkov, Linux Kernel Mailing List,
	Linux Regressions

On Tue, Aug 29, 2023 at 07:14:59PM +0700, Bagas Sanjaya wrote:
> On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote:
> > Hi Kirill,
> > 
> > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel
> > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac
> > ("x86/tdx: Add unaccepted memory support").
> > 
> > I have no idea why a tdx change would affect it, I'm not doing anything
> > related to tdx.
> > 
> > Any ideas?
> > 
> > The kernel config is attached, let me know if you need any other info.
> 
> Can you provide system logs (e.g. journalctl output) when attempting to
> reboot?

... ...
Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Syncing filesystems and block devices.
Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Aug 29 19:16:00 be3af2b6059f systemd-journald[2629]: Journal stopped
-- Boot 7e5173842b8b4be581886ff25ad0c02f --
Aug 29 19:24:27 be3af2b6059f kernel: microcode: updated early: 0x2b000161 -> 0x2b000461, date = 2023-03-13
Aug 29 19:24:27 be3af2b6059f kernel: Linux version 6.3.8-100.fc37.x86_64 (mockbuild@bkernel02.iad2.fedoraproject.org)
Aug 29 19:24:27 be3af2b6059f kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.8-100.fc37.x86_64 root=UUID=4381321e-e0>

First 3 lines are from the first kernel, then I attmpted to kexec reboot
to 6.4.0-rc5-00009-g75d090fd167a and remote console hanged with the
reboot message of the first kernel. After a while, I know kexec failed
so I power cycled the machine to boot into a distro kernel, that is the
last 3 lines. There is no trace of the failed boot.

I guess the kexeced kernel failed to start early in the boot process
so the log is probably only available in serial, if any. Unfortunately,
there is no serial support for this machine.

Thanks,
Aaron

> Anyway, thanks for the regression report. I'm adding it to regzbot:
> 
> #regzbot ^introduced: 75d090fd167aca
> #regzbot title: unable to reboot with kexec due to TDX unaccepted memory support
> 
> Thanks.
> 
> -- 
> An old man doll... just what I always wanted! - Clara



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-08-29 12:51   ` Aaron Lu
@ 2023-08-29 12:59     ` Kirill A. Shutemov
  2023-08-29 14:04       ` Aaron Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-08-29 12:59 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List,
	Linux Regressions

On Tue, Aug 29, 2023 at 08:51:34PM +0800, Aaron Lu wrote:
> On Tue, Aug 29, 2023 at 07:14:59PM +0700, Bagas Sanjaya wrote:
> > On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote:
> > > Hi Kirill,
> > > 
> > > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel
> > > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac
> > > ("x86/tdx: Add unaccepted memory support").
> > > 
> > > I have no idea why a tdx change would affect it, I'm not doing anything
> > > related to tdx.
> > > 
> > > Any ideas?

Are we talking about bare metal? Or is it kexec in a VM?

> > > The kernel config is attached, let me know if you need any other info.
> > 
> > Can you provide system logs (e.g. journalctl output) when attempting to
> > reboot?
> 
> ... ...
> Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Syncing filesystems and block devices.
> Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Sending SIGTERM to remaining processes...
> Aug 29 19:16:00 be3af2b6059f systemd-journald[2629]: Journal stopped
> -- Boot 7e5173842b8b4be581886ff25ad0c02f --
> Aug 29 19:24:27 be3af2b6059f kernel: microcode: updated early: 0x2b000161 -> 0x2b000461, date = 2023-03-13
> Aug 29 19:24:27 be3af2b6059f kernel: Linux version 6.3.8-100.fc37.x86_64 (mockbuild@bkernel02.iad2.fedoraproject.org)
> Aug 29 19:24:27 be3af2b6059f kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.8-100.fc37.x86_64 root=UUID=4381321e-e0>
> 
> First 3 lines are from the first kernel, then I attmpted to kexec reboot
> to 6.4.0-rc5-00009-g75d090fd167a and remote console hanged with the
> reboot message of the first kernel. After a while, I know kexec failed
> so I power cycled the machine to boot into a distro kernel, that is the
> last 3 lines. There is no trace of the failed boot.
> 
> I guess the kexeced kernel failed to start early in the boot process
> so the log is probably only available in serial, if any. Unfortunately,
> there is no serial support for this machine.

Could you show dmesg of the first kernel before kexec?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-08-29 12:59     ` Kirill A. Shutemov
@ 2023-08-29 14:04       ` Aaron Lu
  2023-09-07 13:14         ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Aaron Lu @ 2023-08-29 14:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List,
	Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 2531 bytes --]

On Tue, Aug 29, 2023 at 03:59:39PM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 29, 2023 at 08:51:34PM +0800, Aaron Lu wrote:
> > On Tue, Aug 29, 2023 at 07:14:59PM +0700, Bagas Sanjaya wrote:
> > > On Tue, Aug 29, 2023 at 07:48:16PM +0800, Aaron Lu wrote:
> > > > Hi Kirill,
> > > > 
> > > > Ever since v6.5-rc1, I found that I can not use kexec to reboot an Intel
> > > > SPR test machine. With git bisect, the first bad commit is 75d090fd167ac
> > > > ("x86/tdx: Add unaccepted memory support").
> > > > 
> > > > I have no idea why a tdx change would affect it, I'm not doing anything
> > > > related to tdx.
> > > > 
> > > > Any ideas?
> 
> Are we talking about bare metal? Or is it kexec in a VM?

Bare metal.

> > > > The kernel config is attached, let me know if you need any other info.
> > > 
> > > Can you provide system logs (e.g. journalctl output) when attempting to
> > > reboot?
> > 
> > ... ...
> > Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Syncing filesystems and block devices.
> > Aug 29 19:15:59 be3af2b6059f systemd-shutdown[1]: Sending SIGTERM to remaining processes...
> > Aug 29 19:16:00 be3af2b6059f systemd-journald[2629]: Journal stopped
> > -- Boot 7e5173842b8b4be581886ff25ad0c02f --
> > Aug 29 19:24:27 be3af2b6059f kernel: microcode: updated early: 0x2b000161 -> 0x2b000461, date = 2023-03-13
> > Aug 29 19:24:27 be3af2b6059f kernel: Linux version 6.3.8-100.fc37.x86_64 (mockbuild@bkernel02.iad2.fedoraproject.org)
> > Aug 29 19:24:27 be3af2b6059f kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.8-100.fc37.x86_64 root=UUID=4381321e-e0>
> > 
> > First 3 lines are from the first kernel, then I attmpted to kexec reboot
> > to 6.4.0-rc5-00009-g75d090fd167a and remote console hanged with the
> > reboot message of the first kernel. After a while, I know kexec failed
> > so I power cycled the machine to boot into a distro kernel, that is the
> > last 3 lines. There is no trace of the failed boot.
> > 
> > I guess the kexeced kernel failed to start early in the boot process
> > so the log is probably only available in serial, if any. Unfortunately,
> > there is no serial support for this machine.
> 
> Could you show dmesg of the first kernel before kexec?

Attached.

BTW, kexec is invoked like this:
kver=6.4.0-rc5-00009-g75d090fd167a
kdir=$HOME/kernels/$kver
sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"

Thanks,
Aaron

[-- Attachment #2: dmesg_spr.gz --]
[-- Type: application/gzip, Size: 69139 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-08-29 14:04       ` Aaron Lu
@ 2023-09-07 13:14         ` Kirill A. Shutemov
  2023-09-08  6:02           ` Aaron Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-07 13:14 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List,
	Linux Regressions

On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > Could you show dmesg of the first kernel before kexec?
> 
> Attached.
> 
> BTW, kexec is invoked like this:
> kver=6.4.0-rc5-00009-g75d090fd167a
> kdir=$HOME/kernels/$kver
> sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"

I don't understand why it happens.

Could you check if this patch changes anything:

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94b7abcf624b..172c476ff6f3 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 
 	debug_putstr("\nDecompressing Linux... ");
 
+#if 0
 	if (init_unaccepted_memory()) {
 		debug_putstr("Accepting memory... ");
 		accept_memory(__pa(output), __pa(output) + needed_size);
 	}
+#endif
 
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-07 13:14         ` Kirill A. Shutemov
@ 2023-09-08  6:02           ` Aaron Lu
  2023-09-08 12:32             ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Aaron Lu @ 2023-09-08  6:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List,
	Linux Regressions

On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > Could you show dmesg of the first kernel before kexec?
> > 
> > Attached.
> > 
> > BTW, kexec is invoked like this:
> > kver=6.4.0-rc5-00009-g75d090fd167a
> > kdir=$HOME/kernels/$kver
> > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> 
> I don't understand why it happens.
> 
> Could you check if this patch changes anything:
> 
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 94b7abcf624b..172c476ff6f3 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>  
>  	debug_putstr("\nDecompressing Linux... ");
>  
> +#if 0
>  	if (init_unaccepted_memory()) {
>  		debug_putstr("Accepting memory... ");
>  		accept_memory(__pa(output), __pa(output) + needed_size);
>  	}
> +#endif
>  
>  	__decompress(input_data, input_len, NULL, NULL, output, output_len,
>  			NULL, error);
> -- 

It solved the problem.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-08  6:02           ` Aaron Lu
@ 2023-09-08 12:32             ` Kirill A. Shutemov
  2023-09-08 15:58               ` Kees Cook
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-08 12:32 UTC (permalink / raw)
  To: Aaron Lu, Kees Cook
  Cc: Bagas Sanjaya, Borislav Petkov, Linux Kernel Mailing List,
	Linux Regressions

On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote:
> On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > > Could you show dmesg of the first kernel before kexec?
> > > 
> > > Attached.
> > > 
> > > BTW, kexec is invoked like this:
> > > kver=6.4.0-rc5-00009-g75d090fd167a
> > > kdir=$HOME/kernels/$kver
> > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> > 
> > I don't understand why it happens.
> > 
> > Could you check if this patch changes anything:
> > 
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index 94b7abcf624b..172c476ff6f3 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> >  
> >  	debug_putstr("\nDecompressing Linux... ");
> >  
> > +#if 0
> >  	if (init_unaccepted_memory()) {
> >  		debug_putstr("Accepting memory... ");
> >  		accept_memory(__pa(output), __pa(output) + needed_size);
> >  	}
> > +#endif
> >  
> >  	__decompress(input_data, input_len, NULL, NULL, output, output_len,
> >  			NULL, error);
> > -- 
> 
> It solved the problem.

Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet
understand why and how unaccepted memory is involved. I will look more
into it.

Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away.

Kees, maybe you have a clue?

diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3..26ccce41d781 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -40,7 +40,7 @@
 #ifdef CONFIG_X86_64
 # define BOOT_STACK_SIZE	0x4000
 
-# define BOOT_INIT_PGT_SIZE	(6*4096)
+# define BOOT_INIT_PGT_SIZE	(7*4096)
 # ifdef CONFIG_RANDOMIZE_BASE
 /*
  * Assuming all cross the 512GB boundary:
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-08 12:32             ` Kirill A. Shutemov
@ 2023-09-08 15:58               ` Kees Cook
  2023-09-08 16:17                 ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Kees Cook @ 2023-09-08 15:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Aaron Lu, Bagas Sanjaya, Borislav Petkov,
	Linux Kernel Mailing List, Linux Regressions, ardb

On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote:
> On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote:
> > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > > > Could you show dmesg of the first kernel before kexec?
> > > > 
> > > > Attached.
> > > > 
> > > > BTW, kexec is invoked like this:
> > > > kver=6.4.0-rc5-00009-g75d090fd167a
> > > > kdir=$HOME/kernels/$kver
> > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> > > 
> > > I don't understand why it happens.
> > > 
> > > Could you check if this patch changes anything:
> > > 
> > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > > index 94b7abcf624b..172c476ff6f3 100644
> > > --- a/arch/x86/boot/compressed/misc.c
> > > +++ b/arch/x86/boot/compressed/misc.c
> > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > >  
> > >  	debug_putstr("\nDecompressing Linux... ");
> > >  
> > > +#if 0
> > >  	if (init_unaccepted_memory()) {
> > >  		debug_putstr("Accepting memory... ");
> > >  		accept_memory(__pa(output), __pa(output) + needed_size);
> > >  	}
> > > +#endif
> > >  
> > >  	__decompress(input_data, input_len, NULL, NULL, output, output_len,
> > >  			NULL, error);
> > > -- 
> > 
> > It solved the problem.
> 
> Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet
> understand why and how unaccepted memory is involved. I will look more
> into it.
> 
> Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away.

Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot
attempts? (i.e. maybe some position is bad and KASLR happens to usually
avoid it?)

> Kees, maybe you have a clue?

The only thing I can think of is that something isn't being counted
correctly due to the size of code, and it just happens that this commit
makes the code large enough to exceed some set of mappings?

> 
> diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> index 9191280d9ea3..26ccce41d781 100644
> --- a/arch/x86/include/asm/boot.h
> +++ b/arch/x86/include/asm/boot.h
> @@ -40,7 +40,7 @@
>  #ifdef CONFIG_X86_64
>  # define BOOT_STACK_SIZE	0x4000
>  
> -# define BOOT_INIT_PGT_SIZE	(6*4096)
> +# define BOOT_INIT_PGT_SIZE	(7*4096)

That's why this might be working, for example? How large is the boot
image before/after the commit, etc?

>  # ifdef CONFIG_RANDOMIZE_BASE
>  /*
>   * Assuming all cross the 512GB boundary:
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-08 15:58               ` Kees Cook
@ 2023-09-08 16:17                 ` Ard Biesheuvel
  2023-09-09 11:32                   ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2023-09-08 16:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kirill A. Shutemov, Aaron Lu, Bagas Sanjaya, Borislav Petkov,
	Linux Kernel Mailing List, Linux Regressions

On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote:
> > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > > > > Could you show dmesg of the first kernel before kexec?
> > > > >
> > > > > Attached.
> > > > >
> > > > > BTW, kexec is invoked like this:
> > > > > kver=6.4.0-rc5-00009-g75d090fd167a
> > > > > kdir=$HOME/kernels/$kver
> > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> > > >
> > > > I don't understand why it happens.
> > > >
> > > > Could you check if this patch changes anything:
> > > >
> > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > > > index 94b7abcf624b..172c476ff6f3 100644
> > > > --- a/arch/x86/boot/compressed/misc.c
> > > > +++ b/arch/x86/boot/compressed/misc.c
> > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > > >
> > > >   debug_putstr("\nDecompressing Linux... ");
> > > >
> > > > +#if 0
> > > >   if (init_unaccepted_memory()) {
> > > >           debug_putstr("Accepting memory... ");
> > > >           accept_memory(__pa(output), __pa(output) + needed_size);
> > > >   }
> > > > +#endif
> > > >
> > > >   __decompress(input_data, input_len, NULL, NULL, output, output_len,
> > > >                   NULL, error);
> > > > --
> > >
> > > It solved the problem.
> >
> > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet
> > understand why and how unaccepted memory is involved. I will look more
> > into it.
> >
> > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away.
>
> Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot
> attempts? (i.e. maybe some position is bad and KASLR happens to usually
> avoid it?)
>
> > Kees, maybe you have a clue?
>
> The only thing I can think of is that something isn't being counted
> correctly due to the size of code, and it just happens that this commit
> makes the code large enough to exceed some set of mappings?
>
> >
> > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> > index 9191280d9ea3..26ccce41d781 100644
> > --- a/arch/x86/include/asm/boot.h
> > +++ b/arch/x86/include/asm/boot.h
> > @@ -40,7 +40,7 @@
> >  #ifdef CONFIG_X86_64
> >  # define BOOT_STACK_SIZE     0x4000
> >
> > -# define BOOT_INIT_PGT_SIZE  (6*4096)
> > +# define BOOT_INIT_PGT_SIZE  (7*4096)
>
> That's why this might be working, for example? How large is the boot
> image before/after the commit, etc?
>

Not sure why these changes would make a difference here, but choking
on accept_memory() on a non-TDX suggests that init_unaccepted_memory()
is poking into unmapped memory before it even decides that the
unaccepted memory does not exist.

init_unaccepted_memory() has

        ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len);
        if (ret) {
                warn("EFI config table not found.");
                return false;
        }

which looks for <guid, phys_addr> tuples in an array pointed to by the
EFI system table, and if either of those is not mapped, things can be
expected to explode.

The only odd thing there is that this code is invoked after setting up
the 'demand paging' logic in the decompressor.

If you haven't yet, could you please retry the kexec boot with
earlyprintk=tty<insert your UART params here>?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-08 16:17                 ` Ard Biesheuvel
@ 2023-09-09 11:32                   ` Kirill A. Shutemov
  2023-09-11 14:56                     ` Dave Young
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-09 11:32 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Aaron Lu, Bagas Sanjaya, Borislav Petkov,
	Linux Kernel Mailing List, Linux Regressions

On Fri, Sep 08, 2023 at 06:17:53PM +0200, Ard Biesheuvel wrote:
> On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote:
> > > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> > > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > > > > > Could you show dmesg of the first kernel before kexec?
> > > > > >
> > > > > > Attached.
> > > > > >
> > > > > > BTW, kexec is invoked like this:
> > > > > > kver=6.4.0-rc5-00009-g75d090fd167a
> > > > > > kdir=$HOME/kernels/$kver
> > > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> > > > >
> > > > > I don't understand why it happens.
> > > > >
> > > > > Could you check if this patch changes anything:
> > > > >
> > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > > > > index 94b7abcf624b..172c476ff6f3 100644
> > > > > --- a/arch/x86/boot/compressed/misc.c
> > > > > +++ b/arch/x86/boot/compressed/misc.c
> > > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > > > >
> > > > >   debug_putstr("\nDecompressing Linux... ");
> > > > >
> > > > > +#if 0
> > > > >   if (init_unaccepted_memory()) {
> > > > >           debug_putstr("Accepting memory... ");
> > > > >           accept_memory(__pa(output), __pa(output) + needed_size);
> > > > >   }
> > > > > +#endif
> > > > >
> > > > >   __decompress(input_data, input_len, NULL, NULL, output, output_len,
> > > > >                   NULL, error);
> > > > > --
> > > >
> > > > It solved the problem.
> > >
> > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet
> > > understand why and how unaccepted memory is involved. I will look more
> > > into it.
> > >
> > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away.
> >
> > Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot
> > attempts? (i.e. maybe some position is bad and KASLR happens to usually
> > avoid it?)

Yes, it can be luck.

> > > Kees, maybe you have a clue?
> >
> > The only thing I can think of is that something isn't being counted
> > correctly due to the size of code, and it just happens that this commit
> > makes the code large enough to exceed some set of mappings?
> >
> > >
> > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> > > index 9191280d9ea3..26ccce41d781 100644
> > > --- a/arch/x86/include/asm/boot.h
> > > +++ b/arch/x86/include/asm/boot.h
> > > @@ -40,7 +40,7 @@
> > >  #ifdef CONFIG_X86_64
> > >  # define BOOT_STACK_SIZE     0x4000
> > >
> > > -# define BOOT_INIT_PGT_SIZE  (6*4096)
> > > +# define BOOT_INIT_PGT_SIZE  (7*4096)
> >
> > That's why this might be working, for example? How large is the boot
> > image before/after the commit, etc?
> >
> 
> Not sure why these changes would make a difference here, but choking
> on accept_memory() on a non-TDX suggests that init_unaccepted_memory()
> is poking into unmapped memory before it even decides that the
> unaccepted memory does not exist.
> 
> init_unaccepted_memory() has
> 
>         ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len);
>         if (ret) {
>                 warn("EFI config table not found.");
>                 return false;
>         }
> 
> which looks for <guid, phys_addr> tuples in an array pointed to by the
> EFI system table, and if either of those is not mapped, things can be
> expected to explode.
> 
> The only odd thing there is that this code is invoked after setting up
> the 'demand paging' logic in the decompressor.
> 
> If you haven't yet, could you please retry the kexec boot with
> earlyprintk=tty<insert your UART params here>?

early console in extract_kernel
input_data: 0x000000807eb433a8
input_len: 0x0000000000d26271
output: 0x000000807b000000
output_len: 0x0000000004800c10
kernel_total_size: 0x0000000003e28000
needed_size: 0x0000000004a00000
trampoline_32bit: 0x000000000009d000

Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
pages->pgt_buf_offset: 0x0000000000006000
pages->pgt_buf_size: 0x0000000000006000


Error: kernel_ident_mapping_init() failed

It crashes on #PF due to stbl->nr_tables dereference in
efi_get_conf_table() called from init_unaccepted_memory().

I don't see anything special about stbl location: 0x775d6018.

One other bit of information: disabling 5-level paging also helps the
issue.

I will debug further.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-09 11:32                   ` Kirill A. Shutemov
@ 2023-09-11 14:56                     ` Dave Young
  2023-09-11 14:57                       ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Young @ 2023-09-11 14:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec

Add kexec list in cc

On Sat, 9 Sept 2023 at 19:34, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Fri, Sep 08, 2023 at 06:17:53PM +0200, Ard Biesheuvel wrote:
> > On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote:
> > > > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote:
> > > > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> > > > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > > > > > > Could you show dmesg of the first kernel before kexec?
> > > > > > >
> > > > > > > Attached.
> > > > > > >
> > > > > > > BTW, kexec is invoked like this:
> > > > > > > kver=6.4.0-rc5-00009-g75d090fd167a
> > > > > > > kdir=$HOME/kernels/$kver
> > > > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> > > > > >
> > > > > > I don't understand why it happens.
> > > > > >
> > > > > > Could you check if this patch changes anything:
> > > > > >
> > > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > > > > > index 94b7abcf624b..172c476ff6f3 100644
> > > > > > --- a/arch/x86/boot/compressed/misc.c
> > > > > > +++ b/arch/x86/boot/compressed/misc.c
> > > > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > > > > >
> > > > > >   debug_putstr("\nDecompressing Linux... ");
> > > > > >
> > > > > > +#if 0
> > > > > >   if (init_unaccepted_memory()) {
> > > > > >           debug_putstr("Accepting memory... ");
> > > > > >           accept_memory(__pa(output), __pa(output) + needed_size);
> > > > > >   }
> > > > > > +#endif
> > > > > >
> > > > > >   __decompress(input_data, input_len, NULL, NULL, output, output_len,
> > > > > >                   NULL, error);
> > > > > > --
> > > > >
> > > > > It solved the problem.
> > > >
> > > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet
> > > > understand why and how unaccepted memory is involved. I will look more
> > > > into it.
> > > >
> > > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away.
> > >
> > > Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot
> > > attempts? (i.e. maybe some position is bad and KASLR happens to usually
> > > avoid it?)
>
> Yes, it can be luck.
>
> > > > Kees, maybe you have a clue?
> > >
> > > The only thing I can think of is that something isn't being counted
> > > correctly due to the size of code, and it just happens that this commit
> > > makes the code large enough to exceed some set of mappings?
> > >
> > > >
> > > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> > > > index 9191280d9ea3..26ccce41d781 100644
> > > > --- a/arch/x86/include/asm/boot.h
> > > > +++ b/arch/x86/include/asm/boot.h
> > > > @@ -40,7 +40,7 @@
> > > >  #ifdef CONFIG_X86_64
> > > >  # define BOOT_STACK_SIZE     0x4000
> > > >
> > > > -# define BOOT_INIT_PGT_SIZE  (6*4096)
> > > > +# define BOOT_INIT_PGT_SIZE  (7*4096)
> > >
> > > That's why this might be working, for example? How large is the boot
> > > image before/after the commit, etc?
> > >
> >
> > Not sure why these changes would make a difference here, but choking
> > on accept_memory() on a non-TDX suggests that init_unaccepted_memory()
> > is poking into unmapped memory before it even decides that the
> > unaccepted memory does not exist.
> >
> > init_unaccepted_memory() has
> >
> >         ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len);
> >         if (ret) {
> >                 warn("EFI config table not found.");
> >                 return false;
> >         }
> >
> > which looks for <guid, phys_addr> tuples in an array pointed to by the
> > EFI system table, and if either of those is not mapped, things can be
> > expected to explode.
> >
> > The only odd thing there is that this code is invoked after setting up
> > the 'demand paging' logic in the decompressor.
> >
> > If you haven't yet, could you please retry the kexec boot with
> > earlyprintk=tty<insert your UART params here>?
>
> early console in extract_kernel
> input_data: 0x000000807eb433a8
> input_len: 0x0000000000d26271
> output: 0x000000807b000000
> output_len: 0x0000000004800c10
> kernel_total_size: 0x0000000003e28000
> needed_size: 0x0000000004a00000
> trampoline_32bit: 0x000000000009d000
>
> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
> pages->pgt_buf_offset: 0x0000000000006000
> pages->pgt_buf_size: 0x0000000000006000
>
>
> Error: kernel_ident_mapping_init() failed
>
> It crashes on #PF due to stbl->nr_tables dereference in
> efi_get_conf_table() called from init_unaccepted_memory().
>
> I don't see anything special about stbl location: 0x775d6018.
>
> One other bit of information: disabling 5-level paging also helps the
> issue.
>
> I will debug further.
>
> --
>   Kiryl Shutsemau / Kirill A. Shutemov
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-11 14:56                     ` Dave Young
@ 2023-09-11 14:57                       ` Kirill A. Shutemov
  2023-09-11 15:33                         ` Tom Lendacky
  2023-09-13 14:24                         ` Kirill A. Shutemov
  0 siblings, 2 replies; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-11 14:57 UTC (permalink / raw)
  To: Dave Young
  Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec, Tom Lendacky

On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
> > early console in extract_kernel
> > input_data: 0x000000807eb433a8
> > input_len: 0x0000000000d26271
> > output: 0x000000807b000000
> > output_len: 0x0000000004800c10
> > kernel_total_size: 0x0000000003e28000
> > needed_size: 0x0000000004a00000
> > trampoline_32bit: 0x000000000009d000
> >
> > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
> > pages->pgt_buf_offset: 0x0000000000006000
> > pages->pgt_buf_size: 0x0000000000006000
> >
> >
> > Error: kernel_ident_mapping_init() failed
> >
> > It crashes on #PF due to stbl->nr_tables dereference in
> > efi_get_conf_table() called from init_unaccepted_memory().
> >
> > I don't see anything special about stbl location: 0x775d6018.
> >
> > One other bit of information: disabling 5-level paging also helps the
> > issue.
> >
> > I will debug further.

The problem is not limited to unaccepted memory, it also triggers if we
reach efi_get_rsdp_addr() in the same setup.

I think we have several problems here.

- 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline,
  boot_data and setup_data if we assume that they are in different 1G
  regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1
  for PUD, 4 for PMD tables.

  Looks like we never map EFI/ACPI memory explicitly.

  It might work if kernel/cmdline/... are in single 1G and we have
  spare pages to handle page faults.

- No spare memory to handle mapping for cc_info and cc_info->cpuid_phys;

- I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support.
  And if start pagetables from scratch ('else' case of 'if (p4d_offset...))
  we run out of memory.

I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y
case.

I don't know what the right fix here. We can increase the constants to be
enough to cover existing cases, but it is very fragile. I am not sure I
saw all users. Some of them could silently handled with pagefault handler
in some setups. And it is hard to catch new users during code review.

Also I'm not sure why do we need pagefault handler there. Looks like it
just masking problems. I think everything has to be mapped explicitly.

Any comments?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-11 14:57                       ` Kirill A. Shutemov
@ 2023-09-11 15:33                         ` Tom Lendacky
  2023-09-11 15:53                           ` Kirill A. Shutemov
  2023-09-13 14:24                         ` Kirill A. Shutemov
  1 sibling, 1 reply; 20+ messages in thread
From: Tom Lendacky @ 2023-09-11 15:33 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Young
  Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec

On 9/11/23 09:57, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
>>> early console in extract_kernel
>>> input_data: 0x000000807eb433a8
>>> input_len: 0x0000000000d26271
>>> output: 0x000000807b000000
>>> output_len: 0x0000000004800c10
>>> kernel_total_size: 0x0000000003e28000
>>> needed_size: 0x0000000004a00000
>>> trampoline_32bit: 0x000000000009d000
>>>
>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
>>> pages->pgt_buf_offset: 0x0000000000006000
>>> pages->pgt_buf_size: 0x0000000000006000
>>>
>>>
>>> Error: kernel_ident_mapping_init() failed
>>>
>>> It crashes on #PF due to stbl->nr_tables dereference in
>>> efi_get_conf_table() called from init_unaccepted_memory().
>>>
>>> I don't see anything special about stbl location: 0x775d6018.
>>>
>>> One other bit of information: disabling 5-level paging also helps the
>>> issue.
>>>
>>> I will debug further.
> 
> The problem is not limited to unaccepted memory, it also triggers if we
> reach efi_get_rsdp_addr() in the same setup.
> 
> I think we have several problems here.
> 
> - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline,
>    boot_data and setup_data if we assume that they are in different 1G
>    regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1
>    for PUD, 4 for PMD tables.
> 
>    Looks like we never map EFI/ACPI memory explicitly.
> 
>    It might work if kernel/cmdline/... are in single 1G and we have
>    spare pages to handle page faults.
> 
> - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys;
> 
> - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support.
>    And if start pagetables from scratch ('else' case of 'if (p4d_offset...))
>    we run out of memory.
> 
> I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y
> case.
> 
> I don't know what the right fix here. We can increase the constants to be
> enough to cover existing cases, but it is very fragile. I am not sure I
> saw all users. Some of them could silently handled with pagefault handler
> in some setups. And it is hard to catch new users during code review.
> 
> Also I'm not sure why do we need pagefault handler there. Looks like it
> just masking problems. I think everything has to be mapped explicitly.
> 
> Any comments?

There was a similar related issue around the cc_info blob that is captured 
here: https://lore.kernel.org/lkml/20230601072043.24439-1-ltao@redhat.com/

Personally, I'm a fan of mapping the EFI tables that will be passed to the 
kexec/kdump kernel. To me, that seems to more closely match the valid 
mappings for the tables when control is transferred to the OS from UEFI on 
the initial boot.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-11 15:33                         ` Tom Lendacky
@ 2023-09-11 15:53                           ` Kirill A. Shutemov
  2023-09-11 17:13                             ` Tom Lendacky
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-11 15:53 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec

On Mon, Sep 11, 2023 at 10:33:01AM -0500, Tom Lendacky wrote:
> On 9/11/23 09:57, Kirill A. Shutemov wrote:
> > On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
> > > > early console in extract_kernel
> > > > input_data: 0x000000807eb433a8
> > > > input_len: 0x0000000000d26271
> > > > output: 0x000000807b000000
> > > > output_len: 0x0000000004800c10
> > > > kernel_total_size: 0x0000000003e28000
> > > > needed_size: 0x0000000004a00000
> > > > trampoline_32bit: 0x000000000009d000
> > > > 
> > > > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
> > > > pages->pgt_buf_offset: 0x0000000000006000
> > > > pages->pgt_buf_size: 0x0000000000006000
> > > > 
> > > > 
> > > > Error: kernel_ident_mapping_init() failed
> > > > 
> > > > It crashes on #PF due to stbl->nr_tables dereference in
> > > > efi_get_conf_table() called from init_unaccepted_memory().
> > > > 
> > > > I don't see anything special about stbl location: 0x775d6018.
> > > > 
> > > > One other bit of information: disabling 5-level paging also helps the
> > > > issue.
> > > > 
> > > > I will debug further.
> > 
> > The problem is not limited to unaccepted memory, it also triggers if we
> > reach efi_get_rsdp_addr() in the same setup.
> > 
> > I think we have several problems here.
> > 
> > - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline,
> >    boot_data and setup_data if we assume that they are in different 1G
> >    regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1
> >    for PUD, 4 for PMD tables.
> > 
> >    Looks like we never map EFI/ACPI memory explicitly.
> > 
> >    It might work if kernel/cmdline/... are in single 1G and we have
> >    spare pages to handle page faults.
> > 
> > - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys;
> > 
> > - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support.
> >    And if start pagetables from scratch ('else' case of 'if (p4d_offset...))
> >    we run out of memory.
> > 
> > I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y
> > case.
> > 
> > I don't know what the right fix here. We can increase the constants to be
> > enough to cover existing cases, but it is very fragile. I am not sure I
> > saw all users. Some of them could silently handled with pagefault handler
> > in some setups. And it is hard to catch new users during code review.
> > 
> > Also I'm not sure why do we need pagefault handler there. Looks like it
> > just masking problems. I think everything has to be mapped explicitly.
> > 
> > Any comments?
> 
> There was a similar related issue around the cc_info blob that is captured
> here: https://lore.kernel.org/lkml/20230601072043.24439-1-ltao@redhat.com/
> 
> Personally, I'm a fan of mapping the EFI tables that will be passed to the
> kexec/kdump kernel. To me, that seems to more closely match the valid
> mappings for the tables when control is transferred to the OS from UEFI on
> the initial boot.

I don't see how it would help if initialize_identity_maps() resets
pagetables. See 'else' case of 'if (p4d_offset...).

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-11 15:53                           ` Kirill A. Shutemov
@ 2023-09-11 17:13                             ` Tom Lendacky
  0 siblings, 0 replies; 20+ messages in thread
From: Tom Lendacky @ 2023-09-11 17:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec

On 9/11/23 10:53, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 10:33:01AM -0500, Tom Lendacky wrote:
>> On 9/11/23 09:57, Kirill A. Shutemov wrote:
>>> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
>>>>> early console in extract_kernel
>>>>> input_data: 0x000000807eb433a8
>>>>> input_len: 0x0000000000d26271
>>>>> output: 0x000000807b000000
>>>>> output_len: 0x0000000004800c10
>>>>> kernel_total_size: 0x0000000003e28000
>>>>> needed_size: 0x0000000004a00000
>>>>> trampoline_32bit: 0x000000000009d000
>>>>>
>>>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
>>>>> pages->pgt_buf_offset: 0x0000000000006000
>>>>> pages->pgt_buf_size: 0x0000000000006000
>>>>>
>>>>>
>>>>> Error: kernel_ident_mapping_init() failed
>>>>>
>>>>> It crashes on #PF due to stbl->nr_tables dereference in
>>>>> efi_get_conf_table() called from init_unaccepted_memory().
>>>>>
>>>>> I don't see anything special about stbl location: 0x775d6018.
>>>>>
>>>>> One other bit of information: disabling 5-level paging also helps the
>>>>> issue.
>>>>>
>>>>> I will debug further.
>>>
>>> The problem is not limited to unaccepted memory, it also triggers if we
>>> reach efi_get_rsdp_addr() in the same setup.
>>>
>>> I think we have several problems here.
>>>
>>> - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline,
>>>     boot_data and setup_data if we assume that they are in different 1G
>>>     regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1
>>>     for PUD, 4 for PMD tables.
>>>
>>>     Looks like we never map EFI/ACPI memory explicitly.
>>>
>>>     It might work if kernel/cmdline/... are in single 1G and we have
>>>     spare pages to handle page faults.
>>>
>>> - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys;
>>>
>>> - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support.
>>>     And if start pagetables from scratch ('else' case of 'if (p4d_offset...))
>>>     we run out of memory.
>>>
>>> I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y
>>> case.
>>>
>>> I don't know what the right fix here. We can increase the constants to be
>>> enough to cover existing cases, but it is very fragile. I am not sure I
>>> saw all users. Some of them could silently handled with pagefault handler
>>> in some setups. And it is hard to catch new users during code review.
>>>
>>> Also I'm not sure why do we need pagefault handler there. Looks like it
>>> just masking problems. I think everything has to be mapped explicitly.
>>>
>>> Any comments?
>>
>> There was a similar related issue around the cc_info blob that is captured
>> here: https://lore.kernel.org/lkml/20230601072043.24439-1-ltao@redhat.com/
>>
>> Personally, I'm a fan of mapping the EFI tables that will be passed to the
>> kexec/kdump kernel. To me, that seems to more closely match the valid
>> mappings for the tables when control is transferred to the OS from UEFI on
>> the initial boot.
> 
> I don't see how it would help if initialize_identity_maps() resets
> pagetables. See 'else' case of 'if (p4d_offset...).

Ok, I see what you mean now.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-11 14:57                       ` Kirill A. Shutemov
  2023-09-11 15:33                         ` Tom Lendacky
@ 2023-09-13 14:24                         ` Kirill A. Shutemov
  2023-09-21  9:54                           ` Linux regression tracking (Thorsten Leemhuis)
  1 sibling, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-13 14:24 UTC (permalink / raw)
  To: Dave Young
  Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec, Tom Lendacky, x86

On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
> > > early console in extract_kernel
> > > input_data: 0x000000807eb433a8
> > > input_len: 0x0000000000d26271
> > > output: 0x000000807b000000
> > > output_len: 0x0000000004800c10
> > > kernel_total_size: 0x0000000003e28000
> > > needed_size: 0x0000000004a00000
> > > trampoline_32bit: 0x000000000009d000
> > >
> > > Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
> > > pages->pgt_buf_offset: 0x0000000000006000
> > > pages->pgt_buf_size: 0x0000000000006000
> > >
> > >
> > > Error: kernel_ident_mapping_init() failed
> > >
> > > It crashes on #PF due to stbl->nr_tables dereference in
> > > efi_get_conf_table() called from init_unaccepted_memory().
> > >
> > > I don't see anything special about stbl location: 0x775d6018.
> > >
> > > One other bit of information: disabling 5-level paging also helps the
> > > issue.
> > >
> > > I will debug further.
> 
> The problem is not limited to unaccepted memory, it also triggers if we
> reach efi_get_rsdp_addr() in the same setup.
> 
> I think we have several problems here.
> 
> - 6 pages for !RANDOMIZE_BASE is only enough for kernel, cmdline,
>   boot_data and setup_data if we assume that they are in different 1G
>   regions and do not cross the 1G boundaries. 4-level paging: 1 for PGD, 1
>   for PUD, 4 for PMD tables.
> 
>   Looks like we never map EFI/ACPI memory explicitly.
> 
>   It might work if kernel/cmdline/... are in single 1G and we have
>   spare pages to handle page faults.
> 
> - No spare memory to handle mapping for cc_info and cc_info->cpuid_phys;
> 
> - I didn't increase BOOT_INIT_PGT_SIZE when added 5-level paging support.
>   And if start pagetables from scratch ('else' case of 'if (p4d_offset...))
>   we run out of memory.
> 
> I believe similar logic would apply for BOOT_PGT_SIZE for RANDOMIZE_BASE=y
> case.
> 
> I don't know what the right fix here. We can increase the constants to be
> enough to cover existing cases, but it is very fragile. I am not sure I
> saw all users. Some of them could silently handled with pagefault handler
> in some setups. And it is hard to catch new users during code review.
> 
> Also I'm not sure why do we need pagefault handler there. Looks like it
> just masking problems. I think everything has to be mapped explicitly.
> 
> Any comments?

I struggle to come up with anything better than increasing the constant to
a value that "ought to be enough for anybody" ©, let's say 128K.

And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP.

Objections?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-13 14:24                         ` Kirill A. Shutemov
@ 2023-09-21  9:54                           ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-21 16:03                             ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-21  9:54 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Young
  Cc: Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, Linux Regressions,
	kexec, Tom Lendacky, x86

On 13.09.23 16:24, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote:
>> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
>>>> early console in extract_kernel
>>>> input_data: 0x000000807eb433a8
>>>> input_len: 0x0000000000d26271
>>>> output: 0x000000807b000000
>>>> output_len: 0x0000000004800c10
>>>> kernel_total_size: 0x0000000003e28000
>>>> needed_size: 0x0000000004a00000
>>>> trampoline_32bit: 0x000000000009d000
>>>>
>>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
>>>> pages->pgt_buf_offset: 0x0000000000006000
>>>> pages->pgt_buf_size: 0x0000000000006000
>>>>
>>>> Error: kernel_ident_mapping_init() failed
> [...]
>> The problem is not limited to unaccepted memory, it also triggers if we
>> reach efi_get_rsdp_addr() in the same setup.
>>
>> I think we have several problems here.
> [...]
>> Any comments?
> 
> I struggle to come up with anything better than increasing the constant to
> a value that "ought to be enough for anybody" ©, let's say 128K.
> 
> And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP.
> 
> Objections?

Apparently not, as there was no reply since then (which is why I show up
here, as it looked like fixing this regression stalled).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-21  9:54                           ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-21 16:03                             ` Kirill A. Shutemov
  2023-09-22 10:12                               ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2023-09-21 16:03 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, kexec, Tom Lendacky,
	x86

On Thu, Sep 21, 2023 at 11:54:15AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 13.09.23 16:24, Kirill A. Shutemov wrote:
> > On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote:
> >> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
> >>>> early console in extract_kernel
> >>>> input_data: 0x000000807eb433a8
> >>>> input_len: 0x0000000000d26271
> >>>> output: 0x000000807b000000
> >>>> output_len: 0x0000000004800c10
> >>>> kernel_total_size: 0x0000000003e28000
> >>>> needed_size: 0x0000000004a00000
> >>>> trampoline_32bit: 0x000000000009d000
> >>>>
> >>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
> >>>> pages->pgt_buf_offset: 0x0000000000006000
> >>>> pages->pgt_buf_size: 0x0000000000006000
> >>>>
> >>>> Error: kernel_ident_mapping_init() failed
> > [...]
> >> The problem is not limited to unaccepted memory, it also triggers if we
> >> reach efi_get_rsdp_addr() in the same setup.
> >>
> >> I think we have several problems here.
> > [...]
> >> Any comments?
> > 
> > I struggle to come up with anything better than increasing the constant to
> > a value that "ought to be enough for anybody" ©, let's say 128K.
> > 
> > And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP.
> > 
> > Objections?
> 
> Apparently not, as there was no reply since then (which is why I show up
> here, as it looked like fixing this regression stalled).

It has been fixed in upstream by the commit f530ee95b72e
("x86/boot/compressed: Reserve more memory for page tables")

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kexec reboot failed due to commit 75d090fd167ac
  2023-09-21 16:03                             ` Kirill A. Shutemov
@ 2023-09-22 10:12                               ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 20+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-22 10:12 UTC (permalink / raw)
  To: Kirill A. Shutemov, Linux regressions mailing list
  Cc: Dave Young, Ard Biesheuvel, Kees Cook, Aaron Lu, Bagas Sanjaya,
	Borislav Petkov, Linux Kernel Mailing List, kexec, Tom Lendacky,
	x86

On 21.09.23 18:03, Kirill A. Shutemov wrote:
> On Thu, Sep 21, 2023 at 11:54:15AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 13.09.23 16:24, Kirill A. Shutemov wrote:
>>> On Mon, Sep 11, 2023 at 05:57:07PM +0300, Kirill A. Shutemov wrote:
>>>> On Mon, Sep 11, 2023 at 10:56:36PM +0800, Dave Young wrote:
>>>>>> early console in extract_kernel
>>>>>> input_data: 0x000000807eb433a8
>>>>>> input_len: 0x0000000000d26271
>>>>>> output: 0x000000807b000000
>>>>>> output_len: 0x0000000004800c10
>>>>>> kernel_total_size: 0x0000000003e28000
>>>>>> needed_size: 0x0000000004a00000
>>>>>> trampoline_32bit: 0x000000000009d000
>>>>>>
>>>>>> Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
>>>>>> pages->pgt_buf_offset: 0x0000000000006000
>>>>>> pages->pgt_buf_size: 0x0000000000006000
>>>>>>
>>>>>> Error: kernel_ident_mapping_init() failed
>>> [...]
>>>> The problem is not limited to unaccepted memory, it also triggers if we
>>>> reach efi_get_rsdp_addr() in the same setup.
>>>>
>>>> I think we have several problems here.
>>> [...]
>>>> Any comments?
>>>
>>> I struggle to come up with anything better than increasing the constant to
>>> a value that "ought to be enough for anybody" ©, let's say 128K.
>>>
>>> And we can eliminate logic on no-KASLR vs. KASLR vs. KASLR+VERBOSE_BOOTUP.
>>>
>>> Objections?
>>
>> Apparently not, as there was no reply since then (which is why I show up
>> here, as it looked like fixing this regression stalled).
> 
> It has been fixed in upstream by the commit f530ee95b72e
> ("x86/boot/compressed: Reserve more memory for page tables")

Ahh, great, thx for letting me know. That commit sadly missed a Link: or
Closes: tag to the regression report, which Linus and the docs ask for
(and regression tracking relies on), then it would have noticed this
automatically. Whatever, things happen, thx again.

#regzbot fix: f530ee95b72e

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-09-22 10:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-29 11:48 kexec reboot failed due to commit 75d090fd167ac Aaron Lu
2023-08-29 12:14 ` Bagas Sanjaya
2023-08-29 12:51   ` Aaron Lu
2023-08-29 12:59     ` Kirill A. Shutemov
2023-08-29 14:04       ` Aaron Lu
2023-09-07 13:14         ` Kirill A. Shutemov
2023-09-08  6:02           ` Aaron Lu
2023-09-08 12:32             ` Kirill A. Shutemov
2023-09-08 15:58               ` Kees Cook
2023-09-08 16:17                 ` Ard Biesheuvel
2023-09-09 11:32                   ` Kirill A. Shutemov
2023-09-11 14:56                     ` Dave Young
2023-09-11 14:57                       ` Kirill A. Shutemov
2023-09-11 15:33                         ` Tom Lendacky
2023-09-11 15:53                           ` Kirill A. Shutemov
2023-09-11 17:13                             ` Tom Lendacky
2023-09-13 14:24                         ` Kirill A. Shutemov
2023-09-21  9:54                           ` Linux regression tracking (Thorsten Leemhuis)
2023-09-21 16:03                             ` Kirill A. Shutemov
2023-09-22 10:12                               ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox