* Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
@ 2025-03-04 14:49 Ulrich Gemkow
2025-03-04 16:20 ` Greg KH
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ulrich Gemkow @ 2025-03-04 14:49 UTC (permalink / raw)
To: stable; +Cc: regressions, ardb
[-- Attachment #1: Type: text/plain, Size: 1617 bytes --]
Hello,
starting with stable kernel 6.6.18 we have problems with PXE booting.
A bisect shows that the following patch is guilty:
From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb@kernel.org>
Date: Tue, 12 Sep 2023 09:00:55 +0000
Subject: x86/boot: Remove the 'bugger off' message
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
With this patch applied PXE starts, requests the kernel and the initrd.
Without showing anything on the console, the boot process stops.
It seems, that the kernel crashes very early.
With stable kernel 6.6.17 PXE boot works without problems.
Reverting this single patch (which is part of a larger set of
patches) solved the problem for us, PXE boot is working again.
We use the packages syslinux-efi and syslinux-common from Debian 12.
The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
Our config-File (for 6.6.80) is attached.
Regarding the patch description, we really do not boot with a floppy :-)
Any help would be greatly appreciated, I have a bit of a bad feeling
about simply reverting a patch at such a deep level in the kernel.
Thank you and best regards
Ulrich
--
|-----------------------------------------------------------------------
| Ulrich Gemkow
| University of Stuttgart
| Institute of Communication Networks and Computer Engineering (IKR)
|-----------------------------------------------------------------------
[-- Attachment #2: config.xz --]
[-- Type: application/x-xz, Size: 25532 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-04 14:49 Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18 Ulrich Gemkow
@ 2025-03-04 16:20 ` Greg KH
2025-03-04 16:59 ` Ulrich Gemkow
2025-03-06 10:00 ` Ard Biesheuvel
2025-03-06 14:36 ` Ard Biesheuvel
2 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2025-03-04 16:20 UTC (permalink / raw)
To: Ulrich Gemkow; +Cc: stable, regressions, ardb
On Tue, Mar 04, 2025 at 03:49:35PM +0100, Ulrich Gemkow wrote:
> Hello,
>
> starting with stable kernel 6.6.18 we have problems with PXE booting.
> A bisect shows that the following patch is guilty:
>
> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> From: Ard Biesheuvel <ardb@kernel.org>
> Date: Tue, 12 Sep 2023 09:00:55 +0000
> Subject: x86/boot: Remove the 'bugger off' message
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
>
> With this patch applied PXE starts, requests the kernel and the initrd.
> Without showing anything on the console, the boot process stops.
> It seems, that the kernel crashes very early.
>
> With stable kernel 6.6.17 PXE boot works without problems.
>
> Reverting this single patch (which is part of a larger set of
> patches) solved the problem for us, PXE boot is working again.
>
> We use the packages syslinux-efi and syslinux-common from Debian 12.
> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
>
> Our config-File (for 6.6.80) is attached.
>
> Regarding the patch description, we really do not boot with a floppy :-)
>
> Any help would be greatly appreciated, I have a bit of a bad feeling
> about simply reverting a patch at such a deep level in the kernel.
Does newer kernels than 6.7.y work properly? What about the latest
6.12.y release?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-04 16:20 ` Greg KH
@ 2025-03-04 16:59 ` Ulrich Gemkow
2025-03-04 17:40 ` Greg KH
0 siblings, 1 reply; 13+ messages in thread
From: Ulrich Gemkow @ 2025-03-04 16:59 UTC (permalink / raw)
To: Greg KH; +Cc: stable, regressions, ardb
Hallo,
On Tuesday 04 March 2025, Greg KH wrote:
> On Tue, Mar 04, 2025 at 03:49:35PM +0100, Ulrich Gemkow wrote:
> > Hello,
> >
> > starting with stable kernel 6.6.18 we have problems with PXE booting.
> > A bisect shows that the following patch is guilty:
> >
> > From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Date: Tue, 12 Sep 2023 09:00:55 +0000
> > Subject: x86/boot: Remove the 'bugger off' message
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
> >
> > With this patch applied PXE starts, requests the kernel and the initrd.
> > Without showing anything on the console, the boot process stops.
> > It seems, that the kernel crashes very early.
> >
> > With stable kernel 6.6.17 PXE boot works without problems.
> >
> > Reverting this single patch (which is part of a larger set of
> > patches) solved the problem for us, PXE boot is working again.
> >
> > We use the packages syslinux-efi and syslinux-common from Debian 12.
> > The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
> >
> > Our config-File (for 6.6.80) is attached.
> >
> > Regarding the patch description, we really do not boot with a floppy :-)
> >
> > Any help would be greatly appreciated, I have a bit of a bad feeling
> > about simply reverting a patch at such a deep level in the kernel.
>
> Does newer kernels than 6.7.y work properly? What about the latest
> 6.12.y release?
>
> thanks,
>
> greg k-h
>
Thanks for looking into this!
The latest 6.12.y kernel has the same problem, it also needs reverting
the mentioned patch. I did not test Kernels in between but I am happy
to do so, when this gives a hint.
Thanks again and best regards
Ulrich
--
|-----------------------------------------------------------------------
| Ulrich Gemkow
| University of Stuttgart
| Institute of Communication Networks and Computer Engineering (IKR)
|-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-04 16:59 ` Ulrich Gemkow
@ 2025-03-04 17:40 ` Greg KH
0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2025-03-04 17:40 UTC (permalink / raw)
To: Ulrich Gemkow; +Cc: stable, regressions, ardb
On Tue, Mar 04, 2025 at 05:59:32PM +0100, Ulrich Gemkow wrote:
> Hallo,
>
> On Tuesday 04 March 2025, Greg KH wrote:
> > On Tue, Mar 04, 2025 at 03:49:35PM +0100, Ulrich Gemkow wrote:
> > > Hello,
> > >
> > > starting with stable kernel 6.6.18 we have problems with PXE booting.
> > > A bisect shows that the following patch is guilty:
> > >
> > > From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > > Date: Tue, 12 Sep 2023 09:00:55 +0000
> > > Subject: x86/boot: Remove the 'bugger off' message
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > > Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > > Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
> > >
> > > With this patch applied PXE starts, requests the kernel and the initrd.
> > > Without showing anything on the console, the boot process stops.
> > > It seems, that the kernel crashes very early.
> > >
> > > With stable kernel 6.6.17 PXE boot works without problems.
> > >
> > > Reverting this single patch (which is part of a larger set of
> > > patches) solved the problem for us, PXE boot is working again.
> > >
> > > We use the packages syslinux-efi and syslinux-common from Debian 12.
> > > The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
> > >
> > > Our config-File (for 6.6.80) is attached.
> > >
> > > Regarding the patch description, we really do not boot with a floppy :-)
> > >
> > > Any help would be greatly appreciated, I have a bit of a bad feeling
> > > about simply reverting a patch at such a deep level in the kernel.
> >
> > Does newer kernels than 6.7.y work properly? What about the latest
> > 6.12.y release?
> >
> > thanks,
> >
> > greg k-h
> >
>
> Thanks for looking into this!
>
> The latest 6.12.y kernel has the same problem, it also needs reverting
> the mentioned patch. I did not test Kernels in between but I am happy
> to do so, when this gives a hint.
>
> Thanks again and best regards
Great, then this is an issue in Linus's tree and should be fixed there
first.
thansk,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-04 14:49 Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18 Ulrich Gemkow
2025-03-04 16:20 ` Greg KH
@ 2025-03-06 10:00 ` Ard Biesheuvel
2025-03-06 10:07 ` Ulrich Gemkow
2025-03-06 14:36 ` Ard Biesheuvel
2 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2025-03-06 10:00 UTC (permalink / raw)
To: Ulrich Gemkow; +Cc: stable, regressions
On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
<ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
>
> Hello,
>
> starting with stable kernel 6.6.18 we have problems with PXE booting.
> A bisect shows that the following patch is guilty:
>
> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> From: Ard Biesheuvel <ardb@kernel.org>
> Date: Tue, 12 Sep 2023 09:00:55 +0000
> Subject: x86/boot: Remove the 'bugger off' message
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
>
> With this patch applied PXE starts, requests the kernel and the initrd.
> Without showing anything on the console, the boot process stops.
> It seems, that the kernel crashes very early.
>
> With stable kernel 6.6.17 PXE boot works without problems.
>
> Reverting this single patch (which is part of a larger set of
> patches) solved the problem for us, PXE boot is working again.
>
> We use the packages syslinux-efi and syslinux-common from Debian 12.
> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
>
> Our config-File (for 6.6.80) is attached.
>
> Regarding the patch description, we really do not boot with a floppy :-)
>
> Any help would be greatly appreciated, I have a bit of a bad feeling
> about simply reverting a patch at such a deep level in the kernel.
>
Hello Ulrich,
Thanks for the report, and apologies for the breakage.
I will look into this today - hopefully it is something that can be
resolved swiftly.
Can you share your syslinux config too, please?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 10:00 ` Ard Biesheuvel
@ 2025-03-06 10:07 ` Ulrich Gemkow
0 siblings, 0 replies; 13+ messages in thread
From: Ulrich Gemkow @ 2025-03-06 10:07 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: stable, regressions
[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]
On Thursday 06 March 2025, Ard Biesheuvel wrote:
> On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
> <ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
> >
> > Hello,
> >
> > starting with stable kernel 6.6.18 we have problems with PXE booting.
> > A bisect shows that the following patch is guilty:
> >
> > From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Date: Tue, 12 Sep 2023 09:00:55 +0000
> > Subject: x86/boot: Remove the 'bugger off' message
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
> >
> > With this patch applied PXE starts, requests the kernel and the initrd.
> > Without showing anything on the console, the boot process stops.
> > It seems, that the kernel crashes very early.
> >
> > With stable kernel 6.6.17 PXE boot works without problems.
> >
> > Reverting this single patch (which is part of a larger set of
> > patches) solved the problem for us, PXE boot is working again.
> >
> > We use the packages syslinux-efi and syslinux-common from Debian 12.
> > The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
> >
> > Our config-File (for 6.6.80) is attached.
> >
> > Regarding the patch description, we really do not boot with a floppy :-)
> >
> > Any help would be greatly appreciated, I have a bit of a bad feeling
> > about simply reverting a patch at such a deep level in the kernel.
> >
>
> Hello Ulrich,
>
> Thanks for the report, and apologies for the breakage.
>
> I will look into this today - hopefully it is something that can be
> resolved swiftly.
>
> Can you share your syslinux config too, please?
>
Hello Ard,
Thank you! The config file is attached. Please feel free to
ask for more info.
Best regards
Ulrich
--
|-----------------------------------------------------------------------
| Ulrich Gemkow
| University of Stuttgart
| Institute of Communication Networks and Computer Engineering (IKR)
|-----------------------------------------------------------------------
[-- Attachment #2: Local.cfg --]
[-- Type: text/plain, Size: 147 bytes --]
DEFAULT lnc
LABEL lnc
KERNEL image/bzImage-Local
INITRD lnc-ramdisc-simple.gz
APPEND rw root=/dev/ram0 ip=::::::on quiet ignore_rlimit_data
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-04 14:49 Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18 Ulrich Gemkow
2025-03-04 16:20 ` Greg KH
2025-03-06 10:00 ` Ard Biesheuvel
@ 2025-03-06 14:36 ` Ard Biesheuvel
2025-03-06 14:38 ` H. Peter Anvin
2 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2025-03-06 14:36 UTC (permalink / raw)
To: Ulrich Gemkow, H. Peter Anvin; +Cc: stable, regressions
(cc Peter)
On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
<ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
>
> Hello,
>
> starting with stable kernel 6.6.18 we have problems with PXE booting.
> A bisect shows that the following patch is guilty:
>
> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> From: Ard Biesheuvel <ardb@kernel.org>
> Date: Tue, 12 Sep 2023 09:00:55 +0000
> Subject: x86/boot: Remove the 'bugger off' message
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
>
> With this patch applied PXE starts, requests the kernel and the initrd.
> Without showing anything on the console, the boot process stops.
> It seems, that the kernel crashes very early.
>
> With stable kernel 6.6.17 PXE boot works without problems.
>
> Reverting this single patch (which is part of a larger set of
> patches) solved the problem for us, PXE boot is working again.
>
> We use the packages syslinux-efi and syslinux-common from Debian 12.
> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
>
I managed to track this down to a bug in syslinux, fixed by the hunk
below. The problem is that syslinux violates the x86 boot protocol,
which stipulates that the setup header (starting at 0x1f1 bytes into
the bzImage) must be copied into a zeroed boot_params structure, but
it also copies the preceding bytes, which could be any value, as they
overlap with the PE/COFF header or other header data. This produces a
command line pointer with garbage in the top 32 bits, resulting in an
early crash.
In your case, you might be able to work around this by removing the
padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that
you are building with CONFIG_EFI_STUB disabled. However, this still
requires fixing on the syslinux side.
[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]
--- a/efi/main.c
+++ b/efi/main.c
@@ -1139,10 +1139,14 @@
bp = (struct boot_params *)(UINTN)addr;
memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
- /* Copy the first two sectors to boot_params */
- memcpy((char *)bp, kernel_buf, 2 * 512);
hdr = (struct linux_header *)bp;
+ /* Copy the setup header to boot_params */
+ memcpy(&hdr->setup_sects,
+ &((struct linux_header *)kernel_buf)->setup_sects,
+ sizeof(struct linux_header) -
+ offsetof(struct linux_header, setup_sects));
+
setup_sz = (hdr->setup_sects + 1) * 512;
if (hdr->version >= 0x20a) {
pref_address = hdr->pref_address;
--- a/com32/include/syslinux/linux.h
+++ b/com32/include/syslinux/linux.h
@@ -116,6 +116,7 @@ struct linux_header {
uint64_t pref_address;
uint32_t init_size;
uint32_t handover_offset;
+ uint32_t kernel_info_offset;
} __packed;
struct screen_info {
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 14:36 ` Ard Biesheuvel
@ 2025-03-06 14:38 ` H. Peter Anvin
2025-03-06 14:44 ` Ard Biesheuvel
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2025-03-06 14:38 UTC (permalink / raw)
To: Ard Biesheuvel, Ulrich Gemkow; +Cc: stable, regressions
On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
>(cc Peter)
>
>On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
><ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
>>
>> Hello,
>>
>> starting with stable kernel 6.6.18 we have problems with PXE booting.
>> A bisect shows that the following patch is guilty:
>>
>> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
>> From: Ard Biesheuvel <ardb@kernel.org>
>> Date: Tue, 12 Sep 2023 09:00:55 +0000
>> Subject: x86/boot: Remove the 'bugger off' message
>>
>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
>>
>> With this patch applied PXE starts, requests the kernel and the initrd.
>> Without showing anything on the console, the boot process stops.
>> It seems, that the kernel crashes very early.
>>
>> With stable kernel 6.6.17 PXE boot works without problems.
>>
>> Reverting this single patch (which is part of a larger set of
>> patches) solved the problem for us, PXE boot is working again.
>>
>> We use the packages syslinux-efi and syslinux-common from Debian 12.
>> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
>>
>
>I managed to track this down to a bug in syslinux, fixed by the hunk
>below. The problem is that syslinux violates the x86 boot protocol,
>which stipulates that the setup header (starting at 0x1f1 bytes into
>the bzImage) must be copied into a zeroed boot_params structure, but
>it also copies the preceding bytes, which could be any value, as they
>overlap with the PE/COFF header or other header data. This produces a
>command line pointer with garbage in the top 32 bits, resulting in an
>early crash.
>
>In your case, you might be able to work around this by removing the
>padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that
>you are building with CONFIG_EFI_STUB disabled. However, this still
>requires fixing on the syslinux side.
>
>
>
>[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]
>
>--- a/efi/main.c
>+++ b/efi/main.c
>@@ -1139,10 +1139,14 @@
> bp = (struct boot_params *)(UINTN)addr;
>
> memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
>- /* Copy the first two sectors to boot_params */
>- memcpy((char *)bp, kernel_buf, 2 * 512);
> hdr = (struct linux_header *)bp;
>
>+ /* Copy the setup header to boot_params */
>+ memcpy(&hdr->setup_sects,
>+ &((struct linux_header *)kernel_buf)->setup_sects,
>+ sizeof(struct linux_header) -
>+ offsetof(struct linux_header, setup_sects));
>+
> setup_sz = (hdr->setup_sects + 1) * 512;
> if (hdr->version >= 0x20a) {
> pref_address = hdr->pref_address;
>--- a/com32/include/syslinux/linux.h
>+++ b/com32/include/syslinux/linux.h
>@@ -116,6 +116,7 @@ struct linux_header {
> uint64_t pref_address;
> uint32_t init_size;
> uint32_t handover_offset;
>+ uint32_t kernel_info_offset;
> } __packed;
>
> struct screen_info {
Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 14:38 ` H. Peter Anvin
@ 2025-03-06 14:44 ` Ard Biesheuvel
2025-03-06 15:23 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2025-03-06 14:44 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ulrich Gemkow, stable, regressions
On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> >(cc Peter)
> >
> >On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
> ><ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
> >>
> >> Hello,
> >>
> >> starting with stable kernel 6.6.18 we have problems with PXE booting.
> >> A bisect shows that the following patch is guilty:
> >>
> >> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
> >> From: Ard Biesheuvel <ardb@kernel.org>
> >> Date: Tue, 12 Sep 2023 09:00:55 +0000
> >> Subject: x86/boot: Remove the 'bugger off' message
> >>
> >> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> >> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> >> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
> >>
> >> With this patch applied PXE starts, requests the kernel and the initrd.
> >> Without showing anything on the console, the boot process stops.
> >> It seems, that the kernel crashes very early.
> >>
> >> With stable kernel 6.6.17 PXE boot works without problems.
> >>
> >> Reverting this single patch (which is part of a larger set of
> >> patches) solved the problem for us, PXE boot is working again.
> >>
> >> We use the packages syslinux-efi and syslinux-common from Debian 12.
> >> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
> >>
> >
> >I managed to track this down to a bug in syslinux, fixed by the hunk
> >below. The problem is that syslinux violates the x86 boot protocol,
> >which stipulates that the setup header (starting at 0x1f1 bytes into
> >the bzImage) must be copied into a zeroed boot_params structure, but
> >it also copies the preceding bytes, which could be any value, as they
> >overlap with the PE/COFF header or other header data. This produces a
> >command line pointer with garbage in the top 32 bits, resulting in an
> >early crash.
> >
> >In your case, you might be able to work around this by removing the
> >padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that
> >you are building with CONFIG_EFI_STUB disabled. However, this still
> >requires fixing on the syslinux side.
> >
> >
> >
> >[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]
> >
> >--- a/efi/main.c
> >+++ b/efi/main.c
> >@@ -1139,10 +1139,14 @@
> > bp = (struct boot_params *)(UINTN)addr;
> >
> > memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
> >- /* Copy the first two sectors to boot_params */
> >- memcpy((char *)bp, kernel_buf, 2 * 512);
> > hdr = (struct linux_header *)bp;
> >
> >+ /* Copy the setup header to boot_params */
> >+ memcpy(&hdr->setup_sects,
> >+ &((struct linux_header *)kernel_buf)->setup_sects,
> >+ sizeof(struct linux_header) -
> >+ offsetof(struct linux_header, setup_sects));
> >+
> > setup_sz = (hdr->setup_sects + 1) * 512;
> > if (hdr->version >= 0x20a) {
> > pref_address = hdr->pref_address;
> >--- a/com32/include/syslinux/linux.h
> >+++ b/com32/include/syslinux/linux.h
> >@@ -116,6 +116,7 @@ struct linux_header {
> > uint64_t pref_address;
> > uint32_t init_size;
> > uint32_t handover_offset;
> >+ uint32_t kernel_info_offset;
> > } __packed;
> >
> > struct screen_info {
>
> Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
We're crashing way earlier than the sentinel check - the bogus command
line pointer is dereferenced via
startup_64()
configure_5level_paging()
cmdline_find_option_bool()
whereas sanitize_bootparams() is only called much later, from extract_kernel().
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 14:44 ` Ard Biesheuvel
@ 2025-03-06 15:23 ` H. Peter Anvin
2025-03-06 16:03 ` Ard Biesheuvel
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2025-03-06 15:23 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: Ulrich Gemkow, stable, regressions
On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
>On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
>> >(cc Peter)
>> >
>> >On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
>> ><ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
>> >>
>> >> Hello,
>> >>
>> >> starting with stable kernel 6.6.18 we have problems with PXE booting.
>> >> A bisect shows that the following patch is guilty:
>> >>
>> >> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
>> >> From: Ard Biesheuvel <ardb@kernel.org>
>> >> Date: Tue, 12 Sep 2023 09:00:55 +0000
>> >> Subject: x86/boot: Remove the 'bugger off' message
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> >> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> >> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>> >> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
>> >>
>> >> With this patch applied PXE starts, requests the kernel and the initrd.
>> >> Without showing anything on the console, the boot process stops.
>> >> It seems, that the kernel crashes very early.
>> >>
>> >> With stable kernel 6.6.17 PXE boot works without problems.
>> >>
>> >> Reverting this single patch (which is part of a larger set of
>> >> patches) solved the problem for us, PXE boot is working again.
>> >>
>> >> We use the packages syslinux-efi and syslinux-common from Debian 12.
>> >> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
>> >>
>> >
>> >I managed to track this down to a bug in syslinux, fixed by the hunk
>> >below. The problem is that syslinux violates the x86 boot protocol,
>> >which stipulates that the setup header (starting at 0x1f1 bytes into
>> >the bzImage) must be copied into a zeroed boot_params structure, but
>> >it also copies the preceding bytes, which could be any value, as they
>> >overlap with the PE/COFF header or other header data. This produces a
>> >command line pointer with garbage in the top 32 bits, resulting in an
>> >early crash.
>> >
>> >In your case, you might be able to work around this by removing the
>> >padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that
>> >you are building with CONFIG_EFI_STUB disabled. However, this still
>> >requires fixing on the syslinux side.
>> >
>> >
>> >
>> >[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]
>> >
>> >--- a/efi/main.c
>> >+++ b/efi/main.c
>> >@@ -1139,10 +1139,14 @@
>> > bp = (struct boot_params *)(UINTN)addr;
>> >
>> > memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
>> >- /* Copy the first two sectors to boot_params */
>> >- memcpy((char *)bp, kernel_buf, 2 * 512);
>> > hdr = (struct linux_header *)bp;
>> >
>> >+ /* Copy the setup header to boot_params */
>> >+ memcpy(&hdr->setup_sects,
>> >+ &((struct linux_header *)kernel_buf)->setup_sects,
>> >+ sizeof(struct linux_header) -
>> >+ offsetof(struct linux_header, setup_sects));
>> >+
>> > setup_sz = (hdr->setup_sects + 1) * 512;
>> > if (hdr->version >= 0x20a) {
>> > pref_address = hdr->pref_address;
>> >--- a/com32/include/syslinux/linux.h
>> >+++ b/com32/include/syslinux/linux.h
>> >@@ -116,6 +116,7 @@ struct linux_header {
>> > uint64_t pref_address;
>> > uint32_t init_size;
>> > uint32_t handover_offset;
>> >+ uint32_t kernel_info_offset;
>> > } __packed;
>> >
>> > struct screen_info {
>>
>> Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
>
>We're crashing way earlier than the sentinel check - the bogus command
>line pointer is dereferenced via
>
>startup_64()
> configure_5level_paging()
> cmdline_find_option_bool()
>
>whereas sanitize_bootparams() is only called much later, from extract_kernel().
That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 15:23 ` H. Peter Anvin
@ 2025-03-06 16:03 ` Ard Biesheuvel
2025-03-06 16:50 ` Ulrich Gemkow
0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2025-03-06 16:03 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ulrich Gemkow, stable, regressions
On Thu, 6 Mar 2025 at 16:23, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> >On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >> On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> >> >(cc Peter)
> >> >
> >> >
> >> >I managed to track this down to a bug in syslinux, fixed by the hunk
> >> >below. The problem is that syslinux violates the x86 boot protocol,
> >> >which stipulates that the setup header (starting at 0x1f1 bytes into
> >> >the bzImage) must be copied into a zeroed boot_params structure, but
> >> >it also copies the preceding bytes, which could be any value, as they
> >> >overlap with the PE/COFF header or other header data. This produces a
> >> >command line pointer with garbage in the top 32 bits, resulting in an
> >> >early crash.
> >> >
...
> >>
> >> Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
> >
> >We're crashing way earlier than the sentinel check - the bogus command
> >line pointer is dereferenced via
> >
> >startup_64()
> > configure_5level_paging()
> > cmdline_find_option_bool()
> >
> >whereas sanitize_bootparams() is only called much later, from extract_kernel().
>
> That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.
Indeed - I have just sent out a fix for this.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 16:03 ` Ard Biesheuvel
@ 2025-03-06 16:50 ` Ulrich Gemkow
2025-03-06 17:07 ` Ard Biesheuvel
0 siblings, 1 reply; 13+ messages in thread
From: Ulrich Gemkow @ 2025-03-06 16:50 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: H. Peter Anvin, stable, regressions
On Thursday 06 March 2025, Ard Biesheuvel wrote:
> On Thu, 6 Mar 2025 at 16:23, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin <hpa@zytor.com> wrote:
> > >>
> > >> On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >> >(cc Peter)
> > >> >
> > >> >
> > >> >I managed to track this down to a bug in syslinux, fixed by the hunk
> > >> >below. The problem is that syslinux violates the x86 boot protocol,
> > >> >which stipulates that the setup header (starting at 0x1f1 bytes into
> > >> >the bzImage) must be copied into a zeroed boot_params structure, but
> > >> >it also copies the preceding bytes, which could be any value, as they
> > >> >overlap with the PE/COFF header or other header data. This produces a
> > >> >command line pointer with garbage in the top 32 bits, resulting in an
> > >> >early crash.
> > >> >
> ...
> > >>
> > >> Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
> > >
> > >We're crashing way earlier than the sentinel check - the bogus command
> > >line pointer is dereferenced via
> > >
> > >startup_64()
> > > configure_5level_paging()
> > > cmdline_find_option_bool()
> > >
> > >whereas sanitize_bootparams() is only called much later, from extract_kernel().
> >
> > That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.
>
> Indeed - I have just sent out a fix for this.
>
Hello Ard,
thanks for the patch! It does not apply cleanly to 6.6.80 (the includes
are different) so I applied it manually and it helps - the systems boots.
Please allow the remark regarding the patch description that in
our kernel CONFIG_X86_5LEVEL is not set. The patch helps anyway :-)
Thanks again and best regards
Ulrich
--
|-----------------------------------------------------------------------
| Ulrich Gemkow
| University of Stuttgart
| Institute of Communication Networks and Computer Engineering (IKR)
|-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18
2025-03-06 16:50 ` Ulrich Gemkow
@ 2025-03-06 17:07 ` Ard Biesheuvel
0 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2025-03-06 17:07 UTC (permalink / raw)
To: Ulrich Gemkow; +Cc: H. Peter Anvin, stable, regressions
On Thu, 6 Mar 2025 at 17:50, Ulrich Gemkow
<ulrich.gemkow@ikr.uni-stuttgart.de> wrote:
>
> On Thursday 06 March 2025, Ard Biesheuvel wrote:
> > On Thu, 6 Mar 2025 at 16:23, H. Peter Anvin <hpa@zytor.com> wrote:
> > >
> > > On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin <hpa@zytor.com> wrote:
> > > >>
> > > >> On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >> >(cc Peter)
> > > >> >
> > > >> >
> > > >> >I managed to track this down to a bug in syslinux, fixed by the hunk
> > > >> >below. The problem is that syslinux violates the x86 boot protocol,
> > > >> >which stipulates that the setup header (starting at 0x1f1 bytes into
> > > >> >the bzImage) must be copied into a zeroed boot_params structure, but
> > > >> >it also copies the preceding bytes, which could be any value, as they
> > > >> >overlap with the PE/COFF header or other header data. This produces a
> > > >> >command line pointer with garbage in the top 32 bits, resulting in an
> > > >> >early crash.
> > > >> >
> > ...
> > > >>
> > > >> Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
> > > >
> > > >We're crashing way earlier than the sentinel check - the bogus command
> > > >line pointer is dereferenced via
> > > >
> > > >startup_64()
> > > > configure_5level_paging()
> > > > cmdline_find_option_bool()
> > > >
> > > >whereas sanitize_bootparams() is only called much later, from extract_kernel().
> > >
> > > That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.
> >
> > Indeed - I have just sent out a fix for this.
> >
>
> Hello Ard,
>
> thanks for the patch! It does not apply cleanly to 6.6.80 (the includes
> are different) so I applied it manually and it helps - the systems boots.
>
> Please allow the remark regarding the patch description that in
> our kernel CONFIG_X86_5LEVEL is not set. The patch helps anyway :-)
>
> Thanks again and best regards
>
Thanks for testing. I will take this as a Tested-by.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-03-06 17:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04 14:49 Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18 Ulrich Gemkow
2025-03-04 16:20 ` Greg KH
2025-03-04 16:59 ` Ulrich Gemkow
2025-03-04 17:40 ` Greg KH
2025-03-06 10:00 ` Ard Biesheuvel
2025-03-06 10:07 ` Ulrich Gemkow
2025-03-06 14:36 ` Ard Biesheuvel
2025-03-06 14:38 ` H. Peter Anvin
2025-03-06 14:44 ` Ard Biesheuvel
2025-03-06 15:23 ` H. Peter Anvin
2025-03-06 16:03 ` Ard Biesheuvel
2025-03-06 16:50 ` Ulrich Gemkow
2025-03-06 17:07 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox