* 6.18 CVM guest kernel boot issues with non-UEFI bootloader
@ 2026-01-29 3:57 Kevin Hui
2026-01-30 18:49 ` Tom Lendacky
0 siblings, 1 reply; 5+ messages in thread
From: Kevin Hui @ 2026-01-29 3:57 UTC (permalink / raw)
To: linux-coco
Cc: thomas.lendacky, youngjaelee, hvolkmer, chrisboltz, tzn,
andrisaar, grobler
Hello,
We’ve been using a non-UEFI bootloader to launch SEV-SNP CVMs and noticed that
there’s an issue with booting the newest 6.18 guest kernel with
the stage0 bootloader (https://github.com/project-oak/oak/tree/main/stage0_bin).
The guest kernel boots successfully with 6.17 and below, but fails with 6.18. We
were able to reproduce this with host kernels 6.13.2 and 6.18.3 on Milan/Genoa
hosts.
We traced the commit that started causing boot issues to
68a501d7fd82454525797971c6a0005ceeb93153 and noticed that for some reason the
variable snp_vmpl was non-zero, even though our stack doesn’t run SVSM. This
triggers the sev_es_terminate() call and subsequently crashes the CVM. We
noticed that the commit removes a supposedly redundant rmpadjust() check, but
from our observations it seems that the failed rmpadjust() short-circuited the
check and avoided the underlying issue.
I was chatting with Tom about this, and taking a deeper look at the issue, we
suspect that the BSS is cleared after the sev_enable() call in
arch/x86/boot/compressed/head_64.S, and that because of this snp_vmpl contains
random junk and is not zeroed. When coming through UEFI, it seems that the BSS
is cleared via drivers/firmware/efi/libstub/x86-stub.c, but in a non-UEFI
bootloader there is no call to startup_64 and so this path is never invoked,
leaving whatever random data was in bss to remain.
Perhaps the proper fix for this is to put the variables that are set as part of
sev_enable() into .data so that both non-UEFI and UEFI bootloaders will have the
same treatment from the kernel, but I would love to hear everyone’s thoughts on
this.
Thanks,
Kevin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 6.18 CVM guest kernel boot issues with non-UEFI bootloader
2026-01-29 3:57 6.18 CVM guest kernel boot issues with non-UEFI bootloader Kevin Hui
@ 2026-01-30 18:49 ` Tom Lendacky
2026-01-30 20:49 ` Ard Biesheuvel
0 siblings, 1 reply; 5+ messages in thread
From: Tom Lendacky @ 2026-01-30 18:49 UTC (permalink / raw)
To: Kevin Hui, linux-coco, Ard Biesheuvel
Cc: youngjaelee, hvolkmer, chrisboltz, tzn, andrisaar, grobler
Adding Ard
On 1/28/26 21:57, Kevin Hui wrote:
> Hello,
>
> We’ve been using a non-UEFI bootloader to launch SEV-SNP CVMs and noticed that
> there’s an issue with booting the newest 6.18 guest kernel with
> the stage0 bootloader (https://github.com/project-oak/oak/tree/main/stage0_bin).
> The guest kernel boots successfully with 6.17 and below, but fails with 6.18. We
> were able to reproduce this with host kernels 6.13.2 and 6.18.3 on Milan/Genoa
> hosts.
>
> We traced the commit that started causing boot issues to
> 68a501d7fd82454525797971c6a0005ceeb93153 and noticed that for some reason the
> variable snp_vmpl was non-zero, even though our stack doesn’t run SVSM. This
> triggers the sev_es_terminate() call and subsequently crashes the CVM. We
> noticed that the commit removes a supposedly redundant rmpadjust() check, but
> from our observations it seems that the failed rmpadjust() short-circuited the
> check and avoided the underlying issue.
>
> I was chatting with Tom about this, and taking a deeper look at the issue, we
> suspect that the BSS is cleared after the sev_enable() call in
> arch/x86/boot/compressed/head_64.S, and that because of this snp_vmpl contains
> random junk and is not zeroed. When coming through UEFI, it seems that the BSS
> is cleared via drivers/firmware/efi/libstub/x86-stub.c, but in a non-UEFI
> bootloader there is no call to startup_64 and so this path is never invoked,
> leaving whatever random data was in bss to remain.
>
> Perhaps the proper fix for this is to put the variables that are set as part of
> sev_enable() into .data so that both non-UEFI and UEFI bootloaders will have the
> same treatment from the kernel, but I would love to hear everyone’s thoughts on
> this.
>
> Thanks,
> Kevin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 6.18 CVM guest kernel boot issues with non-UEFI bootloader
2026-01-30 18:49 ` Tom Lendacky
@ 2026-01-30 20:49 ` Ard Biesheuvel
2026-01-30 22:05 ` Tom Lendacky
0 siblings, 1 reply; 5+ messages in thread
From: Ard Biesheuvel @ 2026-01-30 20:49 UTC (permalink / raw)
To: Tom Lendacky, Kevin Hui, linux-coco
Cc: youngjaelee, hvolkmer, chrisboltz, tzn, andrisaar, grobler
Hi Tom Kevin,
On Fri, 30 Jan 2026, at 19:49, Tom Lendacky wrote:
> Adding Ard
>
> On 1/28/26 21:57, Kevin Hui wrote:
>> Hello,
>>
>> We’ve been using a non-UEFI bootloader to launch SEV-SNP CVMs and noticed that
>> there’s an issue with booting the newest 6.18 guest kernel with
>> the stage0 bootloader (https://github.com/project-oak/oak/tree/main/stage0_bin).
>> The guest kernel boots successfully with 6.17 and below, but fails with 6.18. We
>> were able to reproduce this with host kernels 6.13.2 and 6.18.3 on Milan/Genoa
>> hosts.
>>
>> We traced the commit that started causing boot issues to
>> 68a501d7fd82454525797971c6a0005ceeb93153 and noticed that for some reason the
>> variable snp_vmpl was non-zero, even though our stack doesn’t run SVSM. This
>> triggers the sev_es_terminate() call and subsequently crashes the CVM. We
>> noticed that the commit removes a supposedly redundant rmpadjust() check, but
>> from our observations it seems that the failed rmpadjust() short-circuited the
>> check and avoided the underlying issue.
>>
>> I was chatting with Tom about this, and taking a deeper look at the issue, we
>> suspect that the BSS is cleared after the sev_enable() call in
>> arch/x86/boot/compressed/head_64.S, and that because of this snp_vmpl contains
>> random junk and is not zeroed. When coming through UEFI, it seems that the BSS
>> is cleared via drivers/firmware/efi/libstub/x86-stub.c, but in a non-UEFI
>> bootloader there is no call to startup_64 and so this path is never invoked,
>> leaving whatever random data was in bss to remain.
>>
If there is no call to startup_64(), where does the call to sev_enable() originate from?
>> Perhaps the proper fix for this is to put the variables that are set as part of
>> sev_enable() into .data so that both non-UEFI and UEFI bootloaders will have the
>> same treatment from the kernel, but I would love to hear everyone’s thoughts on
>> this.
>>
AIUI, the root cause of the issue is that C code is being called (sev_enable()) before BSS has been cleared, right?
If so, I don't think moving variables around is the right solution here: instead, BSS should be cleared before calling C code. And by the looks of it, this is not even SEV-specific, given that load_stage1_idt() is also a C function.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 6.18 CVM guest kernel boot issues with non-UEFI bootloader
2026-01-30 20:49 ` Ard Biesheuvel
@ 2026-01-30 22:05 ` Tom Lendacky
2026-01-30 22:12 ` Ard Biesheuvel
0 siblings, 1 reply; 5+ messages in thread
From: Tom Lendacky @ 2026-01-30 22:05 UTC (permalink / raw)
To: Ard Biesheuvel, Kevin Hui, linux-coco
Cc: youngjaelee, hvolkmer, chrisboltz, tzn, andrisaar, grobler
On 1/30/26 14:49, Ard Biesheuvel wrote:
> Hi Tom Kevin,
>
> On Fri, 30 Jan 2026, at 19:49, Tom Lendacky wrote:
>> Adding Ard
>>
>> On 1/28/26 21:57, Kevin Hui wrote:
>>> Hello,
>>>
>>> We’ve been using a non-UEFI bootloader to launch SEV-SNP CVMs and noticed that
>>> there’s an issue with booting the newest 6.18 guest kernel with
>>> the stage0 bootloader (https://github.com/project-oak/oak/tree/main/stage0_bin).
>>> The guest kernel boots successfully with 6.17 and below, but fails with 6.18. We
>>> were able to reproduce this with host kernels 6.13.2 and 6.18.3 on Milan/Genoa
>>> hosts.
>>>
>>> We traced the commit that started causing boot issues to
>>> 68a501d7fd82454525797971c6a0005ceeb93153 and noticed that for some reason the
>>> variable snp_vmpl was non-zero, even though our stack doesn’t run SVSM. This
>>> triggers the sev_es_terminate() call and subsequently crashes the CVM. We
>>> noticed that the commit removes a supposedly redundant rmpadjust() check, but
>>> from our observations it seems that the failed rmpadjust() short-circuited the
>>> check and avoided the underlying issue.
>>>
>>> I was chatting with Tom about this, and taking a deeper look at the issue, we
>>> suspect that the BSS is cleared after the sev_enable() call in
>>> arch/x86/boot/compressed/head_64.S, and that because of this snp_vmpl contains
>>> random junk and is not zeroed. When coming through UEFI, it seems that the BSS
>>> is cleared via drivers/firmware/efi/libstub/x86-stub.c, but in a non-UEFI
>>> bootloader there is no call to startup_64 and so this path is never invoked,
>>> leaving whatever random data was in bss to remain.
>>>
>
> If there is no call to startup_64(), where does the call to sev_enable() originate from?
There is a call to startup_64().
>
>>> Perhaps the proper fix for this is to put the variables that are set as part of
>>> sev_enable() into .data so that both non-UEFI and UEFI bootloaders will have the
>>> same treatment from the kernel, but I would love to hear everyone’s thoughts on
>>> this.
>>>
>
> AIUI, the root cause of the issue is that C code is being called (sev_enable()) before BSS has been cleared, right?
Right, but the issue also is that the BSS gets cleared after relocation,
too. So I don't think using BSS is correct, because it would have to be
cleared before the call to sev_enable(). But the call to clear the BSS
after relocation would have to stay because to, otherwise the guest crashes.
>
> If so, I don't think moving variables around is the right solution here: instead, BSS should be cleared before calling C code. And by the looks of it, this is not even SEV-specific, given that load_stage1_idt() is also a C function.
Because of how things are called and because of the relocation, I think
putting the variables in .data is probably the right thing to do. I've
noticed a few other files under arch/x86/boot/compressed that have
variables defined with __section(".data") probably for the same reason.
The load_stage1_idt() function only works with variables on a stack or in
.data (boot_idt), which is probably why that works ok.
Thanks,
Tom
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 6.18 CVM guest kernel boot issues with non-UEFI bootloader
2026-01-30 22:05 ` Tom Lendacky
@ 2026-01-30 22:12 ` Ard Biesheuvel
0 siblings, 0 replies; 5+ messages in thread
From: Ard Biesheuvel @ 2026-01-30 22:12 UTC (permalink / raw)
To: Tom Lendacky, Kevin Hui, linux-coco
Cc: youngjaelee, hvolkmer, chrisboltz, tzn, andrisaar, grobler
On Fri, 30 Jan 2026, at 23:05, Tom Lendacky wrote:
> On 1/30/26 14:49, Ard Biesheuvel wrote:
>> Hi Tom Kevin,
>>
>> On Fri, 30 Jan 2026, at 19:49, Tom Lendacky wrote:
>>> Adding Ard
>>>
>>> On 1/28/26 21:57, Kevin Hui wrote:
...
>>>> Perhaps the proper fix for this is to put the variables that are set as part of
>>>> sev_enable() into .data so that both non-UEFI and UEFI bootloaders will have the
>>>> same treatment from the kernel, but I would love to hear everyone’s thoughts on
>>>> this.
>>>>
>>
>> AIUI, the root cause of the issue is that C code is being called (sev_enable()) before BSS has been cleared, right?
>
> Right, but the issue also is that the BSS gets cleared after relocation,
> too. So I don't think using BSS is correct, because it would have to be
> cleared before the call to sev_enable(). But the call to clear the BSS
> after relocation would have to stay because to, otherwise the guest crashes.
>
Yeah, and relocating the image should migrate the contents of BSS too. So I agree that this is tricky. I hate the decompressor ...
>>
>> If so, I don't think moving variables around is the right solution here: instead, BSS should be cleared before calling C code. And by the looks of it, this is not even SEV-specific, given that load_stage1_idt() is also a C function.
>
> Because of how things are called and because of the relocation, I think
> putting the variables in .data is probably the right thing to do. I've
> noticed a few other files under arch/x86/boot/compressed that have
> variables defined with __section(".data") probably for the same reason.
>
> The load_stage1_idt() function only works with variables on a stack or in
> .data (boot_idt), which is probably why that works ok.
>
Indeed. In fact, creating any global state before relocation is rather tricky because of how the image moves itself around. But I guess it is unavoidable here.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-01-30 22:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-29 3:57 6.18 CVM guest kernel boot issues with non-UEFI bootloader Kevin Hui
2026-01-30 18:49 ` Tom Lendacky
2026-01-30 20:49 ` Ard Biesheuvel
2026-01-30 22:05 ` Tom Lendacky
2026-01-30 22:12 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox